Oh, the many challenges with Alexa Presentation Language (APL)
Our experience with Alexa Skills Challenge: Beyond Voice
Amazon periodically holds hackathon contests to get Alexa developers to build skills that use certain Alexa capabilities. The most recent one is the “Alexa Skills Challenge: Beyond Voice” which targets Alexa’s APL interfaces.
We entered this contest because our Mavis platform naturally uses APL to provide visually rich responses for Alexa screen devices. We also had just enhanced Mavis to add more response capability. So, the timing was perfect for the contest.
The resulting Mavis functionality provided a new level of rich responses. Each response is comprised of segments. Each segment can have its own response fixed or karaoke text, MP3, image, or Video. We also allow background images or videos. Thus, a single Alexa response comprised of 4 or 5 segments can provide incredible depth of experience. No longer are you stuck with single response images or an MP3/video. Now, a single response can interleave multiple MP3s, videos, images, and text with moving or fixed backgrounds. The response segment by segment visibility is tightly controlled using APL opacity so we can fade in/out each segment as well as each individual element within each segment.
The result is very, very cool.
Ah, but there’s a downside to using APL; APL bugs. And, a good number of them.
Because Mavis now uses segmented APL responses, we drive Alexa and APL very hard – apparently harder than anyone else. We can crash and reboot screen Alexas almost on demand. We’ve had to reduce, or limit, Alexa responses to try to workaround Alexa rebooting every third or fourth response in a session. And, no, it’s not device sensitive as all Alexa screen devices crash equally!
We also found that APL is very sensitive to MP3 and video content, meaning that APL won’t play valid MP3s or videos under certain circumstances. Thus, randomly, some valid responses are silent – without warning. They just don’t play.
An interesting example is where a video plays fine in the foreground but won’t play as a background. Another example is an MP3 that plays fine on a non-screen device but won’t play using APL.
Working with Amazon on these issues has been challenging. Because you first have to go through their initial tech support – who are not very knowledgable and not engineers – getting them to treat the issues as real is tough. We know APL much better than they do. We have a skill dedicated to reproducing issues that we let Amazon use. They always want a single response JSON so they can reproduce the issue. Unfortunately, it’s often multiple responses banging on Alexa back to back that causes failures. This is generally beyond their capability. The good people will send the failure to the engineering team. The bad people will spin and attempt to provide non-viable workarounds rather than opening engineering bug tickets.
The overall result is that an Alexa APL implementation can not support the responses APL was intended to do, at least not the wonderfully rich ones Mavis provides. The skills using APL have become non-viable and we’ve had to revert back to an earlier far less rich model to keep Alexa from breaking and destroying the user experience. Hopefully these bugs will make their way to the engineers to be fixed so we can then use APL like it was designed to be used.
The above is a good example of why clients may not want to dabble in creating their own skills without partnering with a company that does this for a living, like Voice2Biz. Let us deal with the headaches while the client can concentrate on the creative aspect of providing engaging content.
Choosing your smart speaker vendor Alexa or Google? Navigating hardware decisions We founded Voice2Biz Inc years ago on the premise that our technology and services would serve all smart speakers so whatever device you chose, we’d be there. Initially, that meant all...read more
Amazon Alexa has a client data security risk with Cross-Origin Resource Sharing (CORS)read more
Meet MAVIS - Multimedia Audio Visual Interface System Today we are proud to introduce the world to MAVIS. MAVIS stands for Multimedia Audio Visual Interface System and is the product of 7+ person years of technology development combined into one powerful...read more