Last September, Amazon unveiled the Voice Interoperability Initiative, a program aimed at ensuring voice-enabled products like smart speakers and displays provide users choice in multiple voice assistants. Today, the company announced the addition of 38 new members including Dolby, Facebook, and Garmin Xiaomi to the initiative, bringing the total number of member companies to 77. (Google remains conspicuously absent from the list.) To mark the milestone, Amazon published what it’s calling the Multi-Agent design guide, a whitepaper outlining design recommendations Voice Interoperability Initiative members should use in building multi-assistant products.
The Voice Interoperability Initiative is organized around four core principles, the first of which is developing voice services that work “seamlessly” with others while ostensibly preserving privacy. Members seek to build devices that ship with multiple assistants as they work to accelerate conversational AI research, with the goal of enabling users to leverage the unique capabilities afforded by Alexa, Cortana, and other voice services on a single platform.
The newly published Multi-Agent design guide covers three key topic areas, specifically (1) customer choice and agent invocation, (2) multi-agent experiences, and (3) privacy and security. It recommends that multi-assistant products help customers find assistants available and explore their capabilities, and it lays out suggestions for agent transfer and universal device commands (UDCs), which address user requests one assistant can’t fulfill without summoning another assistant. (UDCs are commands and device controls any assistant can recognize even if the assistant wasn’t used to kick off the experience, like volume and timer functionality.) In a device with agent transfer and UDCs, asking Alexa to reserve a restaurant using Google Duplex (a service to which Alexa doesn’t have access) could call up Google Assistant automatically, and Google Assistant could stop a timer started by Alexa.
Beyond this, the Multi-Agent design guide recommends coexisting agents convey at least three core attention states — listening, thinking, or speaking — with visual and sound cues. This paradigm, it says, will make it easier for users to see which assistants are active and when their state changes.
The Voice Interoperability Initiative’s launch comes a year after Microsoft and Amazon brought Alexa and Cortana to all Echo speakers and Windows 10 users in the U.S., following the formation of a partnership first made public in a 2017 co-announcement featuring Microsoft CEO Satya Nadella and Bezos. Each of the assistants brought distinctive features to the table. Cortana, for example, can schedule a meeting with Outlook or draw on LinkedIn to tell you about people in your next meeting. And Amazon has more than 100,000 voice apps made to tackle a broad range of use cases.