Following the rollout of its cloudless, edge device-focused voice assistant stack, which comprises wake word, speech-to-text translation, and speech-to-intent capabilities, Picovoice announced a web console that lets you easily create and train your own voice models. Alongside the web console release, the company joined the Arm AI Ecosystem Partner Program, which gives Picovoice deeper access to Arm IP and to chip manufacturers like NXP. Specifically, Picovoice is focused on ARM Cortex-M chip designs, which are extremely low power and can integrate into all manner of IoT devices — but are powerful enough to support its voice assistant without the need for a cloud connection.
The big idea is that OEMs can use the Picovoice web console to whip up voice controls for their devices large and small, for minimal cost. Products with voice assistants on board are hot, and although the likes of smart speakers and smart displays get the bulk of the attention, some level of voice control is possible on all manner of lower-power edge devices, from coffee makers to lights. Amazon in particular has aggressively pushed into this internet of things (IoT) space with household devices like its microwave and lamp, but those are all part of Amazon’s Alexa ecosystem.
Picovoice sees an opportunity to help other companies capture a chunk of that market.
“Over the course of the past few years, we realized that companies are really struggling to build robust voice experiences, because they have to use several tool sets from different companies and glue them together,” said Picovoice business development chief Mehrdad Majzoobi.
He added that training voice models is resource-intensive — even to simply create a wake word — and requires expertise that not all device makers have available, which drove Picovoice to build a tool that removes that need. “Even non-technical stakeholders in companies [like] product managers and UX designers [can] use the tool to build the experience,” he said. Then, after essentially a one-button export, a company’s engineers can handle integrating the voice capabilities into devices.
The resulting voice models are so efficient, he said, that they can run on multiple classes of tiny Cortex-M microcontrollers. Picovoice’s next goal is for its tool to be able to support 1 billion devices.
In a demo, Picovoice showed VentureBeat how easy the web console is to use.
You start with the wake word section: Simply type in the word or phrase you want to use, select the platform (e.g., ARM Cortex-M or x86_64), and click Create Wake Word Draft. Click Submit to train the model, and in a couple of hours (or less) it will be available for download.
Next, you create the “speech-to-intent context,” which is the domain you want to use it for, such as smart lighting — Picovoice has templates for common domains, or you can define your own. Then click Create Context, and you’ll see a list of parameters to adjust, like “turnLight” and “turnOffLight,” state, and location.
It doesn’t take a programmer to figure out how to set all the intents, states, and parameters thereof, but there is a bit of a learning curve. You have to type in some text commands and use characters like the “$” to define them. But you can test your expressions instantly right there in the browser and make sure you’re on the right track, editing or deleting any that don’t work and adding more if need be.
When you’re done, you click Train, and in a matter of hours you’ll have your model.
With the console, you can see clearly how Picovoice’s domain-specific approach makes sense. For smart lighting, you don’t need a universe of possible commands, like your phone assistant would. You just need a certain set of lights in a given location to turn on or off when you say a given word or phrase. The Picovoice console appears to make that easy for non-technical people.