Aaware Embedded Voice (AEV) captures voice within loud interfering noise, leveraging MEMS microphones from Infineon®, and interfaces to popular wake word, automatic speech recognition (ASR) and natural language understanding (NLU) technologies from key partners such as Picovoice™ and Sensory®. The combination of Aaware voice capture and these partner technologies enables voice controlled digital products with a single-chip Xilinx solution that are private, secure, reliable and robust as it does not require third party cloud processing resources.
Here is a flow diagram of Aaware Embedded Voice with key integrated voice AI partners and typical customer specific IP.
Getting The Noise Out
The number one job for a great VUI is accuracy, which allows it to function ubiquitously. This all starts with cleaning up the sound field so the wake word and follow-on ASR and NLU technologies can recognize and understand the voice input. The Aaware voice capture algorithms include, proprietary spatial/spectral/temporal source separation, noise suppression, acoustic echo cancellation (AEC) and source localization that delivers low distortion audio of voice sources in the sound field.
Acceleration at the Edge
The Aaware DSP algorithms and AI models are accelerated using the FPGA fabric, optimizing performance, offloading the CPU, and reducing cost and power. Using the FPGA fabric, Aaware packs more audio processing in this dual core than is possible in traditional quad cores running at twice the clock speed allowing a full VUI to run on this Zynq 7010.
Pulling It All Together
The other major effort to making a VUI integrate more easily into your product is pulling the entire voice flow together into a flexible and open platform. This is where Aaware has partnered with key wake word and ASR/NLU providers, Picovoice and Sensory, and integrated their technology onto the AEV Platform. This combination gives product teams three different complete embedded voice flows integrated with our flexible multi-mic array and Xilnx Zynq All Programmable SoC. Customers can experiment with different mic array configurations (up to 13 mics) and integrate their embedded software within the popular Ubuntu OS. The embedded software is powered within the Zynq SoC by dual 32-bit A9 ARM processors with NEON™ acceleration. The Aaware DSP algorithms and AI models are accelerated in FPGA hardware, optimizing performance, software footprint, cost, and power.
Easy to Use and Flexible Platform
In addition to three VUI flow choices, product teams can experiment with different mic array configurations (up to 13 Infineon MEMS mics) and integrate embedded software using the popular Ubuntu OS.
Key Features and Benefits
- Enables private, secure, reliable and robust product interaction
- Superior noise interference cancellation - delivering low distortion voice
- Multi-channel Acoustic Echo Cancelation (AEC)
- Accurate speech activity and arrival detection
- Best in class integrated wake word technologies from Picovoice and Sensory
- Superior integrated embedded ASR and NLU technologies from Picovoice and Sensory
- Multi-mic array configurations (up to 13 mics)
- Infineon XENSIV™ MEMS microphones
- 3.5mm audio I2S output - for stereo speakers
- Dual ARM A9 with NEON acceleration
- 512MB of DDR3, 8GB eMMC
- WiFi 802.11 b/g/n
- Ubuntu 18.04
- Three different embedded voice flow demonstrated
- Commands - single words or short phrases that take immediate action (10-15 words)
- Speech 2 Intent - more complex expression recognition with built in NLU that outputs intent (hundreds of expressions)
- Speech 2 Text - complete language recognition that outputs recognized words (entire language - needs custom NLU)
- Robotics (consumer and Industrial)
- Smart Home (Speakers and Home Hubs)
- Industrial IOT (Control Panels)
- Surveillance (Consumer and Industrial)
- Kiosks (Hospitality, Food Service, Wayfinding)
click image to enlarge