Skip navigation
>

RoadTest Review a Raspberry Pi 3 Model B ! - Review

Scoring

Product Performed to Expectations: 10
Specifications were sufficient to design with: 10
Demo Software was of good quality: 10
Demo was easy to use: 10
Support materials were available: 10
The price to performance ratio was good: 10
TotalScore: 60 / 60
  • RoadTest: RoadTest Review a Raspberry Pi 3 Model B !
  • Buy Now
  • Evaluation Type: Independent Products
  • Application you used the part in: CMU PocketSphinx Speech Recognition For Autonomous Personal Robot
  • Was everything in the box required?: Yes - Roadtest provided a Raspberry Pi 3 Model B single board computer. Power supply, microphone, speaker, USB power meter, micro-SD card and cables were temporarily re-purposed for the test.
  • Comparable Products/Other parts you considered: Raspberry Pi B+
  • What were the biggest problems encountered?: Version of pocketsphinx used does not log performance data when performing grammar-based recognition.

  • Detailed Review:

    Raspberry Pi 3 Model B Roadtest:

    Pocket Sphinx Speech Recognition Performance Comparison

     

    Author: Alan McDonley

    Sponsor: Element 14 element14.com

    Hardware (existing personal resources unless otherwise noted):

    Raspberry Pi 3 Model B (sponsor provided)

    Raspberry Pi Model B+ 512MB

    USB microphone

     

    Software:

    OS: Raspian Jessie-lite v8

    ASR: PocketSphinx 5prealpha branch 4e4e607

    TTS: Festival 2.1 Nov 2010

    Test Program: mymain.py

    (heralds from Neil Davenport git clone https://github.com/bynds/makevoicedemo)

    Python: v 2.7

     

    Configuration:

        Unconstained Large Vocabulary LM: ~20k uni-grams, 1.4M bi-grams, 3M tri-grams

        Small LM: 136 words in 106 phrase ARPA format single, bi- and tri-gram language model

        JSGF Medium Grammar: supports all 136 words in 106 phrases input to create Small LM

        Pi3: 883052KiB Total Memory, 0 swap used

        PiB+: 380416KiB Total Memory, 0 swap used

     

    Overview:

     

    This roadtest compares the computational performance of the Raspberry Pi 3 Model B with the Raspberry Pi Model B+ running an automatic speech recognition engine, pocketphinx, (Carnegie Mellon University's open source large vocabulary, speaker-independent continuous speech recognition engine).

     

    The performance measurements of interest in this test are "xRT" - relationship between the amount of audio and the amount of CPU time to recognize the speech, and the word error rate "WER" - (substitutions+insertions+deletions) / total words spoken.

     

    Three test modes:

    1) Large Language Model with file input - 10 simple to complex, unconstrained phrases

    2) Small Language Model with microphone input - 10 in-model phrases

    3) Grammar-based with microphone input - 10 in-grammar phrases

     

    The release of the Raspberry Pi 3 Model B enables versatile speech interfaces in autonomous personal robots. The results of this test show that the Pi 3 (using only one of four cores) can keep up with small language-model speech, freeing the developer from the ardour of grammar development, and expanding the speech interface capability beyond commands to enable the beginings of dialog.

     

    The Pi 3 can even enable surprisingly comfortable human-robot interaction using large vocabulary, unconstrained, continuous speech, with tremendous reserve processing resources available.

     

     

    Test Bench:

     

    Photo shows Raspberry Pi B+ at the bottom of the current robot, the sponsored Raspberry Pi 3 Model B on the right front. A Drok USB power meter, (blue), sits above a small externally powered speaker. The microphone used, center, has a USB interface. Tests were run via remote ssh, over WiFi, into the device under test from terminal windows on a MacMini.

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    ==

     

    Summary Test Result:

     

    Speech recognition using large language model with unconstrained audio file input on the Pi 3 is 2.4 times faster than on the Pi B+ with identical error rate.

     

    Speech recognition using small language model with in-model microphone input on the Pi 3 is 3.72 times faster than on the Pi B+. The Pi 3 had a zero (and sometimes near zero) word error rate, while the Pi B+ showed 37% word error rate.

     

    Speech recognition using a medium size grammar with in-grammar microphone input on the Pi 3 had zero errors, while the Pi B+ showed a 3% WER. (The version of pocketsphinx used does not report performance with grammar-based recognition.) Both processors appeared to keep up with the commands in real time.

     

    Short video of each processor running is at: https://vimeo.com/169445418

     

    Programs, grammar, language model, corpus text, test wav file, log files from every run, and performance for each run are at: https://github.com/slowrunner/Pi3RoadTest

     

    This report is located at: https://goo.gl/RrGgCm

     

    Test Procedure:

     

    1) Record test audio of 10 phrases of various length (input_file.txt):

    arecord -f s16_LE -r 16000 test16k.wav

    Speak:

    Hello

    What Time is it

    Drive Forward Slowly

    Who do you think will win the election

    What is the weather forecast

    How long have you been running

    Turn forty five degrees left

    one two three four five six seven eight nine ten

    a b c d e f g h i j k l m n o p q r s t u v w x y z

    Goodbye

     

    2) Run large LM PocketSphinx on the recording in one remote ssh, run top in another (%CPU %Mem)

    pocketsphinx_continuous -infile test16k.wav 2>&1 | tee ./psphinx.log

    (note power consumption A during and after reco)

     

    3) Extract Performance Data (xRT)

    ./perf.sh >result_Pi<model>_file.log

     

    4) Extract recognized phrases

    tail -14 >reco_Pi<model>_file.log

     

    5) Run PocketSphinx from microphone with small LM, speak 10 in-model phrases

      python mymain.py

    (note power consumption A during and after program)

    (note %CPU %Mem from top during program execution)

    Speak:

    Hello

    What Time is it

    Drive Forward Slowly

    How long have you been running

    Turn forty five degrees left

    Go backward quickly

    Is it going to rain

    Spin

    Stop now

    sudo shutdown minus H now

     

    6) Copy term output to LMsmall_Pi<model>.log

     

    7) Extract Performance Data (xRT)

    ./perf.sh >result_Pi<model>_10.txt

     

    8) Run PocketSphinx from microphone with medium JSGF grammar, speak 10 in-grammar phrases.

    (note power consumption A during and after program)

    (note %CPU %Mem from top during program execution)

    Speak:

    Hello

    What Time is it

    Drive Forward Slowly

    How long have you been running

    Turn forty five degrees left

    Go backward quickly

    Is it going to rain

    Spin

    Stop

    sudo shutdown minus H now

     

    9) Copy term output to jsgf_Pi<model>.log

     

     

    Detailed Test Results:

     

    1) pocketsphinx_continuous -infile test16k.wav

    Pi B+: top 92-98% CPU 17% memory, 0.40A at 5.02V (+0.08A)

    2 word substitutions, 1 deletion, 2 insertions = 5 errors / 67 words

    = 7% Word Error Rate WER

    5.234 xRT reported Total CPU xRT ( sum fwdtree, fwdflat, and bestpath calculations)

     

    Pi 3: top 100% CPU 7% memory, 0.49A (+0.18A)

    2 word substitutions, 1 deletion, 2 insertions = 5 errors / 67 words

    = 7% Word Error Rate WER

    2.168 xRT reported Total CPU xRT (sum fwdtree, fwdflat, and bestpath)

     

    2) pocketsphinx python using microphone and small LM

    Pi B+: top 90% CPU 5% memory, 0.45A (+0.13A)

    4 word substitutions, 9 deletion, 0 insertions = 13 errors / 35 words

    = 37% Word Error Rate WER

    3.075 xRT reported Total CPU xRT ( sum fwdtree, fwdflat, and bestpath calculations)

    Pi 3: top 100% CPU 3% memory, 0.49A (+0.18A)

    0 word substitutions, 0 deletion, 0 insertions = 0 errors / 35 words

    = 0% Word Error Rate WER

    0.826 xRT reported Total CPU xRT (sum fwdtree, fwdflat, and bestpath)

     

    3) pocketsphinx python using microphone input and medium grammar:

    Pi B+: 1 word substitution, 0 deletion, 0 insertions = 1 error / 34 words

    = 3% Word Error Rate WER

     

      Pi 3: 0 word substitutions, 0 deletion, 0 insertions = 0 errors / 34 words

    = 0% Word Error Rate WER

     

     

    Impact of Findings:

     

    There has been a long standing debate between product developers (pragmatists) and speech interface researchers over the role of grammars in speech interfaces. When processing resources, (cycles and memory), are scarce or slow, agreeing on a limited set of words and constraining phrase complexity (a speech grammar) can enable a successful speech interface.

     

    Grammar-based-speech interfaces for complex human-machine interaction become arduous to develop and tune. Additionally, the interfaces tend to be fragile with wide disparity of user success. From a software coupling aspect (bad), grammar-based-speech interfaces require the developer to duplicate effort to keep the grammar and the result interpretation tightly in-sync.

     

    Unconstrained, continuous speech interfaces using language-model-based recognition require much higher performance from processing resources, but enable more robust interfaces and with much greater utility.

     

    For a simple product (or personal robot in my case), of limited functionality, a grammar-based-speech command interface is possible on the Raspberry Pi B+.

     

    Since Pi B+ language model recognition is three to five times slower than real-time, small language-model speech cannot be used as a control interface, and large vocabulary, unconstrained (language-model-based) recognition is totally out the question.

     

    The release of the Raspberry Pi 3 Model B enables versatile speech interfaces in autonomous personal robots. The results of this test show that the Pi 3 (using only one of four cores) can keep up with small language-model speech, freeing the developer from the ardour of grammar development, and expanding the speech interface capability beyond commands to enable the beginings of dialog.

     

    The Pi 3 can even enable surprisingly comfortable human-robot interaction using large vocabulary, unconstrained, continuous speech, with tremendous reserve processing resources available.

     

    Sixteen years ago my single board computer based robot ran programs in 32K of memory at 1MHz, had only a one-way interface (Morse Code to human), and had a situational awareness dimension of 12 inches.

     

    Today, with the Raspberry Pi 3, there is processing power for two-way communication in human languages, local situation awareness through vision, and global situation awareness - all in the same 1 cubic foot robot.

     

     

    About The Author:

     

    Alan is a Sr. Software Quality Assurance engineer for Broadsoft Hospitality Group, which provides cloud based communications for hotels and resorts worldwide. Formerly, Alan was a Sr. Development Engineer for IBM's Telephony Speech and Contact Center Services, using speech recognition, speaker identification, and interactive voice response (IVR) technologies.

     

     

     

    Acknowledgements:

     

    element-14.com: Sponsor of the Raspberry Pi 3 Roadtest and provided Pi 3 for this test

    Festival Speech Synthesis System: Copyright Univ. of Edinburgh, and Carnegie Mellon Univ.

    PocketSphinx 5prealpha:

    Authors: Alan W Black, Evandro Gouvea, David Huggins-Daines,

    Alexander Solovets, Vyacheslav Klimov

    Assistance: Nickolay V. Shmyrev - you rock guy!

    pocket_sphinx_listener.py, main.py: Neil Davenport

    http://makezine.com/projects/use-raspberry-pi-for-voice-control/

    git clone https://github.com/bynds/makevoicedemo


Comments

Also Enrolling
Enrollment Closes: Mar 17 
Enroll
Enrollment Closes: Mar 1 
Enroll
Enrollment Closes: Mar 29 
Enroll
Enrollment Closes: Mar 3 
Enroll
Enrollment Closes: Feb 24 
Enroll