We have seen many forms of voice control, and I've used some of them in the past (IoT Alarm Clock using Jasper ) or recently (Running Amazon Echo (Alexa) on Raspberry Pi Zero ).


For this project, I thought I'd try to find a voice control solution that can meet following requirements:

  • work offline
  • easy to customise commands


For example, Alexa is extremely powerful and can understand and answer a lot of questions, while sounding very human. But the data is processed online and wouldn't work without an active internet connection. Another thing is that in order to easily customise commands with Alexa, an additional service like IFTTT is required. So unfortunately, no internet = no voice control.


Luckily, Alan's Raspberry Pi 3 RoadTest was all about speech recognition performance using a Speech To Text tool called PocketSphinx. Alan also refers to a practical application by Neil Davenport for Make. Exactly what I was looking for!




First things first. To be able to do voice control, we need to have an audio input device on the Pi. A USB microphone is probably the easiest and cheapest option.


I found this tiny USB microphone on eBay for about $1:




To verify it was properly detected, I listed the recording devices with following command:


pi@piclock:~ $ arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0


Next, trigger a recording and generate some sound:


pi@piclock:~ $ arecord -Dhw:1 -r 44100 -f S16_LE file
Recording WAVE 'file' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono
^CAborted by signal Interrupt...


And finally, verify the recording by playing it out:


pi@piclock:~ $ aplay -Dhw:0 -r 44100 -f S16_LE file
Playing WAVE 'file' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono


You should be able to hear the recorded sounds, indicating the microphone is working as expected.






To install the software, I mainly followed the very clear and detailed instructions from Neil on Make: http://makezine.com/projects/use-raspberry-pi-for-voice-control/

There were however some things I needed to adapt or add in order to get it fully working, so here's my take on the PocketSphinx installation




Some dependencies need to be installed, to avoid running into problems when building PocketSpinx or running the code.


pi@piclock:~ $ sudo apt-get install libasound2-dev autoconf libtool bison swig python-dev python-pyaudio


pi@piclock:~ $ curl -O https://bootstrap.pypa.io/get-pip.py
pi@piclock:~ $ sudo python get-pip.py
pi@piclock:~ $ sudo pip install gevent grequests


Once the dependencies are installed, the first bit of software can be installed.




These instructions have been followed as is from Neil's guide, and are used to download the source files for SphinxBase and build it.


pi@piclock:~ $ git clone git://github.com/cmusphinx/sphinxbase.git
pi@piclock:~ $ cd sphinxbase
pi@piclock:~/sphinxbase $ git checkout 3b34d87
pi@piclock:~/sphinxbase $ ./autogen.sh
pi@piclock:~/sphinxbase $ make
pi@piclock:~/sphinxbase $ sudo make install
pi@piclock:~/sphinxbase $ cd ..




After building SphinxBase, the same is done for PocketSphinx:


pi@piclock:~ $ git clone git://github.com/cmusphinx/pocketsphinx.git
pi@piclock:~ $ cd pocketsphinx
pi@piclock:~/pocketsphinx $ git checkout 4e4e607
pi@piclock:~/pocketsphinx $ ./autogen.sh
pi@piclock:~/pocketsphinx $ make
pi@piclock:~/pocketsphinx $ sudo make install




With the installation complete, I first tested PocketSphinx using the microphone input, in continuous listen mode:


pi@piclock:~ $ pocketsphinx_continuous -inmic yes
pocketsphinx_continuous: error while loading shared libraries: libpocketsphinx.so.3: cannot open shared object file: No such file or directory


This returned an error. For some reason, the location of the shared libraries needs to be included in library search path, as it's not part of the defaults:


pi@piclock:~ $ sudo nano /etc/ld.so.conf
include /etc/ld.so.conf.d/*.conf


After adding the "/usr/local/lib" path to "/etc/ld.so.conf", apply the change:


pi@piclock:~ $ sudo ldconfig


I then tried again and bumped into another issue:


pi@piclock:~ $ pocketsphinx_continuous -inmic yes
Error opening audio device default for capture: No such file or directory


PocketSphinx searched for the microphone on the default audio card, which is the Pi's onboard audio, that has not input capabilities. This can easily be helped by specifying the microphone's index on the command line interface:


pi@piclock:~ $ pocketsphinx_continuous -inmic yes -adcdev plughw:1


PocketSphinx is then running, trying to recognise speech. Don't worry if at this stage it's not recognising what you say (at all), as it still needs to be configured with meaningful dictionary and grammar data.




Audio Devices


As documented in Neil's post, I changed the <> config to put the USB device at index 0 and only then the onboard audio. This makes the USB device the default for playout and capture, without having to change other files. This works particularly well if you are using a USB sound card with both input and output capabilities.


pi@piclock:~ $ sudo nano /etc/modprobe.d/alsa-base.conf
options snd-usb-audio index=0
options snd_bcm2835 index=1


Then came, what was for me, the most tricky part. Since the USB dongle I used is microphone only, I needed to have the default playout to remain the onboard audio, but the default capture to be the USB mic. After a lot of searches and different tests, this became the resulting audio configuration to get both playout and capture to work on the expected devices:


pi@piclock:~ $ sudo nano /etc/asound.conf

    type hw
    card 0
    type hw
    card 1

    type asym
        type plug
        slave.pcm "onboard"
        type plug
        slave.pcm "mic"


Dictionary & Language Model


The required input is a "corpus" file, a file containing the phrases that need to be recognised. This file can then be fed to Sphinx's lmtool to generate the dictionary and language model files.


As explained on that page:

To use: Create a sentence corpus file, consisting of all sentences you would like the decoder to recognize. The sentences should be one to a line (but do not need to have standard punctuation). You may not need to exhaustively list all possible sentences: the decoder will allow fragments to recombine into new sentences.


Example questions for my application are:

  • is the door of the shed closed
  • is the door of the shed open
  • what is the temperature in the shed
  • turn on the lab light
  • turn off the lab light


Screen Shot 2016-08-02 at 15.37.48.pngScreen Shot 2016-08-02 at 21.22.31.png


Two of these generated files are relevant: the *.dic (dictionary) file and the *.lm (language model)


For ease of use, renamed the files to "dictionary.dic" & "language_model.lm".


Grammar File


The grammar file ("grammar.jsgf") contains the structure of the sentences that will be spoken. Based on Neil's example, I created my own grammar file:


#JSGF V1.0;
grammar commands;

<action> = TURN ON |
  TURN OFF       |
  DOOR ;

<object> = SHED |
  LAB ;

public <command> = <action> THE <object> LIGHT |
  WHAT IS THE <action> OF THE <object> |
  IS THE <action> OF THE <object> CLOSED |
  IS THE <action> OF THE <object> OPEN ;


Be careful though, every word used in the grammar file should be present in the dictionary. Otherwise, an error will be generated at startup and the script will fail to start.




After replacing the files in the demo code by my own, I was able to detect my customised phrases accurately.


Here's a short clip demonstrating the recognition. It is now just a matter of linking the actual actions to the detected phrases.



As you can see, the recognition is extremely fast. Also, everything is done locally, without the need for an internet connection!




Navigate to the next or previous post using the arrows.