On this Pi day, towards its ending, I would also like to share some of my experience of working with Raspberry Pi, back in college days. I was on one of the project groups which was assigned to do a portable keyword detection module. The end goal of the project was to develop an offline natural language keyword detector that can be carried around in our bags (later to be made compact in future researches). The background research on possible hardware and software and to suggest a suitable viable solution for the same was on me.

 

Hardware: While searching for the hardware, I came across a particular one which is actually an Arduino compatible hardware from Geetech. This Voice Recognition Module could be interfaced with Arduino and train and detect various commands up to 15 commands. The list of 15 is divided into 3 groups of 5 and the setup was a little messier. It lacked the scalability as an external PC was required to program it. Also the overhead of programming the Arduino explicitly every time a change was required. So as Raspberry Pi is a widely recognized and scalable platform, I changed my focus on something that would be possible with a Raspberry Pi rather than sticking to a limited module. The advantage was that we can set up the pi once and tinker with it infinite times without much effort.

 

Software: Those were the times where there were no Snips or Tensorflow around. So we had to rely on limited available resources for proceeding. The search bought me to the CMU Sphinx Toolkit - An Open Source Speech Recognition application developed by the Carnegie Mellon University. The advantage of this system was that it contained a specific module known as 'Pocketsphinx' capable of doing offline speech recognition. It accompanies a variety of packages such as sphinxbase and sphinxtrain, capable of training our own voice models. It also hosts resources with pre-trained voice models which could be run out of the box with minor tweaks. But the accent was a major issue here and its always advisable to train our own models which is a huge task and time-consuming.

 

The advantages of using this toolkit compared to the hardware module from Geetech was that this is more customizable, easily available in multiple platforms such as Java and python, and can be trained to detect a variety of keywords as compared to a limited 15 on the latter. The only shortfall was the need to train our own voice model which is a one time task, but time-consuming. Even then, the pocketsphinx was very easy to set up on the Raspberry Pi.

 

So my final verdict was in favor of Raspberry Pi + Pocketsphinx

 

Additionally, I came across a variety of other applications such as Kaldi, Julius, HTK, etc which also had offline recognition, but were discarded due to poor documentation and not so easy setup. Also, applications such as AT&T Watson, Microsoft Speech Server, Google speech API, Nuance recognizer, etc were totally kept apart due to their inability to perform offline recognition. If I was asked to do the same task today, there would be a variety of options such as Snips, At the edge processing, Tensor flow, Wavenet and a lot more NLP packages to choose from.

 

Resources:

https://cmusphinx.github.io/

https://www.geeetech.com/wiki/index.php/Arduino_Voice_Recognition_Module   (Seems to be broken currently)