I welcome you to this part of my review about Arduino Nano 33 BLE Sense. My review is split into multiple blog posts. You can find all my thoughts about this Arduino and related parts in chapters with name beginning with "Review". There are also articles describing test projects like this one which I have done for gathering experiences with board and some tutorials. Main page of review contains summary and final score. Following Table of Contents contains links to other parts of my roadtest review.

 

Table of Contents

 

 

Project 02: Speech Recognition and Machine Learning

As part of review process, I tried to develop machine learning app (my first ML app in my life). Sadly, it was mostly unsuccessful. Most probably because I am total novice to machine learning and my experiences were insufficient to complete this project. I tried to make voice recognition app which determines voice commands “red”, “green” and “blue” based on data from microphone and turns on appropriate colour on RGB led.

 

My first attempt was done by following this tutorial: EloquentTinyML: Easier Voice Classifier on Nano 33 BLE Sense. Arduino side was easy to develop and deploy. Because I had no experiences with Tensor Flow, I spent plenty of time with installing it but later it was also seamless. I tested training model multiple times and always with 60 samples. 20 samples per each classified word. Usually I was able to correctly classify “red” word but when I said nothing it was classified as blue and saying “green” were classified very randomly. But “red” command worked, and it was classified quite accurately. I tried this also with tweaked gain because results from microphone was very low volume. After changing gain, nothing changed.

 

My second attempt was using Edge Impulse environment. It internally also uses Tensor Flow, but it is probably much more tweaked. Some parameters of model and training algorithm are configurable, but I had no experiences to doing this. User interface is also very user friendly and almost anyone can start developing applications very quickly. I trained model similar to first case and results were slightly better, but the behaviour was mostly the same. Red worked, green triggered random colour and blue does not work at all. In this case I was unable to tweak gain because firmware was provided by edge impulse and they did not allow me to adjust any parameters of peripheral (microphone in this case).

Finally, after non-successes, I was thinking about cause for failure. I was thinking about low volume of samples. I tried to collect some samples to check if they are not deformed or noisy. I wrote application (using nRF52840 stack and not an Arduino environment for going in more deep details about microphone, interface with MCU, timing of PDM signal, and PDM peripheral to ensure that all configurations are correct) which uploaded samples over high baudrate UART to PC and visualized them using desktop application which I have also written. I tried to say “reg, “green” and “blue” and results were following.

 

Because I am not an audio expert, I cannot determine correctness of audio signal, but I think this audio signal is ok. See values range. PDM module reports 16-bit samples ranging from minimum -32768 to maximum 32767. I said words as I normally speak but received amplitudes were in range about -300 to 300. I zoomed to middle part of signal and it looks good and correct, I think. Noise is also pretty low. At following picture you can see zoomed middle part of signal.

 

Finally, I tried to clap to check amplitude ranges and results were following. As you can see signal went very high. I received samples with amplitudes between -14699 to 26550. After these tests I think PDM microphone, MCU and PDM peripheral works good and audio signal is not an issue of my ML failure.

 

I thought about samples processing. When samples are collected there are calculated RMS (root mean square) and RMS samples are passed to ML library. I am not sure if this is good approach. I think that RMS mostly depend of amplitudes and not a signal frequency which is more important in audio signal, I think. If my thought is correct, this also matches behaviour of “red” and non-red voice commands. If you look back to first chart, you can see that “red“ word (“red” is first part of chart) has different amplitude behaviour than “green” and “blue” commands which were many times confused in opposition to “red” which was classified very clearly. Currently I have no time to do other experiments with trying passing frequencies rather than amplitude because it requires lot of work and I currently do not want to spend so much time with it. But in the future, I plan to return to this and try further experiments.

 

Lastly, I want to say that I am not the first who failed with this. In fact, author of mentioned tutorial also says that his model and recognitions were not very accurate. On element14 at Edge Impulse webinar page you can see in discussion another user who tried it (in fact, he tried it with different words and much more training samples that I have done) on the same Arduino, but also failed.