My Raspberry Pi TALKS!

At first, you will probably say, "that project is too easy". In a way, you're only partly right. Let's take away that mouse and those fancy graphics. In fact, let's remove the entire X Window system (a.k.a. X11) (including all apps that require a GUI, including LXDE) and give only 4MB to the Raspberry Pi's GPU. Install and run Festival, and make it talk... Now, will you still say, it's too easy? Here are the web pages I used in making this project possible:


The Lite-image of Raspbian OS.

Command-Line Interface

This project might scare away new Raspberry Pi users and possibly new Linux users, since I'm using the Linux command-line interface (CLI), exclusively. I chose to use the text-based user interface (TUI) because it's faster and I have more control, and besides, I started with the CLI/TUI when I first learned Linux, almost 16 years ago. For me, this is the only way to go, to make a dedicated speech synthesizer. Besides, Raspbian OS can be considered as embedded Linux, not necessarily competing with the Linux Desktop counterpart.

My Goal: Dedicated Hardware

While many people would never think of the Raspberry Pi as a dedicated speech synthesizer, there should not be any reason as to why it can't beat a $449 Talking Keyboard.

Strange BBC Micro Fact

"The BBC Micro incorporated the Texas Instruments TMS5220 speech synthesis chip." - Wikipedia, Speech Synthesis - Others

My Speech Synthesis Hobby

During the 1980's, I was fascinated with KITT from Knight Rider because of a car that can talk. Also, when I visited the home of my Sunday School teacher, he had the Texas Instruments TI-99/4A computer, and it could talk. And then, for my high school graduation (1988), my dad bought me the Amiga 500, and I quickly discovered it can talk (AmigaOS Speech Synthesis). I've been practicing computer-generated speech synthesis, almost on a regular basis since then, but with different computers, of course.

Festival and Flite

One of my long-term favourite speech synthesis software applications, is festival, and I have just started using flite. Since I have some familiarity with the Scheme programming language (I'm more familiar with Common LISP), it relatively easy for me to use festival (top picture).


My use of Phonemes

Typically, when I speak of speech synthesis projects, people generally presume I'm talking about text-to-speech (TTS) exclusively. But when they discover I'm able to get a speech synthesizer to speak an unpopular foreign language, such as Filipino/Tagalog (since I'm in Philippines) [for over a decade with Dragon Naturally Speaking], they begin to wonder how I can do it. I use phonemes to make the speech synthesizer speak unpopular words (Tagalog words), better. A few times, I had the assistance of a Tagalog language instructor in Seattle, to aid me in getting the right phonemes for synthesizing. I actually use a form of phonetic transcription. Traditionally, TTS systems (whether hardware or software based) presume the input language is English. Though there are Spanish phonetic dictionaries available, a TTS system just can't speak Filipino. For those that don't already know, I've been using phonemes for speech synthesis since my speech synthesis started (1980s), so it is nothing new to me.

What? No eSpeak?

There are many reasons why I chose not to use eSpeak on my Raspberry Pi. For one, eSpeak requires X11 to be installed:


eSpeak is probably the choice for beginners since it requires a GUI to function, therefore, I chose not to install it in my Raspberry Pi project.

Why I'm not using X11...

The on-board audio filter, on the Raspberry Pi, is not enough when the GPU is running video syncs. What typically results, is distorted audio, such as hearing pops. In my configuration, I removed all X11-related apps, including the desktop environment (LXDE). Without X11, speech synthesis is greatly improved, even at 700 MHz. There's no need to run at faster speeds, besides, I'm only allocating 4 MB of RAM to the GPU. Since the CPU is used for processing speech, the use of the GPU is no longer necessary. I'm using externally powered stereo speakers and the speech sounds very realistic!

The Magic: amixer cset numid=3 1

Changing the alsa mixer to use the speakers, instead of HDMI, improves speech audio performance, since the GPU is no longer interfering with the audio circuits.

Optimum Goal: Microcontroller Interfacing

What? Did you think I'm going to stop at the OS level? Well, I'm not. Nearly all of my Atmel AVR, Atmel ARM and Microchip PIC projects have serial ports, which means, it can be interfaced to my Raspberry Pi. Beyond that, I could probably chain-link microcontrollers together using I²C, expanding the capabilities of my Raspberry Pi as a dedicated speech synthesizer.

Planned: Compiling from Source Code

At the present moment, I'm running pre-compiled binaries for speech synthesis. Soon, as I intend on expanding my project, I will be compiling from source code that I modify, myself. Maybe I want to add new command-line options...

Planned: Artificial Intelligence

Artificial Intelligence (AI) has also been one of my hobbies since the 1980s; also inspired by KITT from Knight Rider. I rarely talk about this hobby as most people will think it's boring, and also, my AI techniques don't really work with techniques of others. Soon, my AI projects will be incorporated into my Raspberry Pi, focusing primarily on domestic and IT security (cybersecurity), and will output data, verbally as if it was human. [A security guard that learns and never needs to sleep.]

Planned: AI-based Automation

Automating stuff in Linux is very easy for me to; it's just a matter of running scripts. But, I'm going beyond that. How about scripts that vary depending on environmental changes? Sure enough, I will have my Raspberry Pi check my Google Mail and Google Calendar, but, just like me (as human), I don't do that all the time. Though I will have a talking clock that runs autonomous to the point when my room temperature is too warm, it will go silent presuming I'm sleeping. Most speech synthesis actions might require me to press an ACK button to acknowledge, but with AI-based automation, that ACK button may activate on its own, based on the principals (intelligence) I've programmed it. (Think of SARAH from Eureka...)

Operational: Secure Shell Login

Currently, I can login using Secure Shell and run festival and flite, outputting speech to my external speakers. I have also used ConnectBot on my Android Gingerbread Dual-SIM smartphone to login as well.

Thanks for Reading!

Not everyone is interested in speech synthesis. Since I will probably never have a SpeakJet IC anytime soon, making my Raspberry Pi into a dedicated speech synthesizer is the next best thing. My intention is to run it behind-the-scenes, whereas it's not the focus of attention. Though present operations may have seemed to be science fiction in the past, future operations might seem too science fiction, nowadays. (My wife says my optimum goals will make me, lazy.) Beyond LEDs blinking and reading LCD screen, I still prefer hearing the human voice, whether human or machine generated; except when its from a telemarketer or a collection agency. (Yes, I pissed off a collector that used an automated voice calling machine to call me; he hung up on my AI-based talking answering machine.) Making the Raspberry Pi into a dedicated speech synthesizer is not for everyone, especially when I plan on including AI-technology. I know there are many Raspberry Pi owners that have experimented with speech synthesis, but I don't expect anyone to do what I'm doing, unless they plan on helping the handicapped and/or making profits. As for me, I am only doing this as a hobby, not as a profession. If someone wishes to profit from my project, please give me [some] credit, preferably of monetary value. Thanks, again for reading, and have a nice day!


Marcos "Kuya Marc" Miranda


P.S. - I don't have the necessary bandwidth, nor a stable broadband internet connection to upload a video of my speech synthesis projects.

P.P.S. - I do have a text file of the Debian packages (dpkg -l) being used in my Raspberry Pi Speech Synthesis project. It's available upon request.