- 1. Motivation
- 2. FM transmitter
- 3 Light show
- 4 Summary
- 5 References
I've recently been playing with a tiny FPGA and posting my experiences with the technology (Learning Verilog with the Digilent Cmod S7, Vector Display GPU Project), so I thought it would be interesting to continue testing the FPGA on different projects. This time I decided to build a project that would be in-between two P14 themes: Xmas and RF. The plan was to produce an appealing audiovisual show where Xmas music would be FM broadcasted and light effects would synchronize to the music. So why would I want to broadcast FM instead of just connecting a single cable to the line-in of the audio equipment? The main reason is that FM broadcasting being wireless can be heard anywhere in the covered radius as long as one has an FM receiver. For the same reason multiple receiver can receive the same signal, which could be handy for instance to cover a great area with the same audio signal without using any wires.
This project is separated in 2 parts: the software-defined FM transmitter and the light show. Lets begin!
2. FM transmitter
I decided to focus first on the most complex part of the project: the software-defined FM transmitter! The FM transmitter is divided in 3 parts: the FPGA input, the FPGA output, and the frequency modulation and generation of the stereo baseband signal.
2.1 Transmitter Input
The Digilent Cmod S7 comes with very little memory to store digital audio, so the audio must come from an external source. I saw 2 readily available ways to interface it to an audio source:
One alternative was to the take left and right channel analog audio to the FPGA ADC hardware IP (XADC) and multiplex the signal to generate the FM baseband, the other was to to stream it through the USB virtual serial port to the FPGA through the FTDI FT2232H.
The XADC appeared to be the quickest way to get audio to the FPGA, and was what I first attempted. But even though Vivado provides wizards to set it up, it took me quite a bit of effort and several reads of the user guide to get it working, as some minor details were not very clear and required some guesswork. XADC contains 2 12-bit 1 MSPS ADCs:
CD audio has a resolution of 16-bits and a rate of 44.1 kHz, so I was short on a few bits of resolution. Part of the missing ADC bits can be compensated through oversampling, but I still would not get to 16 bits. It still would have been OK, except because of 2 more drawbacks: Digilent implemented the 2 analog pins in a way that they can't be sampled at the same time with the 2 ADCs, and the lack of an external voltage reference degrades the sampling even further. For these reasons, and because I was aiming for the highest fidelity, I ended up discarding the XADC as audio input.
2.1.3 USB Virtual COM
By using the USB virtual serial port I would not have to care about ADC resolution and analog signal conditioning of course, but this approach came with its own challenges and I initially had many doubts about it:
- Is the virtual serial port fast enough to transmit CD quality audio?
- How do I avoid buffer under- or overflow?
- How fast do I need to run the FPGA clock to process the UART Rx line?
- Is the bit error rate (BER) significant to the point that I need tackle it?
To transmit stereo CD quality audio I need a payload bit rate of 1,411,200 bps (44.1 kHz x 16 bit x 2), this requires at least a 1,764,000 bps (1,411,200 / 8 x 10) link when using 8-bit-no-parity-1-stop-bit (8N1) serial port settings. To test if the FPGA board could handle that bit rate I wrote a simple loop-back Verilog program that just wired the UART TX to the RX:
Next I built a small Python program that sends serial data and compares if it matches the data received from the FPGA loop-back. I found that I could transmit as fast as 12Mbit/s, much faster than what I required to transmit high quality audio.
Buffer under- and overflow avoidance
The signal must be transmitted at certain rate, a tiny mismatch between the rate the signal is fed into the FPGA and the rate it is being broadcasted would cause an under- or overflow, and this would eventually cause an audible glitch. The way that this is usually handled is through flow control, which can be implemented in either hardware or software. The FT2232H supports supports flow control, but it is not implemented in the FT2232H-FPGA UART link. So I thought I would have to implement my own software-based flow control algorithm, but the FPGA data digestion occurred so fast that if the PC waited for the FPGA to request more data, the FT2232H TX buffer would be empty before the PC even received the request. Then I thought maybe a bit of control theory could be applied to solve the problem, the plan was to continuously transmit data to the FPGA, who would continuously inform the PC how filled the buffer is, so that it can slowly tweak the transmission rate to avoid under- or overflowing the buffer. Of course, implementing such software flow control system would have taken me quite a bit of effort, so I looked for a simpler solution.
I found that the FT2232H and the FPGA both share the clock signal, so unless the FT2232H behaved in some unpredictable way, it should stream data at a constant rate in terms of FPGA clock cycles. This simplified the problem of how to stream the audio a lot, now all what was needed was to keeping the FT2232H UART Tx buffer as full as possible so that it does not underflow. Overflow is not an issue because the PC-FT2232H link is flow controlled.
Before getting into the implementation of the receiver lets begin with a quick introduction of how the UART works. The UART in contrast to I2C or SPI doesn't use a clock signal. The baud rate is usually manually set by the receiver to match the transmitter, commonly used baud rates are 2400, 9600 and 115200 bps, but the UART can be set at any arbitrary rate. The most popular setting is the 8N1, which means that for every 8 bit, 1 start bit, 1 stop bit and no parity bit are sent, that is, a payload of 8-bits requires 10-bits. Lets take a closer look on how an UART signal looks:
When the line is in idle, that is, no data is being transmitted, the line stays at 1. The transmission of a byte begins with a start bit, which is always 0. The moment the receiver detects a falling edge it knows a byte is about to be sent. The receiver samples the bits right in the middle (in terms of time) of every bit so that if the transmitter and receiver clocks run at slightly different speed, the bits still get sampled correctly. The last bit, called "stop bit" is always 1, and no matter what is being sent through the serial port, there will always at least 2 transitions for every byte sent. In this way the receiver can synchronize itself to the start bit falling edge to correctly sample the bits of the incoming byte.
Lets dive deeper into how UART receivers usually read serial data. Traditionally the reception of UART data requires the receiver to run the clock at x16 times the baud rate. The moment the receiver detects a falling edge, it waits 8 clock cycles to sample the midpoint of the start bit to make sure that the detected falling edge was the beginning of a frame and not just noise. If it detects a 0 in the middle of the start bit then it will wait 16 more cycles to sample the midpoint of the first data bit (bit-0), and continue until reading the whole frame.
At a sampling rate x16 times the baud rate we have a timing uncertainty of 1/16 the baud rate, which continuously gets worse from the beginning of the frame until the end of it if the receiver and transmitter clocks run at different speeds because of clock mismatch. If the clock mismatch is low, the receiver will still successfully read the frame, as even though it won't be able to sample exactly at the midpoint of every bit, the error won't cause it to sample a different bit.
Some implementation do fancy stuff like sampling multiple times close to the midpoint to avoid sampling noise, or even perform baud rate auto-detection, but roughly speaking what I just presented is the way UART receivers traditionally operate.
A different UART decoding approach
Even though a UART baud rate of at least 1,764,000 bps would have sufficed for this project, I thought it would be a better plan to aim for 12 Mbit/s in case I later require, for whatever reason, a higher speed. Of course 192 MHz (12 MHz x 16) is a bit fast for the FPGA, and even though the FPGA can probably run a UART receiver at 192 MHz without issues, other parts of the design may struggle to meet the timing constraints. As an alternative the receiver could run at 192 MHz and the rest of the design at a lower speed, but this would make the design more complex and also waste FPGA clocking resources.
So I decided to write an UART module that could receive data at the maximum speed of 12 Mbit/s and required lowest clock speed possible. I found that the lowest speed a which I could reliably decode the UART signal was twice its baud rate speed, or 24 MHz for a 12Mbit/s UART. Let me show you with an example where this value comes from:
The example figure shows what could happen if we don't meet the twice the UART speed requirement. In the first row we see the time line (units are not relevant). The FPGA clock in this example has a period of 2 time units, and samples the UART at every rising edge. The first 3 sampled values are shown, and the switch from a 1 to a 0 indicates that within that interval an UART falling edge was detected, but the exact moment when it occurred can't be known. The "early signal" and "late signal" show the extremes of how early or how late the falling edge could have occurred based on what we know from our samples. For the same reason, depending on where we sample the UART line we could end up sampling a completely different bit than what we intend. The probability of sampling bit-0 and bit-1 are shown in the next plots. If we sample bit-0 too soon, we may end up sampling the start-bit, while if we sample it too late we may end up sampling the bit-1. There is just a small interval where we can be sure that we will sample bit-0, the problem is that this interval is shorter (1) than the FPGA clock period (2). In this example we are lucky and we won't miss sampling the short interval, but in less ideal situations we might! Only when the sampling frequency is at least twice the baud rate we won't miss to sample in the period where the probability of sampling the right bit is 1.
The implementation of this approach requires precise timing, and getting it right took me quite a bit of effort. At slow speeds (close to twice the baud rate) there is no room for mistakes, if sampling occurs one cycle earlier or late the frame gets corrupted. I'll skip the details on how I compute the sampling time, but I'll give a quick overview. The module can compute the right time to sample through 2 approaches: perfect "error free math" or "approximated math" (which I make sure is precise enough to not miss the right sampling time). The "error free math" approach uses fractions and implements a greatest common divisor function (through the Euclidean algorithm), the "approximated math" approach implements a "variation" of fixed point math. Which method is used is automatically evaluated during synthesis based on which approach requires the least amount of resources, and this depends on the clock and baud rate.
Metastability was avoided simply by buffering the incoming bits.
Here is the UART receiver code and its testbench:
To test the UART link I created a design that receives a continuously sequentially increasing byte stream and tests that the bytes don't get corrupted and are delivered at regular amount of clock cycles. I tested it at 12 Mbit/s and with an FPGA clock frequency of 76.8 MHz, and the link didn't fail neither in data integrity nor in timing consistency (I transferred 2 Gb before stopping the test).
Here is the UART receiver testing code and its testbench:
And this is the Python program that continuously sends the sequentially increasing byte sequence.
2.2 Transmitter output
There are multiple approaches that could be used to generate a RF output, a resistor ladder, the SerDes hardware IP, a ΔΣ modulator, etc, but as time was not on my side, I opted for the simplest solution that could the job. I computed the phase of the numerically-controlled oscillator (NCO) at 307.2 MHz and outputted the phase's most significant bit (MSB), which represents the sign of the sine of the phase, to the output pins. An external quarter wave (~70 cm) antenna was used to increase the signal power. Details will be explained later.
2.3 Frequency modulation
As the name implies frequency modulation (FM) works by having a baseband modulate the frequency, mathematically this can be expressed as:
y(t) is the modulated signal.
ƒc is the nominal frequency.
ƒ Δ is the frequency deviation, or the amount that the carrier frequency can change from its nominal frequency. For FM radio broadcasting this is limited to 75 kHz, so each radio station is limited to swinging the carrier frequency from ƒ c−ƒ Δ to ƒ c+ƒ Δ.
x(t) is the baseband signal, its values are limited to [-1, +1]
Let me give you the intuition of how the equation "works". If we remove the ƒ Δx(t) term (or just set x(t) to 0), set ƒ c=100,000,000 and solve the integral we will get y(t)=cos(200,000,000πt), which is a 100 MHz sine wave. With that in mind its easy to see that if we set ƒ Δ=75,000 and x(t)=1 the sine wave will now oscillate at 100.075 MHz, while if we set x(t)=−1 it will oscillate at 99.925 MHz. x(t) is of course not constant, and through its value the frequency of the carrier gets modulated.
In the the early years FM transmissions were simple, they only transmitted mono audio:
x(t) is the mono audio baseband signal
l(t) is the left channel
r(t) is the right channel
In 1961 a stereo system was approved which was backward compatible with mono receivers. The mono signal (L+R)/2 is transmitted as usual, but a difference signal (L-R)/2 is also transmitted in the 23 kHz - 53 kHz band and allows the receiver to separate the left and the right channel through simple math:
The difference signal is modulated with Double-sideband suppressed-carrier (DSB-SC) into the baseband. DSB-SC is similar to amplitude modulation (AM), a carrier signal is multiplied by a baseband signal, but in the case of AM the baseband signal never crosses the zero, while in DSB-SC it does. As result DSB-SC can't be demodulated with an envelope detector, and requires a product detector, which performs demodulation through a mixer. To correctly demodulate the signal, it has to be mixed with the same oscillating signal (in terms of frequency and phase) that was used to modulate it. The 19 kHz pilot tone in a stereo baseband signal has two functions: it indicates the receiver that the broadcast is stereophonic, and by doubling its frequency allows the receiver to reconstruct the 38 kHz carrier signal that was used to modulate the difference signal. This 38 kHz carrier signal is used by the receiver to demodulate the DSB-SC difference signal. It is also worth noting that audio signals are limited to the range of 30 Hz to 15 kHz.
The FM baseband can also contain digital information such as the Radio broadcast data system (RBDS).
This is how the FM baseband spectrum of a stereo, RBDS transmitting station looks in theory:
And this is how it looks from a local FM radio station as I captured it with a software-defined radio (SDR):
Lets put it all together, mathematically the baseband of a stereo FM signal (without RBDS) can be expressed as:
ƒ p represents the pilot frequency (19,000).
2.3.2 FPGA Implementation
My initial plan was to stream both audio channels to the FPGA and let the FPGA multiplex them, but later I decided to multiplex in the CPU and stream the multiplexed signal to the FPGA instead. This considerably reduced development time since debugging, modifying and rerunning Python code is orders of magnitude faster than debugging and resynthesizing Verilog code. The following block diagram shows how the whole system operates:
The CPU multiplexes the channels and generates the baseband, which represents the instantaneous frequency deviation from the nominal FM broadcast frequency. The CPU-generated multiplexed signal has resolution of 24 bit and a symbol rate of 400 ksps, and is sent to the FT2232H who transmits it at 12Mbit/s to the FPGA. The UartRx module converts each 10 bit 8N1 UART frame into a a byte and the Deserializer module deserializes it into 24 bit words. The deserialized word is added to the instantaneous nominal frequency Δθ to generate the NCO instantaneous frequency. The MSB of the 32 bit NCO phase accumulator (which represents the sign of the sine wave) is extracted and sent to the wire antenna.
The FPGA clock domain rates were set so that they are power of 2 times the byte rate of the UART (1,200,000 B/s). This makes it easier to implement FPGA DSP algorithms, although in this particular implementation I didn't implement one. To generate frequencies up to 108 MHz an oscillator must oscillate at least at twice that frequency, I used 307.2 MHz (1,200,000 x 256) on the phase accumulator, and 76.8 MHz (1,200,000 x 64) on the rest of the design (which would not have met the timing constraints if ran at 307.2 MHz).
One last aspect worth mentioning is that the generated RF output is squarish, which means that it contains lots of odd-harmonics. Dithering and filtering can reduce the harmonics power, but at this low transmitting power I just didn't bother.
Here is constraint file:
And the top, the transmitter, and deserializer modules:
2.3.3 CPU Implementation
I wrote a short Python program to stream a 1 kHz tone to the FPGA FM-modulator, and then captured the broadcasted signal with an SDR to finally compute its spectrum and its SINAD of the baseband. The actual SINAD is of course lower, as there is ambient RF noise and the SDR introduce extra noise to the capture. And even though the SINAD doesn't take into account the physiology of the auditory system, it still gives us a hint on how good the audio may sound:
To actually see how the transmitter performs with music I wrote a mono and stereo audio transmitter. Lets skip the details of the mono transmitter and explain how the stereo transmitter operates. The program first reads a WAV file, and then for sake of simplicity, instead of continuously processing and streaming audio chunks, it preprocesses the whole audio file to generate the baseband and then just streams it to the FPGA. The program performs the following steps in order:
- WAV file is read.
- Audio channels are low-pass filtered with 15 kHz cutoff frequency.
- Audio channels are resampled to 400,000 Hz.
- Audio channels are normalized so that they fit in the [-1, 1] range.
- Channels are multiplexed (as shown in 2.3.1).
- Multiplexed signal is conditioned so that it generates a maximum frequency deviation of 75 kHz.
- Multiplexed signal is converted to a stream of 24 bit little-endian symbols.
- Multiplexed signal is streamed to the FPGA.
To stay "legal" I only used tracks from the Youtube's audio library, the drawback is that there is not much high quality content in there. The following videos show the quality of the sound as captured by SDR receiver software (SDR Console):
The sound quality is notoriously inferior to that of the original tracks, but still much better than what I initially expected. Notice also that there is some cracking sound when receiver detects at the beginning of the track that the signal is stereo, this evidently has nothing to do with the quality of the broadcast.
Here is the source code of the test tone, the mono, and the stereo audio broadcasting programs:
3 Light show
The second and easiest part of the project is the music synchronized light show. The complete system looks like this:
The CPU streams data to the FPGA and ESP32 at the same time. Data is sent to the ESP32 in packets that tell the microcontroller what relays to activate and how to set each LED (WS2812) RGB component. Each packet is 37 bytes (12 x 3 + 1) wide, each LED requires 3 bytes total to set its RGB components, there are 12 LEDs in the ring and the 4 relays are controlled through a single byte. The CPU connects to the ESP32 at 921,600 bps, so there is a lot of room to quickly update the light intensity. The LED ring generates 3 rotating waves at different speeds on each of their RGB components while at the same time their intensity is modulated by the volume of the broadcasted music. The relays sadly are not solid state and make noise, for the same reason their switching speed is software limited. The Xmas string lights turn on when broadcasted music volume level is low, while the LED ring does the opposite, and in this way an alternating light effect is created. The volume of the broadcasted music is computed through low-pass filtering the absolute value of the audio waves.
Lets see some images of the system:
The first image shows the FPGA board with a wire antenna connected to a pin. Second image shows the FM receiver tuned to the frequency that the FPGA is transmitting to. Third image shows the ESP32, the LED ring with lens to focus the light on the ceiling, and a 4 relay board. The fourth image shows how the system illuminates the room.
Here are some videos:
And here is the ESP32 and CPU code used:
This was a very fun and interesting project that I really enjoyed working on, it allowed me to mix two completely different themes: RF and Xmas. Probably more than 95% of the time was spent in the RF part of the project, where Verilog coding and its verification took most of the time. I found the RF part of the project quite challenging, and that's the reason that most of the blog es dedicated to it. I spent a great amount of time sharing all the source, putting references, explaining important aspects of the project and drawing diagrams that could help the reader understand the key aspects of the project. I hope the effort was worth it and the blog was detailed enough to allow anybody interested to be able to replicate any part of the project without much difficulty. Thanks for reading!