Vision ThingEnter Your Project for a chance to win an Oscilloscope Grand Prize Package for the Most Creative Vision Thing Project!Back to The Project14 homepage Project14 Home Monthly Themes Monthly Theme Poll

# 1 Introduction

As a follow-up to Learning Verilog with the Digilent Cmod S7 blog post, and as part of the Project14 | Vision Thing: Beaglebone AI Your Vision Thing Project! I decided to use an FPGA to design a GPU to draw on a vector display [1]. Vector displays in contrast to raster displays can have the electron beam deflected in any arbitrary form through the control of their X and Y coordinates. One example of this type of display is the Tektronix 4051 [2], [3]:

[Dalby Datormuseum]

Another example of displays that use this technology are cathode ray tube (CRT) oscilloscopes:

As you may guess, that's the analogue oscilloscope that I'm going to use as a vector display, it's very old, so I hope it will make it until the end of the contest. You may think at this point that this technology is dead, and in the form of electron beams it somewhat is, but not in the form of lasers yet! Laser scanners [4] use galvanometers to tilt mirrors and deflect lasers:

[Wikipedia]

Laser scanners are used in laser shows, laser engravers, 3D printers and LIDARs among other things.

In this Project14 I'll experiment with vector display rendering. I'll mostly use an FPGA to perform low level graphics as their timing is very predictable and its clock frequency very fast.

# 2 Fixed Point Numbers

One of the first things I needed for the project was a real number representation. Verilog does not support real numbers, so they must be implemented by the designer. The most common ways to represent real numbers are floating-point [5] and fixed-point numbers [6]. Floating-point numbers are what floating-point units (FPU) use, these numbers contain a sign bit, mantissa bits and exponent bits. There are standards like the IEE754 [7] that have standardized floating-point number formats such as the single precision (32 bit) and double precision (64 bit), but their implementation in hardware is quite complex and slow so I discarded them. Fixed-point numbers are on the other hand relatively easy to implement. Fixed-point numbers are treated internally as integers, except that they represent the numerator of a defined power of 2 denominator. Let’s take a closer look at how fixed-point numbers represent real numbers.

There are many notations to specify fixed-point numbers, but I’ll use the Q-notation [8]. A Q5.10 fixed-point number represents a 16-bit number that contains an implicit sign bit, 5 integer bits and 10 fractional bits. Q15 represents a 16-bit number with an implicit sign bit, no integer bits and 15 fractional bits. A “U” prefix indicates that there is no implicit sign bit, such as in the 16-bit fixed-point UQ6.10 and UQ16. UQ6.10 represents a 6 integer bits and 10 fractional bits fixed-point number, while UQ16 represents a 0 integer bits and 16 fractional bits fixed-point number.

It might be easier to understand how to work with these numbers with some examples:

The Q3.4 number “0011.0111” represents the number 55 / (2 ** 4) or 55 / 16 (or 110111 / 10000 in binary) or 3.4375. One way to think of fixed-point numbers is as the integer numerator of a fixed denominator.

To perform an addition, you would just add the numbers, it doesn’t matter where the fractional digits begin, the result is would be the same. For the FPGA (or a CPU) its just an integer addition. Here is an example:

0011.0111 + 0001.0110 = 0100.1101 would be in decimal 3.4375 + 1.375 = 4.8125 or (55 + 22) / 16 = 77 / 16. It’s the same when working with signed values as it’s a two’s complement [wiki].

Multiplications are a bit different:

0000.1000 * 0010.1100 = (8 / 16) * (44 / 16) = (8 * 44) / (16 * 16) = (8 * 44 / 16) / 16 = 22 / 16 = 1.375

As we can see, now it matters the number of fractional bits. So, to perform multiplications you treat the fixed-point numbers as integers (8 and 44 in the previous example) but after multiplying you must divide by 2 to the power the number of fractional bits (2**4 in the previous example).

It may look as if divisions and multiplications by powers of 2 are computationally expensive, but they are not, you can perform them using logical [9] or arithmetic shifts [10], which are bitwise operations that shifts the bits of the operand.

Subtractions and divisions are not much different to additions and multiplications, but other operations, such as logarithms, powers or trigonometric functions are much harder to compute.

# 3 FPGA DAC

To draw in the oscilloscope display, I need 2 analog signals (or 3 if I want to use the “Z-axis”). Since the FPGA does not come with a DAC (although it comes with an ADC) I needed an external DAC. Probably the simplest solution would have been to use a DAC IC, but that would've been boring, so I decided to build my own DAC. There are a couple of ways to do this, but I decided to build a ΔΣ DAC [11], [12] . The principles of ΔΣ ADCs and DACs are the same. A continuous signal is converted (usually) to a 1-bit signal at a much higher sampling rate. The oversampled digital signal contains the original signal and a lot of high frequency noise, as result of noise shaping [13]. This digital signal signal can then be low-pass filtered to remove the noise and recover the original signal.

I implemented the modulator like this:

Through algorithms I generate two fixed-point values per clock tick, these values correspond to the X and Y axis of the electron beam position. Each of the values is fed into their own ΔΣ DAC and the output into the oscilloscope.Most of the ΔΣ DAC components run in the FPGA fabric, with the exception being the low-pass filter (ΔΣ ADCs are the other way around).

ΔΣ modulation can be implemented in many ways, but it’s a good practice to start simple and increase complexity as needed. I decided to use a first order ΔΣ modulator with 1-bit output and an RC low-pass filter. The sampling frequency and RC values will be tuned later as needed and depending on the FPGA design.

The Verilog ΔΣ implementation is very simple:

module DeltaSigma(clk, in, out);
parameter bits = 16;

input wire clk;
input wire [bits - 1 : 0] in;
output reg out;

reg [bits : 0] sum = 0;

always @(posedge clk)
begin
sum = sum + in;

if (sum[bits])
begin
sum[bits] = 0;
out = 1;
end
else
out = 0;
end
endmodule



To test it, I built a triangle wave generator:

module Triangle(clk, phase, wave);
parameter phaseBits = 12;
parameter waveBits  = 12;

input  wire clk;
input  wire [phaseBits - 1 : 0] phase;
output reg  [waveBits  - 1 : 0] wave = 0;

always @(posedge clk)
begin
if (phaseBits <= waveBits + 1)
wave = phase[phaseBits - 3 : 0] << (waveBits - phaseBits + 1);
else
wave = phase[phaseBits - 3 : 0] >> (phaseBits - waveBits - 1);

case (phase[phaseBits - 1 : phaseBits - 2])
2'b00:
wave = wave | (1 << (waveBits - 1));
2'b01:
wave = wave ^ {(waveBits - 1){1'b1}} | (1 << (waveBits - 1));
2'b10:
wave = wave ^ {(waveBits - 1){1'b1}};
endcase
end
endmodule


The code (made available in [14]) produced the following 1 kHz waveform in my scope:

Note that the waveform spikes are a bit round, this occurs because the low-pass filter bandwidth is not high enough and reduces the high frequency harmonics of the waveform.

# 4 FPGA ODE Solving

## 4.1 An Introduction to the Lorenz System

Having the ΔΣ DAC properly working I decided to go for something more challenging. Lissajous curves [15] have already been shown too many times on oscilloscopes, so I decided to solve ordinary differential equations [16] in the FPGA. One interesting ODE system is the Lorenz system [17], developed by Edward Lorenz while studying atmospheric convection, which can displays chaotic behaviour [18] with certain parameter values. The ODE system is defined as:

Where σ, β and ρ are the system parameters. Before writing Verilog code, I empirically explored how the system behaves. I set the system parameter to values that are known to produce chaotic behaviour (σ = 10, β = 8/3, ρ = 28), and recorded for a long time span the solution trajectories to empirically measure the position and radius of the smallest sphere that encloses them. The sphere was found to be approximately at (0, 0, 24.5) with an approximate radius of 34. I wrote a small Python program to plot the attractor from 3 different viewpoints and get a rough idea on what to expect from the FPGA implementation:

import sys
import numpy
import scipy.integrate as integrate
import matplotlib.pyplot as pyplot

sigma = 10.
rho   = 28.
beta  = 8./3.

dt = 0.00001
arraySize = 100000
loopIterations = 50

resolution = 1920
gamma = 0.25

def lorenz(xyz, t):
x, y, z = xyz
x_dot = sigma * (y - x)
y_dot = x * rho - x * z - y
z_dot = x * y - beta * z
return [x_dot, y_dot, z_dot]

imageXY = numpy.zeros((resolution, resolution))
imageYZ = numpy.zeros((resolution, resolution))
imageZX = numpy.zeros((resolution, resolution))

t = numpy.arange(arraySize) * dt

initial = [8, 8, 23]
solution = integrate.odeint(lorenz, initial, t)
initial = list(solution[-1, :])

for i in range(loopIterations):
print('.', end = '')
sys.stdout.flush()

solution = integrate.odeint(lorenz, initial, t)[1:]
initial = list(solution[-1, :])

x = (( solution[:, 0]         * 0.015 + 0.5) * resolution)
y = (( solution[:, 1]         * 0.015 + 0.5) * resolution)
z = (((solution[:, 2] - 23.5) * 0.015 + 0.5) * resolution)

for i in range(arraySize - 1):
xi = int(x[i])
yi = int(y[i])
zi = int(z[i])

imageXY[yi, xi] = imageXY[yi, xi] + 1;
imageYZ[zi, yi] = imageYZ[zi, yi] + 1;
imageZX[xi, zi] = imageZX[xi, zi] + 1;

imageXY = imageXY ** gamma
imageYZ = imageYZ ** gamma
imageZX = imageZX ** gamma

colorMap = pyplot.cm.afmhot

normalizeXY = pyplot.Normalize(vmin = imageXY.min(), vmax = imageXY.max())
normalizeYZ = pyplot.Normalize(vmin = imageYZ.min(), vmax = imageYZ.max())
normalizeZX = pyplot.Normalize(vmin = imageZX.min(), vmax = imageZX.max())

rgbImageXY = colorMap(normalizeXY(imageXY))
rgbImageYZ = colorMap(normalizeYZ(imageYZ))
rgbImageZX = colorMap(normalizeZX(imageZX))

pyplot.imsave('LorenzXY.png', rgbImageXY, origin = 'lower')
pyplot.imsave('LorenzYZ.png', rgbImageYZ, origin = 'lower')
pyplot.imsave('LorenzZX.png', rgbImageZX, origin = 'lower')


And the code output is:

## 4.2 2D Lorenz System FPGA Implementation

The Verilog implemention uses 4 modules: Lorenz, Translate2, Scale2 and DeltaSigma, as it can be seen in the schematic:

I used the 12 MHz clock in the whole design. The Lorenz module uses Euler's method [19] to solve the ODE. It takes a dt parameter and outputs a 3D vector. Internally the Lorenz modules uses Q6.25 fixed point numbers and the parameters σ = 10, β = 8/3 and ρ = 28. I only take the X and Z components of the 3D vector and linearly transform them in 3 steps. First the 2D (X, Z) trajectories are centred to the origin (0,0) (translate1), then they are scaled (scale) so that the components of the vector fall within -0.5 and 0.5, and finally translated (translate2) so that they fall within 0 and 1. The linear transformations were implemented with a focus on simplicity and tweaking, if FPGA resources were limited, the transformations could have been implemented in a single step or with transformation matrices [20]. Finally the output of of the linear transformations is fed into two ΔΣ modulators which are connected to two RC low-pass filters.

This is the code (also made available in [21]) :

module Waveform(clk, btn, led, rgb, pinX, pinY);
localparam integerBits    = 6;
localparam fractionBits   = 25;
localparam totalBits      = 1 + integerBits + fractionBits;
localparam dtBits         = 20;
localparam dtShift        = 32;
localparam deltaSigmaBits = 16;
localparam sigma = $rtoi( 10.0 * (2.0 ** fractionBits)); localparam beta =$rtoi((8.0 / 3.0) * (2.0 ** fractionBits));
localparam rho   = $rtoi( 28.0 * (2.0 ** fractionBits)); input wire clk; input wire [1 : 0] btn; output reg [3 : 0] led = 4'b0000; output reg [2 : 0] rgb = 3'b111; output wire pinX; output wire pinY; reg signed [dtBits - 1 : 0] dt = 0.0001 * (2.0 ** dtShift); wire signed [totalBits - 1 : 0] x0; wire signed [totalBits - 1 : 0] y0; wire signed [totalBits - 1 : 0] x1; wire signed [totalBits - 1 : 0] y1; wire signed [totalBits - 1 : 0] x2; wire signed [totalBits - 1 : 0] y2; wire signed [totalBits - 1 : 0] x3; wire signed [totalBits - 1 : 0] y3; Lorenz # ( .integerBits(integerBits), .fractionBits(fractionBits), .dtBits(dtBits), .dtShift(dtShift), .sigma(sigma), .beta(beta), .rho(rho) ) lorenz(.clk(clk), .dt(dt), .x(x0), .y(), .z(y0)); Translate2 #(.bits(totalBits)) translate1(clk, x0, y0, 0,$rtoi(-23.5 * (2.0 ** fractionBits)), x1, y1);
Scale2 #(.integerBits(integerBits), .fractionBits(fractionBits)) scale(clk, x1, y1, $rtoi(0.015 * (2.0 ** fractionBits)),$rtoi(0.015 * (2.0 ** fractionBits)), x2, y2);
Translate2 #(.bits(totalBits)) translate2(clk, x2, y2, $rtoi(0.5 * (2.0 ** fractionBits)),$rtoi(0.5 * (2.0 ** fractionBits)), x3, y3);

DeltaSigma #(.bits(deltaSigmaBits)) deltaSigmaX(clk, x3[totalBits - 1 : fractionBits - deltaSigmaBits], pinX);
DeltaSigma #(.bits(deltaSigmaBits)) deltaSigmaY(clk, y3[totalBits - 1 : fractionBits - deltaSigmaBits], pinY);
endmodule

module Lorenz(clk, dt, x, y, z);
parameter integerBits  = 6;
parameter fractionBits = 25;
parameter dtBits       = 16;
parameter dtShift      = 32;
parameter signed [integerBits + fractionBits : 0] sigma  =        10.0 * (2.0 ** fractionBits);
parameter signed [integerBits + fractionBits : 0] beta   = (8.0 / 3.0) * (2.0 ** fractionBits);
parameter signed [integerBits + fractionBits : 0] rho    =        28.0 * (2.0 ** fractionBits);

localparam totalBits = 1 + integerBits + fractionBits;

input  wire clk;
input  wire signed [dtBits    - 1 : 0] dt;
output reg  signed [totalBits - 1 : 0] x =  8.00 * (2.0 ** fractionBits);
output reg  signed [totalBits - 1 : 0] y =  8.00 * (2.0 ** fractionBits);
output reg  signed [totalBits - 1 : 0] z = 27.00 * (2.0 ** fractionBits);

reg signed [totalBits * 2 - 1 : 0] dxdt = 0;
reg signed [totalBits * 2 - 1 : 0] dydt = 0;
reg signed [totalBits * 2 - 1 : 0] dzdt = 0;

always @(posedge clk)
begin
dxdt = (sigma * (y - x)) >>> fractionBits;
dydt = ((x * (rho - z)) >>> fractionBits) - y;
dzdt = (x * y - beta * z) >>> fractionBits;

x = x + ((dxdt * dt) >>> dtShift);
y = y + ((dydt * dt) >>> dtShift);
z = z + ((dzdt * dt) >>> dtShift);
end
endmodule

module Scale2(clk, xIn, yIn, xScale, yScale, xOut, yOut);
parameter integerBits   = 6;
parameter fractionBits  = 25;

localparam totalBits = 1 + integerBits + fractionBits;
localparam multiplicationBits = totalBits + fractionBits;

input  wire clk;
input  wire signed [totalBits - 1 : 0] xIn;
input  wire signed [totalBits - 1 : 0] yIn;
input  wire signed [totalBits - 1 : 0] xScale;
input  wire signed [totalBits - 1 : 0] yScale;
output reg  signed [totalBits - 1 : 0] xOut = 0;
output reg  signed [totalBits - 1 : 0] yOut = 0;

wire signed [multiplicationBits - 1 : 0] x = (xIn * xScale) >>> fractionBits;
wire signed [multiplicationBits - 1 : 0] y = (yIn * yScale) >>> fractionBits;

always @(posedge clk)
begin
xOut <= x;
yOut <= y;
end
endmodule

module Translate2(clk, xIn, yIn, xTranslation, yTranslation, xOut, yOut);
parameter bits = 32;

input  wire clk;
input  wire signed [bits - 1 : 0] xIn;
input  wire signed [bits - 1 : 0] yIn;
input  wire signed [bits - 1 : 0] xTranslation;
input  wire signed [bits - 1 : 0] yTranslation;
output reg  signed [bits - 1 : 0] xOut = 0;
output reg  signed [bits - 1 : 0] yOut = 0;

always @(posedge clk)
begin
xOut <= xIn + xTranslation;
yOut <= yIn + yTranslation;
end
endmodule

module DeltaSigma(clk, in, out);
parameter bits = 32;

input wire clk;
input wire [bits - 1 : 0] in;
output reg out = 0;

reg [bits : 0] sum = 0;

always @(posedge clk)
begin
sum = sum + in;
out = sum[bits];
sum[bits] = 0;
end
endmodule



And here is the video of the output:

## 4.3 3D Lorenz System FPGA Implementation

The previous design shows the Lorenz system trajectory, but the system evolves too fast to actually let us to appreciate its evolution. Moreover, using a fixed viewpoint it is not possible to appreciate the 3D shape of the trajectories. These two issues are what I tried to solve next. A naive approach to solve the first issue would be to reduce the dt, so that the system evolves slower, the problem of this approach is that it would make trajectories shorter or even reduce them to a single point, as there would be no phosphor persistence [22] or persistence of vision [23]. I solved this by redrawing the trajectory multiple times and slowly advancing the initial position of the trajectory on each redraw. The second issue was solved by continuously 3D rotating the trajectories.

The design uses the modules ClockWizard (by Xilinx), Lorenz (an updated version), Translate3, Scale3, SinCos, Rotate2, Translate2 and DeltaSigma:

ClockWizard is used to generate two clock signals, one at 25 MHz (clkSlow), and another at 250 MHz (clkFast), the slow signal is used everywhere except in the delta sigma modulator. The updated version of the Lorenz module redraws the trajectory once every 2^18 clkSlow oscillations, and at every redraw, the trajectory advances . It is also worth noting that this time we use all 3 components of the 3D vector. Analog to the 2D Lorenz system, the 3D vector is first translated (translate1) and then scaled (scale), but before getting to the final translation (translate2) I perform to rotations around 2 axis. The rotations angles vary continuously, depending on the values of the xyPhase and yzPhase accumulators. These values are fed into two SinCos modules, that uses a look-up table and a linear interpolation to compute the sine and cosine of the angle. the sine and cosine values are used by the Rotate2 modules to perform the rotations. Finally, just like in the 2D Lorenz system design, the X, and Y values are fed into delta sigma modulators connected to low-pass filters which are probed by the oscilloscope.

Code is available in Github ([24]), and this is the video of the output:

## 4.4 Bonus Track

My initial plan for this project was to use a Zinq to perform high level operations in the programmable system (PS), while leaving linear transformations, interpolations and ΔΣ modulation to the programmable logic (PL). I didn't have a Zinq so I applied to the path II programmable program and to a Minized giveaway at that time, I missed both chances of getting a Zinq so I decided to stick to my Cmod S7 and explore what I could do with it. Next I'll show you what I could not do with the little FPGA, but I could using two different suboptimal approaches. So what did I do? I played a video on the oscilloscope! I didn't turn the oscilloscope into a CRT TV (which has already been done many times), but converted the video to a set of contour lines which the oscilloscope could draw in the XY mode.

The process can be separated in two phases, video preprocessing and video display. The preprocessing of the video itself can also be separated into multiple phases. First I conditioned the image for edge detection, then I applied a Canny edge dectector [25] (all this was done using OpenCV). The painful part came next, which was the conversion of the edge raster image into a set of lines which the oscilloscope could draw. The optimal electron gun track that reduces the travelled distance needed to draw all lines is computationally very expensive, so I applied a lot of heuristics to get an acceptable result. The processes begins with the segmentation of the edge image into blocks of contiguous pixels, for each of these blocks I tried to extract the longest line that I could make out of its pixels, and repeated that until no more pixels were left. These lines represent edges of a graph and are used to build a single track for the electron gun to travel.

Tracks had on average of ~8k points (or pixels), but varied a lot between frames, so in some way this had to be compensated so that every frame takes the same amount of time to draw. The quickest way to see the result was to generate a stereo audio signal and feed it into the oscilloscope:

Vectorization of Pink Floyd's "What shall we do now?" played on sound card

The sound card DAC has high resolution (usually 24 bit), but its low ~22 KHz bandwidth and inability to pass low frequency make it less than ideal. I found that I need a refresh rate of ~40 Hz to produce a flicker-free (to the eye) image on the oscilloscope, but too reach such refresh rate I need to drop the details of the image, on the other hand keeping them generates a notorious flicker. The inability to pass low frequencies produces a constant jiggling of the image, which was specially notorious at the end of the video.

I have an ESP32, which comes with two 8-bit DACs that don't have the bandwidth and band-pass limitations of the sound card, so I decided to give it a try. As the ESP32 can't keep in memory the complete video sequence, I decided to feed each frame to the MCU through its serial port (set to 921600 baud), but this is still is not fast enough for real-time video rendering. The ESP32 program uses double buffering, while one buffer is being displayed, the other is receiving the next "frame" track through the serial port. The accelerated the playback of the video looks like this:

Vectorization of Pink Floyd's "What shall we do now?" played on ESP32

The low 8-bit resolution is quite noticeable, and a low-pass filter can reduce the pixelation of lines, but it also makes long distance electron gun "jumps" more visible. The point glows occur when the ESP32 receives data from the serial port, which I suspect may be avoidable using one core for the DAC and another for serial communication.

At this points it may be obvious the reason why the little Cmod S7 could not be used here, and that reason is memory! Which does not mean it can't be solved in some way, like by adding SRAM, but I would have probably missed the deadline if I tried that...

# 5 Final Words

I showed a couple of techniques that I hope others interested in FPGA would find useful. I showed how to build a low-cost ΔΣ DAC, that just requires a single pin, a ground and a low-pass filter. I also showed how fixed point numbers can be used to perform real number operations, such as computing the sine and cosine, 3D linear transformations such as rotations, translations and scaling, and also ODE solving through Euler's method. And maybe more important, I showed how we can use all these techniques to 3D render a Lorenz attractor on a CRT oscilloscope. As an alternative to the Zinq I used a sound card and an ESP32 to trace the contours of a video, most of the time was spent coding the electron gun track generator, and to a lesser extent coding the ESP32 and PC that fed the ESP32 with the frame tracks.

In the FPGA projects most of the time was spent on debugging, so it's worth mentioning how it was perfomed. Simple bugs just required me to simulate a few time steps of the top module as these bugs didn't depend on any particular state of the Lorenz system and could be detected at any moment of the simulation. More complex bugs were related to specific states of the system, to find them I had to simulate a few milliseconds, or in the worst case up to a few seconds (which of course took several minutes in the simulator). Bugs only in very few cases could be found by checking the simulator waveforms, in most cases I had to store them in a (very large) file and then analyze them in Python. I coded many small programs to catch different particular bugs I faced during development, but I also coded a Python version of the Verilog code that I could use to test as much as possible in Python without having to simulate or program the FPGA, and a "software oscilloscope", that is, a program that would read the simulated output and generate the oscilloscope image. As you may guess debugging was very time consuming!

During the development there was one particular bug that took me ages to find, and the reason for that was that it worked perfectly fine on the simulator but failed on the FPGA. To find the bug I began removing parts of the code to see if what was left worked as it should, it happened that I ended up removing alsmot everything, and the bug didn't dissapear. I thought maybe it it had to do with some obscure timing issue, but I couldn't find anything to blame for timing bugs. This was of course very frustrating, but at some point I found a minor difference between my buggy code and another Lorenz Attractor I wrote some time ago that just worked. That minor difference caused the simulator to run just fine, but fail on the FPGA. I found that Vivado's simulator and synthesizer Verilog implementation differ! Letme show you what it found with an example:

The code:

reg [7 : 0] x = 123.456;


will set x to 123 on the simulator and synthesizer, but:

wire [7 : 0] x;
Increment inc(clk, 123.456, x);

module Increment(clk, xIn, xOut);
input  wire clk;
input  wire [7 : 0] xIn;
output  reg [7 : 0] xOut;

always @(posedge clk)
begin
xOut = xIn + 1;
end
endmodule


will set x to 123 on the simulator, and to 119 on the synthesizer. What the 19 is, I don't know, but its likely some binary real number representation.

I have found other differences between the simulator and synthesizer Verilog implementations, for instance, in the last example if we change line two to "Increment (clk, 123.456, x);", the simulator will complain, but the synthesizer won't! So the message to anyone interested in developing on FPGAs is it is hard and time consuming!

There many other variations I thought about exploring during the development, such as adding perspective projection, affecting the animation with sound (music) or controlling the animation through a softcore CPU, but what I showed you here is what I managed to do for this Project 14.