Skip navigation
1 2 3 Previous Next

FPGA Group

45 posts


I had a great opportunity to test and to review the Terasic new flagship, the DE10-Standard FPGA-SoC board.

We will take a look at what this product has to offer, what is its target group and if it is worth its money.

Jan Cumps

XuLA2 FPGA - Up the Clock

Posted by Jan Cumps Top Member Feb 4, 2017

The XuLA2 standard runs on a 12 MHz clock. That's plenty for many things, but not enough for some designs.

In my PWM with DeadBand  project, for instance, the effective signal frequency that the module outputs is halved for each bit of precision of the duty cycle register.

If you want to have 256 steps between 0 and 100% duty cycle, you need 8-bit precision and your maximum PWM output frequency is 47 kHz. When you need higher PWM frequency with the same duty cycle granularity, you can use a Digital Clock Manager to generate a (much!) faster clock signal for the PWM module.

I need a minimum 1 MHz output for the GaN half-bridge that I'm driving (an LMG5200). The deadband should be a in the range of 8 - 10 ns.

Let's assume that we allow for 2 ns steps (we can then set a deadband of 10 ns by skipping 5 ticks.

2ns means that our input frequency has to be 500 MHz. We can't do that - its beyond the capabilities of the Digital Clock Manager of the Spartan-6.

4ns is doable but a stretch. It'll require an up-sample to 250 MHz.


Only 12 MHz! Now what

That's the title of the Xess VHDL Tutorial chapter that covers this concept. Look there for the explanation.

I'll focus on a practical application.



library UNISIM;
use UNISIM.VComponents.all;
-- ...
architecture Behavioral of Rotary_Pwm is
-- ...
  signal clk_fast     : std_logic;

   DCM_SP_inst : DCM_SP
   generic map (
      CLKFX_DIVIDE => 1,                     -- Divide value on CLKFX outputs - D - (1-32)
      CLKFX_MULTIPLY => 22                    -- Multiply value on CLKFX outputs - M - (2-32)
   port map (
      CLKFX => clk_fast,    -- 1-bit output: Digital Frequency Synthesizer output (DFS)
      CLKIN => clk_i,       -- 1-bit input: Clock input
      RST => '0'            -- 1-bit input: Active high reset input

  u0 : PwmDeadBand
    port map (
      clk_i => clk_fast,
  duty_i => accumulator_s,
  band_i => 64,
      pwmA_o => pwmA_o,
      pwmB_o => pwmB_o
-- ...



The DCM_SP_inst is an instance of the Spartan-specific DCM_SP primitive. It's not a standard VHDL construct.

At this point, our design becomes device dependent. You can't just port it to another FPGA.


I've used the following design decisions:

To get the 4ns step rate the minimum frequency is 1/4ns = 250 MHz.

Our output frequency is going to be that frequency divided by 256, so we're just under our   MHz output goal.

We'll have to up it to at least 256 MHz.

With a 12 MHz clock as input, we'll need to multiply that clock with 22 and get 264 MHz.

Each step will have a period of 3.8 ns - close enough to the 4 ns we're aiming for.


We only have to pass that fast clock to the PWM module.

There's no reason to route it to the rotary encoder, it can keep running on the 12 MHz clacker.



photo: Bart Simpson looks over a wall


In the capture above, I've set a deadband of 64 clock ticks (to have something measurable on the scope).

The signal frequency is 1.03127 MHz (theoretical 12 MHz * 22 / 256 = 1.031250 MHz)

The time between the cursors is 244 ns for 64 dead band ticks. So the measured granularity of our deadband is 244ns / 64 = 3.8125 ns.


Here's a capture with 8 clock ticks.


My GaN board will run with 2 or 3 ticks - I'd have to use different probe technique to show that.

Good captures in the neighbourhood of 8 ns (that's the deadband that I'll try to achieve) are hard on a 50 MHz scope



XuLA2 FPGA - First Impressions of the Development Tools
XuLA2 FPGA - SD Card Read and Write
XuLA2 FPGA - Rotary Encoder and VHDL
XuLA2 FPGA - PWM with Dead Band in VHDL
XuLA2 FPGA - Up the Clock

A PWM module for FPGAs that supports dead band.


A VHDL project that generates two opposite PWM signals with a dead band. You can change the duty cycle with a rotary encoder.

When you drive half-bridge designs, you need a control signal for both transistors in the circuit.

These signals need to be each other's opposite, because you close one transistor when you drive the other.

At the switching time, you introduce a tiny bit of dead time, to allow one transistor to properly shut before the other opens.

If you don't allow for this stabilisation period, your transistors will get hot and the magic smoke will eventually (sooner rather than later) escape.


I've made a VHDL module that generates these complementary signals, including a configurable deadband.

You decide in your design what the frequency and deadband is.

You can then freely change the duty cycle. The FPGA takes care that the dead time is guaranteed.


PWM VHDL module


My design is 100% based on the Xess XuLA2 PWM library. I've added the complementary signal and introduced that delay for rising edges of both outputs.


entity PwmDeadBand is
  port (
    clk_i  : in  std_logic;             -- Input clock.
    duty_i : in  std_logic_vector;      -- Duty-cycle input.
    band_i : in  natural;               -- number of clock-ticks to keep both signals low before rising edge
    pwmA_o  : out std_logic;            -- PWM output.
    pwmB_o  : out std_logic             -- PWM output inverse.
end entity;


duty_i is a register that holds the desired duty cycle. It's configurable - I've set it to 8 bits.


Your clock speed is dependent on the size of this register.

The clock that you present on the clk_i input of the module will be divided by 2^(number of bits). In this case, 2^8 -> 256.

For the standard 12 MHz clock of the XuLA2, the PWM module will beat at 47 kHz.

There are ways to increase the clock frequency in the FPGA and the XuLA2 libs have support for that.


The dead band, in clock ticks, is passed via the band_i pin. The two complementary signals appear on pwmA_o and pwmB_o.

In the constraint file of your project, you assign that to physical Spartan-6 pins:


# PM1 connections for the pwm outputs
net pwmA_o      loc=m16;
net pwmB_o      loc=k16;


If you use a StickIt! motherboard, you get the signals at PM1, pin D4 and D6.

If you tap them from the XuLA2 directly, they are chan4 and chan6 on the expansion header.


The implementation is just an extension of what the original Xess library does.

We introduce an additional channel that's HI when the other is LO,, and vice versa.

And we hold off driving any of these channels high until we've waited duty_i clock ticks.


architecture arch of PwmDeadBand is
  constant MAX_DUTY_C : std_logic_vector(duty_i'range) := (duty_i'range => ONE);
  signal timer_r      : natural range 0 to 2**duty_i'length-1;


    if rising_edge(clk_i) then
      pwmA_o   <= LO;

      timer_r <= timer_r + 1;
      if timer_r >= band_i + TO_INTEGER(unsigned(duty_i)) then
  pwmB_o <= HI;
  end if;

      if timer_r < TO_INTEGER(unsigned(duty_i)) then
        pwmB_o <= LO;
        if timer_r >= band_i  then
  pwmA_o <= HI;
   end if;
      end if;
    end if;
  end process;
end architecture;


That's really all for the PWM module.


Rotary Encoder module


This section is very short: read the previous blog post.


Patching it together


Also easy. We just have to wire the register that holds the value of the Rotary Encoder to the one that's driving the PWM module.

I just use the same register. That's the simplest way to do this.



entity Rotary_Pwm is
    Port ( clk_i : in  STD_LOGIC;
           rotEncA_i   : in  std_logic;        -- Rotary encoder phase 1 output.
           rotEncB_i   : in  std_logic;        -- Rotary encoder phase 2 output.
           pwmA_o      : out  STD_LOGIC;
           pwmB_o      : out  STD_LOGIC  
end Rotary_Pwm;

architecture Behavioral of Rotary_Pwm is

  signal accumulator_s : std_logic_vector(7 downto 0) := "01111111"; -- 50%


  u0 : PwmDeadBand
    port map (
      clk_i => clk_i,
  duty_i => accumulator_s,
  band_i => 16,
      pwmA_o => pwmA_o,
      pwmB_o => pwmB_o

  u1 : RotaryEncoderWithCounter
    generic map (ALLOW_ROLLOVER_G => true, INITIAL_CNT_G => 127)
    port map (
      clk_i => clk_i,
      a_i   => rotEncA_i,
      b_i   => rotEncB_i,
      cnt_o => accumulator_s

end Behavioral;


So the handover point is the 8-bit register accumulator_s. When you turn the encoder, it changes the value of the register to reflect your action.

In real-time, the duty cycle of the PWM module adapts. There's not a single clock tick between event and action.




I've used PM1 on the StickIt! motherboard, and used these pins:



PM1 pin
XuLA2 pin
Spartan-6 pinpin namedirectionfunction
6 - +3V3+3.3V+3.3Voutpull-up power for encoder
5 - GNDGNDGNDoutground for PWM and Rotary Encoder
4 - D6CHAN6K16pwmB_ooutPWM complementary signal B
3 - D4CHAN4M16pwmA_ooutPWM complementary signal A
2 - D2CHAN2R16rotEncB_iinRotary Encoder pin B
1 - D0CHAN0R7rotEncA_iinRotary Encoder pin A


If you want to wire the encoder inputs or pwm outputs to other pins, you only have to change the constraint file.





The project is attached, together with the PWM library.

I'm going to use this to drive my GaN experiment board. You?



XuLA2 FPGA - First Impressions of the Development Tools
XuLA2 FPGA - SD Card Read and Write
XuLA2 FPGA - Rotary Encoder and VHDL
XuLA2 FPGA - PWM with Dead Band in VHDL
XuLA2 FPGA - Up the Clock

How to use a rotary encoder with the XuLA2 and the Spartan-6 FPGA.

Another real world example: I'm checking if the Xess Rotary Encoder library works with the encoder I use in a GaN half-bridge design.

TL;DR: yes it works

Xess has a plug-in board with a rotary encoder. I'm not using that module (called a StickIt!) - but I'm using the sample project that comes with it.




Hat Shield Cape Wing. All names were taken except the coolest one.

StickIt!s are tiny modules that work togethet with the XuLa.

A XuLA board is small. Still it manages to expose loads of FPGA pins. Plug the XuLA in a breadboard and you have access to them.

Alternatively,  you can go StickIt!.


To start using the StickIt! modules, there's a motherboard. You dock the XuLA onto it and the signals become available in a few ways:

  • as a Raspberry Pi Hat (it can be used as a Hat or you can plug hats onto it. Your call).
  • as StickIt! ports where you can you can plug StickIt! modules into. The motherboard can host three modules.


In this blog I'm attaching the rotary encoder contacts to StickIt! port PM1. I just use patch cables to make the connections between my GaN PCB and that connector.

If you don't have the motherboard, you can make the connections directly to the XuLA2. I've pasted the xref tables you need to find out the correct pins.


This is the schematic of my rotary encoder. It's identical to the Xess module - except that I've added debounce capacitors.




The encoder is a Panasonic EVQ-VVD00203B Square SMD Encoder.

That's not the same model as the one on the Xess board but it works the same. This one doesn't have a push-button built in.



I'm plugging it into the PM1 of the motherboard. The power comes from the XuLA board, so you have to populate the XuLA PWR jumper and remove all others.

On this image you can see where you have to insert the jumper wires coming from the encoder.

Pin 6, VCC, goes to the 3V3 of the encoder circuit.

Pin 5, GND, to the circuit's GND

Pin 1, DO, to one of the encoder's switch contacts

Pin 2, D2, to the other switch contact.

The table also shows the channel numbers. Use these if you work without the motherboard. They represent the XuLA2 pins.




In this table you can find the mapping between the channel number and the FPGA signal.

You'll use that to define the pins in your project's constraint file.

For the pins that I've used, this is the constraint info.


# PM1 connections for rotary encoder module.
net rotEncA_i   loc=r7;
net rotEncB_i   loc=r16;


Update your project's .ucf file to reflect that.



I haven't changed a single line of code. I just went trough the typical FPGA build steps and generated the bitfile.


> xsload --fpga rotaryencodertest.bit
Success: Bitstream in rotaryencodertest.bit downloaded to FPGA on XuLA2-LX25!


Then I used the Python test script that's part of the project. It checks the (defined in the VHDL project)  register that holds the accumulated value, and prints it to the command line.

To start the test, execute this command:




Rotate the encoder like a madman and see the results on your command prompt:


If you don't have a rotary encoder, break open an old mouse. The scroll wheel is often an encoder.



XuLA2 FPGA - First Impressions of the Development Tools
XuLA2 FPGA - SD Card Read and Write
XuLA2 FPGA - Rotary Encoder and VHDL
XuLA2 FPGA - PWM with Dead Band in VHDL
XuLA2 FPGA - Up the Clock

Let's try to do something real with the Xilinx Spartan-6 FPGA: write a set of data to an SD card.


To boost the FPGA skills, I'm refreshing theory and checking out some real designs.

For a standalone XuLA2 board, talking to SD cards is a good practical example.


There's  a Micro SD slot on the XuLA2 models. The only other component you need is a spare Micro SD card.


Don't use an SD card with your marriage photo shoot on it. You'll very likely loose that when you test this project.


The Xess XuLA2 github has two SD card projects. We'll use the SD Card Control Test example.

In this project, the FPGA has two main duties:

  • Read and Write SD data
  • Communicate with your PC over USB

A python script on your PC will generate test data and send it to the FPGA over USB.

The FPGA writes the data to your SD card. We're using low level protocol here, no filesystem.

When finished, it reads the data back off the SD card and verifies the results.


VHDL Libraries


Xess made a set of common VHDL libraries. The example uses several those libs.

The communication with the SD slot is via :

  • SDCard.vhd

Clocking is handled by these two:

  • ClkGen.vhd
  • SyncToClk.vhd

Talk to the PC over USB happens with this one (and the on-board PIC microcontroller):

  • HostIo.vhd

These libs are not only useful, but also a great source to learn reusable VHDL.



How it Works


The XESS blog explains the example in detail. It explains both the electrical connections and how the Micro SD protocol is handled.

The header comments of the SdCard.vhd source file document many implementation details.

Open that file by double-clicking the u3 - SdCardCtrl node in the Implementation view.

If you've ever tried to understand (or port) a microcontroller SDCard lib - maybe the one from Arduino - you'll recognise much of the logic.


Let's now synthesize the project and generate the programming file.

Insert the SD card, connect the XuLA2 with your laptop and load the bitstream:


xsload --fpga sdcardctrltest.bit


Once you've loaded the bitstream, your XuLA2 board sits idle. You need to tell it to read and write data.

There's a python scrip that does exactly that.

If you have retrieved the latest XuLA2 FPGA sources from GIT, you'll find a python file named in the project directory.

If you're using the examples that were installed by the XESS installer, you can retrieve the testbed from here:


Run the script to write a random set of data to the SD card, read it back and check if everything is correct:



The data is written using low level protocol. You will not be able to read the data when you insert the SD card into your PC.

The design doesn't use a filesystem or any other advanced disk management protocol. It's SD access for real blokes.


Windows 8 and Windows 10 with Xilinx ISE


If you are running WIndows 10 64-bit, you may encounter several ISE problems:


  • Pressing the Open Project button (and several other actions, like selecting the Preferences menu item) crashes ISE.
  • Running the Simulator results in a "failed to link the design" message and the simulator not starting.


Switching to the 32-bit version has solved most of them for me:

To switch,

  • alter the command in the ISE shortcut to
    "<DRIVE>:\Xilinx\14.7\ISE_DS\settings32.bat D:\Xilinx\14.7\ISE_DS\ISE\bin\nt\ise.exe"
  • in "<DRIVE>:\Xilinx\14.7\ISE_DS\ISE\bin\nt", rename fuse.exe to _fuse.exe, and copy the fuse.exe from the "..\nt64" directory over (got that from here)


After that, I still have one issue left: when in Simulator, the Relaunch functionality doesn't work. I have to close iSIM and restart it from within ISE.


I've also tested the design with a known defect card - one that's rejected by any known operating system and can't be formatted.

This test was successful. The test flagged that the FPGA wasn't able to write data, as expected:



The core of the example can be used in your own design as a persistent storage area.

If you want to use it as a data exchange mechanism, you'll have to find a way to read the raw data from the card. That can be done with an Arduino.

Another - advanced - option is to implement a supported file systems (fat32?) in HDL,

Whatever you do, make it a nice design and share your work.




XuLA2 FPGA - First Impressions of the Development Tools
XuLA2 FPGA - SD Card Read and Write
XuLA2 FPGA - Rotary Encoder and VHDL
XuLA2 FPGA - PWM with Dead Band in VHDL
XuLA2 FPGA - Up the Clock

I purchased a Xess XuLA2. It arrived this morning.

This post is the story of my first steps



I have a little bit of experience with FPGAs. I learned digital electronics in the early-to-mid 80's.
My VHDL skills are beginner level and I've worked with the Xilinx Spartan 6 and development tools.

This is my experience to run a first design on the XuLA2 board


I have done training work with the Spartan 6 FPGA before. The Xilinx development environment is running on my laptop and works.

I can focus on getting the Xess tools working.

That's not difficult, but I had some issues with Python dependencies that I'll document here.


The USB Driver

Installing the device on a Windows10 laptop was easy. There's the typical 'signed driver' hurdle to jump.

When you plug in the XuLA, it's recognised by the OS, but in the Device Management screen there's a warning next to the device.

The typical Windows " invalid hash signature" warning.


The solution is known. You have to restart Windows in a special mode that allows to install unsigned drivers (use your google-fu to find the instructions for your version and language).

Once Windows is restarted in the 'allow unsigned drivers' mode, right-click on the device in the Device Manager and select Update Driver.

Windows will find the driver for you and install it. From then on, all is good.


The Loader Tool

The XuLA depends on external tools to synthesize your design into an upload file. I'm using the free Xilinx ISE toolset.

The Xess tools come into play when you want to load your designs to the FPGA.

They have command line and GUI applications. I'm testing the command line Loader tool here.


The command line tools are for Python 2.7 - I got synthax errors in the print() and other functions when trying it with 3.x.

The instructions to install the toolkit are here:

Install went fine. But I ran into a dependency conflict.

The tools depend on a Python library called pubsub. The installer nicely reports that the minimum required version is pypubsub >= 3.1.2.

(I don't know anything about python - I learned the deep internals of it today while getting all of this working )

The bad thing is that the latest version of that library, 4.x, doesn't work (either with Python 2.7 or with the current Xess tools release - I don't know that).

So I had to force-replace pubsub 4 with pubsub3 (I've logged an issue on github).


I used these commands:


pip uninstall Pypubsub
pip install -Iv


Finding all of that out took me a half day - don't ask how I did that unless we're in a pub together and you're paying for the drinks.

Once done, all works.


Success: XuLA2-LX25 passed diagnostic test!



Testing the XuLA2

I used the mandatory blinky project. I didn't use an LED though, but an oscilloscope.


The XuLA2 examples are available when you install the Xess XSTOOLS.

I've opened Examples/XuLA2/LX25/blinker in the Xilinx ISE, synthesised it and generated the programming file.

You can also make the project yourself by following the instructions in the Xess tutorial, chapter "Starting a Design in WebPACK"


Then I used the XSTOOLS Loader to beam it to the XuLA2:


xsload --fpga blinker.bit
Success: Bitstream in blinker.bit downloaded to FPGA on XuLA2-LX25!



With my scope attached to the CLK pin at the right lower corner of the PCB, I got the output:



The clock and output setting are defined in blinker.ucf:


net clk_i     loc=a9;  # 12 MHz input clock.
net blinker_o loc=t7 | IOSTANDARD=LVTTL | DRIVE=24 | SLEW=SLOW ;  # Blinker output to LED.




The downcount from 12 MHz clock to approx. 1Hz blink signal is done in the VHDL design:


entity blinker is
    Port ( clk_i : in  STD_LOGIC;
           blinker_o : out  STD_LOGIC);
end blinker;

architecture Behavioral of blinker is
signal cnt_r : std_logic_vector(22 downto 0) := (others=>'0');

process(clk_i) is
  if rising_edge(clk_i) then
    cnt_r <= cnt_r + 1;
  end if;  
end process;

blinker_o <= cnt_r(22);

end Behavioral;


(if only e14 had a VHDL syntax highlighter)



It works




XuLA2 FPGA - First Impressions of the Development Tools
XuLA2 FPGA - SD Card Read and Write
XuLA2 FPGA - Rotary Encoder and VHDL
XuLA2 FPGA - PWM with Dead Band in VHDL
XuLA2 FPGA - Up the Clock

This is the summary page for the XXICC (21st Century Co-design) project. XXICC was previously hosted at Google Code, which no longer accepts new projects or edits to existing projects. now links to this page.


The latest XXICC release is XXICC (21st Century Co-design) release 0.0q


XXICC (21st Century Co-design) is a not-for-profit research project which attempts to bring digital hardware/software co-design into the 21st Century using an improved programming language and a Reduced Software Complexity philosophy. Its goal is to make it easier and more enjoyable to write and maintain digital hardware and software. XXICC is pronounced “Chicken Coop”, so-called because it has so many layers.


XXICC’s GalaxC programming language narrows the gap between problem domain and language by allowing programmers to extend GalaxC by adding problem domain notations. Instead of adapting the task to the language, they adapt GalaxC to the task. The key extension mechanism in GalaxC is separation of syntax from semantics, a simple yet powerful way to add new notations. This is directly adapted from the Galaxy programming language developed in the late 1980s.


GalaxC programs may consist of ordinary ASCII characters and white space, like C. However, GalaxC programs may also have executable tables, schematic diagrams, comment blocks containing formatted text and figures (not yet implemented), variable names in different fonts, special symbols, string literals containing formatting, and WYSIWYG dialog boxes. This eliminates the need for separate documentation files (which are very hard to keep synchronized with a program) as well as a separate “resource editor”.


These are all edited using the XXICC Object Editor, a unified program and document editor which combines the features of a document editor, spreadsheet program, figure/schematic editor, dialog box editor, and more into a small, easy-to-use program with a consistent user interface. The XXICC Object Editor is written entirely in GalaxC and is used for editing all XXICC software and documentation. XXICC believes in the “take your own medicine” approach to software engineering.


Why GalaxC?


I created GalaxC to address my dissatisfaction with available programming languages.  I found that I spent a great deal of time “encrypting” ideas into restrictive programming languages, and felt it might be more efficient (and certainly more fun) to have a language that can be taught new notations so that the resulting code would be easier to write, understand, and debug.  GalaxC is an attempt to make this possible, or at least blaze a path in the right direction.


If you are perfectly happy with the programming language(s) you are using then you probably shouldn’t waste time learning about GalaxC, except perhaps out of morbid curiosity.  While it is possible that GalaxC’s ideas will revolutionize computer programming and you’ll need to know it to be competitive, it’s unlikely to happen any time soon with the present experimental versions of GalaxC.  OTOH, if you’re adventurous and dissatisfied with current language offerings, read on.


Getting Started


As with any new project, RTFM.  I try to write good documentation and keep it reasonably up to date.  For an overview of GalaxC, read Chapter 1 (Introduction) of Programming in the GalaxC Language.  GalaxC and XXICC try to follow the Reduced Software Complexity philosophy described in Chapter 1 of The XXICC Anthology.  You might find this interesting.


Programming in the GalaxC Language is primarily the reference for the GalaxC language, but tries to provide enough tutorial examples so you can get started. But it also provides considerable detail of how GalaxC’s compiler is implemented.  This is often TMI on a first reading and can be skimmed.  You’ll probably find Chapters 2 and 3 (Tokens and Expressions) to be pretty much the same as C and will go quickly.  Chapter 4 (Types and Variables) diverges quickly from C and gets into how GalaxC’s type inheritance mechanism implements many program features such as variables.  Some parts are TMI on first reading, but are there for the curious.


Chapter 5 (Functions and Macros) is a lot of fun since GalaxC allows them to have any legal syntax.  Here you will really see the power of the language.  Chapter 6 (The Postfix Stack Interpreter, PSI) describes the intermediate interpretive code which GalaxC executes.  What’s remarkable here is how much of GalaxC is implemented using macros and inline functions instead of building these features into the compiler.


Chapter 7 (Control Statements) looks a lot like C, though more like Pascal.  It’s interesting to see how GalaxC implements looping constructs as macros.  Chapter 8 (Programmer-Defined Types) is primarily concerned with data structures, arrays, and pointers.  These mechanisms are similar to C.


Chapters 9 (Generic Macros and Inline Functions) and 10 (Introduction to Special Functions) go deep into how GalaxC is implemented and are TMI for most users.  However, if you are curious as to how GalaxC is constructed, check out Section 10.1 (A Brief Tour of the GalaxC Compiler).


To compile and/or install XXICC on your computer, look at Installing and Running XXICC.  Then look at Compiling and Running GalaxC Programs to learn how to compile and run GalaxC programs.


When you’re ready to try some user interface programming, look at Chapters 3 and 4 of The XXICC Anthology.  They describe the GalaxC Simplified Window Manager (G-SWIM), an easy way to create portable GUI applications.


We have included some sample programs in  These include many of the sample programs in The XXICC Anthology.




For help with XXICC and GalaxC, please ask in the comments section.  This is much better than sending e-mail to the author as he may be too busy to get back to you quickly and others may be able to help you sooner.  Plus, by writing comments you’ll help future users with similar questions.


XXICC is a not-for-profit research project.  If you find XXICC useful or potentially useful, we can use help.


Reproducible Bugs


The single most useful thing we need is ways to reproduce bugs.  There are some known bugs that are maddening difficult to reproduce, so if you are able to reproduce strange behavior we’d really like to learn how so we can fix the bug.  My motto is “a reproducible bug is half fixed”, because once it is reproducible you can instrument the code and quickly figure out where and why.


For now, describe issues in the comments.  If there's anough activity I’ll make a separate issues list.


I’d also like to know about typos and other problems with the documents.


Regression Testing


This is a nasty problem with all software projects: you make a change and suddenly something that used to work is now broken.  There’s even a song about this:


99 little bugs in the code,

99 bugs in the code...

take one out, compile again,

100 little bugs in the code.

[Repeat until bug count goes to zero.]


The reason GNU/Linux is so stable is that there are so many people all over the world testing all sorts of versions on all sorts of platforms.  We’ll take all the help we can get for XXICC.




Currently XXICC does not have a mechanism for accepting donations.  We suggest instead supporting not-for-profit organizations that help Free Software and the free exchange of ideas such as the Free Software Foundation, Electronic Frontier Foundation, Wikimedia Foundation, and Software Freedom Law Center.




The author would like to renew thanks to the talented individuals who helped make the original version of Galaxy circa 1988.  Foremost he wishes to thank Anne Beetem for her ideas, inspiration, support, and scholarly collaborations.  He would also like to thank Jong-Min Park and Jim Rose for their contributions to the original implementation of the Galaxy compiler and its environment, and Monty Denneau for asking The Question which led to the original conception of Galaxy.


The author would also like to thank his family and friends for their support and encouragement over the years which led to GalaxC and XXICC reaching this point.  He would also like to acknowledge the literary inspirations of Miguel de Cervantes and Edmond Rostand towards putting Quixotic idealism ahead of practicality, Kurt Vonnegut for Cat’s Cradle, and Jan Potocki’s The Saragossa Manuscript which celebrates performing vast projects by oneself.


This text is © 2011 John F. Beetem and licensed under the Creative Commons Attribution-!ShareAlike 3.0 Unported License (CC BY-SA 3.0).  To view a copy of this license, visit  No warranty is expressed or implied.

There is quite a number of boards coming to market this year 2016.

These include various hats for Raspberry Pi, Beagleboneblack and the like.

However I've yet to see a really simple OSH design in Kicad format for any iCE40 chip.

If novices and young users are to progress from FPGA development, to creating prototype boards this is a natural requirement.

This crop is almostt right, but where are the design files?


I'm currently awaiting my iCEstick order to work through them.


On the subject of tutorials I published verilog files for iCE40HX8K Breakout Board and IceStorm here:

they are to be used in conjunction with the excellent verilog tutorials from Mojo:

versions for iceStick CATboard, iCEboard, Nandboard and other boards will be published as available.



During the 2015 Community Awards, we asked you to take a cursory look into the future and give us your predictions for the new platforms and technologies that are likely to dominate in 2016.

Even though it didn't make the initial nominations, Field Programmable Gate Arrays (FPGA) evidently captured your attention, as it was a subject that came up time and again in the comments, and also in the Technology of the Year polls.

It's reasonable to consider FPGA, which already has a global market value above $5 billion that's expected to land closer to $10 billion over the next few years, as the evolution of programmable ROMs due to their reconfigurable logic blocks and complex input/output functions.

But they're also so much more.

What's Next for Field Programmable Gate Arrays?

We'd considered how to look deeper into this, and other hot technologies, as part of our 2015 Year in Review. But ultimately it's you guys who have the deeper knowledge, so instead we'd like to look forward into 2016 rather than looking back at 2015.

We want your thoughts on the future of FPGA. How will it evolve, how will it reach a wider user base, and what kind of changes will it undergo over the next 12 months. What will the FPGA look like in New Year 2016, and what kind of projects will you be making with them.

Tell us all about the future of FPGA below (and what you'd like to see, as much as what we will see), and we'll reconvene this time next year to see how close we got to the mark.

Arachne-pnr by Cotton Seed (who also uses pseudonyms cseed and mian2zi3) is an open-source FPGA placement and routing tool for Lattice iCE40 FPGAs.  It's a companion to the open-source Project IceStorm by Clifford Wolf and Mathias Lasser, which has reverse-engineered the iCE40 bitstream.

The usual design flow for IceStorm is to synthesize Verilog source code using Clifford Wolf's Yosys to produce a netlist in the form of a Berkeley Logic Interchange Format (BLIF) file.  Arachne-pnr performs physical placement and routing of the BLIF netlist and produces a text file for IceStorm's icepack tool.  Icepack in turn produces a binary bitstream that can be downloaded to an iCE40 FPGA or SPI Flash memory using IceStorm's iceprog tool.  Here's a typical command sequence:

$ yosys -p "synth_ice40 -blif rot.blif" rot.v

$ arachne-pnr -d 1k -p rot.pcf rot.blif -o rot.txt

$ icepack rot.txt rot.bin

$ [sudo] iceprog rot.bin

I'm going to add IceStorm as a synthesis target for my XXICC (21st Century Co-design) project.  Since XXICC already has rudimentary synthesis, I will skip the Yosys step and go directly to arachne-pnr and IceStorm.  This means I need to produce BLIF files from scratch, which isn't described in the arachne-pnr documentation as far as I can tell.  However, BLIF itself is documented and it's fairly easy to figure out how arachne-pnr uses BLIF from arachne-pnr's example files and others produced by Yosys.  This 'blog documents my experience so others can benefit.  I will be adding to it as I learn more.  This content is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License so others can share it.


Arachne-pnr's source code and documentation are at GitHib: cseed/arachne-pnr


Here is my original IceStorm discussion here at element14: Project IceStorm: fully open-source FPGA tools for Lattice iCE40 and my IceStorm notes: John Beetem's IceStorm Notes

IceStorm uses a Lattice iCEstick as a development board.  It's available for US$21 in the USA:Lattice Semiconductor: ICE40HX1K-STICK-EVN iCEstick Evaluation Kit.

Here are instructions for installing IceStorm and its companion tools: Projet IceStorm: le FPGA libéré! [the FPGA set free!].  The instructions are a combination of French and GNU/Linux.  The IceStorm steps are out of date, since IceStorm is now at GitHub.  The arachne-pnr steps are accurate:

1.  Install IceStorm first.  Arachne-pnr needs IceStorm's textual chip database files which it converts to a faster binary form as part of the make process.  For IceStorm installation instructions, see John Beetem's IceStorm Notes

2.  You can install arachne-pnr anywhere, but the standard place is in "/opt".  You'll probably need to change permissions on "/opt" so you can write to it.

$ cd /opt/


3.  Clone arachne-pnr from GitHub:

$ git clone


4.  Compile arachne-pnr.  It's in C++11, so your GCC will need to support it (GCC 4.8.1 or later).

$ cd arachne-pnr

$ make

$ sudo make install

That's it!  Arachne-pnr is ready to go.  There are sample files in /opt/arachne-pnr/examples/rot and /opt/arachne-pnr/tests.

Like many open-source projects, arachne-pnr doesn't have a lot of documentation.  GitHub is your best starting point.  The arachne-pnr program prints usage and options with the "-h" (help) option:

$ arachne-pnr -h

The usual command for arachne-pnr looks like this:

$ arachne-pnr -d 1k -p rot.pcf rot.blif -o rot.txt


"-d 1k" is the target device, in this case the iCE40 HX1K with 1280 logic cells.  You can also specify "-d 8k" for the HX8K with 7,680 logic cells.  If you leave out "-d", you get 1k by default.  It's the FPGA on the Lattice iCEstick.

"-p rot.pcf" is the physical constraint file (PCF).  It specifies the pinout, i.e., how your top-level signals attach to pins, and has lines like this:

set_io a 78

set_io b 87

set_io LED1 99

HX1K pin numbers assume the TQ144 FPGA used by the Lattice iCEstick.  You can specify a different package with the "-P" option.

If present, "rot.blif" is the input BLIF file.  If absent, I think arachne-pnr reads from standard input.


If present, "-o rot.txt" is the output text file for icepack.  If absent, I think arachne-pnr writes to standard output.

Here are some other useful options:

Like many physical design tools, arachne-pnr uses simulated annealing (SA) for placement.  SA is a pseudo-random algorithm and requires a seed for the pseudo-random number generator.  As of 8 August 2015, arachne-pnr uses 1 as the default seed or you can use the "-r" option to get a random seed, which arachne-pnr prints out.  Earlier arachne-pnr releases always generated a random seed, which meant that each time you ran it you got different results even with the same BLIF and PCF.  You can specify a different fixed seed with the "-s" option.

As we will see shortly, arachne-pnr may pack multiple components -- e.g., a look-up table (LUT) and a flip-flop -- into a single iCE40 logic cell.  The "-B" option creates a post-pack BLIF file to show you what arachne-pnr did.  There's also a "-V" option that creates a post-pack netlist as Verilog.

BLIF and the Silicon Blue Library

BLIF is a general-purpose logic format and I believe it was originally created for specifying logic functions for logic minimization tools.  As such, if you look at the BLIF document you will see logic truth tables and "don't cares" -- all that good stuff you learned about in logic design class.  Remember Karnaugh Maps?

Arachne-pnr doesn't need any of that Boolean stuff.  It assumes all the logic minimization was already done by Yosys or another synthesis tool and logic functions have already been combined into LUTs.  The "gates" used by arachne-pnr are the iCE primitive blocks in Lattice's iCE Technology Library document.  This document is rather hard to find by browsing at the Lattice site: it's hiding at the iCEcube2 Design Software page.  Scroll to the bottom, find "Software Downloads & Documentation", and click "Technical Briefs".  Click on iCE 2015-04 (or whatever) Technology Library.  You'll also need the iCE40 LP/HX Family Data Sheet, especially the Architecture Overview section.

The most important iCE primitives are SB_LUT4 (4-input LUT), SB_CARRY (fast carry logic), SB_DFFxx (D flip-flops with various clock enable and set/reset options) and SB_IO (I/O block with all the options).  "SB" stands for Silicon Blue Technologies, the company that created the iCE40 FPGA and was bought by Lattice Semiconductor.

Here is a diagram showing how SB_LUT4, SB_CARRY, and SB_DFFxx combine to make an iCE40 Logic Cell (LC):


SB_LUT4 has four inputs I0-I3 and output O, which can be the logic cell output O and/or the input to a D flip-flop.  SB_DFFxx is a DFF with various clock-enable and set/reset options, which are listed in the iCE Technology Library document.  For example, SB_DFF has EN=1 and no set/reset, SB_DFFR has asynchronous reset, SB_DFFSS has synchronous set, and SB_DFFESR has a clock-enable input and synchronous reset.  There are also versions with inverted clocks.  LC output O is either the LUT output or the DFF state Q, selected by a configurable multiplexer.

Logic cells are stacked into Programmable Logic Blocks (PLBs), each of which has 8 LCs.  The EN and SR signals and clock polarity must be the same for all LCs in a PLB.  (SR options may be different for each LC.)

Each LC also has carry logic for building high-speed adder, subtracters, comparators, binary counters, etc.  SB_CARRY is the majority function, i.e., the carry-out function of a full adder.  SB_LUT4 calculates the sum output of a full adder and similar functions, setting the configurable mux for LUT input I3 to be the carry in (CI) from the LC below this one.  CI may also be 0 or 1 if this is the lowest LC in a PLB.  The carry out (CO) from SB_CARRY goes to the LC above this one.

Arachne-pnr tries to combine SB_LUT4, SB_CARRY, and SB_DFFxx whenever possible.  In some cases, the logic cannot be combined and arachne-pnr uses SB_LUT4 as a "pass-through" for carry in/out or a DFF input.  Combining SB_LUT4 and SB_CARRY requires the SB_CARRY I0, I1, and CI input signals to be the same as the SB_LUT4 I1, I2, and I3 inputs.

Here's a BLIF example from a binary counter.  Given the current state p5 of the counter,  it calculates the next state np5:


.gate SB_LUT4  I0=clr I1=$0 I2=p5 I3=pc5 O=np5  # np5 = next p5

.param LUT_INIT 0000010101010000

.gate SB_DFF  C=clk D=np5 Q=p5

.gate SB_CARRY CI=pc5 I0=$0 I1=p5 CO=pc6        # pc6 = carry to p6

SB_LUT4 calculates np5 given p5 and carry-in pc5 from the next lower bit of the counter.  np5 is clocked into SB_DFF to update p5 at the next rising edge of clk.  SB_CARRY calculates carry-out pc6 for the next higher bit of the counter.  Note that SB_CARRY's I0, I1, and CI inputs match SB_LUT4's I1, I2, and I3.  SB_LUT4 I0 is set to signal clr which synchronously resets the counter, and I1 is not used so I set it to the constant 0.  (Yosys uses $true and $false for 1 and 0, but BLIF lets you define them to be something else.)

SB_LUT4 has a LUT_INIT parameter that specifies the binary truth table for the LUT.  Each bit i (numbered from LSb = 0) is the LUT value if I[3:0] = i.  If I0 = clr = 1, the LUT output np5 is 0.  If I0 = 0, the LUT output is I2⊕I3 = p5⊕pc5.

Arachne-pnr combines SB_LUT4, SB_DFFxx, and SB_CARRY into an ICESTORM_LC, which includes parameters for DFF and carry chain options.  You can see how it did this using the "-B" option.  I think arachne-pnr understands ICESTORM_LC blocks in its BLIF input if you want to combine LUTs, DFFs, and carry logic yourself before running arachne-pnr.

Finally, let's take a quick look at SB_IO, the I/O block.  The iCE40 I/O block has many options including built-in flip-flops for DDR, tri-state output enable, and pull-up resistor.  See the iCE Technology Library document for details.  If your design has simple inputs and outputs, you can let arachne-pnr generate SB_IO blocks automatically.  However, if you want to use some options like pull-up you may need to include an SB_IO explicitly.

Here's the BLIF for an input pin "en" that has a pull-up:

.gate SB_IO PACKAGE_PIN=en D_IN_0=en.d0

.param PINTYPE  000001000001

.param PULLUP 1

This SB_IO has two of its pins connected: PACKAGE_PIN is the external pin "en" and D_IN_0 is the internal data-in signal, which may be latched or registered.  There's also a D_IN_1 registered on a falling clock edge for DDR.  I've named the internal signal "en.d0" and connect it to internal logic elsewhere in the BLIF file.

The PIN_TYPE parameter specifies whether SB_IO input is registered, latched, or combinational, and whether the SB_IO output is disabled, combinational, registered, inverted, DDR, etc.  The PULLUP parameter enables the I/O pin's pull-up resistor.

OK, that's all for now.  I'll add more as needed.

Project IceStorm, by Clifford Wolf and Mathias Lasser, is an amazing project that has reverse-engineered the Lattice iCE40 FPGA's bitstream so that it's finally possible to write open-source FPGA design tools for a real FPGA.  I've been playing with IceStorm and its companion tool arachne-pnr (place & route) over the last few days and it's been loads of fun with very few problems.  I'm going to add IceStorm as a synthesis target for my XXICC (21st Century Co-design) project.  That will give me a complete open-source FPGA design system, something I've been wanting for decades.


This 'blog is for collecting links and notes for IceStorm so that others can find them quickly and easily.  I will be adding to it as I learn more.  This content is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License so others can share it.


Here is the official Project IceStorm wiki: Project IceStorm


Here is my original IceStorm discussion here at element14: Project IceStorm: fully open-source FPGA tools for Lattice iCE40

IceStorm uses a Lattice iCEstick as a development board.  It's available for US$21 in the USA: Lattice Semiconductor: ICE40HX1K-STICK-EVN iCEstick Evaluation Kit.

Here are instructions for installing IceStorm and its companion tools: Projet IceStorm: le FPGA libéré! [the FPGA set free!].  The instructions are a combination of French and GNU/Linux.  The IceStorm steps are out of date, since IceStorm is now at GitHub.  Here are the specific steps for IceStorm:

1.  Install "libftdi-dev" so IceStorm can talk to iCEstick's FT2232H:

$ sudo apt-get install libftdi-dev

2.  You can install IceStorm anywhere, but the standard place is in "/opt".  You'll probably need to change permissions on "/opt" so you can write to it.

$ cd /opt/


3.  Clone IceStorm from GitHub:

$ git clone icestorm


4.  Make the IceStorm software.  Some of it is in C++11, so your GCC will need to support it (GCC 4.8.1 or later).  The "make" step builds textual chip databases using a Python program, which takes a while on a slow computer.

$ cd icestorm

$ make

$ sudo make install

That's it!  IceStorm is ready to go.

Here is a partial list of IceStorm programs.  Like many open-source projects, IceStorm does not have a lot of documentation.  The IceStorm wiki is your best starting point.  However, each program does have a way to print help.

icepack takes a FPGA configuration data in text form (usually from arachne-pnr) and packs it into a binary file for downloading.  Here is a typical icepack command:

$ icepack moebius.txt moebius.bin

To list icepack options, give the command "icepack -h".

iceprog downloads a bitstream to an iCE40 FPGA or to a serial flash.  Here is a typical iceprog command:

$ sudo iceprog moebius.bin

If you don't want to use sudo, you can add a "rules" file to "/etc/udev/rules.d": see udev rules file for FTDI FT2232D/H, FT232H, and Papilio DUO.

To list iceprog options, give the "iceprog" command with no arguments.  For an iCEstick, iceprog programs the serial flash and then resets the FPGA so that it programs itself from the serial flash.  You can't program the FPGA's internal SRAM directly because of the way the SPI interfaces are wired together.  It is possible to modify iCEstick to program SRAM directly, but you lose the serial flash chip.  The "iceprog" command with no arguments tells you how to do this.


Here is the new release 0.0q of XXICC.  0.0q adds logic capacity to Flavia implementation and allows you to specify pull-up, pull-down, and keeper circuits for FPGA I/Os in all implementations.  The Flavia architecture is now more consistent across all implementations: see the Flavia chapter of The XXICC Anthology rev 0.0q which has an updated version of Flavia: the Free Logic Array.

XXICC (21st Century Co-design) is a not-for-profit research project which attempts to bring digital hardware/software co-design into the 21st Century using an improved programming language and a Reduced Software Complexity philosophy.  Its goal is to make it easier and more enjoyable to write and maintain digital hardware and software. XXICC is pronounced "Chicken Coop", so-called because it has so many layers.


For an overview of XXICC, see XXICC: 21st Century Co-design.  For details on the GalaxC programming language, XXICC Object Editor, and GalaxC extensions for Hardware Design (GCHD), here are the latest documents and source code:


Release notes for XXICC rev 0.0q

Programming in the GalaxC Language rev 0.0j: reference and user guide for the GalaxC programming language, unchanged for 0.0q.

The XXICC Anthology rev 0.0q: collection of miscellaneous XXICC topics, including user guides for the XXICC Object Editor, GCHD and Flavia.  0.0q has a major rewrite of the Flavia chapter (12).

XXICC code release 0.0q: all source code for XXICC.

XXICC source code listing rev 0.0q: source code listing as PDF.

XXICC executable binary for Windows rev 0.0q: XXICC executable binary for Microsoft Windows.

GalaxC sample/demo programs rev 0.0k: sample GalaxC programs and GCHD logic libraries, unchanged for 0.0q.

GalaxC sample/demo program listings rev 0.0k: PDF listing of the sample GalaxC programs and GCHD examples, unchanged for 0.0q.

Installing and Running XXICC rev 0.0q: Document describing how to install and run XXICC.

Compiling and Running GalaxC Programs rev 0.0k: Document describing how to compile and run your own GalaxC programs, unchanged for 0.0q.

Editable XXICC documentation files rev 0.0q: editable XOE files for XXICC documentation.

Data files for FlaviaP40 release 0.0q for Papilio One 250K: Data files for the FlaviaP40 implementation of the Free Logic Array.

Data files for FlaviaP60 release 0.0q for Papilio One 500K: Data files for the FlaviaP60 implementation of the Free Logic Array.

Data files for FlaviaPD59 release 0.0p: Data files for the FlaviaPD59 implementation of the Free Logic Array for the Papilio DUO, unchanged for 0.0q.

Data files for FlaviaLP60 release 0.0q for LOGI-Pi: Data files for the FlaviaLP60 implementation of the Free Logic Array for the ValentF(x) LOGI-Pi.

Data files for FlaviaLB60 release 0.0q for LOGI-Bone: Data files for the FlaviaLB60 implementation of the Free Logic Array for the ValentF(x) LOGI-Bone.

Taming the Wild Bitstream (unchanged for 0.0q): Supplement to Flavia: the Free Logic Array.


I've tested XXICC 0.0q on GNU/Linux (Ubuntu on x86 PCs, Raspberry Pi Raspbian, BeagleBone Debian, and ODROID-C1 Ubuntu) and Windows (2000 and 7).  My main machine is Ubuntu, so the others are more likely to have anomalies.  Constructive comments and suggestions are most welcome.  I'd especially like to find out how to reproduce some of the bugs that have been eluding me.


The previous version of XXICC is: XXICC (21st Century Co-design) release 0.0p

The earliest versions of XXICC are at Google Code:


XXICC is a FLOSS (Free as in Liberty Open Source Software) project.  Software is licensed under GPLv3 and other content is licensed under Creative Commons CC-BY-SA 3.0.


How to read an image in FPGA

Posted by nahidku Jun 24, 2015

Dear All,


I am very very new learner of VHDL code. I am interested to learn hoe to read an image by VHDL code.


My target is to develop an image edge detector.



This blog is part 3 of a 4 part series of implementing a gradient filter on an FPGA.  If you have not already read the earlier parts see the link below to get up to speed before reading this blog.  Additionally the user can catch some of our previous blog posts, linked below.


Part 1 and 2 of this blog series


Other FPGA blogs by ValentF(x)


In the previous two parts, we designed modules to interface a camera and then created a gradient filter on the FPGA. One key aspect of using an FPGA is that the designs needs to be valid by construction. When writing software it's fairly easy to write a buggy first version of an application and then debug using step-by-step debugger, or IO (prints on serial, or LEDs) to get working software. On hardware/FPGA you can easily write a hardware description that compiles/synthesizes well but does not work. When this happens you are left with two options:

  • Use a logic analyzer, either physical (a costly piece of equipment) or soft (a logic analyzer you add to your design in the FPGA) and debug your design outputs.
  • Re-write everything hoping for the best


The best approach when writing HDL is to design a test for every component you create (if your component is a structure of tested component, you should still write a test for it). This test is implemented as a test-bench. A test-bench is a specific HDL component that cannot be synthesized but that can be executed in a simulated environment. This test-bench generates inputs signals (test vectors) for the device to be tested (Unit Under Test, UUT) and gathers the outputs.


Fig 1 : test-bench used to consist in the device physically connected to test equipment. In HDL all this is simulated on the designer’s computer.

The test-bench can be instrumented to automatically validate the device under test by comparing the outputs for a given set of inputs to a reference (Unit Testing). Test-benches can also be used to test the device during it’s lifetime to make sure it still complies with its initial specification when the designer makes changes to it or one of its sub-components (Regression Testing). Because it is impossible to generate all combinations of test inputs, it is very important to make sure that the chosen set will cover most of the cases (test-coverage).


Fig 2 : Minimal HDL development flow

A test-bench is an independent design and writing a test can sometimes take more time than writing the component itself. A well-design test will save you a lot of time when it comes to loading your design to the device and will help you better understand your component behavior.

In the test-bench the input signal can be generated using the usual VHDL syntax plus an extra set of non-synthesizable functions, mainly for handling timing aspects and IOs. The TextIO package provides an interesting set of functions for handling file inputs/outputs to allow reading/writing values from/to files.

The test-bench can then be executed by a simulator (ModelSim, Isim - xilinx’s free version, GHDL, etc). This simulator interprets your VHDL and simulates the behavior of the FPGA. This simulation can either be functional, or timed. A timed simulation will care about the propagation time in the logic while a functional simulation won't. Because the simulator has to emulate the logic you've written, the simulation can take very long. For example the in the next blog post, we will write a test-bench for the gradient filter that processes a QVGA image (320x240 pixels), this simulation takes ~30min to complete. On bigger systems, the simulation time can be well into the range of hours (for regression testing and unit testing, you'd better run these at night). The simulation process is part of what makes HDL development time very long compared to software. For example, when you have an error in your design, it usually takes a minute to fix in the HDL but minutes/hours to validate the fix. If you compare with the usual software development techniques you'll understand why it is so important to think your design through before implementing it.

In the following we will design a test for the gradient filter component we designed in Part 2 of this blog series. This test-bench will be implemented in VHDL and simulated using ISE’s integrated simulator, ISim (comes for free with the web edition).

Basic testing : Testing the arithmetic part of the Sobel filter for X gradient values


In this first part of the testing we will consider the arithmetic part of the Sobel filter that does the pixel window convolution with the Sobel filter convolution (generic convolution before optimization using DSP blocks). At the heart of the convolution is a Multiply And Accumulate operation that does the multiplication of two 16-bit inputs and adds them with the previous output to generate a 32-bit result. In the following we will test this simple component.The created test will simply stimulate the design with static values to observe for potential bugs in the calculated values.

Generating the test-bench skeleton for the unit under test

ISE comes with a nice feature to auto-generate a template of test-bench for a specific component. This allow to free the designer from the hassle of writing the signal instantiation and component instantiation and concentrate on the test behavior. To do so, in the file navigator right click and select “New Source”. In the wizard, select “VHDL Test Bench” and fill-in the filename and location then click “Next”. In the next window select the component to test (the component must be part of your project) and click finish. Beware that if your component has syntax errors, the generated file won’t be valid. To check syntax, select your component file in the project navigator and click on “Check Syntax” in the process panel.

Once generated the test-bench is composed of three parts :

  1. Signals, constants and component declarations.
  2. Components instantiations and wiring
  3. Clocks generations
  4. Stimuli generation


Parts 1, 2, 3 are auto-generated. ISE auto-detects the system clocks (based on the signal names) and by default generates each clock in a separate process. The clock frequency can be tweaked by setting the constant <clock name>_period.  The process looks like this :

clk_process :process   begin       clk <= '0';       wait for clk_period/2;       clk <= '1';       wait for clk_period/2;   end process;

This process runs endlessly and does the following :

  • Sets the clock signal to low
  • Waits for half the clock period. Note that this wait statement is the kind of non synthesizable statement of VHDL
  • Sets the clock signal to high
  • Waits for half the clock period

This process generates a square wave of the configured frequency on the clock signal.


Part 4 is partially generated with comments to help you understand where to write your test code.


stim_proc: process


      -- hold reset state for 100 ns.

      wait for 100 ns;   

      wait for clk_period*10;

      -- insert stimulus here


   end process;


The first part deals with the system reset. You have the reset signal of your UUT active to force the system into reset and then set the reset inactive just after the “wait for 100 ns ;”. Then there is a 10  clock cycles where the test does nothing and then the fun part starts with  “-- insert stimulus here”.


Your stimulus is the sequence of inputs that test the unit. The inputs are generated using traditional assignment operators in HDL and sequencing the inputs is performed by using the wait statement. The wait statement can either be used with time expressed in units picoseconds, nanoseconds, or with a boolean condition using the until statement :


wait for 10 ns ;

wait until clk = ‘1’ ;


Testing MAC16


We have generated the test-bench template for MAC16, now let’s write the test process. We will first write a simple test that will stimulate the MAC16 with two simple values.


stim_proc: process


      -- hold reset state for 100 ns.

       reset <= '1';

      wait for 100 ns;   

       reset <= '0';

      wait for clk_period*10;

      -- insert stimulus here

       A <= to_signed(224,16);

       B <= to_signed(3967,16);

       add_subb <= '1' ;


   end process;


After writing the test process, click the “Simulation” check-box in the project navigator window, then select the test bench file and click “Simulate Behavioral” in the process window.




If your test-bench contains no errors, this will launch the ISim tool. After a bit of time you should end-up with the following window.




Use the zoom-out button and the horizontal scroll-bar to get to the beginning of the simulation with an appropriate scale (you should see the clock edges).




To set the signals display format, right click on the “a[15:0]” signal, select “Radix” and “Signed decimal”.  Do the same for “b[15:0]” and “res[31:0]”. You should now have the following trace.




If you zoom on the resolution signal between 200ns and 250ns you get the following sequence of results.


888608, 1777216, 2665824, 3554432


As we know the expected behavior of the MAC we can check the result validity :


224*3967 = 888608 -> 888608 + (224*367) = 1777216 -> 1777216+ (224*367) = 2665824 …


At this point if something fails in your design, you can go back to ISE, edit your file and then in ISim press the relaunch button to restart the simulation as in the following image.






Reporting errors

Now that we know that the design works, we can improve the test to automatically report errors. The “assert” statement allows us to report warnings/errors/failures to the designer from the simulation. This report will then help the designer to spot exactly where the problem occurs. In our case we will report a failure if the result of the first MAC cycle differs from what is expected.


stim_proc: process


      -- hold reset state for 100 ns.

       reset <= '1';

      wait for 100 ns;   

       reset <= '0';

      wait for clk_period*10;

      -- insert stimulus here

       A <= to_signed(224,16);

       B <= to_signed(3967,16);

       add_subb <= '1' ;

       wait for clk_period ;

       ASSERT res = (224*3967) REPORT "Result does not match what is expected" SEVERITY FAILURE;


   end process;


In this process if the result is different from the expected result, the simulation will stop. A less critical report would be ERROR or WARNING (won’t stop the simulation) and NOTE would just inform the user. The report message will be printed in the simulator console window.






So far in our test we have only tested the behavior of the MAC16 component for a single value and we validated by hand the sequence of value. To create a better test that covers more cases, we need to create an input test vector, that is a sequence of inputs, to apply to the module and an output test vector that is the expected results for the aforementioned input sequence. These vectors can either be created as a file to be read by the simulation using the TextIO package or directly coded in the test-bench. For the purposes of this blog post we will implement the second method (the first method is better for large tests).


First we need to declare the array vector types for out inputs and outputs:


type input_vector_operand_type is array(natural range <>)  of signed(15 downto 0);

type output_vector_res_type is array(natural range <>)  of integer;


Then we need to create the input vectors and expected outputs as follows:


-- test vectors

    constant a_vector : input_vector_operand_type(0 to 5) := (

    to_signed(0, 16),

    to_signed(256, 16),

    to_signed(-64, 16),

    to_signed(16, 16),

    to_signed(0, 16),

    to_signed(0, 16)



    constant b_vector : input_vector_operand_type(0 to 5) := (

    to_signed(1034, 16),

    to_signed(-1, 16),

    to_signed(-89, 16),

    to_signed(32000, 16),

    to_signed(0, 16),

    to_signed(0, 16)



    constant res_vector : output_vector_res_type(0 to 5) := (









For the results, the two initial 0 values are to take into account the pipeline of the MAC16 component. This component has a latency of two clock cycles before a change on the inputs impacts the output.


Then we have to write the process that scans those vectors, and report the errors/failures using assert.


stim_proc: process


      -- hold reset state for 100 ns.

       reset <= '1';

      wait for 100 ns;   

       reset <= '0';

      wait for clk_period*10;

      -- insert stimulus here

       for i in 0 to 5 loop

               A <= a_vector(i);--a_vector(i);

               B <= b_vector(i);--b_vector(i);

               add_subb <= '1' ;

               ASSERT res = res_vector(i) REPORT "Result does not match what is expected "&integer'IMAGE(res_vector(i))&" != "&integer'IMAGE(to_integer(res)) SEVERITY FAILURE;

               wait until falling_edge(clk) ;

       end loop ;


   end process;


The for loop iterates over the range of the test vectors and for each set of inputs, the result of the MAC16 is tested. If the result does not match the assert condition, the simulation will fail and indicate what went wrong.


Now that the base module of our convolution filter has been proven to work, the other components of the sobel filter must be tested. Once  the MAC16 is tested we can plan to test the full gradient filter. Testing the filter using hand-designed test vectors can be very painful considering the amount of information needed to be generated in order to test a whole image. In this case debugging at higher level is a better solution and allows us to evaluate the quality of the filter.


Testing the sobel filter using images will be the topic of the next blog post.



Creative Commons License
This work is licensed to ValentF(x) under a Creative Commons Attribution 4.0 International License.