The Universal DELAY Building Block

 

This is the first post in this blog in which I will try to actually design something useful based on the ideas introduced so far. Also, instead of just short code snippets, the code examples in this post are complete and they can be simulated, synthesized and used in actual designs.

 

We will try to create a generic and reusable delay module - the goal is to delay a signal of some given type by a fixed but arbitrary number of clocks, this is a fundamental building block in almost any type of hardware design. The latency, or clock delay will be controlled by a generic parameter called SIZE and the input and output data will be unconstrained. This means that the same module can be used to implement a delay of any width and depth for a particular given type.

 

While we were able to use local type definitions in the previous examples which were then used to define signals of those types, we cannot have the same approach with module ports. While ports are essentially the same things as internal signals, there is no place for a type definition inside the entity definition before introducing a port of that type. The only solution is to use packages so we will assume that we have a package called type_pkg that will collect all the type, operator and function definitions we will need later. In the long term this package can be a relatively complex piece of code but for today's example it can be as simple as this:

 

library IEEE; 
use IEEE.STD_LOGIC_1164.all;
use IEEE.NUMERIC_STD.all;
use IEEE.math_real.all; 

package
TYPES_PKG is
 
type BOOLEAN_VECTOR is array(NATURAL range <>) of BOOLEAN;   -– this type is already defined in VHDL-2008, use only for VHDL-93 implementations
 
type INTEGER_VECTOR is array(NATURAL range <>) of INTEGER;   –- this type is already defined in VHDL-2008, use only for VHDL-93 implementations
 
type UNSIGNED_VECTOR is array(INTEGER range <>) of UNSIGNED; –- works only in VHDL-2008
 
type SIGNED_VECTOR is array(INTEGER range <>) of SIGNED;     –- works only in VHDL-2008

 
type SFIXED is array(INTEGER range <>) of STD_LOGIC;     -– arbitrary precision fixed point signed number, like SIGNED but lower bound can be negative, OK even in VHDL-93
 
type SFIXED_VECTOR is array(INTEGER range <>) of SFIXED; –- unconstrained array of SFIXED, works only in VHDL-2008
 
type CFIXED is record RE,IM:SFIXED; end record;          -– arbitrary precision fixed point complex signed number, works only in VHDL-2008
 
type CFIXED_VECTOR is array(INTEGER range <>) of CFIXED; –- unconstrained array of CFIXED, works only in VHDL-2008

 
function TO_SFIXED(R:REAL;H,L:INTEGER) return SFIXED; -– returns SFIXED(H downto L) result
 
function TO_SFIXED(R:REAL;HL:SFIXED) return SFIXED; –- returns SFIXED(HL'high downto HL'low) result
end TYPES_PKG; 

package
body TYPES_PKG is
 
function TO_SFIXED(R:REAL;H,L:INTEGER) return SFIXED is
   
variable RR:REAL;
   
variable V:SFIXED(H downto L);
 
begin
   
assert (R<2.0**H) and (R>=-2.0**H) report "TO_SFIXED vector truncation!" severity warning;
   
if R<0.0 then
      V(V'high):='1';
      RR:=R+
2.0**V'high;
   
else
      V(V'high):='0';
      RR:=R;
   
end if;
   
for K in V'high-1 downto V'low loop
     
if RR>=2.0**K then
        V(K):='1';
        RR:=RR-
2.0**K;
     
else
        V(K):='0';
     
end if;
   
end loop;
   
return V;
 
end;

function TO_SFIXED(R:REAL;HL:SFIXED) return SFIXED is
 
begin
   
return TO_SFIXED(R,HL'high,HL'low);
 
end
end TYPES_PKG;

 

Let's start with a module that is a generic BOOLEAN delay and call it BDELAY:

 

library IEEE; 
use IEEE.STD_LOGIC_1164.all; 
use work.types_pkg.all; 

entity
BDELAY is generic(SIZE:NATURAL:=1);   –- SIZE has a default value of 1 and cannot be negative, this would require traveling back in time
 
port(CLK:in STD_LOGIC:='0'; –- an input port with a default value can be left unconnected, this could be useful when SIZE=0
       I:in BOOLEAN;
       O:
out BOOLEAN);
end BDELAY; 

architecture TEST of BDELAY is
 
signal D:BOOLEAN_VECTOR(0 to SIZE):=(others=>FALSE); –- delay line signal is SIZE+1 in length
begin
  D(D'low)<=I;
 
process(CLK)
 
begin
   
if rising_edge(CLK) then
     
for K in 1 to SIZE loop
        D(K)<=D(K-1); –- when SIZE=0 this is never executed and we get no registers, just a wire between I and O
     
end loop;
   
end if;
 
end process;
  O<=D(D'
high);
end TEST;

 

When SIZE=0 we get a wire between the I and O ports, for SIZE>0 we get a chain of SIZE flip-flops. I tried to give the shortest and simplest possible behavioral implementation for BDELAY. While this very elegant solution is definitely synthesizable, the results might or might not be what we want. In particular, there could be different optimal implementations for different values of SIZE that the synthesis tool might not give us. Addressing this issue will make the object of future posts, for now let's try to keep things as simple and clear as possible.

 

Something very similar can be done for virtually any base type. A generic DELAY for STD_LOGIC signals is the same as BDELAY except that we use STD_LOGIC_VECTOR instead of BOOLEAN_VECTOR for the delay line type and we initialize it with '0' instead of FALSE. Generic DELAY blocks for STD_LOGIC_VECTOR, SIGNED, UNSIGNED and SFIXED are all very similar. This is how an SFIXED block called SDELAY would look like:

 

library IEEE; 
use IEEE.STD_LOGIC_1164.all; 
use work.types_pkg.all; 

entity SDELAY is generic(SIZE:NATURAL:=1);  -– SIZE has a default value of 1 and cannot be negative, this would require traveling back in time
 
port(CLK:in STD_LOGIC:='0'; -– an input port with a default value can be left unconnected, this could be useful when SIZE=0
       I:in SFIXED;
       O:
out SFIXED);
end SDELAY; 

architecture TEST of SDELAY is
 
signal D:SFIXED_VECTOR(0 to SIZE)(I'range):=(others=>TO_SFIXED(0.0,I)); –- delay line signal is SIZE+1 in length
begin
  D(D'low)<=I;
  process(CLK)
 
begin
   
if rising_edge(CLK) then
     
for K in 1 to SIZE loop
        D(K)<=D(K-1); –- when SIZE=0 this is never executed and we get no registers, just a wire between I and O
     
end loop;
   
end if;
 
end process;
  O<=D(D'
high);
end TEST;

 

Now not just the depth or module latency SIZE is a generic parameter but the width too can be anything because we are using unconstrained SFIXED types for the I and O ports. While we could use two more INTEGER generics, say HI and LO to specify the range of the I and O ports, which by the way is the only way you could do the same thing in Verliog or SystemVerilog, this VHDL only more concise syntax that uses unconstrained port types is very elegant and produces cleaner code. The same generic SDELAY module can now be used everywhere an SFIXED delay is needed, no matter what the depth and width or range of such a delay is. Finally, we can go one step further and create a complex fixed point delay CFIXED leveraging the SDELAY module we already have:

 

library IEEE; 
use IEEE.STD_LOGIC_1164.all; 
use work.types_pkg.all; 

entity CDELAY is generic(SIZE:NATURAL:=1);  –- SIZE has a default value of 1 and cannot be negative, this would require traveling back in time
 
port(CLK:in STD_LOGIC:='0'; –- an input port with a default value can be left unconnected, this could be useful when SIZE=0
       I:in CFIXED;
       O:
out CFIXED);
end CDELAY; 

architecture TEST of CDELAY is begin
  rd:entity work.SDELAY generic map(SIZE=>SIZE)
                       
port map(CLK=>CLK,
                                 I=>I.RE,
                                 O=>O.RE);

  id:
entity work.SDELAY generic map(SIZE=>SIZE)
                       
port map(CLK=>CLK,
                                 I=>I.IM,
                                 O=>O.IM);
end TEST;

 

All the modules we have created so far will work in VHDL-93 with the exception of CDELAY which requires VHDL-2008 support because of the CFIXED type. In all cases, due to the strong typed nature of VHDL the signals connected to the I and O ports must have equal lengths and of course, the correct types.

 

These unconstrained port modules cannot be synthesized as such because the synthesis tool does not know what the actual ranges of the I and O ports are. These become known at synthesis time only when the generic module is instantiated with signals of known size attached to those ports. Here is an example of how such a CFIXED generic delay would be used in an actual design:

 

library IEEE; 
use IEEE.STD_LOGIC_1164.all; 
use work.types_pkg.all; 

entity TEST_CDELAY is generic(SIZE:NATURAL:=16);
                     
port(CLK:in STD_LOGIC;
                           I:
in CFIXED(RE(1 downto -8),IM(1 downto -8));
                           O:
out CFIXED(RE(1 downto -8),IM(1 downto -8)));
end TEST_CDELAY; 

architecture TEST of TEST_CDELAY is
begin
  cd:entity work.CDELAY generic map(SIZE=>SIZE)
                       
port map(CLK=>CLK,
                                 I=>I,
                                 O=>O);
end TEST;

 

Here the CDELAY is 16 locations deep and the real and imaginary parts are 10-bit wide. And this is how the synthesis result would look like for the CFIXED block:

And for one of the SFIXED sub-blocks, showing only 3 of the 10 bits:

As mentioned earlier, while this design solution is very simple and elegant, the synthesis results might or might not be what we need. In this particular case we get an input FF, an output FF and 14 intermediate delays mapped into an SRL16, which is half of a LUT6. This implementation uses twice as many FFs as needed and a better choice would be an SRL16 configured as a 15 clock delay followed by a single FF (we will see later in another post why the final FF is required). The synthesis tool infers the extra input FF to make timing closure easier but this is rarely needed and does not justify the use of extra FFs. If you have just one such delay in a design, wasting a few FFs might not matter. If you have lots of them and your FPGA is already almost fully utilized this could become a critical issue.

 

There are many ways a delay line can be implemented in a Xilinx FPGA - as a chain of flip-flops, using primitives called SRL16 and SRL32, which can map up to 16 respectively 32 flip-flops into a half respectively a single LUT6, with distributed RAM and a counter where we can have the equivalent of up to 64 flip-flops in a single LUT6 or for even longer delays dedicated memory blocks like BRAMs (18Kb and 36Kb) and URAMs (288Kb) could be used. Which implementation is better depends on the delay depth and width as well as the available resources and this is design dependent.

 

A more useful xDELAY implementation would target different primitives based either on the SIZE value or another generic that could be used to force a particular implementation independent of SIZE. Such a generic could even be used to chose between the elegant behavioral implementation presented above and various implementations based on primitive instantiations. The same thing could also be achieved using an entity with multiple architecture definitions.

 

One final observation - it is very clear that all the xDELAY modules we have created so far are virtually identical, it would be very nice if instead of having a different delay module definition like BDELAY, SDELAY, CDELAY and so on for different base types we could have a single DELAY module that would work for all of them, something like this:

 

library IEEE; 
use IEEE.STD_LOGIC_1164.all; 

entity DELAY is generic(type T;
                        INIT_VAL:T;
                        SIZE:
INTEGER:=1);
               
port(CLK:in STD_LOGIC:='0';
                     I:
in T;
                     O:
out T);
end
DELAY; 

architecture TEST of DELAY is
 
type TA is array(0 to SIZE) of T;
 
signal D:TA:=(others=>INIT_VAL);
begin
  D(D'low)<=I;
  process(CLK)
 
begin
   
if rising_edge(CLK) then
     
for K in 1 to SIZE loop
        D(K)<=D(K-1);
     
end loop;
   
end if;
 
end process;
  O<=D(D'
high);
end TEST;

 

Here the base type itself T is undefined and in theory it can be anything. The delay line type, an array of T is defined locally and if we want it initialized to a particular value we have to pass that as a generic too, since its type is unknown when DELAY is compiled. This is actually valid VHDL-2008 code but unfortunately Vivado does not support generic types in synthesis nor in simulation yet so we cannot use this higher level and even more elegant coding style. The only workaround available right now is to define a different generic xDELAY module for every different type that we want to delay. I promised roadblocks along the way and this is one of them.

 

Update: this feature is in beta in Vivado 2018.3 and works now for synthesis (but not simulation). It will become available in the next tool versions so it is possible now to create a single abstract DELAY entity that can be used to delay a signal of any arbitrary base type, including user defined ones, by an arbitrary number of clocks.

 

Back to the top: The Art of FPGA Design