## NUMERIC_STD SIGNED is not a good choice for fixed point arithmetic

Most signal processing applications require handling fixed point data types, not just integers. These numbers have a binary point, assuming we use binary or base 2 representation, with an integer part to the left of the binary point and a fractional part to the right of it. While it is possible to use integer numbers for that with an implicit binary point position there are numerous problems when trying to use the NUMERIC_STD SIGNED type for that.

First of all, due to the unfortunate definition of the SIGNED type (and STD_LOGIC_VECTOR for that matter) it is not possible to use ranges with negative 'low values. So trying to do things like this is illegal:

`  signal S:SIGNED(7 downto -4); – S'low cannot be negative begin   S<=TO_SIGNED(3.25,S'high,S'low); – there is no such conversion function that could handle fixed point values `

Secondly, all arithmetic operators defined for SIGNED operands right align them before applying the operation, even when they have different 'low values. The SIGNED bit representation is not tied to their actual numeric value in a consistent way and mixing SIGNED operands with different 'low values in expressions is confusing, can lead to unexpected results and should in general be avoided:

`  signal A:SIGNED(7 downto 4);  signal B:SIGNED(7 downto 2);  signal S:SIGNED(7 downto 0); begin   A<=TO_SIGNED(5,A'length);  – A is "0101" but what is its actual numeric value, 5 or 80?   B<=TO_SIGNED(10,B'length); – B is "001010" but is this 10 or 40?   S<=RESIZE(A,S'length)+RESIZE(B,S'length); – what is the value of S? `

The actual value of S is 15 but the meaning of such arithmetic operations is dubious to say the least. Should the result be 15 or 120? In conclusion, if all the fixed point values in your design have the same 'low range using SIGNED with an implicit common binary point might work but in the general case where you can have different fixed point ranges and must accommodate bit growth and handle overflows properly the SIGNED type is a poor design choice for representing fixed point numbers.

In conclusion, while SIGNED/UNSIGNED are good for representing integer numbers with signals that have a 'low value of 0, we need something much better for the more general case of fixed point arithmetic. This is where VHDL's ability to let the designer define his own types, operators and conversion functions becomes very useful.

## SFIXED to the rescue

This limitation of the VHDL-93 standard has been recognized and addressed in the VHDL-2008 version, which introduces a new IEEE.fixed package, with two new types called SFIXED and UFIXED, which are similar to SIGNED and UNSIGNED but for fixed point rather than integer values. Unfortunately the tool adoption rate of VHDL-2008 has been dismally slow - in the land of HDL standards time is measured in decades and VHDL-93 is "modern". The VHDL-2008 SFIXED and UFIXED types implementation is complicated and may lead to inefficient synthesis results due to the way overflow, saturation and rounding are handled. Designers who found the transition from STD_LOGIC arithmetic to SIGNED/UNSIGNED difficult will find the use of SFIXED/UFIXED even more challenging. On top of that, there is still no support in the package for complex fixed point numbers and arbitrary vectors and matrices of real and complex fixed point numbers, the user still needs to define her own types and operators for these derived types anyway.

So while I might come back to the VHDL-2008 SFIXED/UFIXED types in future posts, the user defined type example I will use here, while still called SFIXED is not the same thing as the standard VHDL-2008 type, it just happens to use the same name. My SFIXED example is OK to use in VHDL-93 and has a much simpler implementation than the VHDL-2008 equivalent, in particular there is no rounding or saturation, very much like SIGNED. This is also not an attempt at replacing the original VHDL-2008 SFIXED, just an example of a user defined type illustrating advanced VHDL design techniques.

So we start by defining our own VHDL-93 SFIXED type:

`  type SFIXED is array(INTEGER range <>) of STD_LOGIC; – arbitrary precision fixed point signed number, like SIGNED but lower bound can be negative `

The only difference from STD_LOGIC_VECTOR and SIGNED/UNSIGNED definitions is the use of INTEGER instead of POSITIVE as the range type. This is very important because it will let us use negative 'low values and will make representing fixed point numbers possible without the need for an implicit binary point, the binary point is always to the right of bit 0.

## User defined operators and functions

We can now start defining our own SFIXED operators. While we use the same code template and in the end reduce SFIXED operators to SIGNED operators, there is a fundamental difference here - SIGNED operators right align their operands, SFIXED operators align the binary points of their operands. A second fundamental difference is that the range of the result of an SFIXED operation is chosen so that full numerical precision is maintained. While the sum of two SIGNED operands has the same range as the operands and any overflows are ignored, the sum of two SFIXED operands is extended in both directions so that the numerical value of the result is always correct. It is the designer's responsibility to limit the bitgrowth through saturation or wraparound on the MSB side respectively truncation or rounding on the LSB side when using the result of the operation, this is not done by the operator itself. This is the main difference between this VHDL-93 user defined SFIXED type and the standard VHDL-2008 one. Let's create a "+" operator for the new user defined SFIXED type:

`  function MIN(A,B:INTEGER) return INTEGER is  begin    if A<B then      return A;    else      return B;    end if;  end;  function MAX(A,B:INTEGER) return INTEGER is  begin    if A>B then      return A;    else      return B;    end if;  end;  function "+"(X,Y:SFIXED) return SFIXED is    variable SX,SY,SR:SIGNED(MAX(X'high,Y'high)+1-MIN(X'low,Y'low) downto 0);    variable R:SFIXED(MAX(X'high,Y'high)+1 downto MIN(X'low,Y'low));  begin    for K in SX'range loop      if K<X'low-Y'low then        SX(K):='0';           – zero pad X LSBs      elsif K>X'high-R'low then         SX(K):=X(X'high);     – sign extend X MSBs      else         SX(K):=X(R'low+K);      end if;    end loop;    for K in SY'range loop      if K<Y'low-X'low then         SY(K):='0';           – zero pad Y LSBs      elsif K>Y'high-R'low then         SY(K):=Y(Y'high);     – sign extend Y MSBs      else         SY(K):=Y(R'low+K);      end if;    end loop;    SR:=SX+SY; – SIGNED addition    for K in SR'range loop       R(R'low+K):=SR(K);    end loop;    return R;  end; `

What we did here was to overload the "+" operator so that it can operate on our newly defined SFIXED type. The two SFIXED operands are first converted to SIGNED, taking advantage that at the bit level the two types are compatible, both are arrays of STD_LOGIC. The actual addition is done as NUMERIC_STD SIGNED and the result is then converted back to SFIXED. The conversion loops are written in a way that ensures binary point alignment and correct numerical result.

Like with SIGNED and UNSIGNED types, RESIZE is your best verbose friend when you need to change the range of an SFIXED signal. Of course, since SFIXED is a user defined type, you have to define your own RESIZE function now:

`  function RESIZE(X:SFIXED;H,L:INTEGER) return SFIXED is     variable R:SFIXED(H downto L);  begin     for K in R'range loop       if K<X'low then         R(K):='0';           – zero pad X LSBs       elsif K>X'high then         R(K):=X(X'high);     – sign extend X MSBs       else         R(K):=X(K);      end if;    end loop;    return R;  end;  function RESIZE(X:SFIXED;HL:SFIXED) return SFIXED is   begin     return RESIZE(X,HL'high,HL'low);  end; `

We have here two versions of RESIZE one that specifies the desired result range through high and low values and a second version that uses an existing SFIXED signal to pass those two values with a single function argument. It literally means "resize signal X so that it can be assigned to signal HL". Missing MSBs are sign extended, missing LSBs are zero padded, extra MSBs and LSBs are dropped (wrap overflow respectively truncate).

We also need a function to assign REAL (double precision floating point actually) values to SFIXED signals:

`  function TO_SFIXED(R:REAL;H,L:INTEGER) return SFIXED is     variable RR:REAL;    variable V:SFIXED(H downto L);  begin     assert (R<2.0**H) and (R>=-2.0**H) report "TO_SFIXED vector truncation!" severity warning;    if R<0.0 then       V(V'high):='1';      RR:=R+2.0**V'high;    else       V(V'high):='0';      RR:=R; end if;      for K in V'high-1 downto V'low loop         if RR>=2.0**K then           V(K):='1';          RR:=RR-2.0**K;        else          V(K):='0';        end if;      end loop;    return V;  end;  function TO_SFIXED(R:REAL;HL:SFIXED) return SFIXED is  begin    return TO_SFIXED(R,HL'high,HL'low);  end; `

Like RESIZE, we have two versions of TO_SFIXED that we can use. We have now all the pieces in place to write very elegant VHDL code like this:

`  signal A:SFIXED(7 downto -4);  signal B:SFIXED(5 downto -2);  signal S:SFIXED(6 downto -3); begin   A<=TO_SIGNED(5.25,A); – A is "00000101.0100"   B<=TO_SIGNED(3.5,B);  – B is   "000011.10"   S<=RESIZE(A+B,S);     – S is  "0001000.110" which is 8.75 `

The A+B result is actually SFIXED(8 downto -4) but it is being resized to the actual range of the S signal so it can be assigned to it.

This has been a longer post (too long? please let me know) but it is the direction I plan to take this blog in the future, providing useful code examples to illustrate "The Art of FPGA Design". Next week's post will conclude those dedicated to VHDL types with even more advanced topics, fixed point complex numbers and unconstrained vectors and matrices of such signals.

Back to the top: The Art of FPGA Design