Using the Carry-Save Adder, The Constant Coefficient Multiplier Multiplications in Xilinx FPGAs are done using DSP48s, which are primitives that consist of a 25x18 signed multiplier, a 25-bit preadder and a 48-bit postadder/accumulator. In UltraScale/UltraScale+ FPGA families the signed multiplier is 27x18 and the post adder has three inputs instead of just two. Depending on the FPGA size and family there are hundreds to thousands of such DSP48 primitives, that are able to do one multiply ...

Using the Carry-Save Adder, A Generic Adder Tree In this post I will show how to implement an efficient and generic adder tree, we need to compute the sum of N elements, where N can be any value. The numbers we add are also arbitrary precision fixed point values, all the same range but otherwise unconstrained. We can represent the input data as an unconstrained array of unconstrained SFIXED, which requires VHDL-2008 support - with Vivado we can synthesize and implement this but we ...

Using the Carry-Save Adder, Computing a Running Average I will show in the next few posts some design examples where using a 3-input carry-save adder instead of the normal 2-input ripple-carry adder makes a significant difference. The first example is a running average, where we have a stream of input samples and we want to compute a continuous running average every clock, as the average of the last N samples. In mathematical terms: y(n)=1/N*Sum(x(n-k)), k=0..N-1 As a firs ...

The Carry-Save Adder, two for the price of one This post is about buying two adders but paying only for one of them. When developing software the CPU and memory your code is running on is already paid for and there is little incentive to optimize your code to make it either smaller or faster. But as a hardware designer you literally pay for every LUT and FF in the FPGA you are using. If you could make your design smaller and faster you could do more with the same FPGA or you could ...

Counters, Adders and Accumulators One of the most common operation encountered in digital hardware design, especially for digital signal processing applications, is addition. This actually covers a large group of fundamental building blocks, like up/down binary counters, adders/subtractors, comparators, accumulators and so on. The signal types operated on can be IEEE.numeric_std SIGNED/UNSIGNED for integer operands, the user defined SFIXED introduced earlier, or the default VHDL-2008 type ...

The Universal MUX Building Block Part 3, the one with the Dutch Cocoa Box and the Ouroboros We have seen in the previous post that Vivado Synthesis is able to optimally infer a mux form behavioral code for multiplexers with up to 16 inputs, but beyond that not so much. The synthesis results are not bad but for high performance designs where every LUT and especially every logic level counts not bad is not good enough. So in this post I will present a solution to this problem, that ...

The Universal MUX Building Block Part 2 So the question is now what is the most efficient implementation for arbitrary size multiplexers one should expect? If the result the synthesis tools infers from behavioral code is equal or very close to this there is no need for a specialized MUX Building Block. If the difference is significant then there will be a definite need for such a block, especially for designs with large muxes and/or many of them. To simplify the analysis we will f ...

The Universal MUX Building Block The next example in the series of generic building blocks is a multiplexer. This is a combinatorial block - if we need pipelining we can always add that separately to keep it as generic as possible - with an input port I of N elements, an UNSIGNED SEL port and an output port O, which is one of the N elements of the input port I selected by SEL. We want of course the I and SEL ports to be unconstrained arrays. The most generic solution would be one with a g ...

Instantiating LUT6 Primitives Part 2 Today I will show a couple of examples where LUT6 primitive instantiations make sense. To keep things short and simple these are somewhat artificial examples but situations like these tend to show up all the time in hardware designs. Let's say we need a 48-input AND function. This can be coded very easily behaviorally, especially if we take advantage of the new VHDL-2008 features: library IEEE; use IEEE.STD_LOGIC_1164.all; entity WideAN ...

Instantiating LUT6 Primitives In the previous post we have already seen how to instantiate FPGA primitives, SRL16s in that case. The role of the synthesis tool is to take HDL behavioral code and translate it into a netlist of FPGA fundamental building blocks called primitives. This is very much like the software design flow, where a C compiler takes C code and produces machine code that a processor can execute. The same way a C compiler lets you embed assembly code into your C program, th ...

The Universal DELAY Building Block Part 2, the one with the cake In the last post I have introduced an example of a universal delay block that uses a behavioral implementation to create a reusable module that can be used to delay a signal by an arbitrary but fixed value. Both the delay size and its width are generic respectively unconstrained, which makes the design reusable. While the behavioral implementation is quite compact and elegant, the synthesis result is not always what we reall ...

The Universal DELAY Building Block This is the first post in this blog in which I will try to actually design something useful based on the ideas introduced so far. Also, instead of just short code snippets, the code examples in this post are complete and they can be simulated, synthesized and used in actual designs. We will try to create a generic and reusable delay module - the goal is to delay a signal of some given type by a fixed but arbitrary number of clocks, this is a fund ...

We're Not in Kansas Anymore In this post we will try to do things you probably never tried to do using VHDL, assuming they were not even possible in such a low level language. VHDL and Verilog/SystemVerilog are considered low level hardware design languages, compared with C/C++ based HLS flows for example. Only schematic capture is lower on the totem pole and hopefully nobody is still doing FPGA designs using schematics anymore. But the low level label attached to HDLs is only par ...

VHDL User Defined TypesNUMERIC_STD SIGNED is not a good choice for fixed point arithmetic Most signal processing applications require handling fixed point data types, not just integers. These numbers have a binary point, assuming we use binary or base 2 representation, with an integer part to the left of the binary point and a fractional part to the right of it. While it is possible to use integer numbers for that with an implicit binary point position there are numerous problems when try ...

Beyond STD_LOGIC In the last post we just started to scratch the surface of VHDL types. This time we will try to go deeper - which types should we use, what are their properties and limitations, what are the main pitfalls a beginner would encounter and how to solve them. We have seen that while the basic type for software programming is INTEGER, of various but fixed sizes, signed or unsigned, the VHDL fundamental type is the 1-bit STD_LOGIC. It has 9 possible values, some of which ...