The DSP48 Primitive - Small Multiplications - Two For the Price of One The DSP48E2 primitive contains a signed 27x18 multiplier, any signed multiplier up to this size can be implemented with just one such primitive. Unsigned multiplications are of course possible if you add a zero MSB bit to the operands and then treat them as signed but the largest unsigned multiplication that can be done that way with one DSP48E2 is 26x17. When the operands are much smaller it becomes possib ...

The DSP48 Primitive - Wide XOR Mode The DSP48 primitive can be used for more than just multiply and accumulate. It can for example implement very wide XOR functions. Apart from the obvious ability of XORing two 48-bit operands using the A concatenated with B, or A:B and C inputs and producing a 48-bit result on the P output, or 48 XOR2 logic functions, it is also possible to implement 8 XOR12s, or 4 XOR24s, or 2 XOR48s or one XOR96 with a single DSP48E2. It is also possible to compute a ...

The DSP48 Primitive - Symmetric FIR with DSP48 Primitive Instantiations We will now add the option of choosing between DSP48 inference or primitive instantiations to the symmetric FIR introduced in Post 22. It might make sense to review Post 22 and Post 23 before continuing. We will use the same technique, the generic BEHAVIORAL can be set to TRUE or FALSE to select between the two implementations. This is very similar to the code used in Post 24, except that now we are using the ...

The DSP48 Primitive - FIR with DSP48 Primitive Instantiations After a break I will be resuming my weekly posts on The Art of FPGA Design. In the last posts we started looking at the DSP48 primitive, essentially a signed 27x18 multiplier which also includes a 27-bit pre-adder and a 3-input 48-bit post-adder. In older FPGA families like the 7-series the multiplier is 25x18, the pre-adder is 25-bits and the 48-bit post-adder has only two inputs. The DSP48 primitive also includes a lot of p ...

The DSP48 Primitive - Instantiating the DSP48 Behavioral inference has many advantages - relatively simple and compact code, works with signed and unsigned operands of any size, hides the intricacies of the DSP48 primitive from the user. It should definitely be the first choice when coding a DSP based design if it produces the desired results in terms of device utilization and clock speed. That's a big if, when things do not go as you want there isn't much you can do - fighting wi ...

The DSP48 Primitive - Behavioral Symmetric FIR Inference The DSP48 primitive has an optional preadder function, which can be used to compute things like PCOUT=PCIN+(A+D)*B, which when used for implementing symmetric or anti-symmetric FIRs can reduce the number of multipliers used in half. The following diagram shows how such a symmetric FIR is built using the case N=4, a symmetric FIR with 8 taps as an example: The forward data delay line is identical to the one for the non-sym ...

The DSP48 Primitive - Behavioral FIR Inference As mentioned earlier, the DSP48 primitive is an essential part of any signal processing FPGA design and in over 90% of cases it's either FIR like sums of products or complex multiplications. For this reason we will focus now on efficient implementation of Finite Impulse Response filters with DSP48s, which will also cover other cases where computation of sums of products is required like linear algebra matrix multiplication and convolutional n ...

The DSP48 Primitive This post will start a longer series dedicated to the DSP48 primitive, a MAC (multiply/accumulate) block which is the workhorse for any kind of signal processing design that requires lots of mathematical operations beyond simple additions or subtractions, which are well handled with fabric based implementations that use the dedicated carry chain primitives. The DSP48, of which there are multiple flavors, one for each Xilinx FPGA family, started as a signed 18x1 ...

Using the Carry-Save Adder, The Constant Coefficient Multiplier Multiplications in Xilinx FPGAs are done using DSP48s, which are primitives that consist of a 25x18 signed multiplier, a 25-bit preadder and a 48-bit postadder/accumulator. In UltraScale/UltraScale+ FPGA families the signed multiplier is 27x18 and the post adder has three inputs instead of just two. Depending on the FPGA size and family there are hundreds to thousands of such DSP48 primitives, that are able to do one multiply ...

Using the Carry-Save Adder, A Generic Adder Tree In this post I will show how to implement an efficient and generic adder tree, we need to compute the sum of N elements, where N can be any value. The numbers we add are also arbitrary precision fixed point values, all the same range but otherwise unconstrained. We can represent the input data as an unconstrained array of unconstrained SFIXED, which requires VHDL-2008 support - with Vivado we can synthesize and implement this but we ...

Using the Carry-Save Adder, Computing a Running Average I will show in the next few posts some design examples where using a 3-input carry-save adder instead of the normal 2-input ripple-carry adder makes a significant difference. The first example is a running average, where we have a stream of input samples and we want to compute a continuous running average every clock, as the average of the last N samples. In mathematical terms: y(n)=1/N*Sum(x(n-k)), k=0..N-1 As a firs ...

The Carry-Save Adder, two for the price of one This post is about buying two adders but paying only for one of them. When developing software the CPU and memory your code is running on is already paid for and there is little incentive to optimize your code to make it either smaller or faster. But as a hardware designer you literally pay for every LUT and FF in the FPGA you are using. If you could make your design smaller and faster you could do more with the same FPGA or you could ...

Counters, Adders and Accumulators One of the most common operation encountered in digital hardware design, especially for digital signal processing applications, is addition. This actually covers a large group of fundamental building blocks, like up/down binary counters, adders/subtractors, comparators, accumulators and so on. The signal types operated on can be IEEE.numeric_std SIGNED/UNSIGNED for integer operands, the user defined SFIXED introduced earlier, or the default VHDL-2008 type ...

The Universal MUX Building Block Part 3, the one with the Dutch Cocoa Box and the Ouroboros We have seen in the previous post that Vivado Synthesis is able to optimally infer a mux form behavioral code for multiplexers with up to 16 inputs, but beyond that not so much. The synthesis results are not bad but for high performance designs where every LUT and especially every logic level counts not bad is not good enough. So in this post I will present a solution to this problem, that ...

The Universal MUX Building Block Part 2 So the question is now what is the most efficient implementation for arbitrary size multiplexers one should expect? If the result the synthesis tools infers from behavioral code is equal or very close to this there is no need for a specialized MUX Building Block. If the difference is significant then there will be a definite need for such a block, especially for designs with large muxes and/or many of them. To simplify the analysis we will f ...