The DSP58 Primitive

 

Xilinx has recently announced a new 7nm FPGA family called Versal. Devices in this family will have an improved version of the UltraScale/UltraScale+ DSP48E2 primitive we just studied in the last posts. The new Versal primitive is called DSP58 and there are numerous improvements compared to the earlier DSP48.

 

First of all, the signed multiplier, which was 27x18 in DSP48 is now 27x24 and the 48-bit post-adder/accumulator is 58 bits, which is where the name of the new primitive comes from. Together, these two new features will make implementing larger FIRs easier - it is now possible to add 256 27x24 product terms together in a DSP58 chain without concerns about internal overflows. The P cascade shift right by 17 bits that can be used to implement larger multipliers is still supported for backwards DSP48 compatibility but now there is a new shift right by 23 bits mode to match the larger multiplier.

 

One entirely new feature is the dot product or INT8 mode. The 27-bit A port can be divided into three 9-bit parts A2, A1 and A0, while the 24-bit B port is similarly divided into three 8-bit parts, B2, B1 and B0. The DSP58 can then compute P=±A2*B2±A1*B1±A0*B0 in one operation. This is a major improvement over what could have been done with one DSP48, where only two INT8 products could be computed in parallel, they had to share a common factor and without the final sum. This new feature will make implementing AI algorithms like deep neural networks much more efficient.

 

The 12 and 24-bit SIMD modes still exist, as well as the wide XOR12, XOR24, XOR48 and XOR96 modes, but new XOR22, XOR34, XOR58 and XOR116 modes have been added.

 

Another completely new feature is the complex mode. In this mode two adjacent DSP58s are connected in tandem and they can implement an 18x18 complex multiplier, where one DSP58 computes the real part of the result and the second one the imaginary part. The 3-input post adder is also available in a complex format. This is a significant improvement over the 3 or 4 DSP48s required to implement a complex multiplication.

 

Finally, the DSP58 supports now floating point operations. A single DSP58 can implement a floating point multiplication, either FP32 or FP16, followed by a FP32 adder/accumulator. This replaces from 2 to 4 DSP48s, plus a few hundred LUTs and FFs that were needed to implement the same functionality in earlier Xilinx FPGA families.

 

Like with DSP48s, both behavioral inference and direct primitive instantiation design styles will be available, the new primitives are called DSP58, DSPCPLX and DSPFP32. The same guideline remains valid, if the synthesis tool infers the desired result use behavioral code, if not instantiate a device primitive directly.

 

Back to the top: and  The Art of FPGA Design