The Single Rate Half-Band FIR


We have started by looking at the most general version of an FIR filter. From a mathematical point of view, this is all that is needed. There are countless variations, like the symmetric versions, both odd and even, they are just particular cases of the general FIR algorithm and they present little interest to a mathematician. But from an implementation point of view, these particular filter versions do matter. One such example is the half-band FIR, which will be the object of this post. 


Half-band filters are odd-symmetric FIRs with a very particular set of coefficients. Every other coefficient is zero, with the exception of the center tap, which is always 0.5 and the filter order is N=4*K-1, where K is the number of non-trivial distinct coefficients. Here is an example of such a filter, with K=3 and N=11:

If we ignore for a moment the center tap coefficient and assume it is also zero, this becomes an even-symmetric FIR, where each delay in the input delay line has been replaced with two delays. We can then use the implementation we have already developed for such filters. The transversal pipelining cuts we have to do to pipeline the post adder chain will add one delay to the delays in the left-to-right delay chain but will subtract one from those in the right to left chain:

For this reason, those delay chains are made of 3, respectively 1 clock delays. Unfortunately, it is no longer possible to implement the forward delay chain with the AREG registers in the DSP48s because there are only two such registers and now we need three, so both the forward and the backward delay chains need to be implemented with fabric registers. Finally, we need to address the issue of the missing center tap. Since the coefficient for that tap is always 0.5 we do not need a multiplier for it, we just inject the input sample suitably delayed into the post-adder chain using the C input of the first DSP48. We achieve the 0.5 coefficient value by doubling all the other coefficients, then dividing the filter output by 2, which costs nothing and maximizes the available fixed point precision dynamic range. This is a scalable and very efficient FIR implementation for this particular type of filter. The number of DSP48s required is only one quarter of the filter order.


We can reduce the fabric utilization further by using a transposed implementation with a twist. In this case the delays in the post-adder chain are replaced with two clock delays, while the forward delay chain is made out of 0-clock delays and the backward delay chain still consists of 1-clock delays. Fortunately, every DSP48 post adder can connect to the P output of the previous one either through the dedicated P cascade, or using fabric routing and the C input. The C input has an optional pipeline register called CREG, so the PREG of one DSP48 and the CREG of the next one together will give us the 2-clock delays between post-adders that we need:

This is an even more efficient implementation and as all transposed FIRs also has low and constant latency, independent of the filter order. Also, like all transposed implementations, it suffers from a scalability problem and speed will start to suffer as the DSP48 chain size grows beyond 20. This however corresponds to a very respectable FIR order of N=79 and the same techniques of partial pipelining every 20 DSP48s illustrated earlier can be applied here too and address that problem for larger filters.


These limitations are actually not that important because the half-band FIR is rarely used in this single rate configuration, where the input and output data sample rates are equal to the system clock frequency. Virtually all use cases of half-band FIR filters are either as a decimator or interpolator by a factor of 2. In the first case the input sample rate is twice the system clock rate, while the output sample rate is equal to that. For the interpolator, the input sample rate and clock frequency are equal and the output rate is double.


As we will see in the next post, there is a further increase in efficiency by another factor of 2 and all these issues we have encountered because of register doubling will also go away.


Back to the top: The Art of FPGA Design Season 2