DSP Cascading - 2024.2 English - UG1399

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2024-11-13
Version
2024.2 English

It is also possible to generate an RTL that Vivado can use (if allowed by the placement, of course) to cascade DSPs via the direct ACIN, BCIN and PCIN inputs and ACOUT, BCOUT and PCOUT outputs

This can be achieved by instantiating the hls::dsp(48e1|48e2|58)::cascade class.

The class has a public method called mul_add (more methods will be added in the future) to perform the required operation with the cascaded inputs from the previous stage and return the cascaded outputs.

Return Struct

struct R_t {
  A_t acout;
  B_t bcout;
  C_t pcout;
};
R_t mul_add(A_t a, B_t b, C_t c);

The result of mul_add is a struct that allows the designer to chain the DSP for example, from BCOUT output port to BCIN input port as shown in this example, which models a templated FIR with size taps and using the systolic cascaded architecture.

#include "hls_dsp_builtins.h"
using namespace hls::dsp48e2;
...
template <int size>
class FIR {
     private:
        const long* coeff_;
        long bias_;
        cascade<REG_A1 | REG_B1 | REG_M | REG_P > dsp0;
        cascade<REG_A1 | REG_B1 | REG_B2 | REG_M | REG_P > dspN[size - 1];
 
     public:
        FIR(const long* coeff, long bias)
            : coeff_(coeff), bias_(bias) {
#pragma HLS ARRAY_PARTITION variable=dspN complete dim=1
            };
 
        long fir(B_t input){
 
            auto out = dsp0.mul_add(coeff_[0], input, bias_);
            for(int j=1; j < size ; j++){
#pragma HLS unroll
                out = dspN[j - 1].mul_add(coeff_[j], out.bcout, out.pcout);
            }
            return out.pcout;
        };
};
 
void test_cascaded_systolic_fir(long b[N], long c, long hw[N])
{
    long a[4] = {1, 12, 3, 4}; // coefficients
#pragma HLS ARRAY_PARTITION variable=a complete dim=1
#pragma HLS ARRAY_PARTITION variable=hw complete dim=1
    FIR<4> my_fir(a, c);
 
    LOOP_FIR:
    for(int i = 0 ; i< N ; i++){
        hw[i] = my_fir.fir(b[i]);
    }
}

Note the following details:

  • Two different configurations of the cascaded DSPs are needed for the first stage of the cascade and the rest of the stages.
  • Passing out.bcout and out.pcout as the b and c inputs of each mul_add call allows Vivado to use the bcin and pcin inputs of the next FIR stage.
  • The array of dsp(48e1|48e2|58)::cascade objects must be fully partitioned to create size independent DSP instances by using the array_partition pragma.
  • Vitis HLS generates synthesizable RTL from the code above, which means that:
    • Cascading is selected by Vivado only at placement time, if the corresponding DSP units can be placed close to each other.
    • Users should always check the Vivado report to see if cascading happened and use Vivado to determine why it could not happen (For example, insufficient neighboring DSP resources).
  • For correct cascading in Vivado, the following registers must always be used.
    • REG_A1 and/or REG_A2
    • REG_B1 and/or REG_B2
    • REG_P
  • For the mul_add and mul_sub to be used.
    • REG_AD and REG_D cannot be used (this restriction can be lifted in future releases).
    • REG_C cannot be used in the cascade version (this restriction can be lifted in future releases).