It is also possible to generate an RTL that Vivado can use (if allowed by the placement, of course) to cascade DSPs via the direct ACIN, BCIN and PCIN inputs and ACOUT, BCOUT and PCOUT outputs
This can be achieved by instantiating the
hls::dsp(48e1|48e2|58)::cascade
class.
The class has a public method called mul_add (more methods will be added in the future) to perform the required operation with the cascaded inputs from the previous stage and return the cascaded outputs.
Return Struct
struct R_t {
A_t acout;
B_t bcout;
C_t pcout;
};
R_t mul_add(A_t a, B_t b, C_t c);
The result of mul_add
is a struct that allows the designer to chain the
DSP for example, from BCOUT
output port to BCIN
input
port as shown in this example, which models a templated FIR with size taps and using the
systolic cascaded architecture.
#include "hls_dsp_builtins.h"
using namespace hls::dsp48e2;
...
template <int size>
class FIR {
private:
const long* coeff_;
long bias_;
cascade<REG_A1 | REG_B1 | REG_M | REG_P > dsp0;
cascade<REG_A1 | REG_B1 | REG_B2 | REG_M | REG_P > dspN[size - 1];
public:
FIR(const long* coeff, long bias)
: coeff_(coeff), bias_(bias) {
#pragma HLS ARRAY_PARTITION variable=dspN complete dim=1
};
long fir(B_t input){
auto out = dsp0.mul_add(coeff_[0], input, bias_);
for(int j=1; j < size ; j++){
#pragma HLS unroll
out = dspN[j - 1].mul_add(coeff_[j], out.bcout, out.pcout);
}
return out.pcout;
};
};
void test_cascaded_systolic_fir(long b[N], long c, long hw[N])
{
long a[4] = {1, 12, 3, 4}; // coefficients
#pragma HLS ARRAY_PARTITION variable=a complete dim=1
#pragma HLS ARRAY_PARTITION variable=hw complete dim=1
FIR<4> my_fir(a, c);
LOOP_FIR:
for(int i = 0 ; i< N ; i++){
hw[i] = my_fir.fir(b[i]);
}
}
Note the following details:
- Two different configurations of the cascaded DSPs are needed for the first stage of the cascade and the rest of the stages.
- Passing out.bcout and out.pcout as the b and c inputs of each mul_add call allows Vivado to use the bcin and pcin inputs of the next FIR stage.
- The array of dsp(48e1|48e2|58)::cascade objects must be fully partitioned to create size independent DSP instances by using the array_partition pragma.
-
Vitis HLS generates synthesizable RTL from the code
above, which means that:
- Cascading is selected by Vivado only at placement time, if the corresponding DSP units can be placed close to each other.
- Users should always check the Vivado report to see if cascading happened and use Vivado to determine why it could not happen (For example, insufficient neighboring DSP resources).
- For correct cascading in Vivado, the following
registers must always be used.
- REG_A1 and/or REG_A2
- REG_B1 and/or REG_B2
- REG_P
- For the mul_add and mul_sub to be used.
- REG_AD and REG_D cannot be used (this restriction can be lifted in future releases).
- REG_C cannot be used in the cascade version (this restriction can be lifted in future releases).