Use the dataflow pragma for concurrently scheduling the three functions mm2s0, dmaHls_rowsToCols, and s2mm1.
int dma_hls(
hls::stream<ap_axiu<128, 0, 0, 0>> &strmOut_to_rowiseFFT,
hls::stream<ap_axiu<128, 0, 0, 0>> &strmInp_from_rowiseFFT,
hls::stream<ap_axiu<128, 0, 0, 0>> &strmOut_to_colwiseFFT,
hls::stream<ap_axiu<128, 0, 0, 0>> &strmInp_from_colwiseFFT,
int matSz, int rows, int cols, int iterCnt
)
{
#pragma HLS INTERFACE axis port=strmOut_to_rowiseFFT
#pragma HLS INTERFACE axis port=strmInp_from_rowiseFFT
#pragma HLS INTERFACE axis port=strmOut_to_colwiseFFT
#pragma HLS INTERFACE axis port=strmInp_from_colwiseFFT
#pragma HLS INTERFACE s_axilite port=matSz bundle=control
#pragma HLS INTERFACE s_axilite port=rows bundle=control
#pragma HLS INTERFACE s_axilite port=cols bundle=control
#pragma HLS INTERFACE s_axilite port=iterCnt bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control
#pragma HLS DATAFLOW
int stg0_errCnt = 0, stg1_errCnt = 0;
ap_uint<128> goldenVal;
goldenVal.range(127, 64) = 0x0000000100000001;
goldenVal.range( 63, 0) = 0x0000000100000001;
LOOP_ITER_MM2S0:for(int i = 0; i < iterCnt; ++i)
{
#pragma HLS loop_tripcount min=1 max=8
mm2s0(strmOut_to_rowiseFFT, matSz);
}
LOOP_ITER_S2MM0_TO_MM2S1:for(int i = 0; i < iterCnt; ++i)
{
#pragma HLS loop_tripcount min=1 max=8
dmaHls_rowsToCols(strmInp_from_rowiseFFT, strmOut_to_colwiseFFT, \
matSz, rows, cols, stg0_errCnt, goldenVal);
}
LOOP_ITER_S2MM1:for(int i = 0; i < iterCnt; ++i)
{
#pragma HLS loop_tripcount min=1 max=8
s2mm1(strmInp_from_colwiseFFT, matSz, stg1_errCnt, goldenVal);
}
return (stg0_errCnt + stg1_errCnt);
}
The dma_hls kernel also specifies HLS pragmas to help optimize the kernel code and adhere to interface protocols. See this page for detailed documentation of all HLS pragmas. A summary of the HLS pragmas used in this kernel is given in the following table.
Switch |
Description |
|---|---|
#pragma HLS INTERFACE |
In C/C++ code, all input and output operations are performed, in zero time, through formal function arguments. In a RTL design, these same input and output operations must be performed through a port in the design interface and typically operate using a specific input/output (I/O) protocol. For more information, see this page. |
#pragma HLS PIPELINE II=1 |
Reduces the initiation interval (II) for a function or loop by allowing the concurrent execution of operations. The default type of pipeline is defined by the |
#pragma HLS dataflow |
The |
#pragma HLS loop_tripcount |
When manually applied to a loop, this pragma specifies the total number of iterations performed by a loop. The |