The kernel definition is exactly the same as the previous part of this tutorial. The only difference is in the graph to encode this 16 kernel four-phase filter.
At the graph level, all the kernels are first declared in a class:
class FIRGraph_4Kernels: public adf::graph
{
private:
kernel k[4][4];
public:
input_port in[4];
output_port out[4];
The constructor takes charge in the next operations. The first operation is to create the kernels. The complete grid of 4x4 kernels is defined as follows:
FIRGraph_SSR4()
{
// k[N][0] is always the first in the cascade stream
// Topology of the TopGraph
//
// 3,3 3,2 3,1 3,0 <--
// --> 2,0 2,1 2,2 2,3
// 1,3 1,2 1,1 1,0 <--
// --> 0,0 0,1 0,2 0,3
k[0][0] = kernel::create_object<SingleStream::FIR_MultiKernel_cout<NUM_SAMPLES,SHIFT>>(taps4_p0);
k[0][1] = kernel::create_object<SingleStream::FIR_MultiKernel_cincout<NUM_SAMPLES,SHIFT>>(taps4_p1);
k[0][2] = kernel::create_object<SingleStream::FIR_MultiKernel_cincout<NUM_SAMPLES,SHIFT>>(taps4_p2);
k[0][3] = kernel::create_object<SingleStream::FIR_MultiKernel_cin<NUM_SAMPLES,SHIFT>>(taps4_p3);
.
.
.
k[3][0] = kernel::create_object<SingleStream::FIR_MultiKernel_cout<NUM_SAMPLES,SHIFT>>(taps4_p0);
k[3][1] = kernel::create_object<SingleStream::FIR_MultiKernel_cincout<NUM_SAMPLES,SHIFT>>(taps4_p3);
k[3][2] = kernel::create_object<SingleStream::FIR_MultiKernel_cincout<NUM_SAMPLES,SHIFT>>(taps4_p2);
k[3][3] = kernel::create_object<SingleStream::FIR_MultiKernel_cin<NUM_SAMPLES,SHIFT>>(taps4_p1);
The source and header locations are then defined for the AI Engine. The location of the first AI Engine in each row must also be constrained to facilitate the placer work:
// Constraints: location of the first kernel in the cascade
for(int i=0;i<NPhases;i++)
{
int j = (i%2?28:25); // 25 on even rows and 28 on odd rows
location<kernel>(k[i][0]) = tile(j,i);
}
To shorten the place time by a few seconds, you can constrain the core location. A single one is necessary because all the others are constrained by the cascade connection:
// Constraints: location of the first kernel in the cascade
location<kernel>(k[0]) = tile(25,0);
All kernels need to discard a specific number of elements, this is handled by the initialization function as this must be done beforehand and only once. This can be done in a loop on the column and rows with two initialization functions:
SingleStream::FIRinit<0>SingleStream::FIRinit<1>
Finally, the kernels must be connected together with the cascade stream in between them, and the input streams for all of them.
// Cascade Connections
for(int row=0;row<NPhases;row++)
{
for(int i=0;i<NPhases-1;i++) connect<cascade> (k[row][i].out[0],k[row][i+1].in[1]);
connect<stream> (k[row][3].out[0],out[row]);
}
// Input Streams connections and DMA FIFO constraints
for(int row = 0;row<NPhases;row++)
for(int col=0;col<NPhases;col++)
{
int col1 = (row%2?NPhases-col-1:col); // kernel col is inverted on odd rows
int fiforow = row; // Each Kernel is served by an independent FIFO
connect<stream> n0 (in[col],k[row][col1].in[0]);
fifo_depth(n0) = 512;
location<fifo>(n0) = dma_fifo(aie_tile, FirstCol+col, fiforow, 0x0000, 512);
}