The kernel definition is exactly the same as the previous part of this tutorial. The only difference is in the graph to encode this 16 kernel four-phase filter.
At the graph level, all the kernels are first declared in a class:
class FIRGraph_4Kernels: public adf::graph
{
private:
kernel k[4][4];
public:
input_port in[4];
output_port out[4];
The constructor takes charge in the next operations. The first operation creates the kernels. The following code defines the complete grid of 4x4 kernels:
FIRGraph_SSR4()
{
// k[N][0] is always the first in the cascade stream
// Topology of the TopGraph
//
// 3,3 3,2 3,1 3,0 <--
// --> 2,0 2,1 2,2 2,3
// 1,3 1,2 1,1 1,0 <--
// --> 0,0 0,1 0,2 0,3
k[0][0] = kernel::create_object<SingleStream::FIR_MultiKernel_cout<NUM_SAMPLES,SHIFT>>(taps4_p0);
k[0][1] = kernel::create_object<SingleStream::FIR_MultiKernel_cincout<NUM_SAMPLES,SHIFT>>(taps4_p1);
k[0][2] = kernel::create_object<SingleStream::FIR_MultiKernel_cincout<NUM_SAMPLES,SHIFT>>(taps4_p2);
k[0][3] = kernel::create_object<SingleStream::FIR_MultiKernel_cin<NUM_SAMPLES,SHIFT>>(taps4_p3);
.
.
.
k[3][0] = kernel::create_object<SingleStream::FIR_MultiKernel_cout<NUM_SAMPLES,SHIFT>>(taps4_p0);
k[3][1] = kernel::create_object<SingleStream::FIR_MultiKernel_cincout<NUM_SAMPLES,SHIFT>>(taps4_p3);
k[3][2] = kernel::create_object<SingleStream::FIR_MultiKernel_cincout<NUM_SAMPLES,SHIFT>>(taps4_p2);
k[3][3] = kernel::create_object<SingleStream::FIR_MultiKernel_cin<NUM_SAMPLES,SHIFT>>(taps4_p1);
The source and header locations are then defined for the AI Engine. You must also constrain the location of the first AI Engine in each row to help the placer work:
// Constraints: location of the first kernel in the cascade
for(int i=0;i<NPhases;i++)
{
int j = (i%2?28:25); // 25 on even rows and 28 on odd rows
location<kernel>(k[i][0]) = tile(j,i);
}
To shorten the place time by a few seconds, you can constrain the core location. A single constraint is necessary because the cascade connection constrains all the others:
// Constraints: location of the first kernel in the cascade
location<kernel>(k[0]) = tile(25,0);
All kernels must discard a specific number of elements. The initialization function handles this as it must occur beforehand. You can do this in a loop on the column and rows with two initialization functions:
SingleStream::FIRinit<0>SingleStream::FIRinit<1>
Finally, connect the kernels together with the cascade stream between them, and the input streams for all of them.
// Cascade Connections
for(int row=0;row<NPhases;row++)
{
for(int i=0;i<NPhases-1;i++) connect<cascade> (k[row][i].out[0],k[row][i+1].in[1]);
connect<stream> (k[row][3].out[0],out[row]);
}
// Input Streams connections and DMA FIFO constraints
for(int row = 0;row<NPhases;row++)
for(int col=0;col<NPhases;col++)
{
int col1 = (row%2?NPhases-col-1:col); // kernel col is inverted on odd rows
int fiforow = row; // Each Kernel is served by an independent FIFO
connect<stream> n0 (in[col],k[row][col1].in[0]);
fifo_depth(n0) = 512;
location<fifo>(n0) = dma_fifo(aie_tile, FirstCol+col, fiforow, 0x0000, 512);
}