Kernel Location Constraints
When building large graphs with multiple subgraphs, it is sometimes useful to control the exact mapping of kernels to AI Engines, either relative to other kernels or in an absolute sense. The AI Engine compiler provides a mechanism to specify location constraints for kernels, which when used with the C++ template class specification, provides a powerful mechanism to create a robust, scalable, and predictable mapping of your graph onto the AI Engine array. It also reduces the choices for the mapper to try, which can considerably speed up the mapper. Consider the following graph specification:
#include <adf.h>
#include "kernels.h
#define NUMCORES (COLS*ROWS)
using namespace adf;
template <int COLS, int ROWS, int STARTCOL, int STARTROW>
class indep_nodes_graph1 : public graph {
public:
kernel kr[NUMCORES];
port<input> datain[NUMCORES] ;
port<output> dataout[NUMCORES] ;
indep_nodes_graph1() {
for (int i = 0; i < COLS; i++) {
for (int j = 0; j < ROWS; j++) {
int k = i*ROWS + j;
kr[k] = kernel::create(mykernel);
source(kr[k]) = "kernels/kernel.cc";
runtime<ratio>(kr[k]) = 0.9;
location<kernel>(kr[k]) = tile(STARTCOL+i, STARTROW+j);
}
}
for (int i = 0; i < NUMCORES; i++) {
connect<stream, window<64> >(datain[i], kr[i].in[0]);
connect<window<64>, stream >(kr[i].out[0], dataout[i]);
}
};
};
The template parameters identify a COLS x ROWS logical array of kernels (COLS x ROWS = NUMCORES) that are placed within a larger logical device of some dimensionality starting at (STARTCOL, STARTROW) as the origin. Each kernel in that graph is constrained to be placed on a specific AI Engine. This is accomplished using an absolute location constraint for each kernel placing it on a specific processor tile. For example, the following declaration would create a 1 x 2 kernel array starting at offset (3,2). When embedded within a 4 x 4 logical device topology, the kernel array is constrained to the top right corner.
indep_nodes_graph1<1,2,3,2> mygraph;
location<absolute>(k)
, function to specify
kernel constraints and proc(x,y)
function to specify a
processor tile location. These functions are now deprecated. Instead, use location<kernel>(k)
to specify the kernel constraints and
tile(x,y)
to identify a specific tile location. See Adaptive Data Flow Graph Specification Reference for more information.Buffer Location Constraints
The AI Engine compiler tries to automatically allocate buffers for windows, lookup tables, and run-time parameters in the most efficient manner possible. However, you might want to explicitly control their placement in memory. Similar to the kernels shown previously in this section, buffers inferred on a kernel port can also be constrained to be mapped to specific tiles, banks, or even address offsets using location constraints, as shown in the following example.
#include <adf.h>
#include "kernels.h"
#define NUMCORES (COLS*ROWS)
using namespace adf;
template <int COLS, int ROWS, int STARTCOL, int STARTROW>
class indep_nodes_graph2 : public graph {
public:
kernel kr[NUMCORES];
port<input> datain[NUMCORES] ;
port<output> dataout[NUMCORES] ;
indep_nodes_graph() {
for (int i = 0; i < COLS; i++) {
for (int j = 0; j < ROWS; j++) {
int k = i*ROWS + j;
kr[k] = kernel::create(mykernel);
source(kr[k]) = "kernels/kernel.cc";
runtime<ratio>(kr[k]) = 0.9;
location<kernel>(kr[k]) = tile(STARTCOL+i, STARTROW+j); // kernel location
location<buffer>(kr[k].in[0]) =
{ address(STARTCOL+i, STARTROW+j, 0x0),
address(STARTCOL+i, STARTROW+j, 0x2000) }; // double buffer location
location<stack>(kr[k]) = bank(STARTCOL+i, STARTROW+j, 2); // stack location
location<buffer>(kr[k].out[0]) = location<kernel>(kr[k]); // relative buffer location
}
}
for (int i = 0; i < NUMCORES; i++) {
connect< stream, window<64> >(datain[i], kr[i].in[0]);
connect< window<64>, stream >(kr[i].out[0], dataout[i]);
}
};
};
In the previous code, the location of double buffers at port kr[k].in[0]
is constrained to the specific memory tile address
offsets that are created using the address(col,row,offset)
constructor. Furthermore, the location of the system memory (including the stack and static
heap) for the processor that executes kernel instance kr[k]
is constrained to a particular bank using the bank(col,row,bankid)
constructor. Finally, the tile location of the buffers
connected to the port kr[k].out[0]
is constrained to be the
same tile as that of the kernel instance kr[k]
. Buffer
location constraints are only allowed on window kernel ports.
Hierarchical Constraints
When creating complex graphs with multiple subgraph classes, or multiple instances of the same subgraph class, the location constraints described above can also be applied to each kernel instance or kernel port instance individually at the point of subgraph instantiation instead of the definition. In this case, you need to specify the graph qualified name of that kernel instance or kernel port instance in the constraint as shown below. Also, make sure that the kernels or their ports being constrained as above are defined to be public members of the subgraph.
class ToplevelGraph : public graph {
public:
indep_nodes_graph1<1,2,3,2> mygraph;
port<input> datain[2] ;
port<output> dataout[2] ;
ToplevelGraph() {
for (int i = 0; i < 2; i++) {
connect<stream, window<64> >(datain[i], mygraph.datain[i]);
connect<window<64>, stream >(mygraph.dataout[i], dataout[i]);
// hierarchical constraints
location<stack>(mygraph.kr[i]) = bank(3, 2+i, 2);
location<buffer>(mygraph.kr[i].out[0]) = location<kernel>(mygraph.kr[i]);
}
};
};