Packet Switching Graph Constructs - 2024.1 English

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-06-06
Version
2024.1 English

Packet-switched streams are essentially multiplexed data streams that carry different types of data at different times. Packet-switched streams do not provide deterministic latency due to the potential for resource contention with other packet-switched streams. The multiplexed data flows in units of packets with a 32-bit packet header and a variable number of payload words. A header word needs to be sent before the actual payload data and the TLAST signal is required on the last word of the packet. Two new data types called input_pktstream and output_pktstream are introduced to represent the multiplexed data streams as input to or output from a kernel, respectively. More details on the packet headers and data types can be found in Packet Stream Operations.

Note: By convention, packets originating in the programmable logic are initialized with row, column to be -1,-1.

To explicitly control the multiplexing and de-multiplexing of packets, two new templated node classes, pktsplit<n> and pktmerge<n>, are added to the ADF graph library. A node instance of class pktmerge<n> is a n:1 multiplexer of n packet streams producing a single packet stream. A node instance of class pktsplit<n> is a 1:n de-multiplexer of a packet stream producing n different packet streams. The maximum number of allowable packet streams is 32 on a single physical channel (n ≤ 32).

A kernel can receive packets of data either as buffers of data or as input_pktstream. And a kernel can send packets of data either as buffers of data or as output_pktstream.

To connect from ports, local buffers or packet streams to ports, local buffers or packet streams, use a connect construct, such as:
connect (<SOURCE>.out[0], <DEST>.in[0]);

When a kernel receives packets of data as a buffer of data, the header and TLAST are dropped prior to the kernel receiving the buffer of data. If the kernel writes an output buffer of data, the packet header and TLAST are automatically inserted, when the buffer is transferred by DMA to a packet stream.

However, if the kernel receives input_pktstream of data, the kernel needs to process the packet header and TLAST, in addition to the packet data. Similarly, if the kernel sends an output_pktstream of data, the kernel needs to insert the packet header and TLAST, in addition to the packet data into the output stream.

These concepts are illustrated in the following example.

class ExplicitPacketSwitching: public adf::graph {
 private:
    adf:: kernel core[4];
    adf:: pktsplit<4> sp;
    adf:: pktmerge<4> mg;
 public:
    adf::input_plio in;
    adf::output_plio out;
    mygraph() {
      core[0] = adf::kernel::create(aie_core1);
      core[1] = adf::kernel::create(aie_core2);
      core[2] = adf::kernel::create(aie_core3);
      core[3] = adf::kernel::create(aie_core4);
      adf::source(core[0]) = "aie_core1.cpp";
      adf::source(core[1]) = "aie_core2.cpp";
      adf::source(core[2]) = "aie_core3.cpp";
      adf::source(core[3]) = "aie_core4.cpp";

      in=input_plio::create("Datain0", plio_32_bits, "data/input.txt");
      out=output_plio::create("Dataout0", plio_32_bits, "data/output.txt");

      sp = adf::pktsplit<4>::create();
      mg = adf::pktmerge<4>::create();
      for(int i=0;i<4;i++){
        adf::runtime<ratio>(core[i]) = 0.9;
        adf::connect(sp.out[i], core[i].in[0]);
        adf::connect(core[i].out[0], mg.in[i]);
      }
      adf::connect(in.out[0], sp.in[0]);
      adf::connect(mg.out[0], out.in[0]);
    }
};

The graph has one input PLIO port and one output PLIO port. The input packet stream from the PL is split four ways and input to four different AI Engine kernels. The output streams from the four AI Engine kernels are merged into one packet stream which is output to the PL. The Vitis IDE Graph view of the code is shown as follows.

Figure 1. Graph View

One kernel code example is as follows.

const uint32 pktType=0;
void aie_core1(input_pktstream *in,output_pktstream *out){
  readincr(in);//read header and discard
  uint32 ID=getPacketid(out,0);//for output pktstream
  writeHeader(out,pktType,ID); //Generate header for output

  bool tlast;
  for(int i=0;i<8;i++){
    int32 tmp=readincr(in,tlast);
    tmp+=1;
    writeincr(out,tmp,i==7);//TLAST=1 for last word
  }
}
Warning: input_pktstream is read as integer input.

Following is an example kernel code that accepts and transfers floating-point data type.

const uint32 pktType=0;
void aie_core1_float(input_pktstream *in,output_pktstream *out){
  readincr(in);//read header and discard
  uint32 ID=getPacketid(out,0);//for output pktstream
  writeHeader(out,pktType,ID); //Generate header for output

  bool tlast;
  for(int i=0;i<4;i++){
    int32 tmp=readincr(in,tlast);//read data as integer type
    float tmp_f=reinterpret_cast<float&>(tmp);//Reinterpret memory as float
    tmp_f+=1.0f;
    writeincr(out,tmp_f,i==3);//TLAST=1 for last word
  }
}

The input data should also be in integer format. For example, if float values 0.0f, 1.0f, 2.0f and 3.0f are sent to the kernel, they are converted into integer values in the input data file:

0
1065353216
1073741824
1077936128

After the kernel execution, the float output values are 1.0f, 2.0f, 3.0f and 4.0f. In a simulation output data file, the output values are in integer format, as follows:

1065353216 
1073741824 
1077936128 
1082130432