Buffer Port-Based Access - 2024.1 English

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-06-06
Version
2024.1 English

Buffer ports provide a way for a kernel to operate on a block of data. Buffer ports operate in a single direction (e.g., input or output). The view that a kernel has of incoming blocks of data is called an input buffer. Input buffers are defined by a type. The type of data contained within that buffer needs to be declared before the kernel can operate on it.

The view that a kernel has of outgoing blocks of data is called an output buffer. These are defined by a type. The example below shows a declaration of a kernel named simple. The simple kernel has an input buffer named in, which contains complex integers where both the real and the imaginary parts are each 16-bit wide signed integers. The simple kernel also has an output buffer named out, which contains 32-bit wide signed integers.

void simple(input_buffer<cint16> & in, 
           output_buffer<int32> &out);

The example below shows input and output buffer port sizes declaration using the adf::extents template parameter.

void simple(input_buffer<cint16, adf::extents<INPUT_SAMPLE_SIZE>> & in, 
           output_buffer<int32, adf::extents<OUTPUT_SAMPLE_SIZE>> & out);

These buffer data structures are inferred by the AI Engine compiler from the data flow graph connections and are declared in the wrapper code implementing the graph control. The kernel functions merely operate on pointers to the buffer data structures that are passed to them as arguments. There is no need to declare these buffer data structures in the data flow graph or kernel program.

When two kernels (k1, k2) communicate through buffers (the output buffer of k1 is connected to the input buffer of k2) the compiler attempts to place them into tiles that can share at least an AI Engine-ML memory module.

  • If the two kernels are located on the same tile, the compiler uses a single memory area to communicate because they are not executed simultaneously (see k1 and k2 in tile (8,0) and the single shared memory block in (7,0) in the following figure). Because the execution of multiple kernels within an AI Engine-ML is sequential, access conflicts are avoided when using the same memory area.
  • If the two kernels are placed in different tiles sharing an AI Engine-ML memory module, the compiler will infer a ping-pong buffer, allowing the two kernels to write and read at the same time but not to the same memory area (see k1 in tile (10,0), k2 in tile (11,0) and the shared buffer implemented as a ping-pong buffer in (10,0) in the following figure).
    Figure 1. Same Tile and Memory Sharing Placement Example

  • If your system performance can handle it, you can switch this ping-pong buffer into single buffering by applying the single_buffer(<port>) constraint to the kernel ports.
  • If the two kernels are placed in distant tiles, the compiler will automatically infer a ping-pong buffer at the output of k1, and another one at the input k2. The two ping-pongs are connected with a DMA which will automatically copy the content of the output buffer of k1 onto the input buffer of k2 using a data stream.
    Figure 2. Distant Tiles Placement Example

  • When multiple buffers/streams converge onto a kernel, the various paths may have very different latencies, which can potentially lead to a deadlock. To avoid this kind of problem, you can insert a FIFO between the two kernels. The compiler will generate the same type of architecture as the distant tile case, except that a FIFO is inserted in the middle of the stream connection.