An input_plio
/output_plio
object can be
configured to make external stream connections that cross the AI Engine to programmable logic (PL) boundary. This
situation arises when a hardware platform is designed separately and the PL blocks
are already instantiated inside the platform. This hardware design is exported from
the Vivado tools as a package XSA and it should
be specified when creating a new project in the AMD Vitis™
tools using that platform. The XSA contains a logical
architecture interface specification that identifies which AI Engine I/O ports can be supported by the platform. The following
is an example interface specification containing stream ports (looking from the
AI Engine perspective).
AI Engine Port | Annotation | Type | Direction | Data Width | Clock Frequency (MHz) |
---|---|---|---|---|---|
S00_AXIS | Weight0 | stream | slave | 32 | 300 |
S01_AXIS | Datain0 | stream | slave | 32 | 300 |
M00_AXIS | Dataout0 | stream | master | 32 | 300 |
This interface specification describes how the
platform exports two stream input ports (slave port on the AI Engine array interface) and one stream output port (master port
on the AI Engine array interface). An input_plio
/output_plio
attribute specification is used to represent and connect
these interface ports to their respective destination or source kernel ports in data
flow graph.
The following example shows how the input_plio
/output_plio
attributes shown in the previous table can be used in a
program to read input data from a file or write output data to a file. The width and
frequency of the input_plio
/output_plio
port are also provided in the PLIO
constructor.
input_plio wts = input_plio::create("Weight0", adf::plio_32_bits, "inputwts.txt", 300);
input_plio din = input_plio::create("Datain0", adf::plio_32_bits, "din.txt", 300);
output_plio out = output_plio::create("Dataout0", adf::plio_32_bits, "dout.txt", 300);
When simulated, the input weights and data are read from the two supplied files and the output data is produced in the designated output file in a streaming manner.
When a hardware platform is exported, all the AI Engine to PL stream connections are already routed to specific physical channels from the PL side.
Wide Stream Data Path PLIO
Typically, the AI Engine array
runs at a higher clock frequency than the internal programmable logic. The AI Engine compiler can be given a compiler option
--pl-freq
to identify the frequency at which
the PL blocks are expected to run. To balance the throughput between AI Engine and internal programmable logic, it is
possible to design the PL blocks for a wider stream data path (64-bit, 128-bit),
which is then sequentialized automatically into a 32-bit stream on the AI Engine stream network at the AI Engine to PL interface crossing.
The following example shows how wide stream input_plio
/output_plio
attributes
can be used in a program to read input data from a file or write output data to a
file.
output_plio pl_out = output_plio::create("TestLogicalNameOut", plio_128_bits, "data/output.txt");
input_plio pl_in = input_plio::create("TestLogicalNameIn", plio_128_bits, "data/input.txt");
...
connect(pl_in.out[0], kernel_first.in[0]);
connect(kernel_last.out[0], pl_out.in[0]);
In the previous example, two 128-bit PLIO attributes is declared: one
for input and one for output. The input_plio
and
output_plio
are then hooked up to the graph in
the usual way. Data files specified in the input_plio
/output_plio
attributes
are then automatically opened for reading the input or writing the output
respectively.
When simulating input_plio
/output_plio
with data files, the data should be
organized to accommodate both the width of the PL block as well as the data type of
the connecting port on the AI Engine block.
For example, a data file representing 32-bit PL interface to an AI Engine kernel expecting int16
should be organized as two columns per row,
where each column represents a 16-bit value. As another example, a data file
representing 64-bit PL interface to an AI Engine kernel expecting cint16
should be organized as four columns per row, where each column represents a 16-bit
real or imaginary value. The same 64-bit PL interface feeding an AI Engine kernel with int32
port would need to organize the data as two columns per row of
32-bit real values. The following examples show the format of the input file for the
previously mentioned scenarios.
64-bit PL interface feeding AI Engine kernel expecting cint16
input file:
0 0 0 0
1 1 1 1
2 2 2 2
64-bit PL interface feeding AI Engine kernel expecting int32
input file:
0 0
1 1
2 2
With these wide PLIO attribute specifications, the AI Engine compiler automatically generates the AI Engine array interface configuration to convert a 64-bit or 128-bits data into a sequence of 32-bit words. The AXI4-Stream protocol followed with all PL IP blocks ensures that partial data can also be sent on a wider data path with the appropriate strobe signals describing which words are valid.