The AI Engine runs at 1 GHz (or more, depending on the device) and can write at most two streams with a 32-bit data width per cycle. In contrast, a PL kernel can run at 500 MHz (half the frequency of the AI Engine), while consuming a larger bit width. Rate matching is concerned with balancing the throughput from the producer to the consumer, and is used to ensure that neither of the processes creates a bottleneck with respect to the total performance. The following equation shows the rate matching for each channel:
The following table shows a PL rate matching example for a 32-bit channel written to each cycle by the AI Engine at 1 GHz for -1L speed grade devices. As shown, the PL IP has to consume two times the data at half the frequency or four times the data at one quarter of the frequency.
AI Engine | PL | ||
---|---|---|---|
Frequency | Data per Cycle | Frequency | Data per Cycle |
1 GHz | 32 bit | 500 MHz | 64 bit |
250 MHz | 128 bit |
Because the need to match frequency and adjust data-path width is well
understood by the Vitis compiler
(v++
), the tool automatically extracts the port width from the
PL kernel, the frequency from the clock specification, and introduces an
upsizer/downsizer to temporarily store the data exchanged between the AI Engine and the PL regions to manage the rate
match.
To avoid deadlocks, it is important to ensure that if multiple channels are read or written between the AI Engine and the PL, the data rate per cycle is concurrently achieved on all channels. For example, if one channel requires 32 bits, and the second 64 bits, the AI Engine code must ensure that both channels are written adequately to avoid back pressure or starvation on the channel. Additionally, to avoid deadlock, writing/reading from the AI Engine and reading/writing in the PL code must follow the same chronological order.
The number of interfaces used in the graph function definition for the PL defines the number of AXI4-Stream interfaces. Each argument results in the creation of a separate stream.