Block Design: image_buffer() - Block Design: image_buffer() - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

The image_buffer() block implements the image I/O buffer for the design. It uses PL URAM for image storage and is designed using Vitis HLS. The following figure shows details of the block.

  • The HLS design consists of two phases: the SEND_PULSES phase transmits the image stored in the PL URAM to the AI engine over a single AXI-S stream. The image is updated by the BP SAR engine and returned to the PL URAM over another AXI-S stream. The process repeats for a number of radar pulses NPULSE_USE configured by the host software. When all radar pulses are processed, the UPLOAD_IMAGE phase uploads the final SAR target image to DDR over the NoC.

  • The clock rate for the HLS design is 312.5 MHz. AXI-S I/O streams are 128-bit to align with the 1250 MHz @ 32-bit interface of the AI Engine array. Vitis tools automaacally instantiates the clock domain crossing and data width converter blocks.

  • The main resource required is the 60 URAM blocks required for the image buffer.

figure

Write the HLS code for image_buffer() so as to handle backpressure on the input and output AXI-S streams. The following figure shows the annotated HLS code. Note a few key aspects of the implementation:

  • The SEND_PULSES section of the code on Line 12 captures a while loop that manages the read side and write side of the PL URAM image buffer. The write-side address leads the read side address by the latency of the AI Engine implementation. For this reason, we expect an offset between the write and read addresses.

  • Backpressure can occur on both the write side or the read side of the buffer. Write-side backpressure happens every radar pulse when the graph stalls waiting for the iFFT to be complete before image processing commences. Read-side backpressure can occur in a similar manner when the final image pixels for a radar pulse transfer back to the PL URAM and the AI engine then stalls.

  • You need the read and write sides to stall independently as they can stall at different times. This is achieved by each side validating the state of the AXI-S streams before performing a read or a write operation. The following figure shows the aspects of the code involved annotated in red.

  • On the write side, a check is performed in Line 16 and writes occur only when the outgoing stream is not already full.

  • On the read side, a check is performed in Line 26 and reads occur only when the incoming stream is not empty.

  • The pragma on Line 7 ensures the PL URAM has a latency of only 1 cycle. This is needed in order to achieve II=1 pipelined operation of the SEND_PULSES task.

  • The UPLOAD_IMAGE section of the code on Line 38 contains a for loop to transfer the final image back to the host using DDR over the NoC.

figure