Block Design: image_buffer() - 2025.1 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2025-08-25
Version
2025.1 English

The image_buffer() block implements the image I/O buffer for the design; it uses PL URAM for image storage and is designed using Vitis HLS. Details of the block are shown in the following figure.

  • The HLS design consists of two phases: the SEND_PULSES phase transmits the image stored in the PL URAM to the AI engine over a single AXI-S stream. The image is updated by the BP SAR engine and returned to the PL URAM over another AXI-S stream. The process is repeated for a number of radar pulses NPULSE_USE configured via the host software. Once all radar pulses have been processed, the UPLOAD_IMAGE phase uploads the final SAR target image to DDR over the NoC.

  • The clock rate for the HLS design is 312.5 MHz. AXI-S I/O streams are 128-bit to align with the 1250 MHz @ 32-bit interface of the AI Engine array. The clock domain crossing and data width converter blocks are instantiated automatically by Vitis tools.

  • The main resource required is the the 60 URAM blocks required for the image buffer.

figure

The HLS code for image_buffer() needs to be written carefully to handle backpressure on the input and output AXI-S streams. A diagram of the annotated HLS code is shown below. Note a few key aspects of the implementation:

  • The SEND_PULSES section of the code on Line 12 captures a while loop that manages the read side and write side of the PL URAM image buffer. The write-side address leads the read side address by the latency of the AI Engine implementation. For this reason, we expect an offset between the write and read addresses.

  • Backpressure can occur on both the write side or the read side of the buffer. Write-side backpressure happens every radar pulse when the graph stalls waiting for the IFFT to be complete before image processing commences. Read-side backpressure may occur in a similar manner when the final image pixels for a radar pulse have been transferred back to the PL URAM and the AI engine then stalls.

  • We need the read and write sides to stall independently as they may stall at different times. This is achieved by each side validating the state of the AXI-S streams prior to performing a read or a write operation. The aspects of the code involved in these checks are annotated in red in the following figure.

  • On the write side, a check is performed in Line 16 and writes are performed only when the outgoing stream is not already full.

  • On the read side, a check is performed in Line 26 and reads are performed only when the incoming stream is not empty.

  • The pragma on Line 7 ensures the PL URAM has a latency of only 1 cycle. This is needed in order to achieve II=1 pipelined operation of the SEND_PULSES task.

  • The UPLOAD_IMAGE section of the code on Line 38 contains a for loop to transfer the final image back to the host via DDR over the NoC.

figure