The image_buffer() block implements the image I/O buffer for the design; it uses PL URAM for image storage and is designed using Vitis HLS. Details of the block are shown in the following figure.
The HLS design consists of two phases: the
SEND_PULSESphase transmits the image stored in the PL URAM to the AI engine over a single AXI-S stream. The image is updated by the BP SAR engine and returned to the PL URAM over another AXI-S stream. The process is repeated for a number of radar pulsesNPULSE_USEconfigured via the host software. Once all radar pulses have been processed, theUPLOAD_IMAGEphase uploads the final SAR target image to DDR over the NoC.The clock rate for the HLS design is 312.5 MHz. AXI-S I/O streams are 128-bit to align with the 1250 MHz @ 32-bit interface of the AI Engine array. The clock domain crossing and data width converter blocks are instantiated automatically by Vitis tools.
The main resource required is the the 60 URAM blocks required for the image buffer.
The HLS code for image_buffer() needs to be written carefully to handle backpressure on the input and output AXI-S streams. A diagram of the annotated HLS code is shown below. Note a few key aspects of the implementation:
The
SEND_PULSESsection of the code on Line 12 captures a while loop that manages the read side and write side of the PL URAM image buffer. The write-side address leads the read side address by the latency of the AI Engine implementation. For this reason, we expect an offset between the write and read addresses.Backpressure can occur on both the write side or the read side of the buffer. Write-side backpressure happens every radar pulse when the graph stalls waiting for the IFFT to be complete before image processing commences. Read-side backpressure may occur in a similar manner when the final image pixels for a radar pulse have been transferred back to the PL URAM and the AI engine then stalls.
We need the read and write sides to stall independently as they may stall at different times. This is achieved by each side validating the state of the AXI-S streams prior to performing a read or a write operation. The aspects of the code involved in these checks are annotated in red in the following figure.
On the write side, a check is performed in Line 16 and writes are performed only when the outgoing stream is not already full.
On the read side, a check is performed in Line 26 and reads are performed only when the incoming stream is not empty.
The pragma on Line 7 ensures the PL URAM has a latency of only 1 cycle. This is needed in order to achieve II=1 pipelined operation of the
SEND_PULSEStask.The
UPLOAD_IMAGEsection of the code on Line 38 contains a for loop to transfer the final image back to the host via DDR over the NoC.