The BP engine uses multi-rate scheduling to control its system level operation. The scheme is hard-coded yet configurable for processing any number of radar pulses.
The design employs a single top-level graph.
A single graph iteration corresponds to a full update of the SAR target image for one radar pulse. Using multi-rate scheduling, each AI Engine kernel executes as many times as required to perform its workload for that pulse.
The
ifft2k_async()graph contains two kernels. Theifft()kernel performs the inverse transform function. Thelut()kernel computes the slope and offset LUTs required by the downstreaminterp1()graph. Each kernel must be performed once per radar pulse. Consequently, theifft()andlut()kernels both use a setting ofrepetition_count=1.The
range_gen()kernel generates the \((x,y,z)\) coordinates of the target image in a just-in-time fashion as this data is already known and is easily computed. This saves considerable storage for the implementation.All other graphs in the BP engine perform computations related to updating the SAR target image. Because the memory footprint for this image exceeds the local tile memory, partial SAR image data is streamed through the design using double-buffering. The design adopts a size of 1024 samples for these I/O kernel buffers. For a \(512\times 512\) image, it follows that 256 kernel invocations are required per graph iteration in order to process the full target image; all remaining kernels use
repetition_count=256.Both
interp1()kernels (for servicing the real and imaginary components of the phase correction) must re-use its slope and offset LUTS over multiple kernel invocations to process the full target image. Because these LUTs are computed only once per graph iteration, the design must employ asynchronous buffering of these LUTs. Otherwise, the default multi-rate scheduling would insist on a 256-fold replication of these buffers. This is infeasible. For this reason, theinterp1()kernel must be hand-coded to manage this asynchronous buffering. The current DSP libraryfunc_approx()IP (which otherwise could perform the required linear interpolation functionality) only supports synchronous buffering.The memory footprint if the
ifft()andlut()kernels is quite heavy and is only performed once per graph iteration. These outputs are held constant over 256 invocations of all remaining kernels. Consequently, the design elects to usesingle_buffer()designations on the I/O buffers ofifft()andlut(). This has minimal impact on the overall system throughput since most of the DDR input transfers of the next radar pulse may be hidden by the compute workload of the current radar pulse as noted earlier.