The PL-based data mover consists of the dma_hls
kernel, which generates constant Inputs for Mat A and B and checks the output of GeMM graph for the expected constant pattern.
It internally comprises four loops (
inp_A
,inp_B
, andout_C
), with all concurrently scheduled.The data width is 128 bits at both the AXI4-stream I/O sides, running at 312.5 MHz.