The PL-based data mover consists of the dma_hls kernel, which generates constant Inputs for Mat A and B and checks the output of GeMM graph for the expected constant pattern.
It internally comprises four loops (
inp_A,inp_B, andout_C), with all concurrently scheduled.The data width is 128 bits at both the AXI4-stream I/O sides, running at 312.5 MHz.