The PL-based data mover consists of the dma_hls kernel, which generates constant Inputs for Mat A and B and checks the output of GeMM graph for the expected constant pattern.

  • It internally comprises four loops (inp_A, inp_B, and out_C), with all concurrently scheduled.

  • The data width is 128 bits at both the AXI4-stream I/O sides, running at 312.5 MHz.