Profiling Graph Throughput - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID
XD100
Release Date
2024-03-05
Version
2023.2 English

It can be defined as the average number of bytes produced (or consumed) per second:

  1. To profile the design and calculate the port throughput, you should add the APIs in the host code.

  2. The code changes to profile the design for port throughput calculation are available in Hardware/src/host_PortTP.cpp. You can either do changes in sw/host.cpp manually by referring to Hardware/src/host_PortTP.cpp, or replace the sw/host.cpp. Make sure to take the backup of the file before replacing.

  3. In the Hardware/src/host_PortTP.cpp, the changes to profile the design are summarized as follows:

    a. Notice in the host.cpp, it contains only Native XRT APIs and no ADF APIs are used. For example, a graph handle is created using the uuid of xclbin and extracted the graph details using the xrt::graph API.

    auto cghdl = xrt::graph(device,xclbin_uuid,"mygraph");
    

    b. Also note the graph run and end commands uses graph handle.

    cghdl.run(NIterations);
    cghdl.end();
    

    c. The Hardware/src/host_PortTP.cpp file contains the ADF APIs. This change from native XRT APIs to ADF APIs is required to profile the AI Engine design.

    adf::registerXRT(dhdl, xclbin_uuid.get());
    std::cout<<"Register XRT"<<std::endl;
    
    const int buffer_sizeIn_bytes = 512;
    event::handle handle = event::start_profiling(mygraph.out0,event::io_stream_start_to_bytes_transferred_cycles,buffer_sizeIn_bytes*NIterations);
    if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
    }
    mygraph.run(NIterations);
    mygraph.end();
    ...
    ...
    s2mm_1_rhdl.wait();
    
    long long cycle_count = event::read_profiling(handle);
    std::cout<<"cycle count:"<<cycle_count<<std::endl;
    event::stop_profiling(handle);//Performance counter is released and cleared
    double throughput = (double)buffer_sizeIn_bytes*NIterations / (cycle_count * 0.8* 1e-3); //bytes per second 
    std::cout<<"Throughput of the graph: "<<throughput<<" MB/s"<<std::endl;
    
  4. Also, do the necessary changes to the Makefile, so that the compilation and linking of the host code is successful considering the ADF APIs are included. It is recommended that you replace the Makefile with the Makefile.host_profile. Make sure to take a backup of the original file before replacing.

  5. Do make host and make package TARGET=hw to generate the modified hardware, sd_card.img.

  6. Program the device with the new hardware image, and observe the following message in the Linux console that prints the througput of the port out0:

    run mm2s
    run s2mm
    Register XRT
    graph run
    graph end
    After MM2S wait
    After S2MM_1 wait
    cycle count:2965
    Throughput of the graph: 1510.96 MB/s
    

NOTE: The throughput value you got above matches with the value you got during AIE Simulation and Hardware Emulation.