aie::tile
has method cycles() to read this counter value. See
the following
example:aie::tile tile=aie::tile::current(); //get the tile of the kernel
unsigned long long time=tile.cycles();//cycle counter of the tile counter
The counter is continuously running. The counter is not limited by how many times you can read the counter. The counter value read by the kernel can be written to memory or streamed out for further analysis. For example, to profile the latency of the code below, the counter value is read prior to profiling the code, and again after the code has run:
The latency of the loop in the kernel can then be examined in the host application by the second time minus the first time.
Compare the data read back between different kernel executions or loop iterations to calculate latency. For example, the following code tries to get the latency of certain operations on an asynchronous buffer:
The latency of asynchronous buffer acquiring and release, plus the inner loop execution time can then be calculated by the second time minus the first time.
printf in simulation, or read back by host
code in hardware. If no other code uses the written value, apply the
volatile qualifier to ensure the counter value is stored. This
qualifier ensures that the compiler optimizations do not eliminate this variable.
For
example:static unsigned long long cycle_num[2];
aie::tile tile=aie::tile::current();
volatile unsigned long long *p_cycle=cycle_num;
*p_cycle=tile.cycles();//cycle_num[0]
for(...){...}
*(p_cycle+1)=tile.cycles();//cycle_num[1]
printf("cycles=%lld\n",cycle_num[1]-cycle_num[0]);