aie::tile
has method cycles() to read this counter value. For
example:aie::tile tile=aie::tile::current(); //get the tile of the kernel
unsigned long long time=tile.cycles();//cycle counter of the tile counter
The counter is continuously running. It is not limited by how many times you can read the counter. The value read back by the kernel can be written to memory, or it can be streamed out for further analysis. For example, to profile the latency of the code below, the counter value is read prior to the code being profiled, and again after the code has run:
The latency of the loop in the kernel can then be examined in the host application by the second time minus the first time.
By comparing the data read back in between different executions of the kernel, or between different iterations of the loop, the data can be used to calculate latency. For example, the following code tries to get the latency of certain operations on an asynchronous buffer:
The latency of asynchronous buffer acquiring and release, plus the inner loop execution time can then be calculated by the second time minus the first time.
printf in simulation, or read back by host
code in hardware. If the written value is not used by any other code, the volatile qualifier can be used to enforce the storage
of the value of the counter. This qualifier ensures that the compiler optimizations
do not eliminate this variable. For
example:static unsigned long long cycle_num[2];
aie::tile tile=aie::tile::current();
volatile unsigned long long *p_cycle=cycle_num;
*p_cycle=tile.cycles();//cycle_num[0]
for(...){...}
*(p_cycle+1)=tile.cycles();//cycle_num[1]
printf("cycles=%lld\n",cycle_num[1]-cycle_num[0]);