Partitioning Validation - Partitioning Validation - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

Coding up the Hough Transform prototype yields additional insight into the design and a deeper understanding of its performance limitations. Indeed, as predicted, the histogram update code shows the exact read-modify-write form anticipated from the start. It is difficult to identify any means to vectorize it and remove this performance bottleneck.

template <int COUNT_NUM>
  inline void update_countsA( TT_COUNT (&COUNTS)[COUNT_NUM],
                              aie::vector<TT_COUNT,16>& rho, aie::vector<TT_COUNT,8>& pixels )
  {
    COUNTS[rho[ 0]] += pixels[0];
    COUNTS[rho[ 1]] += pixels[0];
    COUNTS[rho[ 2]] += pixels[0];
    COUNTS[rho[ 3]] += pixels[0];
    COUNTS[rho[ 4]] += pixels[1];
    COUNTS[rho[ 5]] += pixels[1];
    COUNTS[rho[ 6]] += pixels[1];
    COUNTS[rho[ 7]] += pixels[1];
    COUNTS[rho[ 8]] += pixels[2];
    COUNTS[rho[ 9]] += pixels[2];
    COUNTS[rho[10]] += pixels[2];
    COUNTS[rho[11]] += pixels[2];
    COUNTS[rho[12]] += pixels[3];
    COUNTS[rho[13]] += pixels[3];
    COUNTS[rho[14]] += pixels[3];
    COUNTS[rho[15]] += pixels[3];
  }

The following diagram tabulates the profiling data generated by the compiler for the 32-tile Hough Transform prototype design when run on the \(216\times 240\) image. The important conclusions are clear:

  • The update_countsA/B() routines require ~183 cycles each representing ~90% of the total cycles

  • The theta_compute() routine requires ~277,204 cycles representing ~10% of the total cycles. This is equivalent to ~42 cycles per II, very close to the spreadsheet estimate of 45.

  • The overall throughput is \(216\times 240/(0.8\times2,650,436) = ~24\) MP/s

From detailed prototyping, you have accurately quantified the Hough Transform throughput performance. These results can revise the original spreadsheet estimates. This produces accurate projections for achieving a 220 MP/s throughput target.

figure