Hardware Implementation - 2023.1 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
Release Date
2023.1 English

To understand what kind of hardware implementation is needed given the performance constraints, you can examine the convolution kernel in some detail:

  • The core compute is done in a four-level nested loop, but you can break it to the compute per output pixel produced.

  • In terms of the output-pixels produced, it is clear from the filter source code that a single output pixel is produced when the inner two loops finish execution once.

  • These two loops are essentially doing the sum-of-product on a coefficient matrix and image sub-matrix. The matrix sizes are defined by the coefficient matrix, which is 15x15.

  • The inner two loops are performing a dot product of size 225(15x15). In other words, the two inner loops perform 225 multiply-accumulate (MAC) operations for every output pixel produced.