Next, use a spreadsheet analysis to assess compute requirements. Load the system input parameters on the left side of the spreadsheet shown below and analyze compute parameters on the right side. It is useful to tabulate the numbers of processor cycles required by each loop body in the original Matlab model of the Hough Transform. Based on the AI Engine compute capacity of 32 MACs/cycle for int16
data types, you can process two MACs/pixel per $\theta$ value in real time. Based on these vector estimates, the spreadsheet indicates to process 5.7 cycles per pixel to meet the 220 Mpps throughput objective. This is equivalent to 45 cycles for the vector processor with its eight lanes SIMD execution. The compute bound for the vector processor is high at 5000 Mpps. However, assuming an 8-cycle “read-modify-write” instruction to update the histogram tables in the third compute workload, the throughput is limited by the scalar processor to 39 Mpps if using 32 tiles. When projected to more tiles, reaching the 220 Mpps target with even 128 tiles is not possible.