Now having an accurate cost model of the Hough Transform from prototyping, you are in a position to explore alternate solutions via scaling. Here, consider “Image Partitioning” in addition to “Theta Partitioning” as a means to scale up the throughput. In this scheme, partition out portions of the images to 32-tile clusters, where each cluster is computing partial histograms for four \(\theta\) values (as in the prototype). Using this approach, you can scale throughput linearly higher. As shown in the following diagram, you can achieve ~ 220 Mpps with ~275 tiles.