The figure below summarizes the key aspects of the design of the max_pooling2d_w2()
layer. The Jupyter Notebook used for validation is gen_vectors.ipynb.
Max pooling decimates the input images by a factor of 2 in each dimension by applying a
max()
operation across all four pixels in a 2x2 patch. Successive patches are strided by 2, so they are non-overlapping. This compute workload may be vectorized efficiently using theaie::max()
function of the AIE API.The layer is coded as an outer loop over the output image rows and an inner loop over the image columns. Vectorization creates 16 output channels for four pixels each.
Software pipelining of the inner loop is perfect at II=16.