The figure below summarizes the key aspects of the design of the max_pooling2d_w4()
layer. Its design is very similar to the second layer with slightly smaller image dimensions yet more I/O channels to process. The code structure is similar, and an efficient software pipeline scheduling is achieved by the compiler. This layer requires no memory tiles for sample reordering. The Jupyter Notebook used for validation is gen_vectors.ipynb.