In this example, 8 K divided by 4 K is two, so there are two rows. The data space does not matter for this calculation, so the matrix would be two rows and 1 column resulting in three pipeline registers again.
Note: The whole matrix is reproduced to get the extra 8 bits of data
space needed to create the RAM, but that does not matter to the calculation of pipeline
registers.