In this example, 8K divided by 4K is 2, so there are 2 rows. The data space does not matter for this calculation, so the matrix would be 2 rows and 1 column resulting in three pipeline registers again.
Note: The whole matrix is reproduced to get the extra 8 bits of data space needed to create the RAM, but that does not matter to the calculation of pipeline registers.