Similarly, we can see that something is not right with C because this top-level argument is used in 4 communication channels between processes. This is the observed behavior:
Loop2 visits every location and clears them
Loop3 does accumulation on each individual location
In short, if we were to merge the 2 loops then each location is cleared and is followed by a single addition, not an accumulation
Consequences:
loop2 is not necessary
loop3 can do C[i][j] = B[i][j] * E[i][j];
We can simplify the code to solve these issues.
After the code updates we can check the new version.