Each AI Engine is surrounded by 4x 32 kB memories, each one being divided in four pairs of banks. The bandwidth is high:
2 reads / cycle on 32 bytes (256 bits) each
Each bank has a single port, the accesses must be done on different banks to achieve 2x 256 bits/cycle.
1 write / cycle on 32 bytes (256 bits)
On another bank to achieve the highest bandwidth.
Be aware that you need also to feed the memories using DMAs or other AI Engines.