Loop Flattening and Unrolling - 2024.2 English

AI Engine Kernel and Graph Programming Guide (UG1079)

Document ID
UG1079
Release Date
2024-11-28
Version
2024.2 English

Loops can be flattened completely with the chess_flatten_loop pragma. This can be useful for small loops that are not optimally automated by the AI Engine compiler.

For loop flattening, the loop count can be determined by the compiler. In cases where the loop count cannot be determined by the tool automatically, you can set the loop count using the chess_loop_count pragma. For example:
for(int i=0;i<6;i++) chess_flatten_loop {...}
for(...) chess_loop_count(6) chess_flatten_loop {...}

With chess_unroll_loop(N), the loop body can be duplicated N-1 times, and the loop count is divided by N. The loop can also be completely unrolled by chess_unroll_loop(*). The loop is unrolled and rewritten as a repeated sequence of similar independent statements.

The loop flattening is done in the final scheduling phase, such that the code generation, is still done based on the loop construct. Unlike loop flattening, loop unrolling duplicates iterations of code, and the duplicated codes can be compiled differently. This can be used to allow for better software pipelining of loops. But it can also pose a burden on scheduling when the unrolled loop count is large.