Arrays can introduce issues during C/C++ simulation, even before the synthesis step is performed. If you specify a very large array, it might cause the C/C++ simulation to run out of memory and fail, as shown in the following example:
#include "ap_int.h"
int i, acc;
// Use an arbitrary precision type
ap_int<32> la0[10000000], la1[10000000];
for (i=0 ; i < 10000000; i++) {
acc = acc + la0[i] + la1[i];
}
The simulation might fail by running out of memory, because the array is placed on the stack that exists in memory rather than the heap that is managed by the OS and can use local disk space to grow. Certain issues might make this issue more likely:
- On PCs, the available memory is often less than large Linux boxes and there might be less memory available.
- Using arbitrary precision types as shown in the example above could make this issue worse as they require more memory to model than standard C/C++ types.
- Using the more complex fixed-point arbitrary precision types found in C++ might make the issue of designs running out of memory even more likely as types require even more memory.
The standard way to improve memory resources in C/C++ code development is to
increase the size of the stack using the linker options such as the following option
which explicitly sets the stack size syn.csimflags -z
stack-size=10485760
.
However, the machine might not have enough available memory, and increasing the stack size will not help. In this case a solution is to use dynamic memory allocation for simulation but a fixed-sized array for synthesis, as shown in the next example. This means that the memory required for this is allocated on the heap, managed by the OS, and can use local disk space to grow.
#include "ap_int.h"
int i, acc;
#ifdef __SYNTHESIS__
// Use an arbitrary precision type & array for synthesis
ap_int<32> la0[10000000], la1[10000000];
#else
// Use an arbitrary precision type & dynamic memory for simulation
ap_int<32> *la0 = malloc(10000000 * sizeof(ap_int<32>));
ap_int<32> *la1 = malloc(10000000 * sizeof(ap_int<32>));
#endif
for (i=0 ; i < 10000000; i++) {
acc = acc + la0[i] + la1[i];
}
However, this is not an ideal solution because the simulated code and the synthesized
code are not the same. But this might be the only way to complete simulation. If you
take this approach be sure that the C/C++ test bench covers all aspects of accessing the
array. The RTL simulation performed by cosim_design
will verify that
the memory accesses are correct in the synthesized code.
__SYNTHESIS__
macro on the code to be synthesized. Do
not use this macro in the test bench, because it has no
significance in the C/C++ simulation or C/C++ RTL co-simulation. Refer to
Vitis-HLS-Introductory-Examples/Pipelining/Functions/hier_func
for the full version of this example.