Description
In FPGA, non-burst accesses to the DDR memory are very expensive and can impact the overall performance of the design. Hence, it is important to devise a scheme that reduces the time needed to access the necessary information. An efficient solution is to re-write the code or use manual burst, but if that does not work then another solution might be to use a cache memory.
Cache provides a temporary storage area in the M_AXI adapter so the design can more quickly retrieve data. The effectiveness of the cache mechanism is measured in a hit/miss ratio and it is based on a property of computer programs called locality of reference, or the tendency of a program to access the same set of memory locations repetitively over a short period of time. It suggests that when a particular memory block is accessed, it is likely to be accessed again in the near future, along with memory located in close proximity. For instance, when executing consecutive instructions in memory, the next set of instructions to be accessed will most likely be within a contiguous block of data nearby.
syn.interface.m_axi_cache_impl
command.Syntax
syn.directive.cache=<location> port=<name> lines=<value> depth=<value>
Where:
-
<location>
- Specifies the function where the specified ports can be found. This is the top function.
-
port=<name>
- Specifies the port to add cache to. This is a required argument.
-
lines=<value>
- Indicates the number of cache lines. The number of lines can be specified as 1, which indicates a single cache line, or a value greater than 1 expressed as a power of 2, indicating multiple cache lines. This is an optional value, and defaults to 1 if not specified.
-
depth=<value>
- Specifies the size of each line in words. The depth must be specified as a power of 2, and indicates the size in words of the pointer datatype for each line.
Limitations
The CACHE directive or pragma has the following limitations:
- Cache is only supported for read-only port
- The cache is implemented as a Single port, Single way cache.
- A cache is associated with each read channel of a m_axi port. If there
are no channel specifications for a bundle, then all read ports mapped to the bundle
must:
- Either not have a cache or all have a cache with the same line size in bytes and the same number of lines.
- If the word sizes are different, then also the depths must be different, so that the line size in bytes is the same.
- In the code below the depth (For example, number of words) of a
line must be 4 times larger for the char* than for the int*, because one int contains
4 char:
void top(int *int_arr, char *char_arr, …) { #pragma HLS interface m_axi port=int_arr bundle=gmem … #pragma HLS interface m_axi port=char_arr bundle=gmem … #pragma HLS cache port=int_arr lines=8 depth=8 #pragma HLS cache port=char_arr lines=8 depth=32
- This code, on the other hand, would be illegal because the lines
have different line
sizes:
void top(int *int_arr, char *char_arr, …) { #pragma HLS interface m_axi port=int_arr bundle=gmem … #pragma HLS interface m_axi port=char_arr bundle=gmem … #pragma HLS cache port=int_arr lines=8 depth=8 #pragma HLS cache port=char_arr lines=8 depth=16
- This code would be illegal because the number of lines is
different:
void top(int *int_arr, char *char_arr, …) { #pragma HLS interface m_axi port=int_arr bundle=gmem … #pragma HLS interface m_axi port=char_arr bundle=gmem … #pragma HLS cache port=int_arr lines=8 depth=8 #pragma HLS cache port=char_arr lines=4 depth=32
- Finally, this code would be illegal because only one array has the
cache:
void top(int *int_arr, char *char_arr, …) { #pragma HLS interface m_axi port=int_arr bundle=gmem … #pragma HLS interface m_axi port=char_arr bundle=gmem … #pragma HLS cache port=int_arr lines=8 depth=8 y
- If channels are specified for the bundle, then cache sizes for each
channel can be different, and some channels may have a cache and others may not.
Within a channel the same rules as above apply. For example, this code would now be
perfectly legal, but consume more resources than the previous example due to the
existence of two caches:
void top(int *int_arr, char *char_arr, …) { #pragma HLS interface m_axi port=int_arr bundle=gmem channel=0 … #pragma HLS interface m_axi port=char_arr bundle=gmem channel=1 … #pragma HLS cache port=int_arr lines=8 depth=32 #pragma HLS cache port=char_arr lines=8 depth=32
- The cache will generate line-sized read requests. This means that: In cosimulation, the array size (in words of the given datatype) must be an integer multiple of the line size (also specified in words of the given datatype).
- In deployment, the array allocation in DRAM performed by the host code must be aligned to the line size, and the total size of the array must be a multiple of the line size.
- Otherwise, this error message is generated during cosim:
ERROR: Index ... out of bound 0 to ...
Example
The following example shows a design where overlapping access will cause the burst to fail. Using the CACHE pragma or directive will improve the performance of the design.
extern "C" {
void dut(
const double *in, // Read-Only Vector 1
double *out, // Output Result
int size // Size in integer
)
#pragma HLS INTERFACE m_axi port=in bundle=aximm depth = 1026
#pragma HLS INTERFACE m_axi port=out bundle=aximm depth = 1024
#pragma HLS cache port=in lines=8 depth=128
for(int i = 0; i < size; i++)
{
out[i] = in[i] + in[i + 1];
}
}