Load and Store - 2020.2 English

AI Engine Kernel Coding Best Practices Guide (UG1079)

Document ID
UG1079
Release Date
2021-02-04
Version
2020.2 English

Load and Store From Vector Registers

The compiler supports standard pointer de-referencing and pointer arithmetic for vectors. Post increment of the pointer is the most efficient form for scheduling. No special intrinsic functions are needed to load vector registers.

v8int32 * ptr_coeff_buffer = (v8int32 *)ptr_kernel_coeff;
v8int32 kernel_vec0 = *ptr_coeff_buffer++; // 1st 8 values (0 .. 7)
v8int32 kernel_vec1 = *ptr_coeff_buffer;   // 2nd 8 values (8 .. 15)

Load and Store From Memory

AI Engine APIs provide access methods to read and write data from data memory, streaming data ports, and cascade streaming ports which can be used by AI Engine kernels. For additional details on the window and stream APIs, see Window and Streaming Data API in the AI Engine Documentation flow of the Vitis Unified Software Platform Documentation (UG1416). In the following example, the window readincr (window_readincr_v8(din)) API is used to read a window of complex int16 data into the data vector. Similarly, readincr_v8(cin) is used to read a sample of int16 data from the cin stream. writeincr_v4 (cas_out, v) is used to write data to a cascade stream output.

void func(input_window_cint16 *din, 
			input_stream_int16 *cin, 
			output_stream_cacc48 *cas_out){
	v8cint16 data=window_readincr_v8(din);
	v8int16 coef=readincr_v8(cin);
	v4cacc48 v;
	…
	writeincr_v4(cas_out, v);
}

Load and Store Using Pointers

It is mandatory to use the window API in the kernel function prototype as inputs and outputs. However, in the kernel code, it is possible to use a direct pointer reference to read/write data.

void func(input_window_int16 *w_input, 
			output_window_cint16 *w_output){
	.....
	v16int16 *ptr_in  = (v16int16 *)w_input->ptr;
	v8cint16 *ptr_out = (v8cint16 *)w_output->ptr;
	......
}

The window structure is responsible for managing buffer locks tracking buffer type (ping/pong) and this can add to the cycle count. This is especially true when load/store are out-of-order (scatter-gather). Using pointers may help reduce the cycle count required for load and store.

Note: If using pointers to load and store data, it is the designer’s responsibility to avoid out-of-bound memory access.

Load and Store Using Streams

Vector data can also be loaded from or stored in streams as shown in the following example.

void func(input_stream_int32 *s0, input_stream_int32 *s1, …){
	for(…){
		data0=readincr(s0);
		data1=readincr(s1);
		…
	}
}

For more information about window and streaming data API usage, see Window and Streaming Data API in the AI Engine Documentation flow of the Vitis Unified Software Platform Documentation (UG1416).