Running an Application with Tunables - 5.2 English - 57404

AOCL User Guide (57404)

Document ID
57404
Release Date
2025-12-29
Version
5.2 English

Depending on the tunable settings, two states are possible:

  • Default State: None of the parameters is tuned.

  • Tuned State: One of the parameters is tuned with a valid option.

Default State

In this state, none of the parameters is tuned; the library will pick up the best implementation based on the underlying AMD “Zen” micro-architecture.

Run the application by preloading the tunables enabled libaocl-libmem.so:

$ LD_PRELOAD=<path to build/lib/libaocl-libmem.so> <executable> <params>

Tuned State

In this state, one of the parameters is tuned by the application at run time. The library will choose the implementation based on the valid tuned parameter at run time. Only one of the tunables can be set to a valid set of format/options as described in the Application Implementations table.

LIBMEM_OPERATION

You can set the tunable LIBMEM_OPERATION as follows:

LIBMEM_OPERATION=<operations>,<source_alignment>,<destination_alignment>

With this option, the library chooses the best implementation based on the combination of move instructions, alignment of the source and destination addresses.

Valid Options

  • <operations> = [avx2|avx512|erms]

  • <source_alignment> = [b|w|d|q|x|y|n]

  • <destination_alignment> = [b|w|d|q|x|y|n]

Use the following table to select the right implementation for your application:

Table 13.1 Application Implementations#

Application Requirement

LIBMEM_OPERATION

Instructions

Side-effects

Vector unaligned source and destination

[avx2|avx512],b,b

Load:VMOVDQU
Store:VMOVDQU

None

Vector aligned source and destination

[avx2|avx512],y,y

Load:VMOVDQA
Store:VMOVDQA

Unaligned source and/or destination address will lead to crash

Vector aligned source and unaligned destination

[avx2|avx512],y,[b|w|d|q|x]

Load:VMOVDQA
Store:VMOVDQU

None

Vector unaligned source and aligned destination

[avx2|avx512],[b|w|d|q|x],y

Load:VMOVDQU
Store:VMOVDQA

None

Vector non temporal load and store

[avx2|avx512],n,n

Load:VMOVNTDQA
Store:VMOVNTDQ

Unaligned source and/or destination address will lead to crash

Vector non temporal load

[avx2|avx512],n,[b|w|d|q|x|y]

Load:VMOVNTDQA
Store:VMOVDQU

None

Vector non temporal store

[avx2|avx512],[b|w|d|q|x|y],n

Load:VMOVDQU
Store:VMOVNTDQ

None

Rep movs unaligned source or destination

erms,b,b

REP MOVSB

None

Rep movs word aligned source and destination

erms,w,w

REP MOVSW

Data corruption or crash if the length is not a multiple of 2

Rep movs double word aligned source and destination

erms,d,d

REP MOVSD

Data corruption or crash if the length is not a multiple of 4

Rep movs quad word aligned source and destination

erms,q,q

REP MOVSQ

Data corruption or crash if the length is not a multiple of 8

Note

A best-fit solution for the underlying micro-architecture will be chosen if the tunable is in an invalid format.

For example, to use only avx2-based move operations with both unaligned source and aligned destination addresses:

$ LD_PRELOAD=<build/lib/libaocl-libmem.so> LIBMEM_OPERATION=avx2,b,y <executable>

LIBMEM_THRESHOLD

You can set the tunable LIBMEM_THRESHOLD as follows:

LIBMEM_THRESHOLD=<repmov_start_threshold>,<repmov_stop_threshold>,<nt_start_threshold>,
<nt_stop_threshold>

With this option, the library will choose the implementation with tuned threshold settings for supported instruction sets: {vector, rep mov, non-temporal}.

Valid Options

  • <repmov_start_threshold> = [0, +ve integers]

  • <repmov_stop_threshold> = [0, +ve integers, -1]

  • <nt_start_threshold> = [0, +ve integers]

  • <nt_stop_threshold> = [0, +ve integers, -1]

Where, -1 refers to the maximum length.

Refer to the following table for sample threshold settings:

Table 13.2 Sample Threshold Settings#

LIBMEM_THRESHOLD

Vector Range

RepMov Range

Non-Temporal Range

0,2048,1048576,-1

(2049, 1048576)

[0,2048]

[1048576, max value of unsigned long long)

0,0,1048576,-1

[0,1048576)

[0,0]

[1048576, max value of unsigned long long)

Note

A system configured threshold will be chosen if the tunable is in an invalid format.

For example, to use REP MOVE instructions for a range of 1 KB to 2 KB and non_temporal instructions for a range of 512 KB and above:

$ LD_PRELOAD=<build/lib/libaocl-libmem.so> LIBMEM_THRESHOLD=1024,2048,524288,-1 <executable>