AOCL-Utils provides utilities that applications can use to optimize their performance.
CPU Detection for Optimization
Applications can use the CPU detection APIs to make runtime decisions about code paths:
#include "Au/Cpuid/X86Cpu.hh"
Au::X86Cpu cpu{0};
if (cpu.isAMD() && cpu.hasAVX512()) {
// Use AVX512 optimized path
} else if (cpu.hasAVX2()) {
// Use AVX2 optimized path
} else {
// Use scalar fallback
}
The au_cpuid module provides:
CPU architecture detection (AMD Zen family detection)
Instruction set detection (AVX2, AVX512, FMA3, etc.)
Feature flag detection (AVX512_VNNI, AVX512_BF16, etc.)
Cache size queries for data layout decisions
Thread Pinning for Performance
Applications can use thread pinning to improve performance:
#include "Capi/au/threadpinning.h"
pthread_t threads[NUM_THREADS];
// Create threads...
// Spread threads across NUMA domains
au_pin_threads_spread(threads, NUM_THREADS);
Thread pinning strategies:
CORE: Pins threads to physical cores, avoids simultaneous multithreading (SMT) siblings (best for compute-intensive)
LOGICAL: Pins threads to logical processors sequentially (good for memory-bound)
SPREAD: Spreads threads across NUMA domains to maximize memory bandwidth (best for multi-socket)
CUSTOM: User-defined affinity vectors for fine-grained control
Performance benefits:
Reduces cache misses due to thread migration
Improves memory locality on NUMA systems
Can improve performance 10-30% on multi-socket systems
Provides consistent performance (avoids OS migration)
Logger for Diagnostics
Applications can use the logger module for diagnostics with minimal overhead:
Thread-safe logging
Multiple log levels (TRACE, DEBUG, INFO, WARN, ERROR, FATAL)
Asynchronous logging for minimal performance impact