18.4.2. Application Optimization - 5.2 English - 57404

AOCL User Guide (57404)

Document ID
57404
Release Date
2025-12-29
Version
5.2 English

AOCL-Utils provides utilities that applications can use to optimize their performance.

CPU Detection for Optimization

Applications can use the CPU detection APIs to make runtime decisions about code paths:

#include "Au/Cpuid/X86Cpu.hh"

Au::X86Cpu cpu{0};
if (cpu.isAMD() && cpu.hasAVX512()) {
    // Use AVX512 optimized path
} else if (cpu.hasAVX2()) {
    // Use AVX2 optimized path
} else {
    // Use scalar fallback
}

The au_cpuid module provides:

  • CPU architecture detection (AMD Zen family detection)

  • Instruction set detection (AVX2, AVX512, FMA3, etc.)

  • Feature flag detection (AVX512_VNNI, AVX512_BF16, etc.)

  • Cache size queries for data layout decisions

Thread Pinning for Performance

Applications can use thread pinning to improve performance:

#include "Capi/au/threadpinning.h"

pthread_t threads[NUM_THREADS];
// Create threads...

// Spread threads across NUMA domains
au_pin_threads_spread(threads, NUM_THREADS);

Thread pinning strategies:

  • CORE: Pins threads to physical cores, avoids simultaneous multithreading (SMT) siblings (best for compute-intensive)

  • LOGICAL: Pins threads to logical processors sequentially (good for memory-bound)

  • SPREAD: Spreads threads across NUMA domains to maximize memory bandwidth (best for multi-socket)

  • CUSTOM: User-defined affinity vectors for fine-grained control

Performance benefits:

  • Reduces cache misses due to thread migration

  • Improves memory locality on NUMA systems

  • Can improve performance 10-30% on multi-socket systems

  • Provides consistent performance (avoids OS migration)

Logger for Diagnostics

Applications can use the logger module for diagnostics with minimal overhead:

  • Thread-safe logging

  • Multiple log levels (TRACE, DEBUG, INFO, WARN, ERROR, FATAL)

  • Asynchronous logging for minimal performance impact