AOCL-DLP - 5.2 English - 68552

AOCL API Guide (68552)

Document ID
68552
Release Date
2025-12-29
Version
5.2 English

5. AOCL-DLP

AOCL-DLP is an optimized implementation of deep learning primitives for AMD Zen-based processors, providing high-performance implementations of fundamental operations including GEMM (General Matrix Multiplication), batch GEMM, element-wise operations, and matrix transformations with support for multiple data types (FP32, BF16, INT8) and post-operations.

5.7. Quick API Lookup

5.7.1. Core GEMM Operations

Table 5.10 GEMM API Functions#

Function Pattern

Description

aocl_gemm_f32f32f32of32

Float32 precision GEMM

aocl_gemm_bf16bf16f32of32

BFloat16 inputs, float32 output

aocl_gemm_u8s8s32os32

Unsigned/signed 8-bit quantized GEMM

aocl_gemm_s8s8s32os8

Signed 8-bit quantized GEMM

5.7.2. Batch Operations

Table 5.11 Batch GEMM Functions#

Function Pattern

Description

aocl_batch_gemm_*

Batch processing for multiple matrices

5.7.3. Matrix Utilities

Table 5.12 Matrix Utility Functions#

Function Pattern

Description

aocl_get_reorder_buf_size_*

Get buffer size for matrix reordering

aocl_reorder_*

Reorder matrix for optimal performance

aocl_unreorder_*

Convert reordered matrix back to normal format

5.7.4. Element-wise Operations

Table 5.13 Element-wise Functions#

Function Pattern

Description

aocl_gemm_eltwise_ops_*

Apply element-wise operations to matrices

5.7.5. Utility Functions

Table 5.14 Utility Functions#

Function Pattern

Description

aocl_gemm_gelu_*

GELU activation functions

aocl_gemm_softmax_*

Softmax functions

5.7.6. Library Management

Table 5.15 Library Functions#

Function

Description

dlp_thread_set_num_threads

Configure thread count

dlp_thread_set_ways

Configure parallelization strategy

dlp_aocl_enable_instruction_query

Query hardware capabilities

5.8. API Selection Guide

5.8.1. Choose the Right GEMM Variant

By Precision Requirements:

  1. High Precision: f32f32f32of32 for maximum accuracy

  2. Balanced: bf16bf16f32of32 for good accuracy with reduced memory

  3. Quantized: u8s8s32os32 or s8s8s32os8 for inference

By Performance Needs:

  1. Single Operation: Standard GEMM functions

  2. Multiple Operations: Batch GEMM functions

  3. Repeated Operations: Use matrix reordering

5.8.2. Data Type Naming Convention

Function names follow the pattern: [input_A][input_B][accumulation]o[output]

  • f32 = float32

  • bf16 = bfloat16

  • u8 = uint8

  • s8 = int8

  • s32 = int32

Example: bf16bf16f32of32 = bfloat16 inputs, float32 accumulation and output

5.9. See Also

  • API Overview - API design principles and usage patterns

  • GEMM Operations - GEMM operations documentation

  • Post-Operations - Post-operations framework

6. Indices and Tables

  • Index

  • Module Index

  • Search Page