Quantized Inference - 5.2 English - 68552

AOCL API Guide (68552)

Document ID
68552
Release Date
2025-12-29
Version
5.2 English

Workflow for quantized neural networks:

// 1. Load quantized weights and scales
int8_t *weights_q = load_quantized_weights();
float *scales = load_scales();

// 2. Set up quantization post-ops
aocl_post_op post_ops;
setup_quantization_post_ops(&post_ops, scales, zero_points);

// 3. Process quantized inputs
aocl_gemm_u8s8s32os8(
    'R', 'N', 'N', m, n, k,
    1, input_q, k, 'N',
    weights_q, n, 'N',
    0, output_q, n,
    &post_ops
);