Host Programming for Bare-Metal - 2023.1 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2023-06-23
Version
2023.1 English

In a bare-metal/standalone environment, AMD provides standalone board support package (BSP), drivers, and libraries for applications to use to reduce development effort. As described in Host Programming on Linux, the top-level application for bare-metal systems must also integrate and manage the AI Engine graph and PL kernels.

Tip: The steps to integrate a bare-metal system with the AI Engine graph and PL kernels is described in Building a Bare-Metal System, or in Building a Bare-metal AI Engine in the Vitis IDE.
Figure 1. AI Engine Bare-Metal Software Stack

The following is an example top-level application (main.cpp) for a bare-metal system:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include "platform.h"
#include "xparameters.h"
#include "xil_io.h"
#include "xil_cache.h"
#include "input.h"
#include "golden.h"
...
void InitData(int32_t** out, int size)
{
    int i;
    *out = (int32_t*)malloc(sizeof(int32_t) * size);

    if(!out) {
        printf("Allocation of memory failed \n");
        exit(-1);
    }

    for(i = 0; i < size; i++) {
        (*out)[i] = 0xABCDEF00;
    }
}

int RunTest(uint64_t mm2s_base, uint64_t s2mm_base, int32_t* in, int32_t* golden, 
    int32_t* out, int input_size, int output_size)
{
    int i;
    int errCount = 0;
    uint64_t memAddr = (uint64_t)in;
    uint64_t mem_outAddr = (uint64_t)out;

    printf("Starting test w/ cu\n");

    Xil_Out32(mm2s_base + MEM_OFFSET, (uint32_t) memAddr);
    Xil_Out32(mm2s_base + MEM_OFFSET + 4, 0);
    Xil_Out32(s2mm_base + MEM_OFFSET, (uint32_t) mem_outAddr);
    Xil_Out32(s2mm_base + MEM_OFFSET + 4, 0);
    Xil_Out32(mm2s_base + SIZE_OFFSET, input_size);
    Xil_Out32(s2mm_base + SIZE_OFFSET, output_size);
    Xil_Out32(mm2s_base + CTRL_OFFSET, 1);
    Xil_Out32(s2mm_base + CTRL_OFFSET, 1);

    printf("GRAPH INIT\n");
    clipgraph.init();

    printf("GRAPH RUN\n");
    clipgraph.run();

    while(1) {
        uint32_t v = Xil_In32(s2mm_base + CTRL_OFFSET);
        if(v & 6) {
            break;
        }
    }

    printf("PLIO IP DONE!\n");

    for(i = 0; i < output_size; i++) {
        if((((int32_t*)out)[i] != ((int32_t*)golden)[i]) ) {
            printf("Error found in sample %d != to the golden %d\n", i+1, ((int32_t*)out)[i], ((int32_t*)golden)[i]);
            errCount++;
        }
        else
            printf("%d\n ",((int32_t*)out)[i]);
    }

    printf("Ending test w/ cu\n");
    return errCount;
}

int main()
{
    int i;
    int32_t* out;
    int errCount;

    Xil_DCacheDisable();
    init_platform();
    sleep(1);
    
    printf("Beginning test\n");
    InitData(&out, OUTPUT_SIZE);
    errCount = RunTest(MM2S_BASE, S2MM_BASE, (int32_t*)cint16input, int32golden, out, INPUT_SIZE, OUTPUT_SIZE);

    if(errCount == 0)
        printf("Test passed. \n");
    else
        printf("Test failed! Error count: %d \n",errCount);

    cleanup_platform();
    return errCount;
}

The following are the steps in the code example:

  • The main() function initializes the platform, data, runs the test, verifies the return code, and return the error code.
  • Xil_DCacheDisable() disable data cache. Essential to make sure data is synchronized between APIs.
  • InitData() allocates size of memory space and initializes successfully allocated memory space to known data.
  • RunTest() passes necessary data to the kernel to process and return a result.
  • clipgraph.init() initializes the tiles that kernels will be run on.
  • clipgraph.run() starts kernels running on associated tiles.

The preceding code example references xparameters.h which is automatically generated from the bare-metal BSP. The application needs to ensure the bare-metal BSP is properly generated so that the memory mapped addresses for all drivers are correctly assigned.

xil_io.h contains general driver I/O APIs. This is the preferred method for accessing drivers.