Debugging Memory Access Violations - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID
XD100
Release Date
2024-03-05
Version
2023.2 English

Memory access violations occur when a kernel is reading or writing out of bounds of an object or reading uninitialized memory. This can manifest itself in multiple ways, such as a simulator crash or hang. This debug feature helps to find out of range memory access from each tile during the AI Engine emulation runtime; however, this option impacts runtime performance negatively. The ‘out of range’ memory access indicates valid address assignment within each section. There could be certain addresses that are not assigned between sections.

The aiesimulator option, --enable-memory-check, helps to find out these out of range memory accesses.

Besides using the simulator option, it is necessary to first identify an invalid address from the design. To do this:

  1. In a Linux terminal where a valid Vitis installation/setup is done, issue the following command to list a specific tile’s valid memory addresses and sizes assigned by the AI Engine compiler:

    For example, to get valid memory addresses of peak_detect kernel:

    # For Vitis IDE project
    cd ${PROJECT_PATH}/peakDetect/aie_component/build/hw/Work/aie/25_0/Release/25_0
    readelf -S 25_0
    
  2. The output of the readelf command is as follows.

    readelf -S 25_0
    There are 33 section headers, starting at offset 0x47484:
    
    Section Headers:
      [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
      [ 0]                   NULL            00000000 000000 000000 00      0   0  0
      [ 1] .shstrtab         STRTAB          00000000 00268f 00016c 00      0   0  1
      [ 2] .strtab           STRTAB          00000000 0027fb 000872 00      0   0  0
      [ 3] .symtab           SYMTAB          00000000 00306d 000410 10      2  41  0
      [ 4] .bss.DMb.16       NOBITS          00029e00 000294 000200 00  WA  0   0  1
      [ 5] .bss.DMb.16       NOBITS          0002c000 000294 000200 00  WA  0   0  1
      [ 6] .bss.DMb.16       NOBITS          00031e00 000294 000200 00  WA  0   0  1
      [ 7] .data.DMb.4       PROGBITS        000357e0 000294 000020 00  WA  0   0  1
      [ 8] .data.DM_bankA.4  PROGBITS        00035c00 0002b4 000024 00  WA  0   0  1
      [ 9] .bss.DMb.16       NOBITS          00038000 0002d8 000200 00  WA  0   0  1
      [10] .bss.DMb.16       NOBITS          0003a000 0002d8 000200 00  WA  0   0  1
      [11] .bss.DMb.16       NOBITS          0003c000 0002d8 000200 00  WA  0   0  1
      [12] .text             PROGBITS        00000000 0002d8 000102 00  AX  0   0  1
      [13] .text             PROGBITS        00000110 0003da 0002d8 00  AX  0   0  1
      [14] .text             PROGBITS        000003f0 0006b2 0003de 00  AX  0   0  1
      [15] .text             PROGBITS        000007d0 000a90 0000be 00  AX  0   0  1
      [16] .text             PROGBITS        00000890 000b4e 000094 00  AX  0   0  1
      [17] .debug_line       PROGBITS        00000000 00347d 00169b 00      0   0  0
      [18] .debug_info       PROGBITS        00000000 004b18 00a78c 00      0   0  0
      [19] .debug_abbrev     PROGBITS        00000000 00f2a4 000755 00      0   0  0
      [20] .debug_loc        PROGBITS        00000000 00f9f9 005581 00      0   0  0
      [21] .debug_frame      PROGBITS        00000000 014f7a 000d54 00      0   0  0
      [22] .debug_ranges     PROGBITS        00000000 015cce 0017b8 00      0   0  0
      [23] .debug_str        STRTAB          00000000 017486 023195 00      0   0  0
      [24] .debug_pubnames   PROGBITS        00000000 03a61b 00ad75 00      0   0  0
      [25] .debug_pubtypes   PROGBITS        00000000 045390 00208c 00      0   0  0
      [26] .tctmemstrtab     STRTAB          00000000 000be2 000a78 00      0   0  0
      [27] .tctmemtab        LOPROC+0x123456 00000000 04741c 000068 08     26   0  0
      [28] .tctmemtab        LOPROC+0x123467 00000000 00165a 000068 08     26   0  0
      [29] .stackinfo        LOPROC+0x123458 00000000 0016c2 000010 10      0   0  0
      [30] .rtstab           LOPROC+0x123469 00000000 0016d2 0000a9 0d      0   0  0
      [31] .eoltab           LOPROC+0x123470 00000000 00177b 0008f4 0c      0   0  0
      [32] .chesstypean[...] LOPROC+0x123468 00000000 00206f 000620 10      0   0  0
      Key to Flags:
      W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
      L (link order), O (extra OS processing required), G (group), T (TLS),
      C (compressed), x (unknown), o (OS specific), E (exclude),
      D (mbind), p (processor specific)
    

    Here, the flags WA and AX indicates, this section will be loaded into the tile memory.

  3. Add the memory read violation to the kernel code by opening src/kernels/peak_detect.cc, and change line 26 to v_in = *(InIter+8500).

  4. Build the [aiengine] domain project, add the --enable-memory-check option to the Run Configurations, and run the aiesimulation.

  5. Observe the following messages in the console.

    Waiting for core(s) of graph mygraph to finish execution ...
    670400 ps [ERROR] tl.aie_logical.aie_xtlm.math_engine.array.tile_25_1.cm.proc: dme_lda_e_out access out of 	  boundary! address = 0xbcd20 prog_cntr = 0x00010010010100
    670400 ps [ERROR] tl.aie_logical.aie_xtlm.math_engine.array.tile_25_1.cm.proc: dme_ldb_e_out access out of boundary! address = 0xbcd00 prog_cntr = 0x00010010010100
    670400 ps [ERROR] tl.aie_logical.aie_xtlm.math_engine.array.tile_25_1.cm.proc: dmo_lda_e_out access out of boundary! address = 0xbcd30 prog_cntr = 0x00010010010100
    670400 ps [ERROR] tl.aie_logical.aie_xtlm.math_engine.array.tile_25_1.cm.proc: dmo_ldb_e_out access out of boundary! address = 0xbcd10 prog_cntr = 0x00010010010100
    

    The address 0x0000bcd30 is out of range from the valid addresses you see from the readelf command.

  6. The AI Engine simulation generates the ${PROJECT_PATH}/peakDetect/aie_component/build/hw/AIESim_Guidance.json file which can be viewed when the ${PROJECT_PATH}/peakDetect/aie_component/build/hw/aiesimulator_output/default.aierun_summary file is opened in the Vitis Analyzer. aiesim guidance

  7. The PC column in the Memory Violations tab helps redirecting to the kernel function that has a memory violation.

    NOTE: Currently, if the violation impacts the API, e.g., aie::mul in this case, clicking the PC might not redirect you to the exact kernel function. A general recommendation in these kind of cases where memory violations cannot be identified is to use the x86simulator with the valgrind option as explained in Memory Access Violation and Valgrind Support for the x86simulator.

  8. Revert the changes in the source code to exercise other debug features.