XRT 提供了错误报告 API 和工具。这些错误可分为两种类型:
- 同步错误
- 在 XRT 运行时函数调用期间可检测到的错误。
- 异步错误
- 来自下层驱动程序、系统、硬件等的错误。
auto ghdl=xrt::graph(device,uuid,"gr");
try{
ghdl.update("gr.fir24.in[1]",narrow_filter);
ghdl.run(16);
ghdl.read("gr.fir24.inout[0]", coeffs_readback);//Async read
}catch(std::exception const& e){
std::cout<<"Graph Execution Error"<<std::endl;
return 1;
}
异步错误可能与当前 XRT 函数调用或正在运行的应用无关。异步错误缓存在驱动程序子系统内,可供用户应用通过异步错误报告 API 来访问。缓存的错误将长久保存直至被显式清除为止。持久存在的错误并不一定表示当前系统状态,例如,开发板可能已复位且正常工作,而先前缓存的错误仍可用。为避免混淆当前状态,异步错误附有时间戳以指示错误发生时间。例如,此时间戳可与最近的 xbutil
复位时间戳进行比较。
驱动程序缓存的错误包含系统错误代码和 https://github.com/Xilinx/XRT/blob/master/src/runtime_src/core/include/xrt_error_code.h 中定义的额外元数据,此元数据在用户空间与内核空间之间共享。
XRT 错误处理 API 可引用 experimental/xrt_error.h
。异步错误处理示例:
xrt::error error(device, XRT_ERROR_CLASS_AIE);
auto errCode = error.get_error_code();
auto timestamp = error.get_timestamp();
auto err_str = error.to_string();
/* code to deal with this specific error */
std::cout<<"Async error: "<< err_str << std::endl;
异步错误输出示例:
Error Number (6): AIE_ACCESS
Error Driver (4): DRIVER_AIE
Error Severity (3): SEVERITY_CRITICAL
Error Module (3): MODULE_AIE_CORE
Error Class (2): CLASS_AIE
Timestamp: 1637342412366664740
XRT 会维护每个类的最新代码和关联的时间戳(指示错误生成时间)。在 https://github.com/Xilinx/XRT/blob/master/src/runtime_src/core/include/xrt_error_code.h 中可对错误信息进行解读。例如,Error
Module (3): MODULE_AIE_CORE
对应于枚举 xrtErrorModule
中的 XRT_ERROR_MODULE_AIE_CORE
。
xbutil
可用于报告错误。错误报告会累积来自先前各类的所有错误,并按时间戳对其进行排序。此报告会查询驱动程序,了解上次请求复位的时间。
$ xbutil examine -r error -d 0
Asynchronous Errors
Time Class Module Driver Severity Error Code
Fri Nov 19 17:19:42 2021 GMT CLASS_AIE MODULE_AIE_CORE DRIVER_AIE SEVERITY_CRITICAL AIE_ACCESS
$ xbutil examine -r error -f json -o <OUTPUT_FILE> -d 0
{
"schema_version": {
"schema": "JSON",
"creation_date": "Fri Nov 19 17:58:09 2021 GMT"
},
"devices": [
{
"interface_type": "pcie",
"device_id": "0000:00:00.0",
"asynchronous_errors": [
{
"time": {
"epoch": "1637342382770339700",
"timestamp": "Fri Nov 19 17:19:42 2021 GMT"
},
"class": "CLASS_AIE",
"module": "MODULE_AIE_CORE",
"severity": "SEVERITY_CRITICAL",
"driver": "DRIVER_AIE",
"error_code": {
"error_id": "6",
"error_msg": "AIE_ACCESS"
}
}
]
}
]
}
xbutil
还可用于报告 AI 引擎运行状态和读取寄存器以便调试。例如,以下命令会在执行计算图后读取内核状态。
$ xbutil examine -r aie -d 0
--------------------------
1/1 [0000:00:00.0] : edge
--------------------------
Aie
Aie_Metadata
GRAPH[ 0] Name : gr
Status : unknown
SNo. Core [C:R] Iteration_Memory [C:R] Iteration_Memory_Addresses
[ 0] 23:1 23:1 16388
[ 1] 23:2 23:0 6980
[ 2] 23:3 23:1 4
[ 3] 24:1 24:0 4
[ 4] 24:2 24:2 4
[ 5] 24:3 24:1 4
[ 6] 25:1 25:1 4
Core [ 0]
Column : 23
Row : 1
Core:
Status : disabled, core_done
Program Counter : 0x00000308
Link Register : 0x00000290
Stack Pointer : 0x000340a0
DMA:
MM2S:
Channel:
Id : 0
Channel Status : idle
Queue Size : 0
Queue Status : okay
Current BD : 0
Id : 1
Channel Status : idle
Queue Size : 0
Queue Status : okay
Current BD : 0
S2MM:
Channel:
Id : 0
Channel Status : idle
Queue Size : 0
Queue Status : okay
Current BD : 0
Id : 1
Channel Status : idle
Queue Size : 0
Queue Status : okay
Current BD : 0
Locks:
0 : released_for_write
1 : released_for_write
2 : released_for_write
3 : released_for_write
4 : released_for_write
5 : released_for_write
6 : released_for_write
7 : released_for_write
8 : released_for_write
9 : released_for_write
10 : released_for_write
11 : released_for_write
12 : released_for_write
13 : released_for_write
14 : released_for_write
15 : released_for_write
Events:
core : 1, 2, 5, 22, 23, 24, 28, 29, 31, 32, 35, 36, 38, 39, 40, 44, 45, 47, 68
memory : 1, 43, 44, 45, 106, 113
......
Core [ 6]
Column : 25
Row : 1
Core:
Status : enabled, east_lock_stall
Program Counter : 0x000001e6
Link Register : 0x000000b0
Stack Pointer : 0x00030020
DMA:
MM2S:
Channel:
Id : 0
Channel Status : stalled_on_requesting_lock
Queue Size : 0
Queue Status : okay
Current BD : 2
Id : 1
Channel Status : idle
Queue Size : 0
Queue Status : okay
Current BD : 0
S2MM:
Channel:
Id : 0
Channel Status : running
Queue Size : 0
Queue Status : okay
Current BD : 0
Id : 1
Channel Status : idle
Queue Size : 0
Queue Status : okay
Current BD : 0
Locks:
0 : acquired_for_write
1 : released_for_write
2 : released_for_write
3 : released_for_write
4 : released_for_write
5 : released_for_write
6 : released_for_write
7 : released_for_write
8 : released_for_write
9 : released_for_write
10 : released_for_write
11 : released_for_write
12 : released_for_write
13 : released_for_write
14 : released_for_write
15 : released_for_write
Events:
core : 1, 2, 5, 22, 26, 28, 29, 31, 32, 35, 38, 39, 44
memory : 1, 20, 21, 23, 35, 43, 44, 106, 113
以下命令可用于读取特定寄存器以便调试。
$ xbutil advanced --read-aie-reg -d 0 0 25 Core_Status
Register Core_Status Value of Row:0 Column:25 is 0x00000201
如需了解 AI 引擎寄存器定义,请参阅
Versal 自适应 SoC AI 引擎寄存器参考资料(AM015)。如需了解有关 xbutil
命令使用的详细信息,请参阅 Xilinx Runtime (XRT) 架构。如需了解有关 Vitis IDE 中的错误分析的信息,请参阅 分析 AI 引擎状态。