Before instantiating the hardware VM, users have to pre-compile their regular expression using the software compiler mentioned above first to check if the pattern is supported by the hardware VM. The compiler will give an error code XF_UNSUPPORTED_OPCODE
if the pattern is not supported. A pass code ONIG_NORMAL
along with the configurations (including instruction list, bit-set map etc.) will be given if the input is a valid pattern. Then, user should pass these configurations and the input message with its corresponding length in bytes to the hardware VM to trigger the matching process. The hardware VM will judge whether the input message is matched and provide the offset addresses for each capturing group in offset buffer.
It is important to be noticed that only the internal stack buffer is hold in hardware VM, user should allocate memories for bit-set map, instruction buffer, message buffer accordingly, and offset buffer respectively outside the hardware instantiation.
For the internal stack, its size is decided by the template parameter of the hardware VM. Since the storage resource it uses is URAM, the STACK_SIZE
should better be set to be a multiple of 4096 for not wasting the space of individual URAM block. Moreover, it is critical to choose the internal stack size wisely as the hardware VM will overflow if the size is too small or no URAMs will be available on board for you to instantiate more PUs to improve the throughput.
Code Example
The following section gives a usage example for using regex-VM in C++ based HLS design.
To use the regex-VM you need to:
- Compile the software regular expression compiler by running
make
command in pathL1/tests/text/regex_vm/re_compile
- Include the
xf_re_compile.h
header in pathL1/include/sw/xf_data_analytics/text
and theoniguruma.h
header in pathL1/tests/text/regex_vm/re_compile/lib/include
#include "oniguruma.h" #include "xf_re_compile.h"
- Compile your regular expression by calling
xf_re_compile
int r = xf_re_compile(pattern, bitset, instr_buff, instr_num, cclass_num, cpgp_num, NULL, NULL);
- Check the return value to see if its a valid pattern and supported by hardware VM.
ONIG_NORMAL
is returned if the pattern is valid, andXF_UNSUPPORTED_OPCODE
is returned if it’s not supported currently.
if (r != XF_UNSUPPORTED_OPCODE && r == ONIG_NORMAL) { // calling hardware VM here for acceleration }
- Once the regular expression is verified as a supported pattern, you may call hardware VM to match any message you want by
// for data types used in VM #include "ap_int.h" // header for hardware VM implementation #include "xf_data_analytics/text/regexVM.hpp" // allocate memory for bit-set map unsigned int bitset[8 * cclass_num]; // allocate memory for instruction buffer (derived from software compiler) uint64_t instr_buff[instr_num]; // allocate memory for message ap_uint<32> msg_buff[MESSAGE_SIZE]; // set up input message buffer according to input string unsigned str_len = strlen((const char*)in_str); for (int i = 0; i < (str_len + 3) / 4; i++) { for (int k = 0; k < 4; k++) { if (i * 4 + k < str_len) { msg_buff[i].range((k + 1) * 8 - 1, k * 8) = in_str[i * 4 + k]; } else { // pad white-space at the end msg_buff[i].range((k + 1) * 8 - 1, k * 8) = ' '; } } } // allocate memory for offset addresses for each capturing group uint16_t offset_buff[2 * (cpgp_num + 1)]; // initialize offset buffer for (int i = 0; i < 2 * CAP_GRP_NUM; i++) { offset_buff[i] = -1; } ap_uint<2> match = 0; // call for hardware acceleration (basic hardware VM implementation) xf::data_analytics::text:regexVM<STACK_SIZE>((ap_uint<32>*)bitset, (ap_uint<64>*)instr_buff, msg_buff, str_len, match, offset_buff); // or call for hardware acceleration (performance optimized hardware VM implementation) xf::data_analytics::text:regexVM_opt<STACK_SIZE>((ap_uint<32>*)bitset, (ap_uint<64>*)instr_buff, msg_buff, str_len, match, offset_buff);
The match flag and offset addresses for each capturing group are presented in match
and offset_buff
respectively with the format shown in the tables below.
Truth table for the 2-bit output match
flag of hardware VM:
Value | Description |
0 | mismatched |
1 | matched |
2 | internal stack overflow |
3 | reserved for future use |
Arrangement of the offset buffer offsetBuff
:
Address | Description |
0 | start position of the whole matched string |
1 | end position of the whole matched string |
2 | start position of the 1st capturing group |
3 | end position of the 1st capturing group |
4 | start position of the 2nd capturing group |
5 | end position of the 2nd capturing group |
… | … |