Before instantiating the reEngine, users have to pre-compile their regular expression using the software compiler provided in L1 first to check if the pattern is supported by the current implementation of hardware VM. The compiler will give an error code XF_UNSUPPORTED_OPCODE
if the pattern is not supported. A pass code ONIG_NORMAL
along with the configurations (including instruction list, bit-set map etc.) will be given if the input is a valid pattern. Then, user should pass these configurations, the input messages, along with its corresponding lengths in bytes under the format defined above to the reEngineKernel
to trigger the matching process. The reEngineKernel
will automatically be responsible for splitting them to individual inputs and feeding them into the buffers required by L1 regex-VM, also collecting the results after the matching process done and placing them into output result buffer correspondingly.
As mentioned in Regular Expression Virtual Machine (regex-VM), the bit-set map, instruction buffer, message buffer, and offset buffer are needed in hardware VM, and are not handled by itself. We think it is not necessary for users to deal with the details related to L1 at L2. Thus, these buffers will be automatically allocated in reEngineKernel
according to the template parameters given by users.
Code Example
The following section gives a usage example for using reEngine in C++ based HLS design.
Firstly, let me introduce the format of three buffer as the inputs of reEngineKernel
here:
cfg_buff
msg_buff
len_buff
To use the regex-VM you need to:
- Compile the software regular expression compiler by running
make
command in pathL1/tests/text/regex_vm/re_compile
- Include the
xf_re_compile.h
header in pathL1/include/sw/xf_data_analytics/text
and theoniguruma.h
header in pathL1/tests/text/regex_vm/re_compile/lib/include
#include "oniguruma.h" #include "xf_re_compile.h"
- Compile your regular expression by calling
xf_re_compile
// Number of instructions tranlated from the pattern unsigned int instr_num = 0; // Number of character classes in the pattern unsigned int cclass_num = 0; // Number of capturing groups in the pattern unsigned int cpgp_num = 0; // Bit set map unsigned int* bitset = new unsigned int[8 * CCLASS_NM]; // Configuration buffer uint64_t* cfg_buff = aligned_alloc<uint64_t>(INSTRUC_SIZE); // Suppose 1k bytes is long enough for names of each capturing group uint8_t* cpgp_name_val = aligned_alloc<uint8_t>(1024); // Suppose the number of capturing groups is less than 20 uint32_t* cpgp_name_offt = aligned_alloc<uint32_t>(20); // Leave 2 64-bit space for configuration headers int r = xf_re_compile(pattern, bitset, cfg_buff + 2, &instr_num, &cclass_num, &cpgp_num, cpgp_name_val, cpgp_name_offt); // Print a name table for all of the capturing groups printf("Name Table\n"); for (int i = 0; i < cpgp_num; i++) { printf("Group-%d: ", i); for (int j = 0; j < cpgp_name_offt[i + 1] - cpgp_name_offt[i]; j++) { printf("%c", cpgp_name_val[j + cpgp_name_offt[i]]); } printf("\n"); }
- Check the return value to see if its a valid pattern and supported by hardware VM.
ONIG_NORMAL
is returned if the pattern is valid, andXF_UNSUPPORTED_OPCODE
is returned if it’s not supported currently.
if (r != XF_UNSUPPORTED_OPCODE && r == ONIG_NORMAL) { // Prepare the buffers and call reEngine for acceleration here }
- Once the regular expression is verified as a supported pattern, you may prepare the input buffers and get the results by
// Header for reEngine #include "re_engine_kernel.hpp" // Header for reading log file as std::string #include <iostream> #include <fstream> #include <string.h> // Total number of configuration blocks // leave 2 blocks for configuration header unsigned int cfg_nm = 2 + instr_num; // Message buffer (64-bit width for full utilizing the 2 memory ports of BRAMs) uint64_t* msg_buff = aligned_alloc<uint64_t>(MAX_MSG_SZ); // Length buffer uint16_t* len_buff = aligned_alloc<uint16_t>(MAX_MSG_NM); // Append bit-set map to the tail of instruction list for (unsigned int i = 0; i < cclass_num * 4; i++) { uint64_t tmp = bitset[i * 2 + 1]; tmp = tmp << 32; tmp += bitset[i * 2]; cfg_buff[cfg_nm++] = tmp; } // Set configuration header accordingly typedef union { struct { uint32_t instr_nm; uint16_t cc_nm; uint16_t gp_nm; } head_st; uint64_t d; } cfg_info; cfg_info cfg_h; cfg_h.head_st.instr_nm = instr_num; cfg_h.head_st.cc_nm = cclass_num; cfg_h.head_st.gp_nm = cpgp_num; cfg_buff[0] = cfg_nm; cfg_buff[1] = cfg_h.d; // String of each line in the log std::string line; // We provide a 5k line apache log std::ifstream log_file(log_data/access_5k.log); if (log_file.is_open()) { // Read the apache log line-by-line while (getline(log_file, line)) { if (line.size() > 0) { if (writeOneLine(msg_buff, len_buff, offt, msg_nm, line) != 0) { return -1; } } } // Set the header of message buffer (number of message blocks in 64-bit) msg_buff[0] = offt; // Set the header of length buffer (concatenate the first 2 blocks, it presents the total number of messages in msg_buff) len_buff[0] = msg_nm / 65536; len_buff[1] = msg_nm % 65536; } else { printf("Opening input log file failed.\n"); return -1; } // Result buffer uint32_t* out_buff = aligned_alloc<uint32_t>((cpgp_num + 1) * msg_nm); // Call reEngine reEngineKernel(reinterpret_cast<ap_uint<64>*>(cfg_buff), reinterpret_cast<ap_uint<64>*>(msg_buff), reinterpret_cast<ap_uint<16>*>(len_buff), reinterpret_cast<ap_uint<32>*>(out_buff));
The match flag and offset addresses for each capturing group are presented in out_buff
with the format shown in the figure below:
out_buff