Overview - 2023.2 English

Vitis Libraries

Release Date
2023.2 English

The regex-VM aims at the work of converting unstructured texts, like log files, into structured ones. Therefore, state-of-art high-throughput matching algorithm for comparing one string with many patterns, like that used in hyperscan, cannot work well in our target context. We chose VM-based approach, as it allows us to offer drop-in replacement in popular data transformation tools with regex often written in dialect of Perl, Python or Ruby.

The regex-VM consists of two parts: a software compiler written in C and a hardware virtual machine (VM) in C++.

  1. Software compiler: compiles any regular expression given by user into an instruction list along with the corresponding bit-set map, number of instructions/character-classes/capturing-groups, and the name of each capturing group (if specified in input pattern).
  2. Hardware VM: which takes the outputs from the compiler mentioned above to construct a practical matcher to match the string given in message buffer, and emit a 2-bit match flag indicating whether the input string is matched with the pattern or an internal stack overflow is happened. Futhermore, if the input string is matched, the offset addresses for each capturing group is provided in the output offset buffer, users can find the sub-strings in interest by picking them out from the whole input string according to the information given in that buffer.