Due to limited resources and a strict release timeline, in current release, we only provide the most frequently used OPs in the whole OP list given by Oniguruma library to support common regular expressions. This can be shown in the following table:
OP list | Supported |
FINISH | NO |
END | YES |
STR_1 | YES |
STR_2 | YES |
STR_3 | YES |
STR_4 | YES |
STR_5 | YES |
STR_N | YES |
STR_MB2N1 | NO |
STR_MB2N2 | NO |
STR_MB2N3 | NO |
STR_MB2N | NO |
STR_MB3N | NO |
STR_MBN | NO |
CCLASS | YES |
CCLASS_MB | NO |
CCLASS_MIX | NO |
CCLASS_NOT | YES |
CCLASS_MB_NOT | NO |
CCLASS_MIX_NOT | NO |
ANYCHAR | YES |
ANYCHAR_ML | NO |
ANYCHAR_STAR | YES |
ANYCHAR_ML_STAR | NO |
ANYCHAR_STAR_PEEK_NEXT | NO |
ANYCHAR_ML_STAR_PEEK_NEXT | NO |
WORD | NO |
WORD_ASCII | NO |
NO_WROD | NO |
NO_WORD_ASCII | NO |
WORD_BOUNDARY | NO |
NO_WORD_BOUNDARY | NO |
WORD_BEGIN | NO |
WORD_END | NO |
TEXT_SEGMENT_BOUNDARY | NO |
BEGIN_BUF | YES |
END_BUF | YES |
BEGIN_LINE | YES |
END_LINE | YES |
SEMI_END_BUF | NO |
CHECK_POSITION | NO |
BACKREF1 | NO |
BACKREF2 | NO |
BACKREF_N | NO |
BACKREF_N_IC | NO |
BACKREF_MULTI | NO |
BACKREF_MULTI_IC | NO |
BACKREF_WITH_LEVEL | NO |
BACKREF_WITH_LEVEL_IC | NO |
BACKREF_CHECK | NO |
BACKREF_CHECK_WITH_LEVEL | NO |
MEM_START | YES |
MEM_START_PUSH | YES |
MEM_END_PUSH | NO |
MEM_END_PUSH_REC | NO |
MEM_END | YES |
MEM_END_REC | NO |
FAIL | YES |
JUMP | YES |
PUSH | YES |
PUSH_SUPER | NO |
POP | YES |
POP_TO_MARK | YES |
PUSH_OR_JUMP_EXACT1 | YES |
PUSH_IF_PEEK_NEXT | NO |
REPEAT | YES |
REPEAT_NG | NO |
REPEAT_INC | YES |
REPEAT_INC_NG | NO |
EMPTY_CHECK_START | NO |
EMPTY_CHECK_END | NO |
EMPTY_CHECK_END_MEMST | NO |
EMPTY_CHECK_END_MEMST_PUSH | NO |
MOVE | NO |
STEP_BACK_START | YES |
STEP_BACK_NEXT | NO |
CUT_TO_MARK | NO |
MARK | YES |
SAVE_VAL | NO |
UPDATE_VAR | NO |
CALL | NO |
RETURN | NO |
CALLOUT_CONTECTS | NO |
CALLOUT_NAME | NO |
Therefore, the supported atomic regular expressions and their corresponding descriptions should be:
Regex | Description |
^ |
asserts position at start of a line |
$ |
asserts position at the end of a line |
\A |
asserts position at start of the string |
\z |
asserts position at the end of the string |
\ca |
matches the control sequence CTRL+a |
\C |
matches one data unit, even in UTF mode (best avoided) |
\c\\ |
matches the control sequence CTRL+\ |
\s |
matches any whitespace character (equal to [\r\n\t\f\v ] ) |
\S |
matches any non-whitespace character (equal to [^\r\n\t\f\v ] ) |
\d |
matches a digit (equal to [0-9] ) |
\D |
matches any character that’s not a digit (equal to [^0-9] ) |
\h |
matches any horizontal whitespace character (equal to [[:blank:]] ) |
\H |
matches any character that’s not a horizontal whitespace character |
\w |
matches any word character (equal to [a-zA-Z0-9_] ) |
\W |
matches any non-word character (equal to [^a-zA-Z0-9_] ) |
\^ |
matches the character ^ literally |
\$ |
matches the character $ literally |
\N |
matches any non-newline character |
\g'0' |
recurses the 0th subpattern |
\o{101} |
matches the character A with index with 101(oct) |
\x61 |
matches the character a (hex 61) literally |
\x{1 2} |
matches 1 (hex) or 2 (hex) |
\17 |
matches the character oct 17 literally |
abc |
matches the abc literally |
. |
matches any character (except for line terminators) |
| |
alternative |
[^a] |
match a single character not present in the list below |
[a-c] |
matches a , b , or c |
[abc] |
matches a , b , or c |
[:upper:] |
matches a uppercase letter [A-Z] |
a? |
matches the a zero or one time (greedy) |
a* |
matches a between zero and unlimited times (greedy) |
a+ |
matches a between one and unlimited times (greedy) |
a?? |
matches a between zero and one times (lazy) |
a*? |
matches a between zero and unlimited times (lazy) |
a+? |
matches a between one and unlimited times (lazy) |
a{2} |
matches a exactly 2 times |
a{0,} |
matches a between zero and unlimited times |
a{1,2} |
matches a one or two times |
{,} |
matches {,} literally |
(?#blabla) |
comment blabla |
(a) |
capturing group, matches a literally |
(?<name1> a) |
named capturing group name1 , matches a literally |
(?:) |
non-capturing group |
(?i) |
match the remainder of the pattern with the following effective flags: gmi (i modifier: insensitive) |
(?<!a)z |
matches any occurrence of z that is not preceded by a (negative look-behind) |
z(?!a) |
match any occurrence of z that is not followed by a (negative look-ahead) |
Attention
- Supported encoding method in current release is ASCII (extended ASCII codes are excluded).
- Nested repetition is not supported