From 835bd709a8ebfe5dbe4d671a60a841a0f9a18879 Mon Sep 17 00:00:00 2001 From: Akos Kiss Date: Fri, 19 Jun 2020 01:05:13 +0200 Subject: [PATCH] Be more precise in documentation of char set when escaping is needed --- doc/lexer-rules.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/doc/lexer-rules.md b/doc/lexer-rules.md index 5070f4790..b12f54a88 100644 --- a/doc/lexer-rules.md +++ b/doc/lexer-rules.md @@ -58,7 +58,7 @@ Match that character or sequence of characters. E.g., ’while’ or ’=’. [char set] -

Match one of the characters specified in the character set. Interpret x-y as the set of characters between range x and y, inclusively. The following escaped characters are interpreted as single special characters: \n, \r, \b, \t, \f, \uXXXX, and \u{XXXXXX}. To get ], \, or - you must escape them with \.

+

Match one of the characters specified in the character set. Interpret x-y as the set of characters between range x and y, inclusively. The following escaped characters are interpreted as single special characters: \n, \r, \b, \t, \f, \uXXXX, and \u{XXXXXX}. To get ] or \ you must escape them with \. To get - you must escape it with \ too, except for the case when - is the first or last character in the set.

You can also include all characters matching Unicode properties (general category, boolean, or enumerated including scripts and blocks) with \p{PropertyName} or \p{EnumProperty=Value}. (You can invert the test with \P{PropertyName} or \P{EnumProperty=Value}).

@@ -90,6 +90,8 @@ UNICODE_ID : [\p{Alpha}\p{General_Category=Other_Letter}] [\p{Alnum}\p{General_C EMOJI : [\u{1F4A9}\u{1F926}] ; // note Unicode code points > U+FFFF DASHBRACK : [\-\]]+ ; // match - or ] one or more times + +DASH : [---] ; // match a single -, i.e., "any character" between - and - (note first and last - not escaped)