Regular expression repetition characters
Notice that the character "?" pulls double duty. When used as the only repetition specifier, "?" means to match zero or more occurrences of the previous character. For example, /a?/ matches one or more "a" characters in sequence. When used as the second character of a repetition specifier, as in "*?", "+?", and "{n,}?", a question mark "?" indicates a minimal match. What is meant by a minimal match?
Well obviously, it is the counterpart to a maximal match, which is the default for JavaScript and PERL regular expressions. A maximal match will include the maximum number of characters in a text that will qualify to match a regular expression pattern. For example, in the string "one two three", the pattern /o.*e/ will match the text "one two three". Why? The pattern says to match text that begins with the character "o" followed by zero or more of any characters up to the character "e". Since the default is a maximal match, the whole string is matched since it begins with "o" and ends with "e". Often, this maximal match behavior is not what is expected or desired.
Now consider a similar match using the minimal character. The string is still "one two three", but the pattern becomes /o.*?e/. Notice that the only difference is the addition of a question mark "?" as the second repetition character after the "*". The text matched this time is "one", which is the minimal number of characters that match the conditions of the regular expression pattern.
So, it might be a good habit to begin reading regular expression patterns with a maximal and minimal vocabulary. As an example, lets spell out how we could read the two patterns in the current example.
•
"o.*e" - match text that begin with "o" and has the maximum number of characters possible until the last "e" is encountered.
•
"o.*?e" - match text that begins with "o" and has the minimum number of characters possible until the first "e" is encountered.
Sometimes a maximal match is called a greedy match and a minimal match is called a non-greedy match.
Repetition |
How many characters matched
|
? |
Match zero or one occurrence of the previous character or sub pattern. Same as {0,1}
|
* |
Match zero or more occurrences of the previous character or sub pattern. A maximal match, that is, match as many characters as will fulfill the regular expression. Same as {0,}
|
*? |
Match zero or more occurrences of the previous character or sub pattern. A minimal match, that is, match as few characters as will fulfill the regular expression. Same as {0,}?
|
+ |
Match one or more occurrences of the previous character or sub pattern. A maximal match, that is, match as many characters as will fulfill the regular expression. Same as {1,}
|
+? |
Match one or more occurrences of the previous character or sub pattern. A minimal match, that is, match as few characters as will fulfill the regular expression. Same as {1,}?
|
{n} |
Match n occurrences of the previous character or sub pattern.
|
{n,} |
Match n or more occurrences of the previous character or sub pattern. A maximal match, that is, match as many characters as will fulfill the regular expression.
|
{n,}? |
Match n or more occurrences of the previous character or sub pattern. A minimal match, that is, match as few characters as will fulfill the regular expression.
|
{n, m} |
Match the previous character or sub pattern at least n times but not more than m times.
|
Regular expression character classes