Introduction to using regular expressions
A regular expression describes a pattern of characters. Regular expressions are typically used to verify that a text value conforms to a particular pattern (such as verifying that a user-entered phone number has the proper number of digits) or to replace portions of a text value that matches a particular pattern.
Matching a set of characters
In the land of regular expressions, most characters match themselves. The only exceptions are the following 12 special characters:
$()*+.?[\^{|
These characters have special meanings in regular expressions. If you want your regex to match any of them, precede them with a backslash. For example, to look for a literal dollar sign, your regex needs to use this:
\$
This is known as escaping the special character.
Special wildcard characters (metacharacters)
There are a number of common character groups that have their own built-in shortcuts. Digits are one of them: \d means the same thing as [0-9].
Metacharacter | Matches |
\d | Any digit character |
\w | An alphanumeric character (“word character”) |
\s | Any whitespace character (space, tab, newline, and similar) |
\D | A character that is not a digit |
\W | A nonalphanumeric character |
\S | A nonwhitespace character |
. | Any character except for newline |
With the exception of the dot (or period), all the wildcard character sequences begin with a backslash. The dot metacharacter matches anything, including a space, punctuation mark, or even itself. The only thing it doesn’t match is a newline. This is a common source of mistakes when composing regular expressions.
Within square brackets, a dash (-) between two characters can be used to indicate a range of characters, where the ordering is determined by the character’s Unicode number. Characters 0 to 9 sit right next to each other in this ordering (codes 48 to 57), so [0-9] covers all of them and matches any digit.
Defining what to match with character classes
Although metacharacters are very useful, you often want to be more selective. Character classes let you do just that. A character class allows you to specify a range of permitted characters. To create a custom character class, list the characters inside a pair of square brackets like this:
Repeating parts of a pattern
We now know how to match a single digit. What if we want to match a whole number—a sequence of one or more digits?
Quantifier | Meaning |
* | Match 0 or more times |
+ | Match 1 or more times |
? | Match no more than once (makes the character or group optional) |
{n} | Match exactly n times |
{n,m} | Match at least n, but no more than m times |
*? | Match 0 or more times, but as few times as possible |
+? | Match 0 or 1 times, but as few times as possible |
?? | Match 0 or 1 times, but as few times as possible |
{n}? | Match at least n times, but as few times as possible |
{n,m}? | Match at least n times, no more than m times, and as few times as possible |
Examples
\b[1-9][0-9]{3}\b | match a number between 1000 and 9999 |
\b[1-9][0-9]{2,4}\b | matches a number between 100 and 99999 |
Online Regex Tester regex101.com
regex101.com is an online regex tester that lets you find ready-to-user regexes, accoring to your needs! Just go to Regex Library and search for the desired term (eg. phone) and find the regex that suits you.