Regular expressions — pattern language for matching text. Flavors differ slightly (PCRE, POSIX BRE/ERE, JS, Python re, .NET, Go RE2). This sheet uses PCRE-style unless noted.
Character classes
| Pattern | Matches |
|---|---|
. | any character except newline (unless s/dotall) |
\d / \D | digit / non-digit |
\w / \W | word char [A-Za-z0-9_] / non-word |
\s / \S | whitespace / non-whitespace |
\b / \B | word boundary / non-boundary |
[abc] | any of a, b, c |
[^abc] | none of a, b, c |
[a-z] | range |
[[:alpha:]] | POSIX class (alpha, digit, alnum, space, upper, lower, punct, xdigit, …) |
\p{L} | Unicode property — any letter (Unicode mode) |
\P{L} | inverse Unicode property |
Anchors
| Pattern | Meaning |
|---|---|
^ | start of string (or line in multiline mode) |
$ | end of string (or line in multiline mode) |
\A | start of string (always) |
\z | end of string (always) |
\Z | end of string, optionally before final newline |
\b | word boundary |
Quantifiers
| Pattern | Meaning |
|---|---|
* | 0 or more (greedy) |
+ | 1 or more |
? | 0 or 1 |
{n} | exactly n |
{n,} | n or more |
{n,m} | n to m |
*? +? ?? {n,m}? | lazy (match as little as possible) |
*+ ++ ?+ {n,m}+ | possessive (no backtracking — PCRE) |
".*" greedy: matches everything between first and last "
".*?" lazy: matches one quoted token
Groups
(abc) capturing group #1
(?:abc) non-capturing group
(?P<name>abc) named group (Python / PCRE)
(?<name>abc) named group (.NET / modern PCRE / JS)
\1 \2 backreference to group N
\k<name> named backreference
Alternation
cat|dog|bird one of these (try left-to-right)
gr(a|e)y "gray" or "grey"
Lookaround (zero-width assertions)
(?=foo) positive lookahead — followed by "foo"
(?!foo) negative lookahead — NOT followed by "foo"
(?<=foo) positive lookbehind — preceded by "foo"
(?<!foo) negative lookbehind — NOT preceded by "foo"
Examples:
\d+(?=px) digits followed by "px" → in "12px" matches "12"
(?<=\$)\d+ digits preceded by "$" → in "$42" matches "42"
^(?!.*admin).*$ any line not containing "admin"
Flags / modifiers
| Flag | Name | Effect |
|---|---|---|
i | case-insensitive | A == a |
m | multiline | ^/$ match line starts/ends |
s | dotall | . matches newline too |
x | extended | ignore whitespace + # comments in pattern |
u | unicode | full Unicode (\w covers non-ASCII, etc.) |
g | global (JS) | find all matches (not a flag in most engines) |
Inline form: (?i)foo — enable i from this point on. (?i:foo) — only inside group.
Escaping
Special characters that often need \: . ^ $ * + ? ( ) [ ] { } | \. Inside [...], most of these are literal; you only need to escape ], \, ^ (if first), and - (if in the middle).
Substitution (replacement strings)
Backreferences inside the replacement:
$1 $2 $& group 1, group 2, whole match (most engines)
\1 \2 \0 same, sed / Perl style
${name} named group
$$ literal $
sed -E 's/([A-Z])/_\L\1/g' file # lowercase + underscore (GNU sed: \L\U\E)
perl -pe 's/(\w+)\s+(\w+)/\2 \1/' # swap pairsCommon patterns
^\s*$ blank line
^\s+|\s+$ leading/trailing whitespace (trim)
\b\d{4}-\d{2}-\d{2}\b ISO date
\b(?:\d{1,3}\.){3}\d{1,3}\b IPv4 (loose — doesn't validate 0–255)
^(?=.*\d)(?=.*[A-Z]).{8,}$ ≥8 chars, ≥1 digit, ≥1 uppercase
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$ email (pragmatic)
https?://[^\s)>"]+ URL (loose)
^#{1,6}\s+.+$ markdown heading
\b[0-9a-f]{7,40}\b git short/long SHA
Engine differences (gotchas)
- POSIX BRE (
grep,sedwithout-E):(,),{,},|,+,?must be escaped (\(,\|, …) to get their special meaning. Use-E/egrepfor ERE. - Go (RE2) and rust
regex: linear-time, no backreferences, no lookaround. - JavaScript (pre-ES2018): no lookbehind. Modern engines support it.
\din PCRE without/u=[0-9]. With Unicode mode it includes all decimal digits..does not match newline by default — turn ons/dotall if you need it.
Tool quick reference
# grep — extended regex, only matching part
grep -Eo '\b\w+@\w+\.\w+\b' file
# ripgrep — PCRE2 with -P (lookarounds etc.)
rg -P '(?<=\$)\d+(?=\b)' .
# sed — extended regex with -E (POSIX) / -r (GNU)
sed -E 's/(foo)bar/\1baz/g' file
# awk — ERE, /pattern/ matching
awk '/^ERROR/ { print $0 }' app.log
# perl — full PCRE, one-liners
perl -nE 'say $1 if /name=([^&]+)/' file# Python
import re
m = re.search(r'(?P<year>\d{4})-(?P<month>\d{2})', s)
m.group('year') # named capture
re.findall(r'\d+', s)
re.sub(r'\s+', ' ', s)
re.compile(r'foo', re.I | re.M)// JavaScript
const m = "abc 123".match(/(\w+)\s+(\d+)/); // m[1] = "abc", m[2] = "123"
"foo foo".replace(/foo/g, "bar"); // global
const re = /(?<year>\d{4})/; // named groups
[..."a1 b2 c3".matchAll(/\w(\d)/g)]; // iterate all matchesTesting and debugging
- regex101.com — interactive, explains each token, picks PCRE/JS/Python/Go flavor.
- regexr.com — JS flavor.
- Use named groups + comments (
xflag) for any regex over ~30 characters — future-you will thank you. - Catastrophic backtracking: nested unbounded quantifiers like
(a+)+$on"aaaaaaaaaaaaaaaaaaaaa!"explode. Prefer possessive quantifiers(a++)+$or atomic groups(?>a+)+$, or use an engine like RE2.