Regular expressions — pattern language for matching text. Flavors differ slightly (PCRE, POSIX BRE/ERE, JS, Python re, .NET, Go RE2). This sheet uses PCRE-style unless noted.

Character classes

PatternMatches
.any character except newline (unless s/dotall)
\d / \Ddigit / non-digit
\w / \Wword char [A-Za-z0-9_] / non-word
\s / \Swhitespace / non-whitespace
\b / \Bword boundary / non-boundary
[abc]any of a, b, c
[^abc]none of a, b, c
[a-z]range
[[:alpha:]]POSIX class (alpha, digit, alnum, space, upper, lower, punct, xdigit, …)
\p{L}Unicode property — any letter (Unicode mode)
\P{L}inverse Unicode property

Anchors

PatternMeaning
^start of string (or line in multiline mode)
$end of string (or line in multiline mode)
\Astart of string (always)
\zend of string (always)
\Zend of string, optionally before final newline
\bword boundary

Quantifiers

PatternMeaning
*0 or more (greedy)
+1 or more
?0 or 1
{n}exactly n
{n,}n or more
{n,m}n to m
*? +? ?? {n,m}?lazy (match as little as possible)
*+ ++ ?+ {n,m}+possessive (no backtracking — PCRE)
".*"     greedy: matches everything between first and last "
".*?"    lazy:   matches one quoted token

Groups

(abc)          capturing group #1
(?:abc)        non-capturing group
(?P<name>abc)  named group (Python / PCRE)
(?<name>abc)   named group (.NET / modern PCRE / JS)
\1  \2         backreference to group N
\k<name>       named backreference

Alternation

cat|dog|bird          one of these (try left-to-right)
gr(a|e)y              "gray" or "grey"

Lookaround (zero-width assertions)

(?=foo)        positive lookahead   — followed by "foo"
(?!foo)        negative lookahead   — NOT followed by "foo"
(?<=foo)       positive lookbehind  — preceded by "foo"
(?<!foo)       negative lookbehind  — NOT preceded by "foo"

Examples:

\d+(?=px)            digits followed by "px"  →  in "12px" matches "12"
(?<=\$)\d+           digits preceded by "$"   →  in "$42" matches "42"
^(?!.*admin).*$      any line not containing "admin"

Flags / modifiers

FlagNameEffect
icase-insensitiveA == a
mmultiline^/$ match line starts/ends
sdotall. matches newline too
xextendedignore whitespace + # comments in pattern
uunicodefull Unicode (\w covers non-ASCII, etc.)
gglobal (JS)find all matches (not a flag in most engines)

Inline form: (?i)foo — enable i from this point on. (?i:foo) — only inside group.

Escaping

Special characters that often need \: . ^ $ * + ? ( ) [ ] { } | \. Inside [...], most of these are literal; you only need to escape ], \, ^ (if first), and - (if in the middle).

Substitution (replacement strings)

Backreferences inside the replacement:

$1 $2 $&        group 1, group 2, whole match           (most engines)
\1 \2 \0        same, sed / Perl style
${name}         named group
$$              literal $
sed -E 's/([A-Z])/_\L\1/g' file        # lowercase + underscore (GNU sed: \L\U\E)
perl -pe 's/(\w+)\s+(\w+)/\2 \1/'      # swap pairs

Common patterns

^\s*$                             blank line
^\s+|\s+$                         leading/trailing whitespace (trim)
\b\d{4}-\d{2}-\d{2}\b             ISO date
\b(?:\d{1,3}\.){3}\d{1,3}\b       IPv4 (loose — doesn't validate 0–255)
^(?=.*\d)(?=.*[A-Z]).{8,}$        ≥8 chars, ≥1 digit, ≥1 uppercase
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$    email (pragmatic)
https?://[^\s)>"]+                URL (loose)
^#{1,6}\s+.+$                     markdown heading
\b[0-9a-f]{7,40}\b                git short/long SHA

Engine differences (gotchas)

  • POSIX BRE (grep, sed without -E): (, ), {, }, |, +, ? must be escaped (\(, \|, …) to get their special meaning. Use -E/egrep for ERE.
  • Go (RE2) and rust regex: linear-time, no backreferences, no lookaround.
  • JavaScript (pre-ES2018): no lookbehind. Modern engines support it.
  • \d in PCRE without /u = [0-9]. With Unicode mode it includes all decimal digits.
  • . does not match newline by default — turn on s/dotall if you need it.

Tool quick reference

# grep — extended regex, only matching part
grep -Eo '\b\w+@\w+\.\w+\b' file
 
# ripgrep — PCRE2 with -P (lookarounds etc.)
rg -P '(?<=\$)\d+(?=\b)' .
 
# sed — extended regex with -E (POSIX) / -r (GNU)
sed -E 's/(foo)bar/\1baz/g' file
 
# awk — ERE, /pattern/ matching
awk '/^ERROR/ { print $0 }' app.log
 
# perl — full PCRE, one-liners
perl -nE 'say $1 if /name=([^&]+)/' file
# Python
import re
m = re.search(r'(?P<year>\d{4})-(?P<month>\d{2})', s)
m.group('year')                       # named capture
re.findall(r'\d+', s)
re.sub(r'\s+', ' ', s)
re.compile(r'foo', re.I | re.M)
// JavaScript
const m = "abc 123".match(/(\w+)\s+(\d+)/);   // m[1] = "abc", m[2] = "123"
"foo foo".replace(/foo/g, "bar");             // global
const re = /(?<year>\d{4})/;                  // named groups
[..."a1 b2 c3".matchAll(/\w(\d)/g)];          // iterate all matches

Testing and debugging

  • regex101.com — interactive, explains each token, picks PCRE/JS/Python/Go flavor.
  • regexr.com — JS flavor.
  • Use named groups + comments (x flag) for any regex over ~30 characters — future-you will thank you.
  • Catastrophic backtracking: nested unbounded quantifiers like (a+)+$ on "aaaaaaaaaaaaaaaaaaaaa!" explode. Prefer possessive quantifiers (a++)+$ or atomic groups (?>a+)+$, or use an engine like RE2.