Regular Expressions

Overview

Regular expressions are a cross-cutting skill used in every language, every editor, every CLI tool, and every CI pipeline. They are powerful for pattern matching but can be cryptic — this skill provides a readable reference.

Core Syntax

Element	Syntax	Description	Example
Literal	`abc`	Matches "abc"	`abc` matches "abc"
Dot	`.`	Any single character	`a.c` matches "abc", "a1c"
Start anchor	`^`	Start of string/line	`^Hello` matches "Hello world"
End anchor	`$`	End of string/line	`world$` matches "Hello world"
Word boundary	`\b`	Word boundary	`\bcat\b` matches "cat" not "scatter"
Character class	`[abc]`	Any of a, b, or c	`[aeiou]` matches vowels
Range	`[a-z]`	Any char in range	`[0-9]` matches digits
Negated class	`[^abc]`	Any char NOT in set	`[^0-9]` matches non-digits
Shorthand	`\d \w \s`	Digit, word char, whitespace	`\d+` matches "123"
Quantifier	`* + ? {n}`	Repetition	`a{3}` matches "aaa"
Alternation	`\|`	OR	`cat\|dog` matches either
Group	`()`	Capturing group	`(ab)+` matches "abab"
Non-capturing	`(?:)`	Group without capture	`(?:ab)+` matches "abab"
Escape	`\`	Literal special char	`\.` matches "."

Quantifiers

Quantifier	Meaning	Greedy	Lazy
`*`	0 or more	`.*`	`.*?`
`+`	1 or more	`.+`	`.+?`
`?`	0 or 1	`.?`	`.??`
`{n}`	Exactly n	`a{3}`	—
`{n,}`	n or more	`a{2,}`	`a{2,}?`
`{n,m}`	Between n and m	`a{2,4}`	`a{2,4}?`

Greedy quantifiers match as much as possible. Lazy quantifiers (with ? suffix) match as little as possible. Use lazy quantifiers when you need the shortest match.

Character Classes

Class	Matches	Equivalent
`\d`	Digit	`[0-9]`
`\D`	Non-digit	`[^0-9]`
`\w`	Word character	`[a-zA-Z0-9_]`
`\W`	Non-word character	`[^a-zA-Z0-9_]`
`\s`	Whitespace	`[ \t\n\r\f\v]`
`\S`	Non-whitespace	`[^ \t\n\r\f\v]`

Common Patterns

Purpose	Pattern	Notes
Email (simplified)	`[\w.+-]+@[\w-]+\.[\w.]+`	Covers most common formats
URL	`https?://[\w\-.]+(:\d+)?(/[\w\-./?%&=]*)?`	HTTP and HTTPS
IPv4	`\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b`	Does not validate range
ISO date	`\d{4}-\d{2}-\d{2}`	YYYY-MM-DD format
UUID	`[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}`	Lowercase hex
US phone	`$?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4}`	With optional formatting
Semantic version	`\d+\.\d+\.\d+(-[\w.]+)?(\+[\w.]+)?`	Major.Minor.Patch with optional pre-release
Strong password	`^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.[@$!%?&])[A-Za-z\d@$!%?&]{8,}$`	Min 8 chars, upper, lower, digit, special

Lookahead and Lookbehind

Type	Syntax	Description
Positive lookahead	`(?=...)`	Matches if followed by
Negative lookahead	`(?!...)`	Matches if NOT followed by
Positive lookbehind	`(?<=...)`	Matches if preceded by
Negative lookbehind	`(?<!...)`	Matches if NOT preceded by

Practical Examples

Positive lookahead — match a number only if followed by "px":

\d+(?=px)

In "12px 5em 8px", matches "12" and "8".

Negative lookahead — match "http" only if NOT followed by "s":

http(?!s)

Matches "http://" but not "https://".

Positive lookbehind — match a number only if preceded by "$":

(?<=\$)\d+

In "Price: $42 and 10 items", matches "42".

Negative lookbehind — match "cat" only if NOT preceded by "scat":

(?<!s)cat

Matches "cat" in "the cat" but not in "scatter".

Flavor Differences

Feature	JavaScript	Python	PCRE (.NET/PHP/grep -P)	POSIX (grep/sed)
Named groups	`(?<name>)`	`(?P<name>)`	`(?P<name>)` or `(?<name>)`	Not supported
Lookbehind	Fixed-length only	Variable-length	Variable-length	Not supported
Unicode	`\p{L}`	`\p{L}` with `re.UNICODE`	`\p{L}`	Limited
Backreferences	`\1`	`\1`	`\1`	`\1`

When writing cross-platform regex, stick to the common subset: basic character classes, quantifiers, alternation, and fixed-length lookbehind. Test in multiple flavors if portability matters.

Testing and Debugging

regex101.com — The gold standard for regex testing. Supports PCRE, JavaScript, Python, and Go flavors. Provides real-time matching, group highlighting, and a detailed explanation of each token.
regexr.com — Interactive regex editor with a community pattern library. Good for learning and exploring.

Always test your regex against both matching and non-matching inputs, including edge cases like empty strings, special characters, and Unicode text.

Best Practices

Always test regex with edge cases — empty strings, special characters, very long inputs, and Unicode
Prefer named groups for readability — (?<year>\d{4}) is clearer than (\d{4}) when extracting data
Use non-capturing groups (?:) when you don't need the match — it avoids cluttering group numbering
Avoid catastrophic backtracking — nested quantifiers like (a+)+ can cause exponential time on certain inputs
Use character classes over alternation where possible — [aeiou] is faster and clearer than a|e|i|o|u
Add comments with verbose/extended mode ((?x) in most flavors, re.VERBOSE in Python) for complex patterns

regexSafety 92Repository

Package Files

Regular Expressions

Overview

Core Syntax

Quantifiers

Character Classes

Common Patterns

Lookahead and Lookbehind

Practical Examples

Flavor Differences

Testing and Debugging

Best Practices

Install

AI Quality Score

Metadata

Tags

regexSafety 92Repository ShareFavorite skill

Package Files

Regular Expressions

Overview

Core Syntax

Quantifiers

Character Classes

Common Patterns

Lookahead and Lookbehind

Practical Examples

Flavor Differences

Testing and Debugging

Best Practices

Install

AI Quality Score

Metadata

Tags

regexSafety 92Repository