← Back to Blog

Regex Tutorial for Beginners: Patterns, Syntax, and Real-World Examples

March 25, 2026 · 7 min read

Regular expressions look like line noise the first time you see them. I get it. A pattern like ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ seems impenetrable. But the building blocks? There are only about a dozen of them. Learn those and you can read and write regex patterns for search, validation, and text processing in any language or text editor.

What Are Regular Expressions?

A regular expression (regex) is a sequence of characters that defines a search pattern. You use it to match, extract, or replace text. It's built into JavaScript, Python, Java, Go, Rust, PHP, Ruby, and every major text editor. If you've ever used Ctrl+H in VS Code with the regex toggle enabled, you were already writing regular expressions.

The real value is that regex describes patterns, not fixed strings. Instead of searching for the exact text "2026-03-25", you write \d{4}-\d{2}-\d{2} to match any date in YYYY-MM-DD format.

The Building Blocks

Literal Characters

Most characters just match themselves. The pattern cat matches the string "cat" inside "concatenate". Nothing surprising there.

Metacharacters

Some characters have special meaning in regex:

PatternMeaningExample
.Any character except newlinec.t matches "cat", "cut", "c3t"
\dAny digit (0-9)\d\d matches "42"
\wWord character (letter, digit, underscore)\w+ matches "hello_world"
\sWhitespace (space, tab, newline)\s+ matches " "
\D, \W, \SNegated versions of above\D matches "a" but not "5"
^Start of string^Hello matches "Hello world"
$End of stringworld$ matches "Hello world"

Quantifiers

Quantifiers tell regex how many times a pattern should repeat:

QuantifierMeaningExample
*0 or moreab*c matches "ac", "abc", "abbc"
+1 or moreab+c matches "abc", "abbc" (not "ac")
?0 or 1colou?r matches "color" and "colour"
{n}Exactly n times\d{4} matches "2026"
{n,m}Between n and m times\d{2,4} matches "42" or "2026"

By default, quantifiers are greedy, meaning they grab as much text as they can. Add ? after a quantifier to make it lazy so it matches as little as possible. You'll use .*? instead of .* more often than you'd expect.

Character Classes

Square brackets define a set of characters to match. [aeiou] matches any vowel. [0-9] matches any digit (same as \d). [A-Za-z] matches any letter. Put a caret inside the brackets to negate: [^0-9] matches anything that is NOT a digit.

Groups and Alternation

Parentheses create groups, and the pipe | acts as OR:

(cat|dog)        matches "cat" or "dog"
(\d{3})-(\d{4})  captures area code and number separately
(?:non-capture)  groups without capturing

Lookaheads and Lookbehinds

Lookaheads and lookbehinds check that something exists (or doesn't) at a position without consuming any characters. They're weird at first, but very useful once they click:

\d+(?= dollars)    matches "100" in "100 dollars" (positive lookahead)
\d+(?! dollars)    matches "100" in "100 euros" (negative lookahead)
(?<=\$)\d+         matches "50" in "$50" (positive lookbehind)
(?<!\$)\d+         matches "50" in "€50" (negative lookbehind)

Practical Patterns You Will Actually Use

Validate an Email (Basic)

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This catches most valid email addresses. The full RFC 5322 spec is way too complex for regex, so don't try to cover every edge case. Use this for quick format checks and send a confirmation email for real verification.

Extract URLs from Text

https?://[^\s<>"']+

This grabs HTTP and HTTPS URLs by matching from the protocol until it hits whitespace or a quote character. Simple and effective.

Match a Date (YYYY-MM-DD)

\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])

This validates month (01-12) and day (01-31). It won't catch impossible dates like February 30, but it handles most formatting errors. For anything more precise, validate in code after the regex pass.

Find Duplicate Words

\b(\w+)\s+\1\b

I use this one all the time. It catches "the the" or "is is", those typos that spellcheck misses. The \1 is a backreference to whatever the first group captured.

Match an IPv4 Address

\b(?:\d{1,3}\.){3}\d{1,3}\b

Matches patterns like "192.168.1.1". If you need strict validation (0-255 per octet), the pattern gets longer and uglier, but this is good enough for most log-parsing work.

Write regex patterns and test them against sample text. Matches highlight as you type.

Try the Regex Tester

Common Mistakes

You'll hit these. Everyone does. The most common is forgetting to escape special characters. To match a literal dot, write \. not . (which matches any character). I've debugged this exact problem more times than I'd like to admit.

Another classic: using greedy matching when you want lazy. <.*> on <b>bold</b> matches the entire string, not just <b>. Use <.*?> for the shortest match.

Then there's anchoring. Without ^ and $, your email regex will happily match "not-an-email@foo.com-garbage" as valid. Anchors force the entire string to match the pattern.

And finally, know when to stop. If you need to parse HTML, JSON, or XML, use a proper parser. Regex cannot handle nested structures reliably. Don't be the person who writes a 200-character regex to extract JSON values.

For a quick-reference card, see the Regex Reference tool with all the syntax on one page.

Frequently Asked Questions

What is the difference between .* and .*? in regex?

.* is a greedy quantifier: it matches as many characters as possible. .*? is lazy (non-greedy): it matches as few characters as possible. Given the string <b>one</b><b>two</b>, the greedy <b>.*</b> matches the entire string, while the lazy <b>.*?</b> matches only <b>one</b>.

How do I match a literal dot or bracket in regex?

Escape special characters with a backslash: \. for a dot, \[ for a bracket. Inside a character class [...], most special characters lose their meaning except ], \, ^, and -.

Can I use regex to validate email addresses?

You can use regex for basic format checks, like verifying an @ sign and a domain with a dot. A practical pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$. But the full email spec (RFC 5322) is too complex for regex. For production systems, combine basic regex validation with a confirmation email.