Libraries: If you want this lab, consider one of these libraries.
Core Tech

Guided: Validate and Parse Text Using Regular Expressions in Java SE

Transform your Java programming capabilities with the Regular Expressions in Java Lab, designed to deliver business-critical skills and elevate your resume. This hands-on lab equips you with the expertise to solve complex text-processing challenges, driving efficiency and innovation in professional projects.

Get started Contact sales

Lab Info

Level

Intermediate

Last updated

Jun 30, 2026

Duration

45m

Challenge

### Step 1 — Introduction, Pattern and Matcher Classes
Welcome to the Caret & Dollar Text Intelligence Platform team! Your first assignment is to validate product SKU codes before they are written to the catalog. Every SKU at Caret & Dollar follows the shape CDE-1234 — the literal prefix CDE- followed by exactly four digits.

In this step you will:
- Compile your first java.util.regex.Pattern.
- Apply it with Matcher.matches() for whole-input validation and Matcher.find() for searching within a larger string.
- Escape regex metacharacters such as . when you want them treated as literal text.
File you will edit:

src/main/java/com/caretdollar/regex/SkuValidator.java

Reference solution:

If you get stuck, a fully worked solution is provided in solution/SkuValidator.java. Try each task on your own before peeking.

Code-up each step and run the validator.

The starter code intentionally fails the JUnit tests. As you implement each task, more tests turn green.

Concept Overview — The Regex Lifecycle in Java

java.util.regex is a two-step API:
1. Compile a pattern once with Pattern.compile("regex source"). This is relatively expensive, so a pattern that is reused across calls should be stored in a private static final field.
2. Apply it to an input with pattern.matcher(input), then call one of:
  
  matcher.matches() — returns true only when the entire input matches.
  
  matcher.find() — scans the input and returns true if any substring matches; call repeatedly to walk through every occurrence.
Metacharacters such as ., *, +, \d carry special meaning in regex. To match a literal ., you must escape it as \.. In a Java string literal each backslash itself must be doubled, so the regex \. is written "\\." in source code.

Step 1 Wrap-Up

You have written your first real Java regex code. You learned to:
- Compile a Pattern once and store it as private static final.
- Choose between matches() (whole input) and find() (scan for occurrences).
- Escape metacharacters such as . in the regex source, and again in the Java string literal with \\.
Continue to Step 2 to learn how character classes, grouping, quantifiers, and alternation let you validate richer formats such as email addresses and phone numbers.
Challenge

### Step 2 — Character Classes, Quantifiers, and Alternation
Step 2 — Introduction

Caret & Dollar Enterprises is launching a self-service signup page. Before any row is written to the customer database, four fields must be validated: phone number, email address, postal code (US ZIP), and country (only a few are launch-supported). You will write each validator using a single, focused regex.

Along the way you will also extract the three component groups of a US phone number so the CRM can store them separately.

In this step you will:
- Use character classes such as [abc], [a-z], [^0-9], and the predefined \d, \w, \s.
- Apply quantifiers: ?, *, +, {n}, {n,}, {n,m}.
- Group sub-patterns with ( ... ) for capturing and (?: ... ) for non-capturing.
- Use alternation with | to match one of several alternatives.
- Pull captured pieces out of a match with Matcher.group(n).
File you will edit:

src/main/java/com/caretdollar/regex/SignupValidator.java

Reference solution:

solution/SignupValidator.java

Concept Overview — Character Classes, Groups, Quantifiers, Alternation

A character class in square brackets matches any single character inside the brackets: [aeiou] matches one vowel; [0-9] matches one digit; [^0-9] matches one non-digit. Java also provides shorthand character classes: \d for digits, \w for word characters (letters, digits, underscore), and \s for whitespace.

A quantifier controls how many times the preceding element matches: ? (zero or one), * (zero or more), + (one or more), {n} (exactly n), {n,m} (between n and m).

Grouping is done with ( ... ). A group is capturing by default, meaning its match is retrievable via Matcher.group(1), group(2), and so on. Use (?: ... ) when you want grouping without capturing.

Alternation is the | operator. The pattern cat|dog matches either word.

A back-reference \1, \2, etc. matches the same text that a previous capturing group already matched. This is useful when two separator characters in a pattern must be identical.

Step 2 Wrap-Up

You now have the daily-driver regex toolkit: character classes, quantifiers, groups, alternation, and back-references. You also used these tools to write four real-world validators for a signup form.

Step 3 shows you how to make patterns this large readable and maintainable for the next person on your team.
Challenge

### Step 3 — Comments, Named Capture Groups
The Caret & Dollar senior architect just inherited a microservice from a developer who left the company. Buried in it is one of the worst regexes anyone has ever seen — about 180 characters long, no whitespace, no comments, undocumented. Your job today is to refactor it into a regex a teammate can actually maintain, without changing its behavior. You will also build a brand-new validator written in readable, verbose-mode form from day one.

In this step you will:
- Use Pattern.COMMENTS to allow whitespace and # comments inside a pattern.
- Use named capture groups with (?<name>...) so groups are self-documenting.
- Compose larger patterns from small, named static final String building blocks.
File you will edit:

src/main/java/com/caretdollar/regex/ReadablePatterns.java

Reference solution:

solution/ReadablePatterns.java

Concept Overview — Three Pillars of Readable Regex
1. Pattern.COMMENTS (verbose / extended mode). When this flag is passed as the second argument to Pattern.compile, the regex engine ignores all unescaped whitespace in the pattern source and treats # as the start of a line comment. If you actually need to match a space or a # character, escape it: \ and \#. Java text blocks (""" ... """) pair beautifully with this flag because they let you write multi-line patterns directly in source.
2. Named capture groups. Replace (\d{4}) with (?<year>\d{4}), then read it back at the call site with matcher.group("year") instead of matcher.group(1). Named groups survive refactoring because nothing at the call site depends on numeric position.
3. Composition from constants. Declare small static final String building blocks like YEAR = "\\d{4}" and concatenate them. The pattern reads almost like a sentence, and each piece can be unit-tested or reused independently.
Step 3 Wrap-Up

You learned the three techniques that turn regex from write-only code into maintainable code:
- Pattern.COMMENTS for whitespace and inline comments,
- named capture groups for self-documenting matches,
- composition from static final String constants.
Step 4 revisits anchors and boundaries — small symbols (^, $, \b) that have outsized effect on what a pattern actually means.
Challenge

### Step 4 — Caret and Dollar anchors, and word boundaries
A Caret & Dollar production service emits log lines like:
```
2026-03-15T14:30:00Z [ERROR] OrderService: failed to charge card for order 4821
2026-03-15T14:30:01Z [INFO]  OrderService: retrying in 3s
```
The operations team needs you to parse each line into structured pieces (timestamp, level, message) and to detect ERROR mentions as whole words, not as substrings (so the word "TERRORS" does not fire an alert).

In this step you will:
- Use ^ and $ to anchor a pattern to the start and end of input.
- Use \b (word boundary) to distinguish whole words from substrings.
- Understand the difference between matches() (which already implies ^...$) and find() with explicit anchors.
File you will edit:

src/main/java/com/caretdollar/regex/LogLineParser.java

Reference solution:

solution/LogLineParser.java

Concept Overview — Anchors and Boundaries

Anchors are zero-width assertions: they match a position in the input rather than a character.
- ^ matches the start of input (or the start of any line if the MULTILINE flag is set — covered in Step 6).
- $ matches the end of input (similar caveat for MULTILINE).
- \b matches a word boundary — the position between a word character (\w, which is letters, digits, or underscore) and a non-word character.
- \B matches a non-word-boundary.
- \A and \z are Java extensions for absolute start and end of input.
Two things to keep in mind:
- matches() requires the entire input to match, as if there were implicit ^ and $ around the pattern. With matches(), explicit ^ and $ are redundant but harmless.
- find() searches anywhere in the input. To require a whole-input match while using find(), write the anchors yourself.
Step 4 Wrap-Up

Anchors and boundaries look like small symbols but completely change what a pattern means. You used ^ and $ to require a full-input match, \b to constrain a search to whole words, and you saw the practical difference between matches() and find().

Step 5 builds on this foundation with lookarounds — assertions that, like anchors, are zero-width, but can check arbitrary text on either side. You will use them to build a ${placeholder} template engine.
Challenge

### Step 5 — Lookarounds, Streams, and Lambdas
The Caret & Dollar alerting service sends emails like:
```
Hello ${customerName}, your order ${orderId} totalling ${amount} shipped on ${shipDate}.
```
Templates are stored in a database. At send time, a Map<String, String> of values is provided and every ${...} placeholder must be replaced with the corresponding value. You will build a small template engine using regex lookarounds plus Java streams and lambdas. Along the way you will also use lookarounds to extract dollar amounts from receipts and to find SKUs that are not followed by a specific suffix.

In this step you will:
- Use positive lookahead (?=...) and negative lookahead (?!...).
- Use positive lookbehind (?<=...) and negative lookbehind (?<!...).
- Understand why lookarounds are zero-width: they assert context without consuming characters.
- Use Matcher.replaceAll(Function<MatchResult, String>) — the modern, lambda-friendly replacement API introduced in Java 9.
- Build a stream pipeline over regex matches.
File you will edit:

src/main/java/com/caretdollar/regex/TemplateEngine.java

Reference solution:

solution/TemplateEngine.java

Concept Overview — Lookarounds and Lambda-Based Replacement

A lookaround asserts that a pattern matches (or fails to match) at a position in the input, but does not consume any characters. There are four flavors:

| Construct | Name | Meaning | |------------|----------------------|---------| | (?=X) | positive lookahead | "the next characters are X" — without consuming X | | (?!X) | negative lookahead | "the next characters are NOT X" | | (?<=X) | positive lookbehind | "the preceding characters are X" | | (?<!X) | negative lookbehind | "the preceding characters are NOT X" |

Because lookarounds are zero-width, the part you care about can be just the data, with the surrounding context asserted but not captured.

Java 9 introduced the method Matcher.replaceAll(Function<MatchResult, String>). It takes a lambda that receives each match and returns the replacement string. This makes lookup-driven replacement (such as a template engine) very concise.

There is one pitfall worth knowing about up front: the replacement string passed to any replaceAll variant treats $ and \ as special characters. When the replacement value comes from user data — and it might contain a $ — wrap it in Matcher.quoteReplacement(...) to escape those special characters.

Step 5 Wrap-Up

You learned to assert context without consuming it. You used:
- Lookbehind to strip a leading $ from amount matches.
- A pair of lookarounds to extract just the inner placeholder name.
- A capturing group with replaceAll(Function<MatchResult, String>) plus Matcher.quoteReplacement to build a real ${placeholder} template engine driven by a Map.
- A stream pipeline to compute the intersection of placeholders and provided values.
- Negative lookahead to filter out matches based on what follows them.
Step 6 covers regex flags and modes: CASE_INSENSITIVE, MULTILINE, and DOTALL.
Challenge

### Step 6 — Flags
Step 6 — Flags

Production has started sending multi-line log records that include full stack traces. The single-line parser you built in Step 4 no longer works because each event now spans several lines. The operations team also wants case-insensitive level matching, since some legacy services emit [error] instead of [ERROR]. You will fix both problems with Java regex flags.

In this step you will:
- Use Pattern.CASE_INSENSITIVE for case-blind matching.
- Use Pattern.MULTILINE so ^ and $ match at every line break, not just the start and end of input.
- Use Pattern.DOTALL so . matches \n and a single pattern can span multiple lines.
- See how to combine flags with bitwise OR (CASE_INSENSITIVE | MULTILINE) and how to enable them inline with (?i), (?m), (?s).
File you will edit:

src/main/java/com/caretdollar/regex/AdvancedLogProcessor.java

Reference solution:

solution/AdvancedLogProcessor.java

Concept Overview — Flags Change Pattern Behavior, Not Pattern Source

A flag is passed as the second argument to Pattern.compile(...). Flags do not change the source of the pattern; they change how the engine interprets it.
- Pattern.CASE_INSENSITIVE — letters in the pattern match either case in the input.
- Pattern.MULTILINE — ^ matches at the start of every line and $ matches at the end of every line, where lines are separated by \n. Without this flag, ^ and $ only match the very start and end of the whole input.
- Pattern.DOTALL — . matches any character including \n. Without this flag, . matches any character except line terminators.
Flags can be combined with bitwise OR: Pattern.compile("...", CASE_INSENSITIVE | MULTILINE). They can also be enabled inline at the start of a pattern: (?i) for CASE_INSENSITIVE, (?m) for MULTILINE, (?s) for DOTALL.

A reluctant (also called non-greedy) quantifier such as .*? matches as little as possible. It is essential whenever you combine DOTALL with a "from here to the next marker" pattern, otherwise the engine matches as much as possible and skips past the intended end marker.

Step 06 Wrap-Up

Three small flags change three big behaviors:
- CASE_INSENSITIVE ignores letter case.
- MULTILINE rebinds ^ and $ to line boundaries.
- DOTALL lets . match newlines.
You have now seen the entire practical surface of java.util.regex: pattern syntax, character classes, quantifiers, grouping, alternation, anchors, boundaries, lookarounds, and flags. You also applied each technique to a coherent product — Caret & Dollar's Text Intelligence Platform — building real validators and parsers along the way.

To run all of your completed tests, open a terminal and execute the command
```
mvn test
```
You should see it output something like:
```
Tests run: 86, Failures: 0, Errors: 0, Skipped: 0
```
Congratulations on finishing the lab! You now have a transferable, production-quality grasp of Java regular expressions that you can take straight into your own code.

About the author

Victor Grazi

Victor Grazi is an Oracle Java Champion, InfoQ Java Editor, and Java evangelist working at Nomura Securities. He hosts the "Java Concurrent Animated" open source project.

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Guided: Validate and Parse Text Using Regular Expressions in Java SE

Lab Info

Table of Contents

### Step 1 — Introduction, Pattern and Matcher Classes

Concept Overview — The Regex Lifecycle in Java

Step 1 Wrap-Up

### Step 2 — Character Classes, Quantifiers, and Alternation

Step 2 — Introduction

Concept Overview — Character Classes, Groups, Quantifiers, Alternation

Step 2 Wrap-Up

### Step 3 — Comments, Named Capture Groups

Concept Overview — Three Pillars of Readable Regex

Step 3 Wrap-Up

### Step 4 — Caret and Dollar anchors, and word boundaries

Concept Overview — Anchors and Boundaries

Step 4 Wrap-Up

### Step 5 — Lookarounds, Streams, and Lambdas

Concept Overview — Lookarounds and Lambda-Based Replacement

Step 5 Wrap-Up

### Step 6 — Flags

Step 6 — Flags

Concept Overview — Flags Change Pattern Behavior, Not Pattern Source

Step 06 Wrap-Up

About the author

Real skill practice before real-world application

Learn by doing

Follow your guide

Turn time into mastery

Get started with Pluralsight