- Lab
-
Libraries: If you want this lab, consider one of these libraries.
Guided: Validate and Parse Text Using Regular Expressions in Java SE
Transform your Java programming capabilities with the Regular Expressions in Java Lab, designed to deliver business-critical skills and elevate your resume. This hands-on lab equips you with the expertise to solve complex text-processing challenges, driving efficiency and innovation in professional projects.
Lab Info
Table of Contents
-
Challenge
### Step 1 — Introduction, Pattern and Matcher Classes
Welcome to the Caret & Dollar Text Intelligence Platform team! Your first assignment is to validate product SKU codes before they are written to the catalog. Every SKU at Caret & Dollar follows the shape
CDE-1234— the literal prefixCDE-followed by exactly four digits.In this step you will:
- Compile your first
java.util.regex.Pattern. - Apply it with
Matcher.matches()for whole-input validation andMatcher.find()for searching within a larger string. - Escape regex metacharacters such as
.when you want them treated as literal text.
File you will edit:
src/main/java/com/caretdollar/regex/SkuValidator.javaReference solution:
If you get stuck, a fully worked solution is provided in
solution/SkuValidator.java. Try each task on your own before peeking.Code-up each step and run the validator.
The starter code intentionally fails the JUnit tests. As you implement each task, more tests turn green.
Concept Overview — The Regex Lifecycle in Java
java.util.regexis a two-step API:- Compile a pattern once with
Pattern.compile("regex source"). This is relatively expensive, so a pattern that is reused across calls should be stored in aprivate static finalfield. - Apply it to an input with
pattern.matcher(input), then call one of:matcher.matches()— returnstrueonly when the entire input matches.matcher.find()— scans the input and returnstrueif any substring matches; call repeatedly to walk through every occurrence.
Metacharacters such as
.,*,+,\dcarry special meaning in regex. To match a literal., you must escape it as\.. In a Java string literal each backslash itself must be doubled, so the regex\.is written"\\."in source code.Step 1 Wrap-Up
You have written your first real Java regex code. You learned to:
- Compile a
Patternonce and store it asprivate static final. - Choose between
matches()(whole input) andfind()(scan for occurrences). - Escape metacharacters such as
.in the regex source, and again in the Java string literal with\\.
Continue to Step 2 to learn how character classes, grouping, quantifiers, and alternation let you validate richer formats such as email addresses and phone numbers.
- Compile your first
-
Challenge
### Step 2 — Character Classes, Quantifiers, and Alternation
Step 2 — Introduction
Caret & Dollar Enterprises is launching a self-service signup page. Before any row is written to the customer database, four fields must be validated: phone number, email address, postal code (US ZIP), and country (only a few are launch-supported). You will write each validator using a single, focused regex.
Along the way you will also extract the three component groups of a US phone number so the CRM can store them separately.
In this step you will:
- Use character classes such as
[abc],[a-z],[^0-9], and the predefined\d,\w,\s. - Apply quantifiers:
?,*,+,{n},{n,},{n,m}. - Group sub-patterns with
( ... )for capturing and(?: ... )for non-capturing. - Use alternation with
|to match one of several alternatives. - Pull captured pieces out of a match with
Matcher.group(n).
File you will edit:
src/main/java/com/caretdollar/regex/SignupValidator.javaReference solution:
solution/SignupValidator.javaConcept Overview — Character Classes, Groups, Quantifiers, Alternation
A character class in square brackets matches any single character inside the brackets:
[aeiou]matches one vowel;[0-9]matches one digit;[^0-9]matches one non-digit. Java also provides shorthand character classes:\dfor digits,\wfor word characters (letters, digits, underscore), and\sfor whitespace.A quantifier controls how many times the preceding element matches:
?(zero or one),*(zero or more),+(one or more),{n}(exactly n),{n,m}(between n and m).Grouping is done with
( ... ). A group is capturing by default, meaning its match is retrievable viaMatcher.group(1),group(2), and so on. Use(?: ... )when you want grouping without capturing.Alternation is the
|operator. The patterncat|dogmatches either word.A back-reference
\1,\2, etc. matches the same text that a previous capturing group already matched. This is useful when two separator characters in a pattern must be identical.Step 2 Wrap-Up
You now have the daily-driver regex toolkit: character classes, quantifiers, groups, alternation, and back-references. You also used these tools to write four real-world validators for a signup form.
Step 3 shows you how to make patterns this large readable and maintainable for the next person on your team.
- Use character classes such as
-
Challenge
### Step 3 — Comments, Named Capture Groups
The Caret & Dollar senior architect just inherited a microservice from a developer who left the company. Buried in it is one of the worst regexes anyone has ever seen — about 180 characters long, no whitespace, no comments, undocumented. Your job today is to refactor it into a regex a teammate can actually maintain, without changing its behavior. You will also build a brand-new validator written in readable, verbose-mode form from day one.
In this step you will:
- Use
Pattern.COMMENTSto allow whitespace and# commentsinside a pattern. - Use named capture groups with
(?<name>...)so groups are self-documenting. - Compose larger patterns from small, named
static final Stringbuilding blocks.
File you will edit:
src/main/java/com/caretdollar/regex/ReadablePatterns.javaReference solution:
solution/ReadablePatterns.javaConcept Overview — Three Pillars of Readable Regex
Pattern.COMMENTS(verbose / extended mode). When this flag is passed as the second argument toPattern.compile, the regex engine ignores all unescaped whitespace in the pattern source and treats#as the start of a line comment. If you actually need to match a space or a#character, escape it:\and\#. Java text blocks (""" ... """) pair beautifully with this flag because they let you write multi-line patterns directly in source.- Named capture groups. Replace
(\d{4})with(?<year>\d{4}), then read it back at the call site withmatcher.group("year")instead ofmatcher.group(1). Named groups survive refactoring because nothing at the call site depends on numeric position. - Composition from constants. Declare small
static final Stringbuilding blocks likeYEAR = "\\d{4}"and concatenate them. The pattern reads almost like a sentence, and each piece can be unit-tested or reused independently.
Step 3 Wrap-Up
You learned the three techniques that turn regex from write-only code into maintainable code:
Pattern.COMMENTSfor whitespace and inline comments,- named capture groups for self-documenting matches,
- composition from
static final Stringconstants.
Step 4 revisits anchors and boundaries — small symbols (
^,$,\b) that have outsized effect on what a pattern actually means. - Use
-
Challenge
### Step 4 — Caret and Dollar anchors, and word boundaries
A Caret & Dollar production service emits log lines like:
2026-03-15T14:30:00Z [ERROR] OrderService: failed to charge card for order 4821 2026-03-15T14:30:01Z [INFO] OrderService: retrying in 3sThe operations team needs you to parse each line into structured pieces (timestamp, level, message) and to detect ERROR mentions as whole words, not as substrings (so the word
"TERRORS"does not fire an alert).In this step you will:
- Use
^and$to anchor a pattern to the start and end of input. - Use
\b(word boundary) to distinguish whole words from substrings. - Understand the difference between
matches()(which already implies^...$) andfind()with explicit anchors.
File you will edit:
src/main/java/com/caretdollar/regex/LogLineParser.javaReference solution:
solution/LogLineParser.javaConcept Overview — Anchors and Boundaries
Anchors are zero-width assertions: they match a position in the input rather than a character.
^matches the start of input (or the start of any line if theMULTILINEflag is set — covered in Step 6).$matches the end of input (similar caveat forMULTILINE).\bmatches a word boundary — the position between a word character (\w, which is letters, digits, or underscore) and a non-word character.\Bmatches a non-word-boundary.\Aand\zare Java extensions for absolute start and end of input.
Two things to keep in mind:
matches()requires the entire input to match, as if there were implicit^and$around the pattern. Withmatches(), explicit^and$are redundant but harmless.find()searches anywhere in the input. To require a whole-input match while usingfind(), write the anchors yourself.
Step 4 Wrap-Up
Anchors and boundaries look like small symbols but completely change what a pattern means. You used
^and$to require a full-input match,\bto constrain a search to whole words, and you saw the practical difference betweenmatches()andfind().Step 5 builds on this foundation with lookarounds — assertions that, like anchors, are zero-width, but can check arbitrary text on either side. You will use them to build a
${placeholder}template engine. - Use
-
Challenge
### Step 5 — Lookarounds, Streams, and Lambdas
The Caret & Dollar alerting service sends emails like:
Hello ${customerName}, your order ${orderId} totalling ${amount} shipped on ${shipDate}.Templates are stored in a database. At send time, a
Map<String, String>of values is provided and every${...}placeholder must be replaced with the corresponding value. You will build a small template engine using regex lookarounds plus Java streams and lambdas. Along the way you will also use lookarounds to extract dollar amounts from receipts and to find SKUs that are not followed by a specific suffix.In this step you will:
- Use positive lookahead
(?=...)and negative lookahead(?!...). - Use positive lookbehind
(?<=...)and negative lookbehind(?<!...). - Understand why lookarounds are zero-width: they assert context without consuming characters.
- Use
Matcher.replaceAll(Function<MatchResult, String>)— the modern, lambda-friendly replacement API introduced in Java 9. - Build a stream pipeline over regex matches.
File you will edit:
src/main/java/com/caretdollar/regex/TemplateEngine.javaReference solution:
solution/TemplateEngine.javaConcept Overview — Lookarounds and Lambda-Based Replacement
A lookaround asserts that a pattern matches (or fails to match) at a position in the input, but does not consume any characters. There are four flavors:
| Construct | Name | Meaning | |------------|----------------------|---------| |
(?=X)| positive lookahead | "the next characters are X" — without consuming X | |(?!X)| negative lookahead | "the next characters are NOT X" | |(?<=X)| positive lookbehind | "the preceding characters are X" | |(?<!X)| negative lookbehind | "the preceding characters are NOT X" |Because lookarounds are zero-width, the part you care about can be just the data, with the surrounding context asserted but not captured.
Java 9 introduced the method
Matcher.replaceAll(Function<MatchResult, String>). It takes a lambda that receives each match and returns the replacement string. This makes lookup-driven replacement (such as a template engine) very concise.There is one pitfall worth knowing about up front: the replacement string passed to any
replaceAllvariant treats$and\as special characters. When the replacement value comes from user data — and it might contain a$— wrap it inMatcher.quoteReplacement(...)to escape those special characters.Step 5 Wrap-Up
You learned to assert context without consuming it. You used:
- Lookbehind to strip a leading
$from amount matches. - A pair of lookarounds to extract just the inner placeholder name.
- A capturing group with
replaceAll(Function<MatchResult, String>)plusMatcher.quoteReplacementto build a real${placeholder}template engine driven by aMap. - A stream pipeline to compute the intersection of placeholders and provided values.
- Negative lookahead to filter out matches based on what follows them.
Step 6 covers regex flags and modes:
CASE_INSENSITIVE,MULTILINE, andDOTALL. - Use positive lookahead
-
Challenge
### Step 6 — Flags
Step 6 — Flags
Production has started sending multi-line log records that include full stack traces. The single-line parser you built in Step 4 no longer works because each event now spans several lines. The operations team also wants case-insensitive level matching, since some legacy services emit
[error]instead of[ERROR]. You will fix both problems with Java regex flags.In this step you will:
- Use
Pattern.CASE_INSENSITIVEfor case-blind matching. - Use
Pattern.MULTILINEso^and$match at every line break, not just the start and end of input. - Use
Pattern.DOTALLso.matches\nand a single pattern can span multiple lines. - See how to combine flags with bitwise OR (
CASE_INSENSITIVE | MULTILINE) and how to enable them inline with(?i),(?m),(?s).
File you will edit:
src/main/java/com/caretdollar/regex/AdvancedLogProcessor.javaReference solution:
solution/AdvancedLogProcessor.javaConcept Overview — Flags Change Pattern Behavior, Not Pattern Source
A flag is passed as the second argument to
Pattern.compile(...). Flags do not change the source of the pattern; they change how the engine interprets it.Pattern.CASE_INSENSITIVE— letters in the pattern match either case in the input.Pattern.MULTILINE—^matches at the start of every line and$matches at the end of every line, where lines are separated by\n. Without this flag,^and$only match the very start and end of the whole input.Pattern.DOTALL—.matches any character including\n. Without this flag,.matches any character except line terminators.
Flags can be combined with bitwise OR:
Pattern.compile("...", CASE_INSENSITIVE | MULTILINE). They can also be enabled inline at the start of a pattern:(?i)forCASE_INSENSITIVE,(?m)forMULTILINE,(?s)forDOTALL.A reluctant (also called non-greedy) quantifier such as
.*?matches as little as possible. It is essential whenever you combineDOTALLwith a "from here to the next marker" pattern, otherwise the engine matches as much as possible and skips past the intended end marker.Step 06 Wrap-Up
Three small flags change three big behaviors:
CASE_INSENSITIVEignores letter case.MULTILINErebinds^and$to line boundaries.DOTALLlets.match newlines.
You have now seen the entire practical surface of
java.util.regex: pattern syntax, character classes, quantifiers, grouping, alternation, anchors, boundaries, lookarounds, and flags. You also applied each technique to a coherent product — Caret & Dollar's Text Intelligence Platform — building real validators and parsers along the way.To run all of your completed tests, open a terminal and execute the command
mvn testYou should see it output something like:
Tests run: 86, Failures: 0, Errors: 0, Skipped: 0Congratulations on finishing the lab! You now have a transferable, production-quality grasp of Java regular expressions that you can take straight into your own code.
- Use
About the author
Real skill practice before real-world application
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Learn by doing
Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.
Follow your guide
All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.
Turn time into mastery
On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.