Libraries: If you want this lab, consider one of these libraries.
Core Tech

Guided: Regex Patterns for Practical Solutions

In this lab, you will process employee data using regular expressions. You will work on structured data to filter employee feedback, extract client information, analyze text, enhance client privacy, and systematically parse addresses.

Get started Contact sales

Lab Info

Level

Intermediate

Last updated

Jan 09, 2026

Duration

1h 13m

Challenge

Introduction
Hello Developers! Your goal in this lab is to process employee data using regular expression. You will use Python and a command-line interface to process the data.

Lab Scope

You will work on five steps to complete this lab, as shown:
1. Use negated character class and quantification techniques.
2. Use non-capturing groups and named capturing groups.
3. Utilize lookahead and lookbehind assertions.
4. Implement re.sub() function.
5. Implement re.split() function.
Lab Structure

The lab directory structure is shown below:
- content: This directory holds two data files: train_data.csv (used for creating the functions and validating the test cases) and test_data.csv (to be used with the menu.py file). Each of these files have seven features:
  
  fullID: The complete ID of the employee. It encodes four different departments namely - HR (Human Resources), SA (Sales), DE (Developers), and IT (Support). For example, HR-758846
  
  name: The full name of the employee. For example, John Smith
  
  address: The full address of the employee. For example, 35, Second Avenue, Los Angeles, 90210
  
  task: The task assigned to the employee. For example, Project Brave: Work with Emily on the documentation task.
  
  report: The report submitted by the employees on the assigned task. For example, I have been helping Emily for three days.
  
  feedback: Feedback for the employees submitted by their managers. For sales employees, feedback has a date component at the end. For example, John has never let the company down.
  
  cemail: The client email addresses. For example, [email protected]
- src: This directory contains six files - datafile, step1, step2, step3, step4, and step5. The datafile file is already populated with code and fetches the content of the CSV file. You will be working in the remaining step files.
- solutions: This directory contains the solutions for each step. You can access these files if you get stuck.
- menu.py: You must run this file after you have validated all your test cases from the five steps. This file fetches your written functions and applies them on the test_data.csv file.

Challenge

Use Negated Character Class and Quantification Techniques

In this first step, you must filter out the employees who are working in the sales department along with their feedback.

You will work in the src/step1.py file using the content/train_data.csv data. The script already has two populated functions:

step1_features() function which fetches the fullID, name, and feedback features for this step.
display_sales_emp() function which displays the names of sales employees.

In the following task you will use the negated character class and `re.match()` function:

The negated character class is denoted by [^pattern]. The carat ^ sign must be placed after the opening square bracket. You use it when you do not want to include the pattern in your search. For instance, [^\d] will match any pattern that is not a digit.
The re.match(pattern, string) function starts at the beginning of a string to match zero or more characters of the provided pattern.

The task also requires you to implement a for loop and a list comprehension.

Syntax of a for loop:

for i in a_list: 
   print(i)

Syntax of a list comprehension:

[val_if_cond_true for i in a_list if a_condition]
``` info> **DID YOU KNOW?** 
<br> 1. The function of the <code>^</code> character varies depending on whether it is placed inside or outside the square brackets.
<br>2. Python also provides a dictionary comprehension <code>{}</code> similar to a list comprehension.</br></br> In the following task, you will use greedy quantifiers and the <code>re.findall()</code> function including a nested `for` loop list comprehension.

* Greedy quantifiers use a pattern to match a string as much as possible. Some of the greedy quantifiers are:  <code>*</code>, <code>+</code>, and <code>{m,n}</code>.
	
* The <code>re.findall(pattern, string)</code> function searches a string for all non-overlapping matches of a given pattern.

* Syntax of a nested `for` loop list comprehension: 
```python
[val for sub in parent for val in sub]
``` Lazy quantification operates in the opposite manner compared to greedy quantification, as it aims to match the smallest possible string that satisfies the given pattern. The common patterns are: `*?`, `+?`, and `{m,n}?`.

Challenge

Use Non-Capturing Groups and Named Capturing Groups
In this step, you will work in the src/step2.py file that holds the already populated step2_features() function. This function returns the fullID and cemail features. Groups serve two purposes in regular expressions:
1. Grouping: Unites multiple tokens together.
2. Capturing: Captures groups for future references.
However, it is not mandatory to always refer to a group in the future. Hence, the introduction of non-capturing groups. You use them only for grouping and they are represented using the syntax (?:pattern).
In the following task, you will use a non-capturing group and the re.search(pattern, string) function which checks the string and returns the location of the first pattern match. Non-capturing groups are useful when you don't need to reference a specific part of a matched pattern. However, what if you need to extract specific parts from a pattern? This is where named-capturing groups prove invaluable.

In the following task, you use named-capturing groups to give names to groups and reference them in the future. They are represented by the syntax: (?Ppattern). info> INFORMATION ADD-ON!
You must start a named-capturing group with either a letter or an underscore. You cannot start the name with a digit though you can include digits in the name. You can use the following link to learn about the named groups across various programming languages.
Challenge

Utilize Lookahead and Lookbehind Assertions
In this step, you will work in the src/step3.py file. Much like the other steps, the script already has a function,step3_features(), to extract the features, task and report, that will be used in this step.

You will work on the following regular expression assertions:
- Lookahead: Matches content, which is followed by a pattern.
- Lookbehind: Matches content, which is preceded by a pattern.
Challenge

Implement the `re.sub()` Function
Securing the identity of clients is important in any organization. In this step, you will mask the names of your clients in their email addresses with the pattern xxxx.

To complete the task in this step, you will work in the src/step4.py file that already has a populated function,step4_features(), returning the clients' email addresses, cemail. In the following task, you will depend upon the re.sub(pattern, replacement, content) function which scans content based on a regex pattern and replaces the existing content with a user-defined content, in this case xxxx. You will also create a new Python function. The syntax of a Python function is:
```
def function_name(argument):
   # body
   return a_value
``` info> **DID YOU KNOW?** <br>To keep the script short, you can skip creating the <code>replace_name()</code> function and instead use the <code>re.sub()</code> function directly inside the list comprehension. </br>
```
Challenge

Implement the `re.split()` Function

In this final step of the lab, you will split each employees' address into separate components - house number, street name, city, and postal code.

To do so, you will work in the src/step5.py file which already has a populated function, step5_features(). This function returns the complete address of each employee. You will use the re.split(pattern, content) function to complete this task. This function splits the content by a given pattern. In this case, the pattern is a comma (,). You can also control the number of splits produced by this function using the maxsplit argument. Congratulations! You have successfully completed this lab, thus further improving your regular expression knowledge.

Test your Functions on New Data

It is time to apply your written functions on untouched data, test_data.csv. Run the menu.py file in the Terminal and follow menu steps to process the file. Review the output of each step to validate the functionality of your implemented functions. info> TIPS FOR AN IMPROVED REGEX PATTERN
1. Keep the regex simple and prioritize readability even if it increases the regex length.
2. Do not feed raw data to your regex. First, clean the data.
3. Use lazy quantifiers to increase the regex performance.
4. Use anchors and quantifiers carefully.
5. Always test your regex pattern against various input data to ensure you have considered all of the cases.

About the author

Chhaya Wagmi

Written content author.

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Guided: Regex Patterns for Practical Solutions

Lab Info

Table of Contents

Introduction

Lab Scope

Lab Structure

Use Negated Character Class and Quantification Techniques

Use Non-Capturing Groups and Named Capturing Groups

Utilize Lookahead and Lookbehind Assertions

Implement the `re.sub()` Function

Implement the `re.split()` Function

Test your Functions on New Data

About the author

Real skill practice before real-world application

Learn by doing

Follow your guide

Turn time into mastery

Get started with Pluralsight