Libraries: If you want this lab, consider one of these libraries.
Core Tech

Building AI-Ready DSLs and Processors in Python

In this Code Lab, you'll design and implement a Python DSL with a custom template processor. You'll identify core DSL components, build parsing logic to transform structured input, analyze template-driven systems, and integrate AI-assisted code generation. When finished, you'll have a production-ready DSL processor that handles specialized syntax and generates executable output.

Get started Contact sales

Lab Info

Level

Advanced

Last updated

Jul 07, 2026

Duration

12m

Challenge

Introduction
Welcome to the Building AI-Ready DSLs and Processors in Python Code Lab. In this hands-on lab, you build the processor behind a small domain-specific language that describes data transformation pipelines: tokenizing each line into a structured statement, parsing a whole source into an ordered statement list, rendering each statement into Python through a per-directive template, and folding in an AI agent that writes the operators your templates do not cover, validating everything the agent returns before the pipeline trusts it.

About the tools and concepts
A **domain-specific language** (DSL) is a small, focused notation built for one job rather than for general programming. Here the DSL describes a pipeline as a sequence of directives such as `load`, `filter`, and `transform`, each carrying a few `key="value"` attributes. A **processor** is the code that reads that notation and turns it into something runnable.
A template string is the simplest form of code generation. Each directive maps to a Python string with named placeholders, and str.format(**attrs) fills those placeholders from a statement's attributes, so one template plus one set of attributes produces one line of generated code. Centralizing the templates keeps the generated output consistent and makes the DSL easy to extend.

re.findall with a pattern like (\w+)="([^"]*)" returns every key and value pair on a line as tuples, which dict() folds straight into an attribute dictionary. A dataclass named Statement gives each parsed line a stable shape, a directive and an attrs dictionary, so the rest of the processor reads structured data rather than raw text.

An AI agent extends the processor past its built-in templates. When the DSL uses a directive the templates do not cover, the agent generates the operator's Python implementation on demand. Generated code is never trusted blindly: the ast module parses the returned source and reports the functions it defines, so the processor can confirm the agent produced the operator it asked for before the pipeline relies on it.

### Prerequisites
Before starting this lab, you should have:
- Strong Python fundamentals and string manipulation: slicing, splitting, joining, and formatting text
- Familiarity with parsing and processing concepts: turning raw input into structured data
- Understanding of functions, classes, and object-oriented design: composing small units into a larger system
- Knowledge of regular expressions or pattern matching: capturing groups from text
- Experience with Python data structures: lists, dicts, and tuples
- Basic understanding of template engines or code generation: producing source from data The lab environment is ready to use. Run python3 --version from inside the workspace folder at any time to confirm the runtime. The stack is Python 3.12 with pytest 8.x available, and the AI agent is a local, deterministic stand-in for a model service, so the lab runs the same way every time with no network and no API key. The processor uses only the Python standard library, re, ast, dataclasses, and str.format, so there is nothing extra to import. You validate each task by running its check with python3.
The Scenario

You are a backend engineer at CarvedRock building a configuration DSL for data transformation pipelines. The DSL uses a template-based syntax to describe a workflow as an ordered list of directives, and your task is to implement the processor that reads it. The processor needs to tokenize each directive and its attributes, parse a full source into structured statements, render the built-in directives into Python through templates, and reach for an AI agent when the DSL uses a custom operator the templates do not cover, validating the generated code before the pipeline trusts it.

The Application Structure

Key files in the lab environment
- `workspace/src/config.py`: the DSL's lexical rules and the output shape for the generator, all read from one place - `workspace/src/components.py`: the `Statement` dataclass and the directive catalog that classifies each directive as built-in or custom - `workspace/src/tokenizer.py`: the helper that turns one DSL line into a structured `Statement` - `workspace/src/parser.py`: the function that reads a full source into an ordered list of statements - `workspace/src/templates.py`: the per-directive code templates and the renderer that fills them - `workspace/src/generator.py`: the assembler that wraps rendered lines into a complete pipeline function - `workspace/src/ai_agent.py`: the local AI agent that generates a custom operator on demand; treat it as a black box - `workspace/src/validator.py`: the check that confirms generated code parses and defines the expected operator - `workspace/src/processor.py`: the orchestration that ties parsing, generation, AI assistance, and validation into one report - `workspace/src/logger.py`: the shared stage logger - `workspace/run_processor.py`: the end-to-end runner that processes the sample DSL and prints the generated pipeline - `workspace/data/pipeline.dsl`: the sample CarvedRock pipeline written in the DSL Complete the tasks in order. Each task builds on the previous one.

info>If you get stuck, you can refer to the provided solution code for each task, available in the solutions folder.

Run the full processor from the workspace directory at any point with:
```
python3 run_processor.py
```
Note: Running python3 run_processor.py before completing all tasks will produce errors, as each file depends on the others being fully implemented first.
Challenge

Establishing DSL Components and Processor Configuration

Setting the DSL's Lexical Rules in One Place

Every processor starts from a handful of rules: which character opens a comment, how an attribute is written, how generated lines are indented, and what the generated function is called. Centralizing these in config.py means the tokenizer, the parser, and the generator all read from one source of truth, and changing the DSL's syntax later becomes a one-line edit rather than a hunt across modules. The values you set here shape everything downstream: the attribute pattern decides what the tokenizer can read, and the indent and function name decide what the generated pipeline looks like.

Identifying the DSL Components and Routing Custom Directives

A processor needs a stable shape for the data it passes around and a way to tell which directives it already knows how to render. The Statement dataclass gives every parsed line that shape, a directive and an attrs dictionary, so later stages read structured fields rather than raw strings. The directive catalog is what separates the two code paths in this lab: a directive the templates cover is rendered directly, while anything outside the catalog is a "custom" operator that the AI agent fills in later. Classifying a directive once, here, keeps that routing decision in a single place.
Challenge

Parsing the DSL into Structured Statements

Tokenizing One Line into a Directive and Its Attributes

Before a source can be parsed, each line has to become structured data. The tokenizer does exactly one thing: it takes the first whitespace-delimited word as the directive, then pulls every key="value" pair off the rest of the line into an attribute dictionary. Splitting on whitespace with a limit keeps the directive separate from its attributes, and a single regular expression captures the attributes regardless of how many there are. Wrapping the result in a Statement is what lets the parser, the renderer, and the generator all consume the same shape.

Parsing the Whole Source into an Ordered Statement List

A pipeline is an ordered sequence, so the parser's job is to walk the source top to bottom and keep the statements in order. Blank lines and comments carry no directive, so the parser strips each line and skips the ones that are empty or that open with the comment marker. Everything that remains is a real directive, so the parser tokenizes it and appends the result. The ordered list it returns is the backbone every later stage works from.
Challenge

Generating Pipeline Code from Templates

Rendering One Statement from Its Template

Template-based generation is what keeps the processor maintainable: each directive owns one template, and rendering is the single point where a statement's attributes flow into that template. The renderer looks up the template by the statement's directive, then fills its named placeholders by expanding the attribute dictionary as keyword arguments. Because every directive renders the same way, adding a new built-in directive later is a matter of adding one template entry, not changing the renderer.

Assembling the Rendered Lines into a Pipeline Function

A pipeline is more than a pile of lines: it is a callable function the rest of the system can run. The generator renders every statement, indents each one so it sits inside a function body, and wraps the block under a single function header. Joining the indented lines with newlines and prefixing the header produces one valid function definition. Keeping the header name in config means the whole shape of the generated function is configurable from one place.
Challenge

AI-Assisted Operator Generation and Validation

Validating What the AI Agent Generates

Generated code cannot be trusted on sight. Before the processor accepts an operator the AI agent wrote, it has to confirm two things: that the source is valid Python and that it actually defines the operator that was requested. Parsing the source with ast.parse catches a syntax error without ever running the code, and walking the resulting tree for function definitions reveals exactly what the agent produced. Checking the expected name against that set is what turns the agent returned something into the agent returned the right thing.

Running the AI Step and Assembling the Report

The final task ties every layer together. The processor parses the source, splits the statements into the built-in directives the templates render and the custom directives the agent must fill, and generates the built-in pipeline. For each custom directive it asks the agent for an operator and validates the result, recording whether each one passed. A run is judged valid only when every generated operator checks out, and the report bundles the statement count, the custom operators, the verdict, and the generated code into one object the runner can print. That report is the honest summary a real platform would hand to whatever consumes the processor.
Challenge

Run the Full Processor
Now that every task is complete, run the end-to-end processor to watch the DSL turn into a pipeline and the agent fill the operator the templates do not cover.
1. Confirm the runtime is available:
  
  python3 --version
2. Process the sample DSL from the workspace directory:
  
  python3 run_processor.py
3. Watch the log stream print an [INIT] line as the source loads, then a [PARSE] line reporting the statement count, an [AI] line naming the custom operators the agent generated, a [VALIDATE] line reporting whether every generated operator is valid, the generated run_pipeline function, and finally a [DONE] line reporting how long the run took.
4. Confirm the [PARSE] line reports 5 statements, the [AI] line names dedupe as the custom operator, and the [VALIDATE] line reports True.
5. Read the generated run_pipeline function: the four built-in directives, load, filter, transform, and write, appear rendered in order, while dedupe is handled by the agent path rather than a template, which is exactly the split the processor is built to handle.
Expected Result: Every layer you built is visible in one run: the tokenizer and parser turn the DSL into ordered statements, the templates render the built-in directives into a pipeline function, the AI agent fills the custom dedupe operator, and the validator confirms the generated code before the processor reports a clean, valid result.
Challenge

Conclusion
Congratulations on completing the Building AI-Ready DSLs and Processors in Python lab! You have built the processor behind a small domain-specific language: tokenizing each line into a structured statement, parsing a full source into an ordered list, rendering directives into Python through templates, and folding in an AI agent that fills the operators your templates do not cover, validating everything it returns. These are the patterns you need to design and extend AI-ready DSLs and code generators.

What You Have Accomplished
1. Set the DSL Syntax and Processor Limits: Defined the comment marker, the attribute pattern, the indent, and the function name once in a shared config every module reads from.
2. Defined the Statement Component and Classified Directives: Gave each parsed line a stable shape and built the catalog that routes a directive to a template or to the AI agent.
3. Tokenized One DSL Line: Split a line into its directive and a clean attribute dictionary captured by a single regular expression.
4. Parsed the Full DSL Source: Walked the source top to bottom, skipped blanks and comments, and returned an ordered list of statements.
5. Rendered One Statement from Its Template: Filled a per-directive template with a statement's attributes to produce one line of generated code.
6. Assembled the Full Pipeline: Wrapped the rendered, indented lines under one function header into a complete pipeline definition.
7. Validated a Generated Operator: Parsed generated source with ast and confirmed it defines the requested operator before trusting it.
8. Ran the AI-Assisted Step and Assembled the Report: Routed custom directives through the agent, validated each one, and returned a report carrying the count, the operators, the verdict, and the code.
Key Takeaways
- A DSL plus a template-based processor separates what a workflow should do from how the code is produced, so the notation stays readable while the generation stays consistent.
- Centralizing the DSL's lexical rules in one config makes the syntax and the generated output configurable from a single place.
- A single capturing regular expression turns key="value" attributes into a dictionary regardless of how many appear on a line.
- A per-directive template plus str.format(**attrs) is the simplest reliable code generator, and adding a directive becomes one new template entry.
- An AI agent extends a processor past its built-in templates, but generated code must be validated, parsed with ast and checked for the expected definition, before the pipeline relies on it.
- Splitting statements into built-in and custom paths keeps template rendering and AI generation cleanly separated while one report ties the whole run together.
Experiment Before You Go

You still have time in the lab environment. Try these explorations:
- Add a new built-in directive by adding one entry to TEMPLATES and to BUILTIN_DIRECTIVES, then add a matching line to data/pipeline.dsl and rerun the processor to watch it render in the generated function.
- Add a second custom directive to data/pipeline.dsl and watch the [AI] line name both operators and the processor validate each one.
- Edit ai_agent.py so the generated source defines a differently named function, then rerun and watch validate_generated flip the run to invalid because the expected operator is missing.
- Change INDENT to a different width in config.py and rerun to see the generated function body shift, all from one constant.
- Add a comment line or a blank line to data/pipeline.dsl and confirm the parser skips it without changing the statement count.

About the author

Angel Sayani

Angel Sayani is a Certified Artificial Intelligence Expert®, CEO of IntellChromatics, author of two books in cybersecurity and IT certifications, world record holder, and a well-known cybersecurity and digital forensics expert.

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Building AI-Ready DSLs and Processors in Python

Lab Info

Table of Contents

Introduction

The Scenario

The Application Structure

Establishing DSL Components and Processor Configuration

Setting the DSL's Lexical Rules in One Place

Identifying the DSL Components and Routing Custom Directives

Parsing the DSL into Structured Statements

Tokenizing One Line into a Directive and Its Attributes

Parsing the Whole Source into an Ordered Statement List

Generating Pipeline Code from Templates

Rendering One Statement from Its Template

Assembling the Rendered Lines into a Pipeline Function

AI-Assisted Operator Generation and Validation

Validating What the AI Agent Generates

Running the AI Step and Assembling the Report

Run the Full Processor

Conclusion

What You Have Accomplished

Key Takeaways

Experiment Before You Go

About the author

Real skill practice before real-world application

Learn by doing

Follow your guide

Turn time into mastery

Get started with Pluralsight