- Lab
-
Libraries: If you want this lab, consider one of these libraries.
- Core Tech
Building AI-Ready DSLs and Processors in Python
In this Code Lab, you'll design and implement a Python DSL with a custom template processor. You'll identify core DSL components, build parsing logic to transform structured input, analyze template-driven systems, and integrate AI-assisted code generation. When finished, you'll have a production-ready DSL processor that handles specialized syntax and generates executable output.
Lab Info
Table of Contents
-
Challenge
Introduction
Welcome to the Building AI-Ready DSLs and Processors in Python Code Lab. In this hands-on lab, you build the processor behind a small domain-specific language that describes data transformation pipelines: tokenizing each line into a structured statement, parsing a whole source into an ordered statement list, rendering each statement into Python through a per-directive template, and folding in an AI agent that writes the operators your templates do not cover, validating everything the agent returns before the pipeline trusts it.
### PrerequisitesAbout the tools and concepts
A **domain-specific language** (DSL) is a small, focused notation built for one job rather than for general programming. Here the DSL describes a pipeline as a sequence of directives such as `load`, `filter`, and `transform`, each carrying a few `key="value"` attributes. A **processor** is the code that reads that notation and turns it into something runnable.A template string is the simplest form of code generation. Each directive maps to a Python string with named placeholders, and
str.format(**attrs)fills those placeholders from a statement's attributes, so one template plus one set of attributes produces one line of generated code. Centralizing the templates keeps the generated output consistent and makes the DSL easy to extend.re.findallwith a pattern like(\w+)="([^"]*)"returns every key and value pair on a line as tuples, whichdict()folds straight into an attribute dictionary. AdataclassnamedStatementgives each parsed line a stable shape, adirectiveand anattrsdictionary, so the rest of the processor reads structured data rather than raw text.An AI agent extends the processor past its built-in templates. When the DSL uses a directive the templates do not cover, the agent generates the operator's Python implementation on demand. Generated code is never trusted blindly: the
astmodule parses the returned source and reports the functions it defines, so the processor can confirm the agent produced the operator it asked for before the pipeline relies on it.Before starting this lab, you should have:
- Strong Python fundamentals and string manipulation: slicing, splitting, joining, and formatting text
- Familiarity with parsing and processing concepts: turning raw input into structured data
- Understanding of functions, classes, and object-oriented design: composing small units into a larger system
- Knowledge of regular expressions or pattern matching: capturing groups from text
- Experience with Python data structures: lists, dicts, and tuples
- Basic understanding of template engines or code generation: producing source from data
The lab environment is ready to use. Run
python3 --versionfrom inside theworkspacefolder at any time to confirm the runtime. The stack is Python 3.12 with pytest 8.x available, and the AI agent is a local, deterministic stand-in for a model service, so the lab runs the same way every time with no network and no API key. The processor uses only the Python standard library,re,ast,dataclasses, andstr.format, so there is nothing extra to import. You validate each task by running its check withpython3.
The Scenario
You are a backend engineer at CarvedRock building a configuration DSL for data transformation pipelines. The DSL uses a template-based syntax to describe a workflow as an ordered list of directives, and your task is to implement the processor that reads it. The processor needs to tokenize each directive and its attributes, parse a full source into structured statements, render the built-in directives into Python through templates, and reach for an AI agent when the DSL uses a custom operator the templates do not cover, validating the generated code before the pipeline trusts it.
The Application Structure
Key files in the lab environment
- `workspace/src/config.py`: the DSL's lexical rules and the output shape for the generator, all read from one place - `workspace/src/components.py`: the `Statement` dataclass and the directive catalog that classifies each directive as built-in or custom - `workspace/src/tokenizer.py`: the helper that turns one DSL line into a structured `Statement` - `workspace/src/parser.py`: the function that reads a full source into an ordered list of statements - `workspace/src/templates.py`: the per-directive code templates and the renderer that fills them - `workspace/src/generator.py`: the assembler that wraps rendered lines into a complete pipeline function - `workspace/src/ai_agent.py`: the local AI agent that generates a custom operator on demand; treat it as a black box - `workspace/src/validator.py`: the check that confirms generated code parses and defines the expected operator - `workspace/src/processor.py`: the orchestration that ties parsing, generation, AI assistance, and validation into one report - `workspace/src/logger.py`: the shared stage logger - `workspace/run_processor.py`: the end-to-end runner that processes the sample DSL and prints the generated pipeline - `workspace/data/pipeline.dsl`: the sample CarvedRock pipeline written in the DSL Complete the tasks in order. Each task builds on the previous one.info>If you get stuck, you can refer to the provided solution code for each task, available in the
solutionsfolder.Run the full processor from the workspace directory at any point with:
python3 run_processor.pyNote: Running
python3 run_processor.pybefore completing all tasks will produce errors, as each file depends on the others being fully implemented first. -
Challenge
Establishing DSL Components and Processor Configuration
Setting the DSL's Lexical Rules in One Place
Every processor starts from a handful of rules: which character opens a comment, how an attribute is written, how generated lines are indented, and what the generated function is called. Centralizing these in
config.pymeans the tokenizer, the parser, and the generator all read from one source of truth, and changing the DSL's syntax later becomes a one-line edit rather than a hunt across modules. The values you set here shape everything downstream: the attribute pattern decides what the tokenizer can read, and the indent and function name decide what the generated pipeline looks like.Identifying the DSL Components and Routing Custom Directives
A processor needs a stable shape for the data it passes around and a way to tell which directives it already knows how to render. The
Statementdataclass gives every parsed line that shape, adirectiveand anattrsdictionary, so later stages read structured fields rather than raw strings. The directive catalog is what separates the two code paths in this lab: a directive the templates cover is rendered directly, while anything outside the catalog is a "custom" operator that the AI agent fills in later. Classifying a directive once, here, keeps that routing decision in a single place. -
Challenge
Parsing the DSL into Structured Statements
Tokenizing One Line into a Directive and Its Attributes
Before a source can be parsed, each line has to become structured data. The tokenizer does exactly one thing: it takes the first whitespace-delimited word as the directive, then pulls every
key="value"pair off the rest of the line into an attribute dictionary. Splitting on whitespace with a limit keeps the directive separate from its attributes, and a single regular expression captures the attributes regardless of how many there are. Wrapping the result in aStatementis what lets the parser, the renderer, and the generator all consume the same shape.Parsing the Whole Source into an Ordered Statement List
A pipeline is an ordered sequence, so the parser's job is to walk the source top to bottom and keep the statements in order. Blank lines and comments carry no directive, so the parser strips each line and skips the ones that are empty or that open with the comment marker. Everything that remains is a real directive, so the parser tokenizes it and appends the result. The ordered list it returns is the backbone every later stage works from.
-
Challenge
Generating Pipeline Code from Templates
Rendering One Statement from Its Template
Template-based generation is what keeps the processor maintainable: each directive owns one template, and rendering is the single point where a statement's attributes flow into that template. The renderer looks up the template by the statement's directive, then fills its named placeholders by expanding the attribute dictionary as keyword arguments. Because every directive renders the same way, adding a new built-in directive later is a matter of adding one template entry, not changing the renderer.
Assembling the Rendered Lines into a Pipeline Function
A pipeline is more than a pile of lines: it is a callable function the rest of the system can run. The generator renders every statement, indents each one so it sits inside a function body, and wraps the block under a single function header. Joining the indented lines with newlines and prefixing the header produces one valid function definition. Keeping the header name in config means the whole shape of the generated function is configurable from one place.
-
Challenge
AI-Assisted Operator Generation and Validation
Validating What the AI Agent Generates
Generated code cannot be trusted on sight. Before the processor accepts an operator the AI agent wrote, it has to confirm two things: that the source is valid Python and that it actually defines the operator that was requested. Parsing the source with
ast.parsecatches a syntax error without ever running the code, and walking the resulting tree for function definitions reveals exactly what the agent produced. Checking the expected name against that set is what turns the agent returned something into the agent returned the right thing.Running the AI Step and Assembling the Report
The final task ties every layer together. The processor parses the source, splits the statements into the built-in directives the templates render and the custom directives the agent must fill, and generates the built-in pipeline. For each custom directive it asks the agent for an operator and validates the result, recording whether each one passed. A run is judged valid only when every generated operator checks out, and the report bundles the statement count, the custom operators, the verdict, and the generated code into one object the runner can print. That report is the honest summary a real platform would hand to whatever consumes the processor.
-
Challenge
Run the Full Processor
Now that every task is complete, run the end-to-end processor to watch the DSL turn into a pipeline and the agent fill the operator the templates do not cover.
-
Confirm the runtime is available:
python3 --version -
Process the sample DSL from the workspace directory:
python3 run_processor.py -
Watch the log stream print an
[INIT]line as the source loads, then a[PARSE]line reporting the statement count, an[AI]line naming the custom operators the agent generated, a[VALIDATE]line reporting whether every generated operator is valid, the generatedrun_pipelinefunction, and finally a[DONE]line reporting how long the run took. -
Confirm the
[PARSE]line reports 5 statements, the[AI]line namesdedupeas the custom operator, and the[VALIDATE]line reports True. -
Read the generated
run_pipelinefunction: the four built-in directives,load,filter,transform, andwrite, appear rendered in order, whilededupeis handled by the agent path rather than a template, which is exactly the split the processor is built to handle.
Expected Result: Every layer you built is visible in one run: the tokenizer and parser turn the DSL into ordered statements, the templates render the built-in directives into a pipeline function, the AI agent fills the custom
dedupeoperator, and the validator confirms the generated code before the processor reports a clean, valid result. -
-
Challenge
Conclusion
Congratulations on completing the Building AI-Ready DSLs and Processors in Python lab! You have built the processor behind a small domain-specific language: tokenizing each line into a structured statement, parsing a full source into an ordered list, rendering directives into Python through templates, and folding in an AI agent that fills the operators your templates do not cover, validating everything it returns. These are the patterns you need to design and extend AI-ready DSLs and code generators.
What You Have Accomplished
- Set the DSL Syntax and Processor Limits: Defined the comment marker, the attribute pattern, the indent, and the function name once in a shared config every module reads from.
- Defined the Statement Component and Classified Directives: Gave each parsed line a stable shape and built the catalog that routes a directive to a template or to the AI agent.
- Tokenized One DSL Line: Split a line into its directive and a clean attribute dictionary captured by a single regular expression.
- Parsed the Full DSL Source: Walked the source top to bottom, skipped blanks and comments, and returned an ordered list of statements.
- Rendered One Statement from Its Template: Filled a per-directive template with a statement's attributes to produce one line of generated code.
- Assembled the Full Pipeline: Wrapped the rendered, indented lines under one function header into a complete pipeline definition.
- Validated a Generated Operator: Parsed generated source with
astand confirmed it defines the requested operator before trusting it. - Ran the AI-Assisted Step and Assembled the Report: Routed custom directives through the agent, validated each one, and returned a report carrying the count, the operators, the verdict, and the code.
Key Takeaways
- A DSL plus a template-based processor separates what a workflow should do from how the code is produced, so the notation stays readable while the generation stays consistent.
- Centralizing the DSL's lexical rules in one config makes the syntax and the generated output configurable from a single place.
- A single capturing regular expression turns
key="value"attributes into a dictionary regardless of how many appear on a line. - A per-directive template plus
str.format(**attrs)is the simplest reliable code generator, and adding a directive becomes one new template entry. - An AI agent extends a processor past its built-in templates, but generated code must be validated, parsed with
astand checked for the expected definition, before the pipeline relies on it. - Splitting statements into built-in and custom paths keeps template rendering and AI generation cleanly separated while one report ties the whole run together.
Experiment Before You Go
You still have time in the lab environment. Try these explorations:
- Add a new built-in directive by adding one entry to
TEMPLATESand toBUILTIN_DIRECTIVES, then add a matching line todata/pipeline.dsland rerun the processor to watch it render in the generated function. - Add a second custom directive to
data/pipeline.dsland watch the[AI]line name both operators and the processor validate each one. - Edit
ai_agent.pyso the generated source defines a differently named function, then rerun and watchvalidate_generatedflip the run to invalid because the expected operator is missing. - Change
INDENTto a different width inconfig.pyand rerun to see the generated function body shift, all from one constant. - Add a comment line or a blank line to
data/pipeline.dsland confirm the parser skips it without changing the statement count.
About the author
Real skill practice before real-world application
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Learn by doing
Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.
Follow your guide
All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.
Turn time into mastery
On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.