• Labs icon Lab
  • Data
Labs

Create Complex DAGs and Task Dependencies in Apache Airflow

This lab introduces you to the core concepts of branching logic and explicit task dependency management in Apache Airflow—essential skills for designing dynamic, modular workflows. You will explore key Airflow functionalities such as the BranchPythonOperator, task dependency methods, and trigger rules to build DAGs that adapt their execution paths at runtime. This approach enables more flexible and maintainable workflow designs that reflect real-world decision-making processes. Throughout this lab, you will gain hands-on experience in defining a DAG that executes different task paths based on conditional logic, managing task relationships using explicit dependency methods, and controlling downstream behavior with advanced trigger configurations. This lab is ideal for data engineers, analysts, and developers seeking to create more intelligent and responsive data pipelines with Apache Airflow. By the end of the lab, you will have the practical skills to design adaptive DAGs that respond to runtime conditions and maintain execution flow with precision.

Labs

Path Info

Level
Clock icon Intermediate
Duration
Clock icon 43m
Published
Clock icon Apr 23, 2025

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Introduction to Creating Complex DAGs and Task Dependencies

    Introduction to Creating Complex DAGs and Task Dependencies

    In this lab, you will design a dynamic, branching DAG using Apache Airflow. You’ll learn how to route execution paths based on runtime logic, connect tasks using explicit dependencies, and finalize workflows with guaranteed post-processing logic using trigger rules.

    Instead of relying on real data, this lab uses randomized logic to simulate branching behavior. You'll implement a Python function that chooses between two branches, apply custom trigger rules, and validate task behavior using both the Airflow CLI and the web UI.


    🟦 Note:
    Many real-world pipelines involve conditional logic—where different paths are taken based on external state, business rules, or data conditions. This lab equips you with the skills to:

    • Implement branching workflows using BranchPythonOperator
    • Ensure consistent execution flow using DummyOperator and trigger rules
    • Validate branching logic and task flow using Airflow's visual and command-line tools
    • Guarantee post-processing with tasks that run unconditionally

    🔍 Key Concepts

    🔀 Branching and Routing with BranchPythonOperator

    • Route workflow execution to one of many branches at runtime
    • Return dynamic task IDs from a Python function

    🔗 Explicit Dependencies with set_downstream()

    • Connect tasks clearly and flexibly
    • Add dummy start and end anchors for visual and logical clarity

    🎯 Trigger Rules for Conditional Completion

    • Use none_failed_min_one_success to converge branching paths
    • Use all_done to execute final tasks regardless of upstream state

    🧪 Validation with the CLI and Web UI

    • Trigger and inspect DAG runs with airflow dags trigger and airflow tasks states-for-dag-run
    • Confirm DAG structure and behavior using Graph View in the Airflow web UI

    🟩 Learning Objectives

    By the end of this lab, you will:

    • Implement a Python branching function for dynamic execution
    • Register a branching task using BranchPythonOperator
    • Add setup and end tasks using DummyOperator
    • Connect tasks using set_downstream()
    • Configure trigger rules to handle conditional execution
    • Use the Airflow CLI and web UI to monitor and validate DAG behavior

    Now that you're ready to build smart, conditional workflows with Airflow, click Next Step to get started with branching logic! 🚀

  2. Challenge

    Create a Branching DAG Using BranchPythonOperator

    Step 1: Create a Branching DAG Using BranchPythonOperator

    In this step, you will define a branching structure inside your Airflow DAG using the BranchPythonOperator. You’ll create a function that randomly selects a branch, register it in the DAG, and wire it to downstream tasks. After triggering the DAG manually, you’ll verify execution using Airflow CLI tools to confirm the branching behavior.

    By the end of this step, your DAG will conditionally execute one of two downstream tasks and skip the other—depending on which branch is selected. You will also gain confidence in how to inspect DAG and task run states directly from the terminal.


    🟦 Note:

    • Branching allows your DAG to follow dynamic paths based on runtime logic.
    • Understanding how to define and test conditional execution helps build smarter workflows.
    • CLI tools offer efficient visibility into DAG structure and runtime behavior.

    In Airflow, you will:

    • Define a Python function that returns one of two task IDs at random.
    • Register that function with a BranchPythonOperator.
    • Link downstream dummy tasks representing conditional paths.
    • Trigger the DAG manually using the command terminal and inspect execution paths.
    • Use Airflow CLI to validate which task was executed and which was skipped.

    By completing this step, you will gain practical experience implementing branching control flow in Airflow and using CLI tools to validate your DAG’s execution logic.

  3. Challenge

    Set Task Dependencies with upstream and downstream

    Step 2: Set Task Dependencies with Upstream and Downstream Logic

    In this step, you will add structure and clarity to your DAG by introducing setup and end tasks using the DummyOperator. You’ll then define task order explicitly using set_downstream() to ensure a clean, consistent execution path—from the DAG’s entry point to its merging endpoint.

    This structure visually frames the DAG, ensuring that tasks are executed in the correct order, even when only one of the branch paths is followed. After wiring the dependencies, you’ll confirm the layout using the Airflow web UI.


    🟦 Note:

    • Start and end tasks improve DAG readability and maintainability.
    • Using set_downstream() gives you full control over task execution flow.
    • Trigger rules and visualization help prevent skipped or misaligned tasks.
    • Visual inspection ensures accuracy before proceeding to post-processing logic.

    In Airflow, you will:

    • Add a setup task named start using DummyOperator to begin the DAG.
    • Add an end task named end that uses the none_failed_min_one_success trigger rule.
    • Use set_downstream() to connect:
      start → branch_decision → (branch_a_task | branch_b_task) → end
      
    • Validate the dependency layout using Graph View in the Airflow web UI.

    By completing this step, your DAG will have a clean structure with a clear entry and exit path. You’ll confirm that your conditional branching logic correctly flows back into a single converging task using explicit dependency methods.

    🔍 Observation 2.4: Confirm Dependency Structure Using the Airflow UI

    In this observation, you will visually inspect your DAG in the Airflow web UI to confirm that task dependencies have been wired correctly. This check ensures that the flow from start through the branching logic and into the end task is configured as intended.


    🟦 Note:

    • Visual confirmation helps catch misconfigured or missing task connections.
    • A clear graph structure improves readability and maintainability of DAGs.
    • Reviewing structure in the UI is a best practice before production runs.

    🛠 Steps to Confirm the Dependency Structure

    1. Open the Airflow web UI in your lab web browser tab:
      http://localhost:8081
      Username: admin
      Password: admin

    2. In the DAGs list, locate and click on branching_example_dag.

    3. In the DAG view, select Graph View from the top navigation.

    4. Confirm that the DAG structure appears as:

      start → branch_decision → (branch_a_task | branch_b_task) → end
      

    🔍 What You’re Observing

    • start connects to branch_decision, the branching logic point.
    • Only one of the downstream branches will run per DAG execution.
    • Both branch_a_task and branch_b_task converge to the shared end task.
    • This confirms that your explicit set_downstream() calls created the correct dependencies.

    ✅ If the DAG appears as described, your dependency structure is implemented correctly and you're ready to proceed to downstream control logic.

  4. Challenge

    Run a Common Task After Branches Using Trigger Rules

    Step 3: Add a Notification Task with Conditional Trigger Rules

    In this step, you will finalize your DAG by appending a notification task that runs regardless of upstream outcomes. You’ll use a PythonOperator to define the notification logic and apply the all_done trigger rule so that it executes whether the prior task succeeded, failed, or was skipped.

    You’ll also explicitly connect the end task to the notify_completion task and validate its behavior using the Airflow web UI. This ensures a complete and fault-tolerant DAG execution chain.


    🟦 Note:

    • Notification and finalization tasks must run under all conditions.
    • trigger_rule='all_done' ensures cleanup or reporting logic is never skipped.
    • Visual confirmation helps ensure consistent post-processing behavior across DAG runs.

    In Airflow, you will:

    • Define a notify() Python function to simulate notification behavior.
    • Use a PythonOperator to register the notification task in the DAG.
    • Set the trigger rule to all_done so it always executes.
    • Connect the end task to notify_completion using set_downstream().
    • Trigger the DAG multiple times and inspect the Graph View in the Airflow web UI to validate execution flow.

    By completing this step, you will have built a full-cycle DAG that executes a conditional branch and follows it with a guaranteed final task—closing the loop on robust and observable workflow execution.

    🔍 Observation 3.3: Validate Notification Behavior in the Airflow UI

    In this observation, you will verify that the final notify_completion task always runs—regardless of which branch was selected or whether it was skipped. You’ll trigger the DAG multiple times and use the Airflow web UI to confirm consistent post-branching execution behavior.

    This observation builds on your work in Tasks 3.1 and 3.2.


    🟦 Why It Matters:

    • Helps confirm that trigger_rule='all_done' is working as intended.
    • Ensures notifications or final cleanup tasks are reliably executed.
    • Provides visual confirmation that the DAG handles branching variability gracefully.

    🛠 Steps to Trigger and Monitor the DAG

    1. Open your command terminal and trigger the DAG multiple times using unique run IDs:

      airflow dags trigger branching_example_dag --run-id test_run_11
      airflow dags trigger branching_example_dag --run-id test_run_22
      airflow dags trigger branching_example_dag --run-id test_run_33
      
    2. Open the Airflow web UI in your lab web browser tab:
      http://localhost:8081
      Username: admin
      Password: admin

    3. Navigate to branching_example_dag and click Graph View.

    4. Observe the following behavior for each run:

      • Only one of the branches (branch_a_task or branch_b_task) is executed.
      • The end task runs if the selected branch succeeds.
      • The notify_completion task always runs, regardless of upstream outcome.

    🔍 What You’re Observing

    • choose_branch() randomly selects which branch runs.
    • The end task runs if one branch completes successfully.
    • notify_completion executes every time due to trigger_rule='all_done'.

    This ensures that notifications, reports, or final logs are reliably triggered even if upstream branches are skipped or fail.


    ✅ Once you’ve observed this behavior consistently across runs, your DAG’s notification logic is fully validated.

    🎉 Congratulations on Completing the Lab!

    You have successfully completed the Create a Branching Workflow in Apache Airflow lab.
    Throughout this lab, you built a dynamic Airflow DAG that demonstrates conditional logic, explicit dependencies, and downstream task execution control using trigger rules.


    ✅ What You Accomplished

    • Implemented a branching function that randomly selects a task ID using random.choice().
    • Registered the branching logic with BranchPythonOperator.
    • Created start and end anchor tasks using DummyOperator.
    • Connected tasks explicitly using set_downstream() for clear execution flow.
    • Applied trigger_rule='none_failed_min_one_success' to merge branching paths.
    • Configured a final notification task with trigger_rule='all_done'.
    • Triggered and validated DAG runs using the command terminal with airflow dags trigger.
    • Confirmed structure and execution using Graph View in the Airflow web UI.

    🔑 Key Takeaways

    • Branching enables your DAG to follow different paths based on runtime logic.
    • Trigger rules give you control over task execution flow in conditional pipelines.
    • Dummy operators help define clear DAG boundaries and improve readability.
    • Combining CLI commands and UI tools ensures effective monitoring and debugging.

    Amazing work!
    You’ve built a branching DAG that adapts to dynamic logic and ensures consistent execution—even with variable paths. You’re now ready to design intelligent workflows that can react to data and conditions in real time. 🚀

Pinal Dave is a Pluralsight Developer Evangelist.

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.