Productionalizing Data Pipelines with Apache Airflow

This course will teach you how to master production-grade Data Pipelines with ease within Apache Airflow.
Course info
Rating
(45)
Level
Intermediate
Updated
Dec 9, 2020
Duration
2h 12m
Table of contents
Course Overview
Introducing Apache Airflow
Dissecting the Components of a Pipeline
Demystifying Common DAGs Pitfalls
Abstracting Functionality
Scaling Airflow
Final Thoughts
Description
Course info
Rating
(45)
Level
Intermediate
Updated
Dec 9, 2020
Duration
2h 12m
Description

Production-grade Data Pipelines are hard to get right. Even when they are done, every update is complex due to its central piece in every organization's infrastructure. In this course, Productionalizaing Data Pipelines with Apache Airflow, you’ll learn to master them using Apache Airflow. First, you’ll explore what Airflow is and how it creates Data Pipelines. Next, you’ll discover how to make your pipelines more resilient and predictable. Finally, you’ll learn how to distribute tasks with Celery and Kubernetes Executors. When you’re finished with this course, you’ll have the skills and knowledge of Apache Airflow needed to make any Data Pipelines production grade.

About the author
About the author

Axel Sirota has a Masters degree in Mathematics with a deep interest in Deep Learning and Machine Learning Operations. After researching in Probability, Statistics and Machine Learning optimization, he is currently working at JAMPP as a Machine Learning Research Engineer leveraging customer data for making accurate predictions at Real Time Bidding.

More from the author
More courses by Axel Sirota
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi, everyone. My name is Axel Sirota. Welcome to my course, Productionalizing Data Pipelines with Apache Airflow. I am a machine learning research engineer at Jampp, MLOps specialist, and I am very excited to present this to you. Production‑grade data pipelines are hard to get right. Even when they are done, every update is complex due to its central piece in every organization's infrastructure. Apache Airflow is an open source platform to programmatically develop, schedule, and orchestrate workflows. Our journey begins discovering the architecture of Apache Airflow and how to create data pipelines. Next, we will discover how to make your pipelines more resilient and predictable, only to master them with advanced techniques such as templating, macros, and branching. Finally, we will scale our pipelines to infinity with the CeleryExecutor in a distributed fashion. When you're finished with this course, you will have the skills and knowledge of Apache Airflow needed to make any data pipeline production grade. I hope you'll join me on this journey to learn Apache Airflow with the Productionalizing Data Pipelines with Apache Airflow course, at Pluralsight.