Advanced

Building Data Pipelines with Luigi and Python

Other developers implement data pipelines by putting together a bunch of hacky scripts, that over time turn into liabilities and maintenance nightmares. Take this course to implement sane and smart data pipelines with Luigi in Python.
Course info
Level
Intermediate
Updated
Oct 12, 2020
Duration
1h 33m
Table of contents
Description
Course info
Level
Intermediate
Updated
Oct 12, 2020
Duration
1h 33m
Description

Data arrives from various sources and needs further processing. It's very tempting to re-invent the wheel and write your own library to build data pipelines for batch processing. This results in data pipelines that are difficult to maintain. In this course, Building Data Pipelines with Luigi and Python, you’ll learn how to build data pipelines with Luigi and Python. First, you’ll explore how to build your first data pipelines with Luigi. Next, you’ll discover how to configure Luigi pipelines. Finally, you’ll learn how to run Luigi pipelines. When you’re finished with this course, you’ll have the Luigi skills and knowledge for building data pipelines that are easy to maintain.

Course FAQ
Course FAQ
What is a data pipeline?

A data pipeline is a series of data processing steps. Data pipelines consist of three components: a source, a processing step or steps, and a destination.

What prerequisites are needed for this course?

Prerequisites for this course are fluency within Python and familiarity with linux command line.

What is Luigi in python?

Luigi is a package within Python that helps you build complex pipelines of data intense jobs. Luigi handles dependency resolution, workflow management, visualization, handling failures, and command line integration.

What are the benefits of python?

Some benefits of Python are: easy to read, learn, and write, open-source, portable, dynamically typed, and provides extensive support libraries.

What are data pipelines used for?

Data pipelines are primarily used to automate the process of extracting, transforming, and loading data.

About the author
About the author

As a software engineer and lifelong learner, Dan wrote a PhD thesis and many highly-cited publications on decision making and knowledge acquisition in software architecture. Dan used Microsoft technologies for many years, but moved gradually to Python, Linux and AWS to gain different perspectives of the computing world.

More from the author
Processing Data on AWS
Intermediate
1h 57m
Jul 17, 2020
More courses by Dan Tofan
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
When processing data, watch out. Many teams struggle with bunches of scripts that keep growing into a constant source of bugs and headaches. How about working smarter? How about building data pipelines instead of data headaches? Hi, I'm Dan. I am a software engineer with a PhD and two decades of software engineering experience. I prepared this course to help you build better data pipelines using Luigi and Python. Here is the plan. First, let's get started with Luigi and build some very simple pipelines. Second, let's build larger pipelines with various kinds of tasks. Third, let's configure pipelines and make them more flexible. Finally, let's look into how to run pipelines from development to production. As prerequisites, you need to know some Python and have a bit of command line familiarity. Take this course and start building better data pipelines with Luigi and Python.