Building Data Pipelines with Luigi 3 and Python
Other developers implement data pipelines by putting together a bunch of hacky scripts, that over time turn into liabilities and maintenance nightmares. Take this course to implement sane and smart data pipelines with Luigi in Python.
What you'll learn
Data arrives from various sources and needs further processing. It's very tempting to re-invent the wheel and write your own library to build data pipelines for batch processing. This results in data pipelines that are difficult to maintain. In this course, Building Data Pipelines with Luigi and Python, you’ll learn how to build data pipelines with Luigi and Python. First, you’ll explore how to build your first data pipelines with Luigi. Next, you’ll discover how to configure Luigi pipelines. Finally, you’ll learn how to run Luigi pipelines. When you’re finished with this course, you’ll have the Luigi skills and knowledge for building data pipelines that are easy to maintain.
Table of contents
A data pipeline is a series of data processing steps. Data pipelines consist of three components: a source, a processing step or steps, and a destination.
Prerequisites for this course are fluency within Python and familiarity with linux command line.
Luigi is a package within Python that helps you build complex pipelines of data intense jobs. Luigi handles dependency resolution, workflow management, visualization, handling failures, and command line integration.
Some benefits of Python are: easy to read, learn, and write, open-source, portable, dynamically typed, and provides extensive support libraries.
Data pipelines are primarily used to automate the process of extracting, transforming, and loading data.