Building Your First ETL Pipeline Using Azure Databricks

In this course, you will learn about the Spark based Azure Databricks platform, see how to setup the environment, quickly build extract, transform, and load steps of your data pipelines, orchestrate it end-to-end, and run it automatically and reliably.
Course info
Rating
(53)
Level
Beginner
Updated
Oct 17, 2019
Duration
2h 40m
Table of contents
Description
Course info
Rating
(53)
Level
Beginner
Updated
Oct 17, 2019
Duration
2h 40m
Description

With an exponential growth in data volumes, increase in types of data sources, faster data processing needs and dynamically changing business requirements, traditional ETL tools are facing the challenge to keep up to the needs of modern data pipelines. While Apache Spark is very popular for big data processing and can help us overcome these challenges, managing the Spark environment is no cakewalk.

In this course, Building Your First ETL Pipeline Using Azure Databricks, you will gain the ability to use the Spark based Databricks platform running on Microsoft Azure, and leverage its features to quickly build and orchestrate an end-to-end ETL pipeline. And all this while learning about collaboration options and optimizations that it brings, but without worrying about the infrastructure management.

First, you will learn about the fundamentals of Spark, about the Databricks platform and features, and how it is runs on Microsoft Azure.

Next, you will discover how to setup the environment, like workspace, clusters and security, and build each phase of extract, transform and load separately, to implement the dimensional model.

Finally, you will explore how to orchestrate that using Databricks jobs and Azure Data Factory, followed by other features, like Databricks APIs and Delta Lake, to help you build automated and reliable data pipelines.

When you’re finished with this course, you will have the skills and knowledge of Azure Databricks platform needed to build and orchestrate an end-to-end ETL pipeline.

About the author
About the author

Mohit is a Data Engineer, a Microsoft Certified Trainer (MCT) and a consultant. Mohit has 15+ years of extensive experience in architecting large scale Business Intelligence, Data Warehousing and Big Data solutions with companies like Microsoft and some leading investment banks.

Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
[Autogenerated] Hi, everyone. My name is smart but Tre and welcome to my course building your 1st 8 year pipeline with issued a breaks with exponential growth in greater volumes, faster data processing needs and dynamically changing business requirements. Traditional idiot tools face the challenge by the party spark can help us overcome this. Managing the spark environment is no cakewalk. Wouldn't it be great to have a cloud service? Just table that this course walks you through Educator Breaks, which is sparked based Unified Analytics Platform running on Microsoft Short, and you'll see how to quickly build the extract transform and Lord strips off your data pipelines. Some of the major topics that will cover include understanding the architecture and components of Federated Bricks setting of the as your data bricks environment. Building an end to an eternal pipeline on the platform, various ways to orchestrate the pipeline and other features like data bricks. AP Eyes and Delta like to help you build automated and reliable by plants. By the end of the scores, you'll be comfortable to work on a shooter bricks platform and build production really detailed pipelines. Before beginning the course, I would recommend being familiar with basic soft Microsoft ashore. The beginning courses in our library can quickly get you up to speed. I hope you'll join me on this journey to learn building. It'll pipelines with the building. Your first, it'll pipeline using your data _____ scores here in full sight.