Building Your First ETL Pipeline Using Azure Databricks

In this course, you will learn about the Spark based Azure Databricks platform, see how to setup the environment, quickly build extract, transform, and load steps of your data pipelines, orchestrate it end-to-end, and run it automatically and reliably.
Course info
Rating
(14)
Level
Beginner
Updated
Oct 17, 2019
Duration
2h 41m
Table of contents
Description
Course info
Rating
(14)
Level
Beginner
Updated
Oct 17, 2019
Duration
2h 41m
Description

With an exponential growth in data volumes, increase in types of data sources, faster data processing needs and dynamically changing business requirements, traditional ETL tools are facing the challenge to keep up to the needs of modern data pipelines. While Apache Spark is very popular for big data processing and can help us overcome these challenges, managing the Spark environment is no cakewalk. In this course, Building Your First ETL Pipeline Using Azure Databricks, you will gain the ability to use the Spark based Databricks platform running on Microsoft Azure, and leverage its features to quickly build and orchestrate an end-to-end ETL pipeline. And all this while learning about collaboration options and optimizations that it brings, but without worrying about the infrastructure management. First, you will learn about the fundamentals of Spark, about the Databricks platform and features, and how it is runs on Microsoft Azure. Next, you will discover how to setup the environment, like workspace, clusters and security, and build each phase of extract, transform and load separately, to implement the dimensional model. Finally, you will explore how to orchestrate that using Databricks jobs and Azure Data Factory, followed by other features, like Databricks APIs and Delta Lake, to help you build automated and reliable data pipelines. When you’re finished with this course, you will have the skills and knowledge of Azure Databricks platform needed to build and orchestrate an end-to-end ETL pipeline.

About the author
About the author

Mohit is a Data Engineer, a Microsoft Certified Trainer (MCT) and a consultant. Mohit has 15+ years of extensive experience in architecting large scale Business Intelligence, Data Warehousing and Big Data solutions with companies like Microsoft and some leading investment banks.

Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi everyone. My name is Mohit Batra, and welcome to my course, Building Your First ETL Pipeline Using Azure Databricks. With the exponential growth in data volumes, faster data processing needs, and dynamically changing business requirements, traditional ETL tools face the challenge. While Apache Spark can help us overcome this, managing the Spark environment is no cakewalk. Wouldn't it be great to have a cloud service just to do that? This course walks you through Azure Databricks, which is a Spark-based Unified Analytics Platform running on Microsoft Azure, and you'll see how to quickly build the extract, transform, and load strips of your data pipelines. Some of the major topics that we'll cover include understanding the architecture and components of Azure Databricks, setting up the Azure Databricks environment, building an end-to-end ETL pipeline on the platform, various ways to orchestrate the pipeline, and other features like Databricks APIs and data lakes like to help you build automated and reliable pipelines. By the end of this course, you'll be comfortable to work on the Azure Databricks platform and build production-ready ETL pipelines. Before beginning the course, I would recommend being familiar with the basics of Microsoft Azure. The beginner courses in our library can quickly get you up to speed. I hope you'll join me on this journey to learn building ETL pipelines with the Building Your First ETL Pipeline Using Azure Databricks course, here at Pluralsight.