Simple play icon Course

Handling Streaming Data with Azure Databricks Using Spark Structured Streaming

by Mohit Batra

In this course, you will deep-dive into Spark Structured Streaming, see its features in action, and use it to build end-to-end, complex & reliable streaming pipelines using PySpark. And you will be using Azure Databricks platform to build & run them.

What you'll learn

Modern data pipelines often include streaming data that needs to be processed in real-time. In a practical scenario, you would be required to deal with multiple streams and datasets, to continuously produce the results. In this course, Handling Streaming Data with Azure Databricks Using Spark Structured Streaming, you will learn how to use Spark Structured Streaming on Databricks platform, which is running on Microsoft Azure, and leverage its features to build end-to-end streaming pipelines. First, you will see a quick recap of Spark Structured Streaming processing model; understand the scenario that we will implement, and complete the environment setup. Next, you will learn how to configure sources and sinks, and build each phase of the streaming pipeline – by extracting the data from various sources, transforming it, and loading it into multiple sinks – Azure Data Lake, Azure Event Hubs, and Azure SQL. You will also see the different timestamps associated with an event, and how to aggregate data using Windows. Next, you will see how to combine a stream, with static or historical datasets. And how to combine multiple streams together. Finally, you will learn how to build a production ready pipeline, schedule it as a job in Databricks, and manage them using Databricks CLI. When you are finished with this course, you will be comfortable to build complex streaming pipelines, running on Azure Databricks, to solve a variety of business problems.

About the author

Mohit is a Data Engineer, a Microsoft Certified Trainer (MCT) and a consultant. Mohit has 15+ years of extensive experience in architecting large scale Business Intelligence, Data Warehousing and Big Data solutions with companies like Microsoft and some leading investment banks. As an expert in his field, Mohit has often shared his knowledge in Azure, Spark, SQL Server and Power BI at various public forums and as a corporate trainer. Mohit truly loves to teach and enjoys producing high-quality,... more

Ready to upskill? Get started