Description
Course info
Level
Intermediate
Updated
Jun 13, 2018
Duration
1h 39m
Description

OK, so you are using Azure Data Lakes, and you think it's great. You just wish you could improve the performance of your U-SQL queries. Why does that query always read your entire data set? Why does this query take forever to complete? Like anything else in the Big Data world, your Azure Data Lake has to be structured around your data. This course, Improving Azure Data Lake Performance, will show you how to put the right structure in place. Then watch the magic start to happen! First, you'll see how an Azure Data Lake works behind the scenes – how it handles different types of data and how the storage of that data can be optimized. Next, you'll see how it's possible to optimize non-structured data. Finally, you'll be shown how structuring your data opens up a world of possibilities, including horizontal and vertical partitioning. This is where the real power of the Azure Data Lake comes to light! Horizontal partitioning allows you to defer a lot of control to the Data Lake, whereas vertical partitioning allows you – the developer – to take total control of how your data is partitioned and distributed within the Data Lake. When you're finished with this course, you'll understand how you can better optimize your jobs and save some cash. Software required: Visual Studio Community Edition 2017 with the Azure Data Lake and Stream Analytics Tools installed.

About the author
About the author

Mike loves to mess around with data and programming problems, the bigger the better. He’s worked with a variety of companies, helping to build and improve systems of all shapes and sizes.

More from the author
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hello there! My name is Mike McQuillan, and this course is all about Improving Azure Data Lake Performance. I'm a data specialist consulting with organizations large and small. And I'm here to help you improve your big data queries in Azure. You might be using an Azure Data Lake right now, but are you using it efficiently? Maybe your queries are costing you too much money. Maybe they are taking too long to execute. They might even be timing out right now. If any of this sounds familiar to you, you're in the right place. The course goes into detail about how an Azure Data Lake works and the various ways in which data can be structured within the Azure Data Lake. Correctly structuring your data can lead to big performance gains. We'll cover some major topics including how data is stored and processed within the Azure Data Lake. We'll see how to improve the performance of your U-SQL jobs and how to organize data within your Data Lake from files to databases to indexes. There's in-depth coverage of the two supported partition schemes-- horizontal partitioning and vertical partitioning. By the time you reach the end of this course, you'll know the common pitfalls of an Azure Data Lake and the techniques you need to use to avoid them. You'll have learned that with a bit of careful thought, file-based queries can be optimized to greatly reduce the amount of data your queries need to read. You'll also be well versed on horizontal partitioning, including the distribution schemes it supports, and vertical partitioning, which allows developers to take total control of the Azure Data Lake. Before viewing this course, you should already have some experience of general database concepts and, more importantly, Data Lakes. Not a big problem if you don't though. Just watch the introduction to the Azure Data Lake and U-SQL course on Pluralsight first. I look forward to guiding you to better Data Lake performance. Come and learn with me on the Improving Azure Data Lake Performance course at Pluralsight.