Expanded Library

Data Management Tools on Databricks

by Kishan Iyer

This course will teach you some of the fundamental techniques to store, manage, and process data using the Databricks platform.

What you'll learn

Data is at the heart of Databricks and managing it in an optimal manner is a crucial skill for any user on this platform.

In this course, Data Management Tools on Databricks, you’ll learn to load, configure, and access data using the UI, the dbutils library, and a Spark application.

First, you'll explore the Databricks File System (DBFS), how it is implemented as a layer above object storage, and how it can be accessed using the Databricks web UI and the Databricks API. You'll also look into the use of the dbutils library, from its application in file system operations to setting up widgets in a notebook.

Next, you'll delve into management of structured data in Databricks by creating and then using managed (Delta) tables and external tables, seeing the features available for each, how they are similar, and where they differ from each other.

Finally, you'll turn your attention towards consuming and analyzing data from a Spark application built using a notebook, and glimpse into the metrics and graphs that are available for tracking executions and resources within Databricks.

When you are finished with this course, you'll have gained the necessary knowledge and skills in data management and processing on Databricks to help you store and access data in a secure and efficient manner on this platform.

About the author

I have a Masters in Computer Science from Columbia University and have worked previously as a developer and DevOps engineer. I now work at Loonycorn which is a studio for high-quality video content. My interests lie in the broad categories of Big Data, ML and Cloud.

Ready to upskill? Get started