-
Course
- Data
Performance Optimization in Databricks
Boosting performance in your workflows requires Databricks optimization techniques that minimize costs. This course will focus on improving skills associated with cluster configuration and query tuning, as well as data partitioning and caching.
What you'll learn
Databricks performance optimization ensures efficient, scalable, and cost-effective data processing in Delta Tables and parquet file outputs. Best practices evolve over time and Databricks is keeping pace for the open-source community.
In this course, Performance Optimization in Databricks, you will dive into causes and solutions for performance issues like skewed data and long processing queries.
First, you will see strategies like z-ordering and using the optimize method for compact data storage.
Next, you will learn techniques for query optimization, including best practices for writing efficient SQL and partitioning strategies to reduce execution time.
Finally, you will investigate cluster configuration for resource allocation, choosing the right cluster size with auto scaling, and leveraging Databricks’ compute options like Photon for enhanced processing speed.
When you are finished with this course, you will have the skills needed to performance-tune processes and enhance your Databricks environment while minimizing cost.
Table of contents
About the author
Microsoft Data Platform MVP specializing in SQL Server, Microsoft Business Intelligence (SSIS, SSAS & SSRS) and Power BI.
More Courses by Thomas