- Learning Path Libraries: This path is only available in the libraries listed. To access this path, purchase a license for the corresponding library.
- Cloud
- Data
Apache Spark on Databricks
Apache Spark on Databricks is a unified analytics platform that combines the powerful data processing capabilities of Apache Spark with the collaborative and managed environment of Databricks, enabling scalable and efficient big data processing, real-time analytics, and machine learning applications in a cloud-native architecture. This learning path is intended to give learners foundational skills to start working with Apache Spark on Databricks for these purposes.
Content in this path
Beginner
You will learn Spark transformations, actions, visualizations, and functions leveraging the Databricks API. You will also learn how to transform and aggregate batch data using Spark with built-in and user defined functions, and perform windowing and join operations on batch data.
Intermediate
You will learn how to use Spark abstractions for streaming data and perform transformations on streaming data using the Spark streaming APIs on Databricks as well as how to leverage windowing, watermarking and join operations on streaming data in Spark for your specific use-cases.
Advanced
You will understand and implement important techniques for predictive analytics such as regression and classification using Apache Spark MLlib APIs on Databricks as well as learn how to implement graph algorithms such as Triangle Count and PageRank and visualize them using the GraphFrames API on Spark Databricks. You will also learn how to optimize the performance of Spark clusters by identifying and mitigating various performance issues such as data ingestion problems and leveraging the new features offered by Spark 3.
- In Apache Spark on Databricks you will learn the in's and out's of of Apache Spark via Databricks. You will learn how to handle batch data, processing streaming data, windowing and joining operations, predictive analytics using MLib, executing graph algorithms and optimizing Apache Spark.
- Intermediate programming experience in Python or Scala.
- Beginner experience with the DataFrame API.
- Apache Spark
- Databricks
- Python