Applying the Lambda Architecture with Spark, Kafka, and Cassandra
This course introduces how to build robust, scalable, real-time big data systems using a variety of Apache Spark's APIs, including the Streaming, DataFrame, SQL, and DataSources APIs, integrated with Apache Kafka, HDFS and Apache Cassandra.
What you'll learn
This course aims to get beyond all the hype in the big data world and focus on what really works for building robust, highly-scalable batch and real-time systems. In this course, Applying the Lambda Architecture with Spark, Kafka, and Cassandra, you'll string together different technologies that fit well and have been designed by some of the companies with the most demanding data requirements (such as Facebook, Twitter, and LinkedIn) to companies that are leading the way in the design of data processing frameworks, like Apache Spark, which plays an integral role throughout this course. You'll look at each individual component and work out details about their architecture that make them good fits for building a system based on the Lambda Architecture. You'll continue to build out a full application from scratch, starting with a small application that simulates the production of data in a stream, all the way to addressing global state, non-associative calculations, application upgrades and restarts, and finally presenting real-time and batch views in Cassandra. When you're finished with this course, you'll be ready to hit the ground running with these technologies to build better data systems than ever.
Table of contents
- Defining the Lambda Architecture 5m
- What Are We Building? 2m
- Setting up Your Environment: Demo 5m
- Tools We'll Need: Demo 4m
- Installing the Course VM: Demo 5m
- Fast Track to Scala: Basics 6m
- Fast Track to Scala: Language Features 12m
- Fast Track to Scala: Collections 4m
- Spark with Zeppelin: Demo 6m
- Summary 2m
- Introduction to Spark 7m
- Spark Components and Scheduling 7m
- Getting Started: Log Producer Demo 11m
- First Spark Job: Demo 5m
- Aggregations with RDD API: Demo 10m
- Aggregations with DataFrame API: Demo 9m
- Saving to HDFS and Executing on YARN: Demo 9m
- Querying Data with Spark DataSources API: Demo 4m
- Summary 1m
- Intro 1m
- Spark Streaming Fundamentals 6m
- DStream vs. RDD 2m
- Using transform and foreachRDD 2m
- SparkSQL in Streaming Applications 1m
- Streaming Receiver Model 4m
- Creating Spark Streaming Application: Demo 9m
- Streaming Log Producer: Running with Zeppelin: Demo 6m
- Refactoring Streaming Application: Demo 9m
- Spark Streaming with SparkSQL Aggregations: Demo 9m
- Streaming Aggregations with Zeppelin: Demo 9m
- Summary 1m
- Intro 1m
- Checkpointing in Spark 2m
- Window Operations 8m
- Visualizing Stateful Transformations 3m
- Stateful Transformations: updateStateByKey 7m
- State Management Using updateStateByKey: Demo 10m
- Stateful Transformations: mapWithState 6m
- Better State Management Using mapWithState: Demo 8m
- Stateful Cardinality Estimation: Unique Counts Using HyperLogLog 3m
- Approximating Unique Visitors Using HLL: Demo 14m
- Evaluating Approximation Performance with Zeppelin: Demo 7m
- Summary 1m
- Introduction to Kafka 5m
- Kafka Broker 6m
- Kafka Producer 3m
- Partition Assignment and Consumers 7m
- Messaging Models 3m
- Kafka Producer: Demo 10m
- Spark Streaming Kafka Receiver: Demo 7m
- Spark Kafka Receiver API 6m
- Spark Kafka Direct Streaming API 3m
- Direct Streaming API: Demo 3m
- Direct Stream to HDFS 3m
- Direct Stream to HDFS: Demo 15m
- Streaming Resiliency: Demo 8m
- Batch Processing from HDFS with Data Sources API: Demo 3m
- Summary 1m
- Introduction 1m
- Cassandra's Design 3m
- Relational Database vs. Cassaandra 1m
- Spark Cassandra Connector 1m
- Reading Using DataFrames and Spark SQL 2m
- Creating Keyspace and Cassandra Tables: Demo 6m
- Data Modeling with Cassandra: Part 1 3m
- Data Modeling with Cassandra: Part 2 3m
- Composite Keys in Cassandra 2m
- Modeling Time Series Data with Cassandra 2m
- Spark Streaming Realtime Cassandra Views: Demo 7m
- Spark Batch Cassandra Views: Demo 1m
- Summary 1m