Course

Skills

Applying the Lambda Architecture with Spark, Kafka, and Cassandra

by Ahmad Alkilani

This course introduces how to build robust, scalable, real-time big data systems using a variety of Apache Spark's APIs, including the Streaming, DataFrame, SQL, and DataSources APIs, integrated with Apache Kafka, HDFS and Apache Cassandra.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(120)

Level

Beginner

Updated

Jun 14, 2024

Duration

6h 4m

What you'll learn

This course aims to get beyond all the hype in the big data world and focus on what really works for building robust, highly-scalable batch and real-time systems. In this course, Applying the Lambda Architecture with Spark, Kafka, and Cassandra, you'll string together different technologies that fit well and have been designed by some of the companies with the most demanding data requirements (such as Facebook, Twitter, and LinkedIn) to companies that are leading the way in the design of data processing frameworks, like Apache Spark, which plays an integral role throughout this course. You'll look at each individual component and work out details about their architecture that make them good fits for building a system based on the Lambda Architecture. You'll continue to build out a full application from scratch, starting with a small application that simulates the production of data in a stream, all the way to addressing global state, non-associative calculations, application upgrades and restarts, and finally presenting real-time and batch views in Cassandra. When you're finished with this course, you'll be ready to hit the ground running with these technologies to build better data systems than ever.

Course Overview

2mins

Course Overview 2m

A Modern Big Data Architecture

51mins

Batch Layer with Apache Spark

64mins

Introduction to Spark 7m
Spark Components and Scheduling 7m
Getting Started: Log Producer Demo 11m
First Spark Job: Demo 5m
Aggregations with RDD API: Demo 10m
Aggregations with DataFrame API: Demo 9m
Saving to HDFS and Executing on YARN: Demo 9m
Querying Data with Spark DataSources API: Demo 4m
Summary 1m

Speed Layer with Spark Streaming

60mins

Intro 1m
Spark Streaming Fundamentals 6m
DStream vs. RDD 2m
Using transform and foreachRDD 2m
SparkSQL in Streaming Applications 1m
Streaming Receiver Model 4m
Creating Spark Streaming Application: Demo 9m
Streaming Log Producer: Running with Zeppelin: Demo 6m
Refactoring Streaming Application: Demo 9m
Spark Streaming with SparkSQL Aggregations: Demo 9m
Streaming Aggregations with Zeppelin: Demo 9m
Summary 1m

Advanced Streaming Operations

70mins

Intro 1m
Checkpointing in Spark 2m
Window Operations 8m
Visualizing Stateful Transformations 3m
Stateful Transformations: updateStateByKey 7m
State Management Using updateStateByKey: Demo 10m
Stateful Transformations: mapWithState 6m
Better State Management Using mapWithState: Demo 8m
Stateful Cardinality Estimation: Unique Counts Using HyperLogLog 3m
Approximating Unique Visitors Using HLL: Demo 14m
Evaluating Approximation Performance with Zeppelin: Demo 7m
Summary 1m

Streaming Ingest with Kafka and Spark Streaming

82mins

Introduction to Kafka 5m
Kafka Broker 6m
Kafka Producer 3m
Partition Assignment and Consumers 7m
Messaging Models 3m
Kafka Producer: Demo 10m
Spark Streaming Kafka Receiver: Demo 7m
Spark Kafka Receiver API 6m
Spark Kafka Direct Streaming API 3m
Direct Streaming API: Demo 3m
Direct Stream to HDFS 3m
Direct Stream to HDFS: Demo 15m
Streaming Resiliency: Demo 8m
Batch Processing from HDFS with Data Sources API: Demo 3m
Summary 1m

Persisting with Cassandra

32mins

Introduction 1m
Cassandra's Design 3m
Relational Database vs. Cassaandra 1m
Spark Cassandra Connector 1m
Reading Using DataFrames and Spark SQL 2m
Creating Keyspace and Cassandra Tables: Demo 6m
Data Modeling with Cassandra: Part 1 3m
Data Modeling with Cassandra: Part 2 3m
Composite Keys in Cassandra 2m
Modeling Time Series Data with Cassandra 2m
Spark Streaming Realtime Cassandra Views: Demo 7m
Spark Batch Cassandra Views: Demo 1m
Summary 1m

About the author

Ahmad Alkilani

Ahmad Alkilani is a Data Architect specializing in the implementation of high-performance compute platforms, data warehouses and BI systems. Author of ForestFlow, an LFAI policy-based machine learning model server. Ahmad enjoys over 16 years of broad IT experience from traditional ODBMS to large-scale big data systems and No-SQL databases. He enjoys speaking at various user groups and national conferences. When not tinkering with new code or consulting on projects, Ahmad takes pleasure in spen... more

See more courses by Ahmad Alkilani

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(120)

Level

Beginner

Updated

Jun 14, 2024

Duration

6h 4m

Ready to upskill? Get started

Contact Sales

Applying the Lambda Architecture with Spark, Kafka, and Cassandra

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Applying the Lambda Architecture with Spark, Kafka, and Cassandra

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?