Expanded

Processing Streaming Data Using Apache Flink

Apache Flink is built on the concept of stream-first architecture where the stream is the source of truth.
Course info
Level
Intermediate
Updated
Dec 7, 2020
Duration
3h 20m
Table of contents
Course Overview
Getting Started with a Standalone Cluster in Flink
Integrating Flink with Apache Kafka
Processing High-velocity Streaming Data Using Windowing Operations
Processing Streaming Data Using Join Operations
Description
Course info
Level
Intermediate
Updated
Dec 7, 2020
Duration
3h 20m
Your 10-day individual free trial includes:

Expanded library

This course and over 7,000+ additional courses from our full course library.

Hands-on library

Practice and apply knowledge faster in real-world scenarios with projects and interactive courses.
*Available on Premium only
Description

Flink is a stateful, tolerant, and large scale system which works with bounded and unbounded datasets using the same underlying stream-first architecture. In this course, Processing Streaming Data Using Apache Flink, you will integrate your Flink applications with real-time Twitter feeds to perform analysis on high-velocity streams.

First, you’ll see how you can set up a standalone Flink cluster using virtual machines on a cloud platform. Next, you will install and work with the Apache Kafka reliable messaging service.

Finally, you will perform a number of transformation operations on Twitter streams, including windowing and join operations.

When you are finished with this course you will have the skills and knowledge to work with high volume and velocity data using Flink and integrate with Apache Kafka to process streaming data.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
Summarizing Data and Deducing Probabilities
Intermediate
2h 50m
Jul 8, 2021
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi. My name is Janani Ravi, and welcome to this course on Processing Streaming Data Using Apache Flink. A little about myself. I have a masters in electrical engineering from Stanford and have worked at companies such as Microsoft, Google, and Flipkart. At Google. I was one of the first engineers working on real‑time collaborative editing in Google Docs, and I hold four patents for its underlying technologies. I currently work on my own startup, Loonycorn, a studio for high‑quality video content. Flink is a stateful tolerant and a large scale system, which works with bounded and unbounded datasets using the same underlying stream first architecture. In this course, you will integrate your Flink applications with real time Twitter feeds to perform analysis on high velocity streams. First, you'll see how you can set up a standalone Flink cluster using virtual machines on a cloud platform. You will configure password‑less SSH to allow cluster machines to communicate and install and set up Flink on every machine. You will then run and monitor Flink streaming applications on this cluster. Next, you will install and work with the Apache Kafka reliable messaging service. You'll understand how Kafka publishers, consumers, and topics work, and you'll integrate your Flink streaming application to read and write data to topics in Kafka. You'll also set up a Twitter developer account, which you will use to stream Twitter messages to a Kafka topic, which can then be processed by a streaming application. Finally, you'll perform a number of transformation operations on Twitter streams, including windowing and joint operations. We'll round this course off by exploring how you can perform unit testing and end‑to‑end testing of your Flink pipelines. When you're finished with this course, you will have the skills and knowledge to work with high volume and high velocity data using Apache Flink and integrate with Apache Kafka to process streaming data.