Understanding Apache Flink

Apache Flink is a new forth generation Big Data processing tool that is changing the landscape of data processing technologies. This course teaches basic skills that you need to have to develop applications using Apache Flink.
Course info
Rating
(42)
Level
Beginner
Updated
Jun 16, 2017
Duration
1h 46m
Table of contents
Description
Course info
Rating
(42)
Level
Beginner
Updated
Jun 16, 2017
Duration
1h 46m
Description

Year after year the world is generating more and more data, and to process it we need better and more sophisticated tools. Apache Flink is a new, next generation Big Data processing tool that is capable of complex stream and batch data processing. In this course, Understanding Apache Flink, you'll learn how to write simple and complex data processing applications using Apache Flink. First, you'll cover an overview how Apache Flink works under the hood, and what it brings to the world of Big Data. Next, you'll learn the ins and outs of how to process data utilizing Apache Flink. Finally, you'll explore how to apply Apache Flink in practice. When you're finished with this course, you'll have a solid understanding of how to write applications in Apache Flink, and will have a good foundation to learn more advanced Apache Flink features.

About the author
About the author

Ivan is a software development engineer at Amazon in CloudWatch team. Before he joined Amazon he was working at Samsung R&D. In his free time, he contributes to Apache Flink.

More from the author
AWS DynamoDB Deep Dive
Intermediate
3h 21m
Oct 2, 2017
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi everyone. My name is Ivan Mushketyk, and welcome to my course, Understanding Apache Flink. I am a principle software development engineer at ViaSat, but before that, I worked at Amazon Web Services and at Samsung R&D. Every day, the world generates more and more data and the amount of data that we currently have is enormous, and to process it, we need better, more powerful tools. This course is an introduction to the new fourth generation big data processing tool called Apache Flink. This course starts with a short overview of big data technologies and describes unique Flink features. After that, you will learn how to process finite amounts of data using Flink and how to process infinite streams of data, which is where Apache Flink really shines. You won't just learn theory. You will see how to apply Flink in practice plus a data set of movie ratings and a real-time stream of Twitz. There are not many prerequisites for this course. You don't need to have prior exposure to big data, but you should have good knowledge of Java. By the end of this course, you will have a solid foundation to get started developing your own Flink applications. From here, you should feel comfortable diving into other courses about big data ecosystem. I hope you'll join me on this journey to learn Apache Flink with the Understanding Apache Flink course, at Pluralsight.

Processing Finite Amounts of Data
Hi, and welcome to this module. Previously, we talked about big data and how Flink is changing the landscape of big data processing. In this module, we'll focus on how to start using Flink for processing finite amounts of data. Even though Flink's scaling feature is stream processing, we will start with batch processing for two reasons. First, I think it's easier to learn batch processing API. In many ways, it's similar to SQL and relational databases, which a lot if developers are familiar with. Second, many concepts in Flink are similar in both batch and stream processing, so the skills that you will learn here will be useful in any case. In this module, you will see how to apply Flink to implement real-world batch processing applications. The module will start with some simple examples and gradually build up to somewhat more advanced batch processing features. To fully utilize Flink capabilities, we need to know how it works under the hood. Later in this module, I will talk about this and you will see how to use this knowledge when you're developing your applications. Just for simplicity, most of the examples in this module will be executed in an IDE, but at the end of the module, I will also show you how to run one of the algorithms in a local Flink cluster. During this module, you will work as a big data engineer for a company called Globomantics that specializes in data processing. Globomantics has acquired a dataset of movie ratings that contains ratings from thousands of users and now it needs to extract useful information out of this vast amount of data. You are in charge of this project and you decided to use Apache Flink since it's the latest generation big data tool. Let's get started.

Processing Infinite Streams of Data
Hi, and welcome to the stream processing module of this course. Before, we were writing batch applications where all data that we need to process was available in advance, but in many cases, new data constantly arrived and we need to process a stream of infinite data rather than a finite chunk. To tackle this type of data, we will learn about the most innovative Flink feature, stream processing. While stream processing is very different from batch processing, many concepts that we've learned so far are similar, so we'll build new skills on top of existing knowledge. After the previous module, we will apply new skills in practice and develop a number of stream processing applications. So what do we need stream processing for? Let's take a look at a few examples. One example would be to analyze stock prices. Stock prices are always changing and the stream of new prices never ends. If we want to make real time decisions like in algorithm trading, we need to process them here and now and the faster we can do this, the more valuable the result of analysis may be. Another example is fraud detection. A financial institution would like to process financial transactions in real time and immediately react to any suspicious activity and not wait until we accumulate all possible data, maybe only then act on it. Log analysis is another example. Service and data centers constantly produce new log data and we may want to analyze it in real time. There are many and many other examples of stream processing and it's not surprising that developers of Flink decided to focus their product on stream processing. It's a vast topic. Just as in the previous module, you will continue to help Globomantics to make sense out of big data. But now, the company is expanding and they want to analyze a stream of tweets in real time and get valuable information out of it. This task is a perfect match for stream processing and a batch of Flink.