The Building Blocks of Hadoop - HDFS, MapReduce, and YARN

Processing billions of records requires a deep understanding of distributed computing. In this course, you'll get introduced to Hadoop, an open-source distributed computing framework that can help you do just that.
Course info
Rating
(175)
Level
Beginner
Updated
November 4, 2016
Duration
2h 18m
Table of contents
Description
Course info
Rating
(175)
Level
Beginner
Updated
November 4, 2016
Duration
2h 18m
Description

You know how to write Java code and you know what processing you want to perform on your huge dataset. But, can you use the Hadoop distributed framework effectively to get your work done? This course, The Building Blocks of Hadoop ­ HDFS, MapReduce, and YARN, gives you a fundamental understanding of the building blocks of Hadoop: HDFS for storage, MapReduce for processing, and YARN for cluster management, to help you bridge the gap between programming and big data analysis. First, you'll get a complete architecture overview for Hadoop. Next, you'll learn how to set up a pseudo-distributed Hadoop environment and submit and monitor tasks on that environment. And finally, you'll understand the configuration choices you can make for stability, reliability optimized task scheduling on your distributed system. By the end of this course you'll have gained a strong understanding of the building blocks needed in order for you to use Hadoop effectively.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real time collaborative editing framework.

More from the author
Building Classification Models with TensorFlow
Intermediate
3h 16m
19 Oct 2017
More courses by Janani Ravi
Transcript
Transcript

Hi, my name is Janani Ravi and I’m very happy to meet you today. I have a Masters in EE from Stanford and have worked at companies such as Microsoft, Google and Flipkart. At Google I was one of the first engineers working on real time collaborative editing in Google Docs and I hold 4 patents for its underlying technologies. I currently work on my own startup, Loonycorn, a studio for high quality video content.

This course focuses on the most widely used distributed computing environment today, the Hadoop framework. This course gives you a fundamental understanding of the building blocks of Hadoop - HDFSforstorage,the MapReduce programming model forprocessing andfinally the resource negotiator, YARN for cluster management.

This will help you:
1. Bridge the gap between plain vanilla programming and big data analysis
2. Learn how to set up a pseudo-distributed Hadoop environment
3. Submit and monitor tasks on that environment using the built-in web interface
4. Understand the configuration choices you can make for stability, reliability and optimized task scheduling on your distributed system.

I hope you’ll join me on this journey with The Building Blocks of Hadoop - HDFS, MapReduce, and YARN, course at Pluralsight.