Processing billions of records requires a deep understanding of distributed computing. In this course, you'll get introduced to Hadoop, an open-source distributed computing framework that can help you do just that.
You know how to write Java code and you know what processing you want to perform on your huge dataset. But, can you use the Hadoop distributed framework effectively to get your work done?
This course, The Building Blocks of Hadoop HDFS, MapReduce, and YARN, gives you a fundamental understanding of the building blocks of Hadoop:
HDFS for storage
MapReduce for processing
YARN for cluster management
to help you bridge the gap between programming and big data analysis.
First, you'll get a complete architecture overview for Hadoop. Next, you'll learn how to set up a pseudo-distributed Hadoop environment and submit and monitor tasks on that environment. Finally, you'll understand the configuration choices you can make for stability, reliability optimized task scheduling on your distributed system.
By the end of this Hadoop tutorial you'll have gained a strong understanding of the building blocks needed in order for you to use Hadoop effectively.
What is HDFS?
HDFS is the Hadoop Distributed File System, the primary data storage system used by Hadoop applications to scale a single Apache Hadoop cluster to hundreds of nodes.
What is MapReduce?
MapReduce is a framework and java-based programming model used for processing large amounts of data. The map procedure filters and sorts the data, and the reduce method performs a summary operation.
What is Hadoop YARN?
YARN stands for Yet Another Resource Negotiator. It is a large-scale, distributed operating system for big data apps that allows the data in HDFS to be processed and run by data processing engines.
What will I learn in this Hadoop tutorial?
This course will introduce you to Hadoop and its basic building blocks. Topics covered include:
An introduction to Hadoop
Storing data with HDFS
Processing data with MapReduce
Scheduling and managing tasks with YARN
Who should take this course?
Anyone who wants to learn Hadoop and its building blocks of HDFS, MapReduce, and YARN should take this tutorial! If you need help processing vast numbers of records and want to understand distributed computing, this course is for you.
What prerequisites are there to this course?
If you know how to write Java code and you know what processing you want to perform on your huge dataset then you should be good to go in this course. No prior experience with Hadoop is required.
A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.