The Building Blocks of Hadoop - HDFS, MapReduce, and YARN
Processing billions of records requires a deep understanding of distributed computing. In this course, you'll get introduced to Hadoop, an open-source distributed computing framework that can help you do just that.
What you'll learn
You know how to write Java code and you know what processing you want to perform on your huge dataset. But, can you use the Hadoop distributed framework effectively to get your work done?
This course, The Building Blocks of Hadoop HDFS, MapReduce, and YARN, gives you a fundamental understanding of the building blocks of Hadoop:
- HDFS for storage
- MapReduce for processing
- YARN for cluster management
First, you'll get a complete architecture overview for Hadoop.
Next, you'll learn how to set up a pseudo-distributed Hadoop environment and submit and monitor tasks on that environment.
Finally, you'll understand the configuration choices you can make for stability, reliability optimized task scheduling on your distributed system.
By the end of this Hadoop tutorial you'll have gained a strong understanding of the building blocks needed in order for you to use Hadoop effectively.
Table of contents
- Hadoop Install Modes 5m
- Installing Hadoop in Standalone Mode 7m
- Pseudo-distributed Mode: Setting up SSH 5m
- Pseudo-distributed Mode: The JAVA_HOME Environment Variable 3m
- Pseudo-distributed Mode: Configuration Settings 4m
- Pseudo-distributed Mode: Starting HDFS and YARN 5m
- Psuedo-distributed Mode: Monitoring the Cluster 5m
HDFS is the Hadoop Distributed File System, the primary data storage system used by Hadoop applications to scale a single Apache Hadoop cluster to hundreds of nodes.
MapReduce is a framework and java-based programming model used for processing large amounts of data. The map procedure filters and sorts the data, and the reduce method performs a summary operation.
YARN stands for Yet Another Resource Negotiator. It is a large-scale, distributed operating system for big data apps that allows the data in HDFS to be processed and run by data processing engines.
This course will introduce you to Hadoop and its basic building blocks. Topics covered include:
- An introduction to Hadoop
- Installing Hadoop
- Storing data with HDFS
- Processing data with MapReduce
- Scheduling and managing tasks with YARN
- Much more
Anyone who wants to learn Hadoop and its building blocks of HDFS, MapReduce, and YARN should take this tutorial! If you need help processing vast numbers of records and want to understand distributed computing, this course is for you.
If you know how to write Java code and you know what processing you want to perform on your huge dataset then you should be good to go in this course. No prior experience with Hadoop is required.