Simple play icon Course
Skills

The Building Blocks of Hadoop - HDFS, MapReduce, and YARN

by Janani Ravi

Processing billions of records requires a deep understanding of distributed computing. In this course, you'll get introduced to Hadoop, an open-source distributed computing framework that can help you do just that.

What you'll learn

You know how to write Java code and you know what processing you want to perform on your huge dataset. But, can you use the Hadoop distributed framework effectively to get your work done?

This course, The Building Blocks of Hadoop ­ HDFS, MapReduce, and YARN, gives you a fundamental understanding of the building blocks of Hadoop:

  • HDFS for storage
  • MapReduce for processing
  • YARN for cluster management
to help you bridge the gap between programming and big data analysis.

First, you'll get a complete architecture overview for Hadoop.
Next, you'll learn how to set up a pseudo-distributed Hadoop environment and submit and monitor tasks on that environment.
Finally, you'll understand the configuration choices you can make for stability, reliability optimized task scheduling on your distributed system.

By the end of this Hadoop tutorial you'll have gained a strong understanding of the building blocks needed in order for you to use Hadoop effectively.

Course FAQ

What is HDFS?

HDFS is the Hadoop Distributed File System, the primary data storage system used by Hadoop applications to scale a single Apache Hadoop cluster to hundreds of nodes.

What is MapReduce?

MapReduce is a framework and java-based programming model used for processing large amounts of data. The map procedure filters and sorts the data, and the reduce method performs a summary operation.

What is Hadoop YARN?

YARN stands for Yet Another Resource Negotiator. It is a large-scale, distributed operating system for big data apps that allows the data in HDFS to be processed and run by data processing engines.

What will I learn in this Hadoop tutorial?

This course will introduce you to Hadoop and its basic building blocks. Topics covered include:

  • An introduction to Hadoop
  • Installing Hadoop
  • Storing data with HDFS
  • Processing data with MapReduce
  • Scheduling and managing tasks with YARN
  • Much more
Who should take this course?

Anyone who wants to learn Hadoop and its building blocks of HDFS, MapReduce, and YARN should take this tutorial! If you need help processing vast numbers of records and want to understand distributed computing, this course is for you.

What prerequisites are there to this course?

If you know how to write Java code and you know what processing you want to perform on your huge dataset then you should be good to go in this course. No prior experience with Hadoop is required.

About the author

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

Ready to upskill? Get started