Course

Skills

The Building Blocks of Hadoop - HDFS, MapReduce, and YARN

by Janani Ravi

Processing billions of records requires a deep understanding of distributed computing. In this course, you'll get introduced to Hadoop, an open-source distributed computing framework that can help you do just that.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(445)

Level

Beginner

Updated

Jun 20, 2024

Duration

2h 18m

What you'll learn

You know how to write Java code and you know what processing you want to perform on your huge dataset. But, can you use the Hadoop distributed framework effectively to get your work done?

This course, The Building Blocks of Hadoop HDFS, MapReduce, and YARN, gives you a fundamental understanding of the building blocks of Hadoop:

HDFS for storage
MapReduce for processing
YARN for cluster management

to help you bridge the gap between programming and big data analysis.

First, you'll get a complete architecture overview for Hadoop.
Next, you'll learn how to set up a pseudo-distributed Hadoop environment and submit and monitor tasks on that environment.
Finally, you'll understand the configuration choices you can make for stability, reliability optimized task scheduling on your distributed system.

By the end of this Hadoop tutorial you'll have gained a strong understanding of the building blocks needed in order for you to use Hadoop effectively.

Course Overview

1min

Course Overview 2m

Introducing Hadoop

20mins

Installing Hadoop

33mins

Hadoop Install Modes 5m
Installing Hadoop in Standalone Mode 7m
Pseudo-distributed Mode: Setting up SSH 5m
Pseudo-distributed Mode: The JAVA_HOME Environment Variable 3m
Pseudo-distributed Mode: Configuration Settings 4m
Pseudo-distributed Mode: Starting HDFS and YARN 5m
Psuedo-distributed Mode: Monitoring the Cluster 5m

Storing Data with HDFS

34mins

The Name Node and Data Nodes 5m
Storing and Reading Files from HDFS 5m
Introduction to HDFS Commands 5m
Copying Files to and from Hadoop 5m
Fault Tolerance with Replication 7m
Name Node Failure Management 8m

Processing Data with MapReduce

26mins

The Map and Reduce Phases to Process Data 5m
Data Flow in a MapReduce 5m
Implement MapReduce in Java 4m
Set up the Map, Reduce, and Main Classes 6m
Submit a Jar to Hadoop 4m
Monitor the Mapreduce Job Using the Web Interface 3m

Scheduling and Managing Tasks with YARN

22mins

Anatomy of a Job Run in YARN 6m
The First in First out Scheduler 4m
The Capacity Scheduler 4m
The Fair Scheduler 3m
Running Jobs on a Specific Queue 6m

Course FAQ

What is HDFS?

HDFS is the Hadoop Distributed File System, the primary data storage system used by Hadoop applications to scale a single Apache Hadoop cluster to hundreds of nodes.

What is MapReduce?

MapReduce is a framework and java-based programming model used for processing large amounts of data. The map procedure filters and sorts the data, and the reduce method performs a summary operation.

What is Hadoop YARN?

YARN stands for Yet Another Resource Negotiator. It is a large-scale, distributed operating system for big data apps that allows the data in HDFS to be processed and run by data processing engines.

What will I learn in this Hadoop tutorial?

This course will introduce you to Hadoop and its basic building blocks. Topics covered include:

An introduction to Hadoop
Installing Hadoop
Storing data with HDFS
Processing data with MapReduce
Scheduling and managing tasks with YARN
Much more

Who should take this course?

Anyone who wants to learn Hadoop and its building blocks of HDFS, MapReduce, and YARN should take this tutorial! If you need help processing vast numbers of records and want to understand distributed computing, this course is for you.

What prerequisites are there to this course?

If you know how to write Java code and you know what processing you want to perform on your huge dataset then you should be good to go in this course. No prior experience with Hadoop is required.

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(445)

Level

Beginner

Updated

Jun 20, 2024

Duration

2h 18m

Ready to upskill? Get started

Contact Sales

The Building Blocks of Hadoop - HDFS, MapReduce, and YARN

What you'll learn

Table of contents

Course FAQ

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

The Building Blocks of Hadoop - HDFS, MapReduce, and YARN

What you'll learn

Table of contents

Course FAQ

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?