Lab
A Cloud Guru

Deploy and Configure a Single-Node Hadoop Cluster

Many cloud platforms and third-party service providers offer Hadoop as a service or VM/container image. This lowers the barrier of entry for those wishing to get started with Hadoop. In this hands-on lab, you will have the opportunity to deploy a single-node Hadoop cluster in a pseudo-distributed configuration. Doing so demonstrates the deployment and configuration of each individual component of Hadoop that will get you ready for when you want to start working with a multi-node cluster to separate and cluster Hadoop services. In this learning activity, you will be performing the following: * Installing Java * Deploying Hadoop from an archive file * Configuring Hadoop's `JAVA_HOME` * Configuring the default filesystem for Hadoop * Configuring HDFS replication * Setting up passwordless SSH * Formatting the Hadoop Distributed File System (HDFS) * Starting Hadoop * Creating files and directories in Hadoop * Examining a text file with a MapReduce job

Try for free Contact sales

Path Info

Level

Beginner

Duration

2h 0m

Published

Jan 14, 2019

Challenge

Install Java
Log into Node 1 as cloud_user and install the java-19-amazon-corretto-devel package:
```
sudo yum -y install java-19-amazon-corretto-devel
```
Challenge

Deploy Hadoop
From the cloud_user home directory, download Hadoop-3.3.5 from your desired mirror. You can view a list of mirrors here:
```
curl -O  https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz
```
Unpack the archive in place:
```
tar -xzf hadoop-3.3.5.tar.gz
```
Delete the archive file:
```
rm hadoop-3.3.5.tar.gz
```
Rename the installation directory:
```
mv hadoop-3.3.5/ hadoop/
```
Challenge

Configure java_home
From /home/cloud_user/hadoop, set JAVA_HOME in etc/hadoop/hadoop-env.sh by changing the following line:
```
export JAVA_HOME=${JAVA_HOME}
```
Change it to this:
```
export JAVA_HOME=/usr/lib/jvm/java-19-amazon-corretto/
```
Save and close the file.
Challenge

Configure Core Hadoop
Set the default filesystem to hdfs on localhost in /home/cloud_user/hadoop/etc/hadoop/core-site.xml by changing the following lines:
```
<configuration>
</configuration>
```
Change them to this:
```
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>
```
Save and close the file.
Challenge

Configure HDFS
Set the default block replication to 1 in /home/cloud_user/hadoop/etc/hadoop/hdfs-site.xml by changing the following lines:
```
<configuration>
</configuration>
```
Change them to this:
```
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>
```
Save and close the file.
Challenge

Set Up Passwordless SSH Access to localhost
As cloud_user, generate a public/private RSA key pair with:
```
ssh-keygen
```
The default option for each prompt will suffice.

Add your newly generated public key to your authorized keys list with:
```
cat ~/.ssh/id_rsa.pub >>  ~/.ssh/authorized_keys
```
Challenge

Format the Filesystem
From /home/cloud_user/hadoop/, format the DFS with:
```
bin/hdfs namenode -format
```
Challenge

Start Hadoop
Start the NameNode and DataNode daemons from /home/cloud_user/hadoop with:
```
sbin/start-dfs.sh
```
Challenge

Download and Copy the Latin Text to Hadoop
From /home/cloud_user/hadoop, download the latin.txt file with:
```
curl -O https://raw.githubusercontent.com/linuxacademy/content-hadoop-quick-start/master/latin.txt
```
From /home/cloud_user/hadoop, create the /user and /user/root directories in Hadoop with:
```
bin/hdfs dfs -mkdir -p /user/cloud_user
```
From /home/cloud_user/hadoop/, copy the latin.txt file to Hadoop at /user/cloud_user/latin with:
```
bin/hdfs dfs -put latin.txt latin
```
Challenge

Examine the latin.txt Text with MapReduce
From /home/cloud_user/hadoop/, use the hadoop-mapreduce-examples-*.jar to calculate the average length of the words in the /user/cloud_user/latin file and save the job output to /user/cloud_user/latin_wordmean_output in Hadoop with:
```
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar wordmean latin latin_wordmean_output
```
From /home/cloud_user/hadoop/, examine your wordmean job output files with:
```
bin/hdfs dfs -cat latin_wordmean_output/*
```

Author

A Cloud Guru

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Start learning by doing today

View Plans

Deploy and Configure a Single-Node Hadoop Cluster

Path Info

Table of Contents

Install Java

Deploy Hadoop

Configure java_home

Configure Core Hadoop

Configure HDFS

Set Up Passwordless SSH Access to localhost

Format the Filesystem

Start Hadoop

Download and Copy the Latin Text to Hadoop

Examine the latin.txt Text with MapReduce

What's a lab?

Provided environment for hands-on practice

Guided walkthrough

Did you know?

Start learning by doing today