- Lab
- A Cloud Guru
Deploy and Configure a Single-Node Hadoop Cluster
Many cloud platforms and third-party service providers offer Hadoop as a service or VM/container image. This lowers the barrier of entry for those wishing to get started with Hadoop. In this hands-on lab, you will have the opportunity to deploy a single-node Hadoop cluster in a pseudo-distributed configuration. Doing so demonstrates the deployment and configuration of each individual component of Hadoop that will get you ready for when you want to start working with a multi-node cluster to separate and cluster Hadoop services. In this learning activity, you will be performing the following: * Installing Java * Deploying Hadoop from an archive file * Configuring Hadoop's `JAVA_HOME` * Configuring the default filesystem for Hadoop * Configuring HDFS replication * Setting up passwordless SSH * Formatting the Hadoop Distributed File System (HDFS) * Starting Hadoop * Creating files and directories in Hadoop * Examining a text file with a MapReduce job
Path Info
Table of Contents
-
Challenge
Install Java
Log into Node 1 as
cloud_user
and install thejava-19-amazon-corretto-devel
package:sudo yum -y install java-19-amazon-corretto-devel
-
Challenge
Deploy Hadoop
From the
cloud_user
home directory, download Hadoop-2.9.2 from your desired mirror. You can view a list of mirrors here:curl -O http://mirrors.gigenet.com/apache/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
Unpack the archive in place:
tar -xzf hadoop-3.3.4.tar.gz
Delete the archive file:
rm hadoop-3.3.4.tar.gz
Rename the installation directory:
mv hadoop-3.3.4/ hadoop/
-
Challenge
Configure java_home
From
/home/cloud_user/hadoop
, setJAVA_HOME
inetc/hadoop/hadoop-env.sh
by changing the following line:export JAVA_HOME=${JAVA_HOME}
Change it to this:
export JAVA_HOME=/usr/lib/jvm/java-19-amazon-corretto/
Save and close the file.
-
Challenge
Configure Core Hadoop
Set the default filesystem to
hdfs
onlocalhost
in/home/cloud_user/hadoop/etc/hadoop/core-site.xml
by changing the following lines:<configuration> </configuration>
Change them to this:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
Save and close the file.
-
Challenge
Configure HDFS
Set the default block replication to
1
in/home/cloud_user/hadoop/etc/hadoop/hdfs-site.xml
by changing the following lines:<configuration> </configuration>
Change them to this:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
Save and close the file.
-
Challenge
Set Up Passwordless SSH Access to localhost
As
cloud_user
, generate a public/private RSA key pair with:ssh-keygen
The default option for each prompt will suffice.
Add your newly generated public key to your authorized keys list with:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
-
Challenge
Format the Filesystem
From
/home/cloud_user/hadoop/
, format the DFS with:bin/hdfs namenode -format
-
Challenge
Start Hadoop
Start the
NameNode
andDataNode
daemons from/home/cloud_user/hadoop
with:sbin/start-dfs.sh
-
Challenge
Download and Copy the Latin Text to Hadoop
From
/home/cloud_user/hadoop
, download thelatin.txt
file with:curl -O https://raw.githubusercontent.com/linuxacademy/content-hadoop-quick-start/master/latin.txt
From
/home/cloud_user/hadoop
, create the/user
and/user/root
directories in Hadoop with:bin/hdfs dfs -mkdir -p /user/cloud_user
From
/home/cloud_user/hadoop/
, copy thelatin.txt
file to Hadoop at/user/cloud_user/latin
with:bin/hdfs dfs -put latin.txt latin
-
Challenge
Examine the latin.txt Text with MapReduce
From
/home/cloud_user/hadoop/
, use thehadoop-mapreduce-examples-*.jar
to calculate the average length of the words in the/user/cloud_user/latin
file and save the job output to/user/cloud_user/latin_wordmean_output
in Hadoop with:bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordmean latin latin_wordmean_output
From
/home/cloud_user/hadoop/
, examine your wordmean job output files with:bin/hdfs dfs -cat latin_wordmean_output/*
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.