- Lab
-
Libraries: If you want this lab, consider one of these libraries.
- Cloud
Deploy and Configure a Single-Node Hadoop Cluster
Many cloud platforms and third-party service providers offer Hadoop as a service or VM/container image. This lowers the barrier of entry for those wishing to get started with Hadoop. In this hands-on lab, you will have the opportunity to deploy a single-node Hadoop cluster in a pseudo-distributed configuration. Doing so demonstrates the deployment and configuration of each individual component of Hadoop that will get you ready for when you want to start working with a multi-node cluster to separate and cluster Hadoop services. In this learning activity, you will be performing the following: * Installing Java * Deploying Hadoop from an archive file * Configuring Hadoop's `JAVA_HOME` * Configuring the default filesystem for Hadoop * Configuring HDFS replication * Setting up passwordless SSH * Formatting the Hadoop Distributed File System (HDFS) * Starting Hadoop * Creating files and directories in Hadoop * Examining a text file with a MapReduce job
Lab Info
Table of Contents
-
Challenge
Install Java
Log into Node 1 as
cloud_userand install thejava-19-amazon-corretto-develpackage:sudo yum -y install java-19-amazon-corretto-devel -
Challenge
Deploy Hadoop
From the
cloud_userhome directory, download Hadoop-3.3.5 from your desired mirror. You can view a list of mirrors here:curl -O https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gzUnpack the archive in place:
tar -xzf hadoop-3.3.5.tar.gzDelete the archive file:
rm hadoop-3.3.5.tar.gzRename the installation directory:
mv hadoop-3.3.5/ hadoop/ -
Challenge
Configure java_home
From
/home/cloud_user/hadoop, setJAVA_HOMEinetc/hadoop/hadoop-env.shby changing the following line:export JAVA_HOME=${JAVA_HOME}Change it to this:
export JAVA_HOME=/usr/lib/jvm/java-19-amazon-corretto/Save and close the file.
-
Challenge
Configure Core Hadoop
Set the default filesystem to
hdfsonlocalhostin/home/cloud_user/hadoop/etc/hadoop/core-site.xmlby changing the following lines:<configuration> </configuration>Change them to this:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>Save and close the file.
-
Challenge
Configure HDFS
Set the default block replication to
1in/home/cloud_user/hadoop/etc/hadoop/hdfs-site.xmlby changing the following lines:<configuration> </configuration>Change them to this:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>Save and close the file.
-
Challenge
Set Up Passwordless SSH Access to localhost
As
cloud_user, generate a public/private RSA key pair with:ssh-keygenThe default option for each prompt will suffice.
Add your newly generated public key to your authorized keys list with:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys -
Challenge
Format the Filesystem
From
/home/cloud_user/hadoop/, format the DFS with:bin/hdfs namenode -format -
Challenge
Start Hadoop
Start the
NameNodeandDataNodedaemons from/home/cloud_user/hadoopwith:sbin/start-dfs.sh -
Challenge
Download and Copy the Latin Text to Hadoop
From
/home/cloud_user/hadoop, download thelatin.txtfile with:curl -O https://raw.githubusercontent.com/linuxacademy/content-hadoop-quick-start/master/latin.txtFrom
/home/cloud_user/hadoop, create the/userand/user/rootdirectories in Hadoop with:bin/hdfs dfs -mkdir -p /user/cloud_userFrom
/home/cloud_user/hadoop/, copy thelatin.txtfile to Hadoop at/user/cloud_user/latinwith:bin/hdfs dfs -put latin.txt latin -
Challenge
Examine the latin.txt Text with MapReduce
From
/home/cloud_user/hadoop/, use thehadoop-mapreduce-examples-*.jarto calculate the average length of the words in the/user/cloud_user/latinfile and save the job output to/user/cloud_user/latin_wordmean_outputin Hadoop with:bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar wordmean latin latin_wordmean_outputFrom
/home/cloud_user/hadoop/, examine your wordmean job output files with:bin/hdfs dfs -cat latin_wordmean_output/*
About the author
Real skill practice before real-world application
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Learn by doing
Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.
Follow your guide
All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.
Turn time into mastery
On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.