Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.
  • Labs icon Lab
  • A Cloud Guru
Google Cloud Platform icon
Labs

Deploy and Configure a Single-Node Hadoop Cluster

Many cloud platforms and third-party service providers offer Hadoop as a service or VM/container image. This lowers the barrier of entry for those wishing to get started with Hadoop. In this hands-on lab, you will have the opportunity to deploy a single-node Hadoop cluster in a pseudo-distributed configuration. Doing so demonstrates the deployment and configuration of each individual component of Hadoop that will get you ready for when you want to start working with a multi-node cluster to separate and cluster Hadoop services. In this learning activity, you will be performing the following: * Installing Java * Deploying Hadoop from an archive file * Configuring Hadoop's `JAVA_HOME` * Configuring the default filesystem for Hadoop * Configuring HDFS replication * Setting up passwordless SSH * Formatting the Hadoop Distributed File System (HDFS) * Starting Hadoop * Creating files and directories in Hadoop * Examining a text file with a MapReduce job

Google Cloud Platform icon
Labs

Path Info

Level
Clock icon Beginner
Duration
Clock icon 2h 0m
Published
Clock icon Jan 14, 2019

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Install Java

    Log into Node 1 as cloud_user and install the java-19-amazon-corretto-devel package:

    sudo yum -y install java-19-amazon-corretto-devel
    
  2. Challenge

    Deploy Hadoop

    From the cloud_user home directory, download Hadoop-2.9.2 from your desired mirror. You can view a list of mirrors here:

    curl -O http://mirrors.gigenet.com/apache/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
    

    Unpack the archive in place:

    tar -xzf hadoop-3.3.4.tar.gz
    

    Delete the archive file:

    rm hadoop-3.3.4.tar.gz
    

    Rename the installation directory:

    mv hadoop-3.3.4/ hadoop/
    
  3. Challenge

    Configure java_home

    From /home/cloud_user/hadoop, set JAVA_HOME in etc/hadoop/hadoop-env.sh by changing the following line:

    export JAVA_HOME=${JAVA_HOME}
    

    Change it to this:

    export JAVA_HOME=/usr/lib/jvm/java-19-amazon-corretto/
    

    Save and close the file.

  4. Challenge

    Configure Core Hadoop

    Set the default filesystem to hdfs on localhost in /home/cloud_user/hadoop/etc/hadoop/core-site.xml by changing the following lines:

    <configuration>
    </configuration>
    

    Change them to this:

    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
      </property>
    </configuration>
    

    Save and close the file.

  5. Challenge

    Configure HDFS

    Set the default block replication to 1 in /home/cloud_user/hadoop/etc/hadoop/hdfs-site.xml by changing the following lines:

    <configuration>
    </configuration>
    

    Change them to this:

    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
    </configuration>
    

    Save and close the file.

  6. Challenge

    Set Up Passwordless SSH Access to localhost

    As cloud_user, generate a public/private RSA key pair with:

    ssh-keygen
    

    The default option for each prompt will suffice.

    Add your newly generated public key to your authorized keys list with:

    cat ~/.ssh/id_rsa.pub >>  ~/.ssh/authorized_keys
    
  7. Challenge

    Format the Filesystem

    From /home/cloud_user/hadoop/, format the DFS with:

    bin/hdfs namenode -format
    
  8. Challenge

    Start Hadoop

    Start the NameNode and DataNode daemons from /home/cloud_user/hadoop with:

    sbin/start-dfs.sh
    
  9. Challenge

    Download and Copy the Latin Text to Hadoop

    From /home/cloud_user/hadoop, download the latin.txt file with:

    curl -O https://raw.githubusercontent.com/linuxacademy/content-hadoop-quick-start/master/latin.txt
    

    From /home/cloud_user/hadoop, create the /user and /user/root directories in Hadoop with:

    bin/hdfs dfs -mkdir -p /user/cloud_user
    

    From /home/cloud_user/hadoop/, copy the latin.txt file to Hadoop at /user/cloud_user/latin with:

    bin/hdfs dfs -put latin.txt latin
    
  10. Challenge

    Examine the latin.txt Text with MapReduce

    From /home/cloud_user/hadoop/, use the hadoop-mapreduce-examples-*.jar to calculate the average length of the words in the /user/cloud_user/latin file and save the job output to /user/cloud_user/latin_wordmean_output in Hadoop with:

    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordmean latin latin_wordmean_output
    

    From /home/cloud_user/hadoop/, examine your wordmean job output files with:

    bin/hdfs dfs -cat latin_wordmean_output/*
    

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Start learning by doing today

View Plans