Getting Started with HDFS

Learning to work with Hadoop Distributed File System (HDFS) is a baseline skill for anyone administering or developing in the Hadoop ecosystem. In this course, you will learn how to work with HDFS, Hive, Pig, Sqoop and HBase from the command line.
Course info
Rating
(109)
Level
Beginner
Updated
Feb 16, 2016
Duration
2h 48m
Table of contents
Understanding HDFS
Creating, Manipulating, and Retrieving HDFS Files
Transferring Relational Data to HDFS Using Sqoop
Querying Data with Pig and Hive
Processing Sparse Data with HBase
Automating Basic HDFS Operations
Description
Course info
Rating
(109)
Level
Beginner
Updated
Feb 16, 2016
Duration
2h 48m
Description

Getting Started with Hadoop Distributed File System (HDFS) is designed to give you everything you need to learn about how to use HDFS to read, store, and remove files. In addition to working with files in Hadoop, you will learn how to take data from relational databases and import it into HDFS using Sqoop. After we have our data inside HDFS, we will learn how to use Pig and Hive to query that data. Building on our HDFS skills, we will look at how use HBase for near real-time data processing. Whether you are a developer, administrator, or data analyst, the concepts in this course are essential to getting started with HDFS.

About the author
About the author

Thomas is a Senior Software Engineer and Certified ScrumMaster. He spends most of his time working with the Hortonwork Data Platform and Agile Coaching.

More from the author
Analyzing Machine Data with Splunk
Beginner
2h 38m
4 Nov 2016
Pig Latin: Getting Started
Beginner
1h 56m
1 May 2015