Creating Your First Big Data Hadoop Cluster Using Cloudera CDH
Data by itself has no meaning, it is what you do with it that counts. In this course, you'll fast track to Hadoop & Big Data with the Cloudera QuickStart VM and then you'll learn how to set up a Hadoop cluster with Cloudera CDH.
What you'll learn
"Ask Bigger Questions" is Cloudera's vision. You may not be familiar with this phrase, but you're likely familiar with "Knowledge is Power". To get knowledge you need to analyze and understand huge amounts of structured and unstructured data - Big Data. In this course, Creating Your First Big Data Hadoop Cluster Using Cloudera CDH, you'll get started on Big Data with Cloudera, taking your first steps with Hadoop using a pseudo cluster and then moving on to set up our own cluster using CDH, which stands for Cloudera's Distribution including Hadoop. First, you'll explore the case for Hadoop, Big Data, and Cloudera. Next, you'll learn about the fast track to Big Data with Cloudera's QuickStart VM and you'll also learn how to create a visualization environment with VirtualBox. Then, you'll discover how to create a Linux clean cluster with CentOS. Finally, you'll follow the steps to install and configure a cluster with the help of Cloudera Manager. By the end of this course, you'll have a Hadoop cluster, and you'll be ready to start your journey to Big Data.
Table of contents
- Getting Started with the Cloudera QuickStart VM 2m
- Getting a Virtualization Environment with Demo 3m
- Getting the Cloudera QuickStart VM with Demo 3m
- A Quick Tour Around CDH 3m
- Cloudera Manager: What Is It and How to Start It with Demo 3m
- Demo: Load a StackExchange Site into HDFS and Ask a Few Questions 8m
- What’s Next? A Real Cluster! Important Considerations to Take 2m
- Requirements and Supported Versions 5m
- Takeaway 0m
Hadoop clusters are collections of computers, known as nodes, that are networked together to perform these kinds of parallel computations on big data sets. Hadoop clusters consist of a network of connected master and slave nodes that utilize high availability, low-cost commodity hardware.
Cloudera is a software company that provides an enterprise data cloud accessible via a subscription. Cloudera is built on open source technology that uses analytics and machine learning to yeild insights from data through a secure connection.
To complete this course, you will need the Cloudera Quickstart VM and Cloudera CDH software.
A data cluster is a sub-group of data which shares similar characteristics and is significantly different to other clusters in a database, usually defined by the statistical technique of cluster analysis.
In this course, you will learn about big data and how to create data clusters. You will also learn how to create a visualization environment with VirtualBox. Finally, you'll discover how to create a Linux clean cluster with CentOS. By the end of this course you will have a Hadooop cluster, and you'll be ready to embark in big data.