Big Data is a natural evolution of data analysis, scaling beyond the limits of conventional databases. However, they're still an important part of a Hadoop cluster. Learn how to setup databases for Cloudera CDH and install a production grade cluster.
Big Data is a natural evolution of data analysis, scaling beyond the limits of conventional databases. However, this does not mean that databases are dead. On the contrary, they are still an important part of a Hadoop cluster and used to store all kinds of information by multiple services. In this course, Preparing a Production Hadoop Cluster with Cloudera: Databases, you'll learn how to setup databases for Cloudera CDH and install a production grade cluster using Cloudera's Installation Path B. First, you'll discover how to select, initialize, and install a supported database. Next, you'll explore how to configure a database with Cloudera's recommended settings, and how to create databases with CDH services. Finally, you'll learn how to complete a CDH deployment. By the end of this course, you'll be able to deploy a production grade cluster.
Xavier is very passionate about teaching, helping others understand search and Big Data. He is also an entrepreneur, project manager, technical author, trainer, and holds a few certifications with Cloudera, Microsoft, and the Scrum Alliance, along with being a Microsoft MVP.
Course Overview Hi everyone. My name is Xavier Morera, and welcome to my course, Preparing Databases for Your Production Hadoop Cloudera Cluster. I have been working with search and big data for many years, and now I have the pleasure of being part of your journey into the Hadoop ecosystem with Cloudera. Did you know that many years ago we used databases to analyze our data, finding trends, and discovering insights? But as the size of the data grew, at some point databases could not keep up, and so we entered a big data world, primarily with Hadoop, scaling where no database could scale before. In this course, we're going to learn how to set up the required databases for deploying a Hadoop cluster using Cloudera's distribution, CDH. Some of the topics that we will cover include how to select, install, and initialize a supported database, configure a database with Cloudera's recommended settings, create databases for CDH services, and finally complete a CDH deployment following Cloudera's Path B installation method. By the end of this course, you will know the steps required to deploy a production grade cluster using an external database. And before beginning this course, it is recommended that you are familiar with the basics of Hadoop and Cloudera. But don't worry if you are not. You can still follow along and learn the process. I hope you'll join me on this journey to learn about databases in Hadoop with the Preparing Databases for Your Production Hadoop Cloudera Cluster course at Pluralsight.
Setting up a Production Database for Your Hadoop Cluster Welcome to module 2, Setting up a Production Database for Your Hadoop Cluster with Cloudera. In this module, we will take our first step, selecting which database you will use for your cluster. This step is pretty important as there could be a few implications that you have to take into account because they can affect you in the long run. Thus, better take your time to decide upfront which database suits your organization best. Let's talk about the process of selecting the right database for you.
Configuring Your MySQL Database for Cloudera Manager (Path B) Welcome to module 3, Configuring Your MySQL Database for Cloudera Manager, namely what's called Path B installation. Up until now we have taken the first two steps, initially selecting which one of the Cloudera supported databases we will use for our cluster, our pick being MySQL. And then we continued our journey by installing the database server with the help of Yum. However, we left all of the default installation settings. In this module, we will take the next steps required, starting by configuring the database server with the recommended engine and setting the correct startup options and variables required by Cloudera Manager and CDH services.
Preparing Your Databases and Deploying CDH Welcome to module 4, Preparing Your Databases and Deploying CDH Services. In the previous modules, we selected which database we wanted to use, namely MySQL. We installed it and configured it with the necessary options and parameters so that we could use the database for deploying CDH. In this module, we will take the three final steps that are related to the database to set up our cluster for production. We will start by preparing Cloudera Manager for an external database, then create the databases for the Hadoop services, and then finish the CDH deployment. However, let me mention something that is very relevant to what we will be doing in this module. The main topic of this course revolves around preparing your databases for deploying your cluster, and all this is great. With an embedded database, you will not be able to scale your cluster. However, there is a secondary objective of this course, which involves all the steps, besides databases, for completing a Path B installation. So in this module, we will also be completing the necessary perquisites for your cluster setup, installing the JDK Cloudera Manager Server and Cloudera Manager daemons. We will be completing a CDH deployment including selecting which services we would like to deploy. And finally, we will learn about a very useful tool to work with databases in a Hadoop cluster, the Hue DB Query app. Let's begin with the prerequisites.
Preparing Your Database for High Availability Welcome to module 5, Preparing Your Databases for High Availability. So far, we have gone through all the steps necessary to install and configure everything related to databases for a Hadoop production cluster deployed with Cloudera Manager. Once we deploy our cluster, we configure the DB Query app from Hue, the Hadoop User Experience, to be able to work with our cluster's databases. And now let's go one step further by learning how to configure our database for replication, which is a required step to achieve high availability, also known as HA.