SQL on Hadoop - Analyzing Big Data with Hive

This course will teach you the Hive query language and how to apply it to solve common Big Data problems. This includes an introduction to distributed computing, Hadoop, and MapReduce fundamentals and the latest features released with Hive 0.11
Course info
Rating
(570)
Level
Intermediate
Updated
Oct 8, 2013
Duration
4h 16m
Table of contents
Introduction to Hadoop
Introduction to Hive
Hive Query Language
Advanced HiveQL
Storage and The Eco-System
Description
Course info
Rating
(570)
Level
Intermediate
Updated
Oct 8, 2013
Duration
4h 16m
Description

From developer to analyst, this Hive SQL course tackles a few big questions about big data:

  • Why does this technology exist and why do I need it?
  • How can I get the best out of it utilizing something familiar like SQL?
  • How does this all fit together in an ever-evolving eco-system?
This course will introduce the concepts of distributed computing, Hadoop and MapReduce and then goes into great detail into Apache Hive which is an SQL-like query language that can be used with Hadoop and NoSQL databases like HBase and Cassandra.

The course presents some challenges you might experience solving real production problems and how Apache Hive makes that task easier to accomplish.

Course FAQ
Course FAQ
What is Hadoop?

Hadoop is a software framework for storing and processing large sets of data across clusters of hardware. It has large storage for all kinds of data, incredible processing power, and it can handle a seemingly infinite amount of tasks at the same time.

What is the difference between Hive and SQL?

Hive is a data warehouse software project built on top of Hadoop which provides data query and analysis. SQL is a programming language for working with large sets of data in relational databases. While they both query and program big data, Hive handles complicated data more effectively than SQL.

What will I learn in this course?

This course will introduce you to Hadoop and the Hive query language. Some of the topics covered include:

  • The concepts of distributed computing
  • What is MapReduce
  • Creating databases and tables with HiveQL
  • Multi inserts and dynamic partition inserts
  • Bucket and block sampling
  • Storage and the ecosystem
  • Much more
Who is this course for?

This course is great for anyone who wants to learn Hadoop, Hive, and the Hive query language (HiveQL). If you want to be able to solve common Big Data problems, then this is perfect for you.

Are there prerequisites to this course?

This is an intermediate level course, so it does assume some prior knowledge of working with Big Data and query languages like SQL. However, no prior knowledge of Hadoop or Hive is expected.

About the author
About the author

Ahmad is a Data Architect specializing in the implementation of high-performance data warehouses and BI systems and enjoys speaking at various user groups and conferences.

More from the author