Simple play icon Course
Skills

SQL on Hadoop - Analyzing Big Data with Hive

by Ahmad Alkilani

This course will teach you the Hive query language and how to apply it to solve common Big Data problems. This includes an introduction to distributed computing, Hadoop, and MapReduce fundamentals and the latest features released with Hive 0.11

What you'll learn

From developer to analyst, this Hive SQL course tackles a few big questions about big data:

  • Why does this technology exist and why do I need it?
  • How can I get the best out of it utilizing something familiar like SQL?
  • How does this all fit together in an ever-evolving eco-system?
This course will introduce the concepts of distributed computing, Hadoop and MapReduce and then goes into great detail into Apache Hive which is an SQL-like query language that can be used with Hadoop and NoSQL databases like HBase and Cassandra.

The course presents some challenges you might experience solving real production problems and how Apache Hive makes that task easier to accomplish.

Table of contents

Course FAQ

What is Hadoop?

Hadoop is a software framework for storing and processing large sets of data across clusters of hardware. It has large storage for all kinds of data, incredible processing power, and it can handle a seemingly infinite amount of tasks at the same time.

What is the difference between Hive and SQL?

Hive is a data warehouse software project built on top of Hadoop which provides data query and analysis. SQL is a programming language for working with large sets of data in relational databases. While they both query and program big data, Hive handles complicated data more effectively than SQL.

What will I learn in this course?

This course will introduce you to Hadoop and the Hive query language. Some of the topics covered include:

  • The concepts of distributed computing
  • What is MapReduce
  • Creating databases and tables with HiveQL
  • Multi inserts and dynamic partition inserts
  • Bucket and block sampling
  • Storage and the ecosystem
  • Much more
Who is this course for?

This course is great for anyone who wants to learn Hadoop, Hive, and the Hive query language (HiveQL). If you want to be able to solve common Big Data problems, then this is perfect for you.

Are there prerequisites to this course?

This is an intermediate level course, so it does assume some prior knowledge of working with Big Data and query languages like SQL. However, no prior knowledge of Hadoop or Hive is expected.

About the author

Ahmad Alkilani is a Data Architect specializing in the implementation of high-performance compute platforms, data warehouses and BI systems. Author of ForestFlow, an LFAI policy-based machine learning model server. Ahmad enjoys over 16 years of broad IT experience from traditional ODBMS to large-scale big data systems and No-SQL databases. He enjoys speaking at various user groups and national conferences. When not tinkering with new code or consulting on projects, Ahmad takes pleasure in spen... more

Ready to upskill? Get started