SQL on Hadoop - Analyzing Big Data with Hive
Course info



Course info



Description
From developer to analyst, this Hive SQL course tackles a few big questions about big data:
- Why does this technology exist and why do I need it?
- How can I get the best out of it utilizing something familiar like SQL?
- How does this all fit together in an ever-evolving eco-system?
The course presents some challenges you might experience solving real production problems and how Apache Hive makes that task easier to accomplish.
Course FAQ
Hadoop is a software framework for storing and processing large sets of data across clusters of hardware. It has large storage for all kinds of data, incredible processing power, and it can handle a seemingly infinite amount of tasks at the same time.
Hive is a data warehouse software project built on top of Hadoop which provides data query and analysis. SQL is a programming language for working with large sets of data in relational databases. While they both query and program big data, Hive handles complicated data more effectively than SQL.
This course will introduce you to Hadoop and the Hive query language. Some of the topics covered include:
- The concepts of distributed computing
- What is MapReduce
- Creating databases and tables with HiveQL
- Multi inserts and dynamic partition inserts
- Bucket and block sampling
- Storage and the ecosystem
- Much more
This course is great for anyone who wants to learn Hadoop, Hive, and the Hive query language (HiveQL). If you want to be able to solve common Big Data problems, then this is perfect for you.
This is an intermediate level course, so it does assume some prior knowledge of working with Big Data and query languages like SQL. However, no prior knowledge of Hadoop or Hive is expected.