Getting Started with Hive for Relational Database Developers

Traditional databases focus on transactional processing, whereas Hive helps with analytical processing extracted from huge datasets. This course focuses on the similarities and differences between SQL and Hive.
Course info
Rating
(67)
Level
Beginner
Updated
Feb 24, 2017
Duration
2h 56m
Table of contents
Course Overview
Hive vs. RDBMS
Getting Started with Basic Queries in Hive
Creating Databases and Tables
Using Complex Data Types and Table Generating Functions
Understanding Constraints in Subqueries and Views
Designing Schema for Hive
Description
Course info
Rating
(67)
Level
Beginner
Updated
Feb 24, 2017
Duration
2h 56m
Description

Transactional processing focuses on accessing and updating individual records. Analytical processing works on data in bulk and deals more with summaries across the dataset, trends and insights. The difference in requirements and the kind of data they work on, lead to differences between Hive and traditional databases. This course, Getting Started with Hive for Relational Database Developers, teaches you about several gotchas involved while using familiar SQL constructs in Hive. You'll learn about loading and parsing data from files, views, subqueries, and some cool built-in functionality such as table generating functions. The course also demonstrates the constraints imposed by Hive architecture choices such as schema on read, denormalized storage in HDFS, and high latency of operations. This serves as a guide for user choices during storage and querying. By the end of this course, you'll feel confident in using Hive for your own relational database uses.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
Using PyTorch in the Cloud: PyTorch Playbook
Intermediate
2h 21m
Apr 25, 2019
Building Clustering Models with scikit-learn
Intermediate
2h 33m
Apr 24, 2019
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi, my name is Janani Ravi and welcome to this introductory course on Hive. Let me introduce myself first. I have a master's degree in electrical engineering from Stanford and have worked with companies such as Microsoft, Google, and Flipkart. At Google I was one of the first engineers working on real time collaborative editing in Google Docs and I hold four patents for its underlying technologies. I currently work on my own startup, Loonycorn, a studio for high quality video content. Traditionally the pieces are usually used for transactional processing. This involves accessing and updating individual records in real time. Updates have to reflect in the database right away and updates have to be made in an ACID compliant manner. Analytical processing on the other hand involves huge datasets, summarizing and extracting insights from this data, and calculating frames. Analytical processing is usually carried out using a data warehouse. Hive is an open source data warehouse which runs on top of Hadoop. Hadoop is probably familiar to you all, a very widely used distributed computing framework. This course focuses on the similarities and the differences between SQL and Hive with an emphasis on understanding what happens behind the scenes when a Hive query runs. This course makes the user aware of several gotchas involved by using familiar SQL constructs in Hive. Loading and parsing data from files, views, sub queries, and some cool built-in functionalities such as table generating functions. All of this is covered. This course also demonstrates the constraints imposed by Hive architecture choices, such as schema on read, denormalized storage in HDFS, high latency of operations, et cetera. This serves as a guide for the choices a user makes when storing and querying data.