Course

Skills Expanded

Data Transformations with Apache Pig

Pig is an open source engine for executing parallelized data transformations which run on Hadoop. This course shows you how Pig can help you work on incomplete data with an inconsistent schema, or perhaps no schema at all.

Preview this course

What you'll learn

Pig is an open source software which is part of the Hadoop eco-system of technologies. Pig is great at working with data which are beyond traditional data warehouses. It can deal well with missing, incomplete, and inconsistent data having no schema. In this course, Data Transformations with Apache Pig, you'll learn about data transformations with Apache. First, you'll start with the very basics which will show you how to get Pig installed and get started working with the Grunt shell. Next, you'll discover how to load data into relations in Pig and store transformed results to files via load and store commands. Then, you'll work on a real world dataset where you analyze accidents in NYC using collision data from the City of New York. Finally, you'll explore advanced constructs such as the nested foreach and also gives you a brief glimpse into the world of MapReduce and shows you how easy it is to implement this construct in Pig. By the end of this course, you'll have a better understanding of data transformations with Apache Pig.

Course Overview

2mins

Course Overview 2m

Introducing Pig

20mins

Using the GRUNT Shell

18mins

Install and Set up Pig on Your Local Machine 5m
Pig Modes of Operation 4m
Basic Commands and Configuring Log Messages 4m
Running Pig Scripts in Batch Mode 2m
Behind the Scenes of Pig Commands 3m

Loading Data into Relations

45mins

The Structure of a Pig Script and the Concept of Relations 5m
Loading Data from Files and Directories 4m
Loading Data with Schema 3m
Storing Relations in Directories 3m
Case-sensitivity in Pig 1m
Scalar Data Types 3m
Complex Data Types: The Tuple 9m
Complex Data Types: The Bag 5m
Complex Data Types: The Map 6m
Working with Partial Schema Specification 5m

Working with Basic Data Transformations

36mins

Foreach-generate: Visualization 2m
Foreach-generate: Indexes and Column Names 4m
Foreach-generate: Complex Data Types 6m
Categories of Pig Functions 4m
Math, String, and Date-time Functions 6m
The Filter Operation 6m
Distinct, Limit, and Order By 4m
The Split Operation 5m

Working with Advanced Data Transformations

48mins

Download NYC Collision Data 7m
Visualize the Group by Operation 3m
The Group by Operation 5m
Aggregations on Grouped Data 5m
Join Operations on Relations 5m
Types of Joins 5m
Implement the Left Outer, Self, and Cross Joins 4m
The Union Operation 3m
The Union Onschema Operation 7m
The Flatten Function 5m

Executing MapReduce Using Pig

24mins

The Nested Foreach Operation 3m
Analyze NYC Collision Data Using the Nested Foreach 10m
An Overview of the MapReduce Programming Model 3m
Dataflow Through a MapReduce Operation 4m
MapReduce Operations in Pig Latin 5m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Ready to upskill? Get started

Contact Sales

Data Transformations with Apache Pig

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Data Transformations with Apache Pig

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?