Applying MapReduce to Common Data Problems

Knowing how to program MapReduce is only half the battle. In this course, you'll learn how to set up the correct MapReduce based on what you want to accomplish.
Course info
Rating
(56)
Level
Beginner
Updated
Oct 26, 2016
Duration
2h 2m
Table of contents
Description
Course info
Rating
(56)
Level
Beginner
Updated
Oct 26, 2016
Duration
2h 2m
Description

This course, Applying MapReduce to Common Data Problems, helps you with three unique MapReduce patterns: summarizing numeric data, filtering large datasets, and building an index for fast data lookup. First, you'll learn about how you start "Thinking MapReduce" including what's involved and what needs to be broken down to start thinking in these terms. Next, you'll explore how to compute numeric summary metrics, and how to filter large data sets. Finally, you'll wrap up the course by learning about building indices, and why an inverted index is so important in the context of search engines. After watching this course, you'll have the confidence to spot patterns in MapReduce problems and will be on you're way to mastering this programming model.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
Using PyTorch in the Cloud: PyTorch Playbook
Intermediate
2h 21m
Apr 25, 2019
Building Clustering Models with scikit-learn
Intermediate
2h 33m
Apr 24, 2019
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi, my name is Janani Ravi, and I'm very happy to meet you today. I have a master's degree in Electrical Engineering from Stanford, and have worked with companies such as Microsoft, Google, and Flipkart. At Google, I was one of the first engineers working on real-time collaborative editing in Google Docs, and I hold four patents for its underlying technologies. I currently work on my own startup, Loony Corn, a studio for high-quality video content. This course focuses on the backbone of the data technologies, the MapReduce programming model. You'll see that different problems require the application of a different MapReduce design pattern, and you'll study three of the most common patterns, numeric summarization, filtering records, and building an index. This course will help you see how you identify the key value output of the mapper, and the combining operation preformed in the reducer for summarization, filtering, and indexing problems. All of this is accompanied with actual code-alongs in Java, so you can see your solutions come to life, see how marital status affects the working hours of individuals based on census data, and build an inverted index to help power your basic search engine.