Course info
Sep 22, 2016
1h 48m

Processing millions of records requires that you first understand the art of breaking down your tasks into parallel processes. The MapReduce programming model, part of the Hadoop eco-system, gives you a framework to define your solution in terms of parallel tasks, which are then combined to give you the final desired result. In this course, Understanding the MapReduce Programming Model, you'll get an introduction to the MapReduce paradigm. First, you'll learn how it helps you visualize how data flows through the map, partition, shuffle, and sort phases before it gets to the reduce phase and gives you the final result. Next, it will guide you through your very first MapReduce program in Java. Finally, you'll learn to extend the framework Mapper and Reducer classes to plug in your own logic and then run this code on your local machine without using a Hadoop cluster. By the end of this course, you will be able to break big data problems into parallel tasks to help tackle large-scale data munging operations.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi! My name is Janani, and I'm very happy to meet you today. I have a master's degree in electrical engineering from Stanford and have worked at companies such as Microsoft, Google, and Flipkart. At Google, I was one of the first engineers working on real-time collaborative editing in Google Docs. And I hold four patents for it on the _____ Technologies. This course, Understanding MapReduce, introduces you to the backbone of big data technologies--the MapReduce programming model. MapReduce is beautiful in its simplicity, but courses on MapReduce tend to focus on the implementation rather than on the metalevel understanding of how data flows and is transformed in this system. MapReduce is the foundation of the art of thinking parallel. And this course helps you develop this art. This course will walk you through what MapReduce is from the very first principles. It will guide you as you write your first MapReduce Hello World program. And then, finally, it will show you how you can optimize the MapReduce process by taking advantage of inherent parallelism that is available in a distributed computing environment.