Course

Data Science & Hadoop Workflows at Scale With Scalding

Learn how to use Scalding and Algebird and join Twitter, Etsy, eBay, and others to efficiently extract value and process data at scale on Hadoop.

Intermediate

2h 8m

(68)

Created by Ahmad Alkilani

Last Updated Nov 01, 2023

Get started today

Access this course and other top-rated tech content with one of our business plans.

Start a free team trial

Buy now

Try this course for free

Access this course and other top-rated tech content with one of our individual plans.

Start a free trial

Buy now

This course is included in the libraries shown below:

Data

Course

Data Science & Hadoop Workflows at Scale With Scalding

Learn how to use Scalding and Algebird and join Twitter, Etsy, eBay, and others to efficiently extract value and process data at scale on Hadoop.

Intermediate

2h 8m

(68)

Created by Ahmad Alkilani

Last Updated Nov 01, 2023

Get started today

Access this course and other top-rated tech content with one of our business plans.

Start a free team trial

Buy now

Try this course for free

Access this course and other top-rated tech content with one of our individual plans.

Start a free trial

Buy now

This course is included in the libraries shown below:

Data

What you'll learn

This course teaches you how to use Scalding (a domain specific language) built on Scala and Cascading to build distributed applications on Hadoop. The course also focuses on the data science aspect using Algebird, an abstract algebra library for Scala, to solve real-world sketching/streaming problems on distributed systems. You will learn how to reason about a variety of problems, how to build and test locally, and how to deploy on Hadoop. You will also learn the algorithms used to solve problems at scale where performance, compute and memory resources, and the window of time you have to process streaming data are all challenges you'll have to overcome, and how you can use Scalding and Algebird to solve for these constraints. This course also covers some Scala basics to get you up to speed and looks into how you can monitor, visualize, and troubleshoot your application's workflow and performance problems. Watch this course if you were considering, or already know how to use Pig, Hive, or any other DSL for Hadoop and not only wanted more power over your workflows, but also a DSL that is actively being developed to support up and coming execution frameworks like Apache Tez and Apache Spark with all the flexibility that a full functional programming language like Scala has to offer. If you're serious about learning how to build enterprise-grade applications on Hadoop, data science, and Lambda architectures, then this course is for you.

Data Science & Hadoop Workflows at Scale With Scalding

Intermediate

2h 8m

(68)

Table of contents

About the author

Ahmad Alkilani

4 courses

4.5 author rating

937 ratings

Ahmad is a Data Architect specializing in the implementation of high-performance data warehouses and BI systems and enjoys speaking at various user groups and conferences.

More Courses by Ahmad

Data Science & Hadoop Workflows at Scale With Scalding

Data Science & Hadoop Workflows at Scale With Scalding

Get started today

Try this course for free

Data Science & Hadoop Workflows at Scale With Scalding

What you'll learn

Data Science & Hadoop Workflows at Scale With Scalding

Introduction to Scalding 37m

Building Applications With Scalding 37m

Scalding on Hadoop 16m

Data Science With Scalding 37m

2025 Forrester Wave™ names Pluralsight as a Leader among tech skills dev platforms