Course

Skills

Building Features from Numeric Data

by Janani Ravi

This course exhaustively covers data preprocessing techniques and transforms available in scikit-learn, allowing the construction of highly optimized features that are scaled, normalized and transformed in mathematically sound ways to fully harness the power of machine learning techniques.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(24)

Level

Beginner

Updated

Apr 8, 2019

Duration

2h 25m

What you'll learn

The quality of preprocessing that numeric data is subjected to is an important determinant of the results of machine learning models built using that data. With smart, optimized data pre-processing, you can significantly speed up model training and validation, saving both time and money, as well as greatly improve model performance in prediction.

In this course, Building Features from Numeric Data, you will gain the ability to design and implement effective, mathematically sound data pre-processing pipelines.

First, you will learn the importance of normalization, standardization and scaling, and understand the intuition and mechanics of tweaking the central tendency as well as dispersion of a data feature.

Next, you will discover how to identify and deal with outliers and possibly anomalous data. You will then learn important techniques for scaling and normalization. Such techniques, notably normalization using the L1-norm, L2-norm and Max norm, seek to transform feature vectors to have uniform magnitude. Such techniques find wide usage in ML model building - for instance in computing the cosine similarity of document vectors, and in transforming images before techniques such as convolutional neural networks are applied to them.

You will then move from normalization and standardization to scaling and transforming data. Such transformations include quantization as well as the construction of custom transformers for bespoke use cases. Finally, you will explore how to implement log and power transformations. You will round out the course by comparing the results of three important transformations - the Yeo-Johnson transform, the Box-Cox transform and the quantile transformation - in converting data with non-normal characteristics, such as chi-squared or lognormal data into the familiar bell curve shape that many models work best with.

When you’re finished with this course, you will have the skills and knowledge of data preprocessing and transformation needed to get the best out of your machine learning models.

Course Overview

1min

Course Overview 2m

Using Numeric Data in Machine Learning Algorithms

54mins

Building Features Using Normalization

34mins

Module Overview 1m
What Is Normalization? 2m
Normalization and Cosine Similarity 8m
Demo: Cosine Similarity and the L2 Norm 7m
Demo: Normalizing Data to Simplify Cosine Similarity Calculations 4m
Demo: K-means Clustering with Cosine Similarity 4m
L1, L2 and Max Norms 3m
Demo: Normalization Using L1, L2 and Max Norms 5m
Summary 1m

Building Features Using Scaling and Transformations

54mins

Module Overview 1m
Converting Continuous Data to Categorical 3m
Demo: Convert Numeric Data to Binary Categories Using a Binarizer 5m
Demo: Using the KBinsDiscretizer to Categorize Numeric Values 6m
Demo: Using Bin Values to Flag Outliers 3m
Scaling Data 2m
Demo: Scaling with the MaxAbsScaler 2m
Demo: Scaling with the MinMaxScaler 3m
Custom Transformations 1m
Demo: Performing Custom Transforms Using the FunctionTransformer 3m
Generating Polynomial Features 2m
Demo: Using Polynomial Features to Transform Data 6m
Transforming Features to Gaussian-like Distributions Using Power Transformers 1m
Demo: Working with Chi Squared Distributed Input Features 5m
Demo: Applying Power Transformers to Get Normal Distributions 4m
Transforming Data to Normal or Uniform Distributions Using Quantile Transformers 1m
Demo: Tranforming to a Normal Distribution Using the QuantileTransformer 4m
Summary and Further Study 2m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(24)

Level

Beginner

Updated

Apr 8, 2019

Duration

2h 25m

Ready to upskill? Get started

Contact Sales

Building Features from Numeric Data

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Building Features from Numeric Data

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?