Course

Skills

Preparing Data for Modeling with scikit-learn

This course covers important steps in the pre-processing of data, including standardization, normalization, novelty and outlier detection, pre-processing image and text data, as well as explicit kernel approximations such as the RBF and Nystroem methods.

Preview this course

What you'll learn

Even as the number of machine learning frameworks and libraries increases on a daily basis, scikit-learn is retaining its popularity with ease. Scikit-learn makes the common use-cases in machine learning - clustering, classification, dimensionality reduction and regression - incredibly easy. In this course, Preparing Data for Modeling with scikit-learn, you will gain the ability to appropriately pre-process data, identify outliers and apply kernel approximations. First, you will learn how pre-processing techniques such as standardization and scaling help improve the efficacy of ML algorithms. Next, you will discover how novelty and outlier detection is implemented in scikit-learn. Then, you will understand the typical set of steps needed to work with both text and image data in scikit-learn. Finally, you will round out your knowledge by applying implicit and explicit kernel transformations to transform data into higher dimensions. When you’re finished with this course, you will have the skills and knowledge to identify the correct data pre-processing technique for your use-case and detect outliers using theoretically robust techniques.

Course Overview

1min

Course Overview 2m

Preparing Numeric Data for Machine Learning

46mins

Understanding and Implementing Novelty and Outlier Detection

47mins

Module Overview 1m
Outliers and Novelties 3m
Detecting and Coping with Outlier Data 4m
Local Outlier Factor 3m
Elliptic Envelope 3m
Isolation Forest 4m
Outlier Detection Using Local Outlier Factor 7m
Outlier Detection Using Isolation Forest 5m
Outlier Detection Using Elliptic Envelope 3m
Novelty Detection Using Local Outlier Factor 5m
Using the Predict Score Samples and Decision Function 3m
Outlier Detection Using the Head Brain Dataset 4m
Module Summary 1m

Preparing Text Data for Machine Learning

30mins

Module Overview 1m
Representing Text Data in Numeric Form 5m
Bag-of-words and Bag-of-n-grams Models 3m
Vectorize Text Using the Bag-of-words Model 5m
Vectorize Text Using the Bag-of-n-grams Model 3m
Vectorize Text Using Tf-Idf Scores 3m
Hashing for Dimensionality Reduction 3m
Reducing Dimensions Using the Hashing Vectorizer 3m
Performing Feature Extraction on a Python Dictionary 2m
Module Summary 1m

Preparing Image Data for Machine Learning

34mins

Module Overview 1m
Representing Images as Matrices 3m
Feature Extraction from Images 6m
Extracting Patches from Image Data 4m
Using Dictionary Learning to Denoise and Reconstruct Images 7m
Clustering Image Data Using a Pixel Connectivity Graph 7m
Clustering Images Using a Gradient Connectivity Graph 6m
Module Summary 1m

Working with Specialized Datasets

27mins

Module Overview 1m
Internal, Artificial, and External Datasets in Scikit Learn 3m
Exploring Internal Datasets 7m
Creating Artificial Datasets for Regression, Classification, Clustering, and Dimensionality Reduction 8m
Generating Manifold Data 7m
Module Summary 1m

Performing Kernel Approximations

32mins

Module Overview 1m
Support Vector Classifiers and the Kernel Trick 4m
Kernel Approximations 7m
Preparing Image Data 5m
Comparing Classifiers Trained Using Implicit and Explict Features 7m
Comparing Accuracy and Runtime for Different Sample Sizes 7m
Summary and Further Study 2m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Ready to upskill? Get started

Contact Sales

Preparing Data for Modeling with scikit-learn

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Preparing Data for Modeling with scikit-learn

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?