Preparing Data for Modeling with scikit-learn

This course covers important steps in the pre-processing of data, including standardization, normalization, novelty and outlier detection, pre-processing image and text data, as well as explicit kernel approximations such as the RBF and Nystroem methods.
Course info
Level
Advanced
Updated
Aug 12, 2019
Duration
3h 41m
Table of contents
Course Overview
Preparing Numeric Data for Machine Learning
Understanding and Implementing Novelty and Outlier Detection
Preparing Text Data for Machine Learning
Working with Specialized Datasets
Performing Kernel Approximations
Preparing Image Data for Machine Learning
Description
Course info
Level
Advanced
Updated
Aug 12, 2019
Duration
3h 41m
Description

Even as the number of machine learning frameworks and libraries increases on a daily basis, scikit-learn is retaining its popularity with ease. Scikit-learn makes the common use-cases in machine learning - clustering, classification, dimensionality reduction and regression - incredibly easy. In this course, Preparing Data for Modeling with scikit-learn, you will gain the ability to appropriately pre-process data, identify outliers and apply kernel approximations. First, you will learn how pre-processing techniques such as standardization and scaling help improve the efficacy of ML algorithms. Next, you will discover how novelty and outlier detection is implemented in scikit-learn. Then, you will understand the typical set of steps needed to work with both text and image data in scikit-learn. Finally, you will round out your knowledge by applying implicit and explicit kernel transformations to transform data into higher dimensions. When you’re finished with this course, you will have the skills and knowledge to identify the correct data pre-processing technique for your use-case and detect outliers using theoretically robust techniques.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
Building Features from Image Data
Advanced
2h 10m
Aug 13, 2019
Designing a Machine Learning Model
Intermediate
3h 25m
Aug 13, 2019
Building Features from Nominal Data
Intermediate
2h 40m
Aug 12, 2019
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
[Autogenerated] Hi, My name is Jenny Ravi, and welcome to the scores on the bearing Gator for modeling that psychic. Learn a little about myself. I have a master's degree in electrical engineering from Stanford and have opened companies such as Microsoft, Google and Flip Card at Google was one of the first engineers working on a real time collaborative editing in Global Dogs, and I hold four patterns for the tagline technologies. I currently work on my own Start up loony Con, a studio for high quality video contact. Even as the number of machine learning frameworks and libraries in pleases on a daily basis, Psychic's learner's retaining its popularity with ease. Psychic Land makes the common use cases in machine learning, clustering classifications, dimensionality reduction and regression Incredibly easy. In this course, you begin the ability to appropriately the process data, identify outliers and apply colonel approximations. First, you will learn how pre processing techniques such as standardization and scaling health improve the efficacy off Emel algorithms. Next, you will discover how novelty and out fire detection is implemented inside ______. You will then understand the typical set of steps needed to work with both. Next on image data entitled Finally, you'll round out your knowledge by applying implicit and explicit colonel transformations to transform data to higher dimensions. When you finish with scores, you will have the skills and knowledge to identify the correct data. Pre processing technique for your king's case and you'll be able to detect out flyers in your data set using theoretically robust techniques.