Building Features from Numeric Data

This course exhaustively covers data preprocessing techniques and transforms available in scikit-learn, allowing the construction of highly optimized features that are scaled, normalized and transformed in mathematically sound ways to fully harness the power of machine learning techniques.
Course info
Level
Beginner
Updated
Apr 8, 2019
Duration
2h 25m
Table of contents
Course Overview
Using Numeric Data in Machine Learning Algorithms
Building Features Using Normalization
Building Features Using Scaling and Transformations
Description
Course info
Level
Beginner
Updated
Apr 8, 2019
Duration
2h 25m
Description

The quality of preprocessing that numeric data is subjected to is an important determinant of the results of machine learning models built using that data. With smart, optimized data pre-processing, you can significantly speed up model training and validation, saving both time and money, as well as greatly improve model performance in prediction. In this course, Building Features from Numeric Data, you will gain the ability to design and implement effective, mathematically sound data pre-processing pipelines. First, you will learn the importance of normalization, standardization and scaling, and understand the intuition and mechanics of tweaking the central tendency as well as dispersion of a data feature. Next, you will discover how to identify and deal with outliers and possibly anomalous data. You will then learn important techniques for scaling and normalization. Such techniques, notably normalization using the L1-norm, L2-norm and Max norm, seek to transform feature vectors to have uniform magnitude. Such techniques find wide usage in ML model building - for instance in computing the cosine similarity of document vectors, and in transforming images before techniques such as convolutional neural networks are applied to them. You will then move from normalization and standardization to scaling and transforming data. Such transformations include quantization as well as the construction of custom transformers for bespoke use cases. Finally, you will explore how to implement log and power transformations. You will round out the course by comparing the results of three important transformations - the Yeo-Johnson transform, the Box-Cox transform and the quantile transformation - in converting data with non-normal characteristics, such as chi-squared or lognormal data into the familiar bell curve shape that many models work best with. When you’re finished with this course, you will have the skills and knowledge of data preprocessing and transformation needed to get the best out of your machine learning models.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
Building Features from Image Data
Advanced
2h 10m
Aug 13, 2019
Designing a Machine Learning Model
Intermediate
3h 25m
Aug 13, 2019
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
(Music playing) Hi, my name is Janani Ravi, and welcome to this course on Building Features from Numeric Data. A little about myself, I have a master's degree in electrical engineering from Stanford, and have worked at companies such as Microsoft, Google, and Flipkart. At Google, I was one of the first engineers working on real-time collaborative editing in Google Docs, and I hold four patents for its underlying technologies. I currently work on my own startup, Loonycorn, a studio for high quality video content. In this course, you will gain the ability to design and implement effective, mathematically-sound data preprocessing pipelines. First, you will learn the importance of normalization, standardization, and scaling, and understand the intuition and mechanics of tweaking the central tendency, as well as dispersion of a data feature. Next, you will discover how to identify and deal with outliers and possibly anonymous data. You will learn important techniques for scaling and normalization. Such techniques, notably normalization, using the L1 norm, L2 norm, and the max norm seek to transform feature vectors to have uniform magnitude. You will then move onto scaling and transforming data. Such transformations include quantization, as well as the construction of custom transformers for most use cases. Finally, you'll explore how to implement log and power transformations. You'll round out the course by comparing the results of three important transformations, the Yeo-Johnson transform, the Box-Cox transform, and the quantile transformation in converting data with non-normal characteristics into the familiar Bell curve shape that many models work best with. When you're finished with this course, you will have the skills and knowledge of data preprocessing and transformation needed to get the best out of your machine learning models.