Preparing Data for Machine Learning
By Janani Ravi
Course info



Course info



Description
As Machine Learning explodes in popularity, it is becoming ever more important to know precisely how to prepare the data going into the model in a manner appropriate to the problem we are trying to solve.
In this course, Preparing Data for Machine Learning* you will gain the ability to explore, clean, and structure your data in ways that get the best out of your machine learning model.
First, you will learn why data cleaning and data preparation are so important, and how missing data, outliers, and other data-related problems can be solved. Next, you will discover how models that read too much into data suffer from a problem called overfitting, in which models perform well under test conditions but struggle in live deployments. You will also understand how models that are trained with insufficient or unrepresentative data suffer from a different set of problems, and how these problems can be mitigated.
Finally, you will round out your knowledge by applying different methods for feature selection, dealing with missing data using imputation, and building your models using the most relevant features.
When you’re finished with this course, you will have the skills and knowledge to identify the right data procedures for data cleaning and data preparation to set your model up for success.
Section Introduction Transcripts
Course Overview
Hi, My name is Janani Ravi and welcome to this course on Preparing Data for Machine Learning. A little about myself, I have a Master's degree in Electrical Engineering from Stanford and have worked at companies such as Microsoft, Google, and Flip Card. At Google, I was one of the first engineers working on drill time collaborative editing in Google Docs and I hold four patterns for its underlying technology's. I currently work on my own startup Loony Corn, a studio for high quality video content. As machine learning explodes in popularity, it is becoming even more important to know precisely how to prepare the data going into the model in a manner appropriate to the problem we're trying to solve. In this course, you will gain the ability to explore, clean and structure your data in base that get the best out of your machine learning model. First, you will learn by data cleaning and data preparation are so important and how missing data outlayers and other data related problems can be solved. Next, you will discover how models that read too much into data suffer from a problem called over fitting, in which models perform well under test conditions but struggle in live deployments. You will also understand how models that are trained with insufficient or unrepresentative data suffer from a different set of problems and how these problems can be mitigated. Finally, you will round out your knowledge by applying different methods for feature selection, dealing with missing data using imputation and building your models using the most relevant features. When you're finished with this course, you will have the skills and knowledge to identify the right data procedures for data cleaning and data preparations to set your model up for success.