Building Machine Learning Models in Python with scikit-learn

This course course will help engineers and data scientists learn how to build machine learning models using scikit-learn, one of the most popular ML libraries in Python. No prior experience with ML needed, only basic Python programming knowledge.
Course info
Rating
(75)
Level
Beginner
Updated
Apr 30, 2018
Duration
3h 13m
Table of contents
Course Overview
Processing Data with scikit-learn
Building Specialized Regression Models in scikit-learn
Building SVM and Gradient Boosting Models in scikit-learn
Implementing Clustering and Dimensionality Reduction in scikit-learn
Description
Course info
Rating
(75)
Level
Beginner
Updated
Apr 30, 2018
Duration
3h 13m
Description

The Python scikit-learn library is extremely popular for building traditional ML models i.e. those models that do not rely on neural networks. In this course, Building Machine Learning Models in Python with scikit-learn, you will see how to work with scikit-learn, and how it can be used to build a variety of machine learning models. First, you will learn how to use libraries for working with continuous, categorical, text as well as image data. Next, you will get to go beyond ordinary regression models, seeing how to implement specialized regression models such as Lasso and Ridge regression using the scikit-learn libraries. Finally, in addition to supervised learning techniques, you will also understand and implement unsupervised models such as clustering using the mean-shift algorithm and dimensionality reduction using principal components analysis. At the end of this course, you will have a good understanding of the pros and cons of the various regression, classification, and unsupervised learning models covered and you will be extremely comfortable using the Python scikit-learn library to build and train your models. Software required: scikit-learn, Python 3.x.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
Analyzing Data with Qlik Sense
Intermediate
2h 11m
Jun 17, 2019
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi, my name is Janani Ravi, and welcome to this course on building machine learning models in Python with scikit-learn. A little about myself, I have a master's degree in electrical engineering from Stanford and have worked with companies such as Microsoft, Google, and Flipkart. At Google, I was one of the first engineers working on realtime collaborative editing in Google Docs, and I hold four patents for it's underlying technologies. I currently work on my own startup, Loonycorn, a studio for high-quality video content. This course is a beginner's course for engineers and data scientists who want to understand and learn how to build machine learning models using scikit-learn, one of the most popular ML libraries in Python. This course covers scikit-learn support for data processing and feature extraction. You'll learn how to use libraries for working with continuous, categorical, ex as well as image data. This course goes beyond ordinary regression models. You'll understand and learn to implement specialized regression models such as lasso and ridge regression. Classification algorithms such as support vector machines and ensemble learning techniques such as gradient boosting and scikit-learn are also covered. In addition to supervised learning techniques, you'll also understand and implement unsupervised models such as clustering using the mean shift algorithm and dimensionality reduction using principle component analysis. At the end of this course, you will have a good understanding of the pros and cons of the various regression, classification, and unsupervised learning models covered, and you'll be extremely comfortable using the Python scikit-learn library to build and preen your models.

Processing Data with scikit-learn
Hi, and welcome to this course on building machine learning models in Python using scikit-learn. Scikit-learn is an extremely popular open source Python library with implementations for a wide range of machine learning problems, such as classification, regression, clustering, dimensionality reduction, and so on. This is typically the first library that a student encounters when she starts her study of machine learning, which is why we'll start off this course by understanding the different types of ML algorithms and where they might be used. We'll then see how we can work with numerical as well as categorical data. In the world of machine learning there are well understood ways in which we deal with data in a continuous range, and data which can only take on discreet values. We'll see how we can standardize numerical data when we want to feed it as an input to an ML model. Standardization of numerical data is an important prereq for many machine learning algorithms to get a stable and robust solution. ML algorithms only recognize numeric input, but we want our machine learning models to work with text data as well, such as for sentiment analysis. This is an important tree processing step here. We need to be able to represent our text in numerical form. We also want our machine learning algorithms to be able to work with images, which means we ought to know how we can represent pixel intensities and extract other features from images. This module will cover that as well.

Building Specialized Regression Models in scikit-learn
Hi, and welcome to this module on building specialized regression models using scikit-learn. Regression is a very common machine learning technique which is used to predict an output which is a continuous variable. The price of a house given where it's located, the length of a sports person's career given his health and fitness, these are all examples of regression. We'll talk about regression models and how we measure the fit of a model to its underlying data. When you're building your machine learning model it's possible that your model does very well during training but performs poorly on the test data. Such models are called overfitted models. We'll speak about those and the bias variance trade off that is inherent when you build any machine learning model. Lasso and Ridge regression are alternative methods that you can use. Beyond ordinary least squares regression these mitigate the problem of overfitting. We'll also study another form of regression called support vector regression. These are built using the same principles as support vector machines for classification but use a different objective function.

Building SVM and Gradient Boosting Models in scikit-learn
Hi and welcome to this module on building support vector machines and gradient boosting models using scikit-learn. We studied support vector machines briefly in a previous module when we spoke of support vector regression. Support vector machines are actually a very popular ML technique for classification. In this module, we will see how we can use SVMs to work on text as well as images. We'll use SVMs to classify documents by topic and to find the digit represented by an image. We spoke briefly about ensemble learning earlier which allows us to mitigate the over fitting problem in machine learning models. Often you can have many ML models work together as an ensemble to build a stronger model. We will see how this can be done in gradient boosting regression, which uses several weak decision trees to build a stronger regression model.

Implementing Clustering and Dimensionality Reduction in scikit-learn
Hi, and welcome to this module on unsupervised learning techniques. We'll see how we can implement clustering and dimensionality reduction in scikit-learn. Clustering is a popular and elegant unsupervised learning technique which helps find patterns in the underlying data. Clustering does not use any Y variables or labels on the data. It looks at the data structure itself. Common clustering algorithms are K-means algorithm, hierarchically clustering, and mean shift clustering. Today the problem is no longer of scarcity of data. We have a lot of data and a lot of that data might be meaningless. Dimensionality reduction represents the input data in terms of their most significant features, and tend to improve the performance of machine learning models. One of the most widely use techniques for dimensionality reduction is principal components analysis. We studied early on in this course that machine learning models can be divided into two broad categories. Supervised learning techniques require labeled training data. Unsupervised learning techniques do not need label instances. Instead, they try to find patterns within the data itself.