Course info
Dec 31, 2018
1h 33m

At the core of applied machine learning is supervised machine learning. In this course, Machine Learning with XGBoost Using scikit-learn in Python, you will learn how to build supervised learning models using one of the most accurate algorithms in existence. First, you will discover what XGBoost is and why it’s revolutionized competitive modeling. Next, you will explore the importance of data wrangling and see how clean data affects XGBoost’s performance. Finally, you will learn how to build, train, and score XGBoost models for real-world performance. When you are finished with this course, you will have a foundational knowledge of XGBoost that will help you as you move forward to becoming a machine learning engineer.

About the author
About the author

Mike has Bachelor of Science degrees in Business and Psychology. He's passionate about machine learning and data engineering.

More from the author
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hello. My name is Mike West, and welcome to my course, Machine Learning with XGBoost Using scikit‑learn in Python. While artificial neural networks are getting all the attention, a class of models known as gradient boosters are doing all the winning in the competitive modeling space. The most famous gradient booster is XGBoost. XGBoost is an implementation of gradient boosted decision trees designed for speed and performance. XGBoost stands for Extreme Gradient Boosting. Additionally, because so much of applied machine learning is supervised, XGBoost is being widely adopted as the model of choice for highly structured datasets in the real world. This course will provide you with the foundation you'll need to build highly performant models using XGBoost. This course will introduce you to decision trees. Decision trees are used as the base model in XGBoost. Decision trees build an ensemble model that offers better predictability than the base model. You'll learn how machine learning engineers massage their data into highly structured, highly cleansed arrays that machine learning models understand. You'll also learn how data is segmented into training and test sets. Separating your data is critical to avoid overfitting. Boosting algorithms like XGBoost are prone to overfitting. Overfitting happens when the model learns the data too well. Once your data has been cleansed, the XGBoost model trained and tested on fresh data, you'll learn how to persist or save those models to disk. The gold standard for saving models in Python is called pickle. Every step in the machine learning process is critical to building highly accurate models in XGBoost. I hope you'll join me on this journey to learn more about XGBoost in Python, at Pluralsight.