XGBoost is the most winning supervised machine learning approach in competitive modeling on structured datasets. This course will teach you the basics of XGBoost, including basic syntax, functions, and implementing the model in the real world.
At the core of applied machine learning is supervised machine learning. In this course, Machine Learning with XGBoost Using scikit-learn in Python, you will learn how to build supervised learning models using one of the most accurate algorithms in existence. First, you will discover what XGBoost is and why it’s revolutionized competitive modeling. Next, you will explore the importance of data wrangling and see how clean data affects XGBoost’s performance. Finally, you will learn how to build, train, and score XGBoost models for real-world performance. When you are finished with this course, you will have a foundational knowledge of XGBoost that will help you as you move forward to becoming a machine learning engineer.
Preparing Data for Gradient Boosting Hello, my name is Mike West, and welcome back to an Introduction to XGBoost Using Scikit-learn in Python. In this module, you'll learn how data is prepared for machine learning models. A model is only as good as the data passed into it. In machine learning, the process of massaging data into a modellable state is called data wrangling. In this module, you'll learn what data wrangling is, and why it's so important to the machine learning process. In this module, the AI hierarchy will be discussed. You'll learn about the two core types of models, the artificial neural network and traditional models. XGBoost is a traditional model. Data wrangling is the same process for traditional models, as well as it is for artificial neural networks. Machine learning is separated into two core types of learning. The first is supervised, and the second is unsupervised. This module will cover the difference between the two, and why most applied machine learning is supervised learning. In this module, you'll learn about the applied machine learning world versus the world of research or academia. Supervised learning is all about data, and real-world machine learning is all about cleaning data and modeling that cleansed dataset. The module will cover the array at a high level. Linear algebra is the mathematics of structured data, and the array is the core object that houses that data. You'll learn what an array is and how to navigate arrays using indexes. You'll also become familiar with data cleansing within the context of an array. Machine learning is very process oriented. Machine-learning engineers follow the same steps in order to build predictive models. This module will cover that process and explain the importance of data wrangling within the context of the machine learning process. Lastly, you'll wrangle the Titanic dataset and use XGBoost to create a highly accurate model against that cleansed dataset.
Selecting Features in Gradient Boosting Hello, my name is Mike West, and welcome back to an Introduction to XGBoost Using Scikit-learn in Python. In this module, you'll learn about feature engineering. When your goal is to get the best possible results from a predictive model, you need to get the most you can from the data you have. Features are the numbers that are fed into the model. If you're working with structured data, think of a feature as a column in that array or table. There are three general classes of feature selection algorithms, filter methods, wrapper methods, and embedded methods. Each method will be defined in this module. You can use the wrong models, or one that is less optimal, and still get good results. Most models can pick up on good structured data. The flexibility of good features will allow you to use less complex models that are faster to run, easier to understand, and easier to maintain. One of the benefits of using gradient boosting is that after the boosted trees are constructed, you can retrieve the important scores for each attribute. XGBoost is a gradient boosting model. Generally, importance provides a score that indicates how useful or valuable each feature was in the construction of the boosted decision tree within that model. This module will also cover feature construction. This technique is the one that most people are referring to when they talk about feature engineering. This is the process of manually constructing new attributes from raw data. It involves intelligently combining or splitting existing raw features into one which will have a higher predictive power. Finally, we'll use feature selection and feature engineering in various demonstrations. All the demonstrations will use XGBoost.