Applying statistical techniques to your data within Azure Machine Learning Service will often boost model performance. This course will teach you the basics of data cleansing, including basic syntax and functions.
At the core of applied machine learning is data. In this course, Building Features from Nominal and Numeric Data in Microsoft Azure, you will learn how to cleanse data within the confines of Azure Machine Learning Service. First, you will discover the sundry options you have within Azure Machine Learning Service for building your models end to end. Next, you will explore the importance of applying statistical techniques to your data to improve model performance. Finally, you will learn how to apply various data cleansing techniques to your data for enhancing real-world performance. When you are finished with this course, you will have a foundational knowledge of Azure Machine Learning Service and a solid understating of how to apply statistical techniques to your data that will help you as you move forward to becoming a machine learning engineer.
Course Overview Hello everyone. My name is Mike West. Welcome to my course, Building Features from Nominal and Numeric Data in Microsoft Azure. Applied machine learning is data driven. In the real world, data is dirty, and it's the responsibility of the machine-learning engineer to ensure that the data you'll be using to build your models is well cleansed. Microsoft is a top player in the machine learning and data-science space. This course is a quick introduction to data cleansing within the confines of Microsoft Azure Machine Learning service. The course will introduce you to the machine-learning process. Within that process are various statistical and programmatic techniques you can apply to your data. The application of these techniques to your data prior to modeling will often help you get better performance from your models. Within Azure Machine Learning service are Notebook Virtual Machines. These Notebook VMs are almost identical to Jupyter Notebooks, the gold standard for building end-to-end models in the applied space. The course will introduce you to Gaussian distributions. Gaussian distributions are very well understood, so much so that large parts of the field of statistics is dedicated to methods for this distribution. You'll learn how machine-learning engineers use imputation to fill gaps in their data. Machine-learning models don't like missing values, and handling these values is often critical to the model's performance. You'll learn multiple approaches to standardizing and scaling your data. Whenever you're dealing with features that differ from each other, in terms of range of values, you'll often normalize the data so that the difference in these ranges of values does not affect your outcome. You'll learn about the different types of data and statistics and some recommended approaches to handling categorical data. Categorical data is data that's not numeric. Data preparation is one of the most important facets to the outcome of your machine-learning models. I hope you'll join me on this journey to learn more about Building Features from Nominal Numeric Data in Microsoft Azure, at Pluralsight.