Everything you need to know about machine learning: part 2
- select the contributor at the end of the page -
This is the second of a three-part series. Part 1 covers the basics of machine learning, while this article gives a more in-depth look into Microsoft Azure Machine Learning and how to access it via Web services. Finally, the third part will go through some examples. In the first part of this series on Microsoft Azure Machine Learning (MAML), I laid out the basics of machine learning and provided some basic terminology. I also showed a high-level example that takes you from starting with your data and progressing, all the way to testing the model. In Part 2, we'll go back through this process again with a practical example based on the Titanic data set. Then, finally, at the end, we'll wrap it up with some new information and make a prediction based on the input values. Since MAML is all about cloud computing on Microsoft Azure’s public cloud, I'm going to show you how to use Web services to provide the data to input and also to get the prediction output returned. In the next section, I'll jump right into things using the terminology introduced in Part 1 to go through the details of a hands-on example, so it might be a good idea for you to go back and re-read the first part as a quick refresher.
Hands-on exampleMicrosoft has provided an online Web-based interface to work with MAML, named Azure Machine Learning Studio. I’m not going to get into the finer details of Azure ML Studio detailing, like how to drag and drop and link things together, but in Part 3 of this series, you'll get a video that walks you through how I’ve assembled all of the modules or blocks together. If you need additional guidance, head here and enter “Azure Machine Learning” in the search box, and additional information will be available. Using the Kaggle Titanic data is an example of a supervised machine learning scenario. Basically, I have data that I will define having a label and features. In this case, the features I'll use to create my model are “passenger class,” “sex,” “age” and “fare”, and I'll define the label as whether the particular person survived or not, which is defined by “survived.” This is what the workflow for this example can look like in MAML Studio: If we go block-by-block starting from the top, briefly:
- Add our data set.
- Select certain features (because such things as the name may not be relevant to training a model).
- Clean our data (some passengers had an empty age, so I filled in empty values with the median age).
- Spit the data (I chose to use 80 percent of my data to train my model, and the remaining 20 percent to test).
- Setup the model with 80 percent of the original data and initialize it as a “Multiclass Decision Forest,” which is a classification-type algorithm.
- Score the model.
- Evaluate the model.