Everything you need to know about machine learning: part 2
- select the contributor at the end of the page -
This is the second of a three-part series. Part 1 covers the basics of machine learning, while this article gives a more in-depth look into Microsoft Azure Machine Learning and how to access it via Web services. Finally, the third part will go through some examples.
In the first part of this series on Microsoft Azure Machine Learning (MAML), I laid out the basics of machine learning and provided some basic terminology. I also showed a high-level example that takes you from starting with your data and progressing, all the way to testing the model.
In Part 2, we'll go back through this process again with a practical example based on the Titanic data set. Then, finally, at the end, we'll wrap it up with some new information and make a prediction based on the input values.
Since MAML is all about cloud computing on Microsoft Azure’s public cloud, I'm going to show you how to use Web services to provide the data to input and also to get the prediction output returned. In the next section, I'll jump right into things using the terminology introduced in Part 1 to go through the details of a hands-on example, so it might be a good idea for you to go back and re-read the first part as a quick refresher.
If we go block-by-block starting from the top, briefly:
Using my trained model, I predicted non-survivors (“0”) with a 90.1 percent success rate, and I predicted survivors (“1”) with a 67.2 percent success rate. So, my model does pretty well with predicting non-survivors, but just so-so with survivors. This begs the question, is there a way to improve on the success rate of my predictions?
So now, what I’ve done is initiated, trained and scored, and now I can evaluate both models. Not only that, but I can combine the evaluation into one single view:
The “confusion matrix” to the left is the same as my original experiment, and the one to the right is my new matrix based on the algorithm I added. If you compare both matrices, you might notice that my second model was more accurate in predicting non-survivors successfully (now 97.3 percent instead of 90.1 percent), but it was worse when predicting survivors (now 55.2 percent instead of 67.2 percent).
It’s quite possible that another classification algorithm would provide better predictions, and there’s no shortage of possibilities as MAML currently provides 14 different algorithms. There’s a lot of experimenting that can be done here to try to find the best model in a particular scenario.
In the final part of this series, I’ll talk more about algorithm choices. You need to pick the right type of algorithm because some are likely optimized if you’re working with a classification or clustering machine learning task (MAML Studio can help guide you with its samples). Here’s a snapshot of three current scenarios:
For example, sample five would likely fit well with what I've done with the Kaggle Titanic data experiment. Now, I need to talk about how to access the Web service remotely, because that’s what this is all about -- think, “Data Scientist as a Service.”
Notice that the important features I passed included passenger class: 1, sex: F, age: 35 and fare: 75. My model is telling me it predicts that this person should have survived (the “1” at the end). If you’re interested, you can find my sample code here.
This sets up my web service in a staging environment that doesn’t necessarily guarantee with any kind of uptime. To get published service level, I need to “productionize” my web service, which we'll discuss next. (At the time of writing, I couldn't find anything officially stating what service level Microsoft guaranteed for this service.)
Hands-on example
Microsoft has provided an online Web-based interface to work with MAML, named Azure Machine Learning Studio. I’m not going to get into the finer details of Azure ML Studio detailing, like how to drag and drop and link things together, but in Part 3 of this series, you'll get a video that walks you through how I’ve assembled all of the modules or blocks together. If you need additional guidance, head here and enter “Azure Machine Learning” in the search box, and additional information will be available. Using the Kaggle Titanic data is an example of a supervised machine learning scenario. Basically, I have data that I will define having a label and features. In this case, the features I'll use to create my model are “passenger class,” “sex,” “age” and “fare”, and I'll define the label as whether the particular person survived or not, which is defined by “survived.” This is what the workflow for this example can look like in MAML Studio:
- Add our data set.
- Select certain features (because such things as the name may not be relevant to training a model).
- Clean our data (some passengers had an empty age, so I filled in empty values with the median age).
- Spit the data (I chose to use 80 percent of my data to train my model, and the remaining 20 percent to test).
- Setup the model with 80 percent of the original data and initialize it as a “Multiclass Decision Forest,” which is a classification-type algorithm.
- Score the model.
- Evaluate the model.

Getting a better prediction
I used an algorithm named “Multiclass Decision Forest,” but was that the best choice? Remembering that I’m dealing with a classification-type problem, MAML Studio provides another algorithm named “Multiclass Neural Network.” This part of MAML is awesome; with just a few clicks, I can add another model to my experiment. I’m just showing the important part of the workflow here where I’ve added the new algorithm and joined all of the blocks together, and I’m using the same training and test data:


Using the Web service
The above runs us through an example that eventually provides us with a model we can use to make predictions. The point of having MAML in the cloud is that you want to be able to interact with it easily. You can publish your model by setting the appropriate locations in your MAML workflow as a published input and output (see the video in Part 3 of this series, the MAML Studio documentation or Channel9 videos for more details). Then, you can easily access sample code in R, Python or C# (with sample code provided in MAML Studio to help you). Below, I’m showing an example where I'm using Windows PowerShell to access the web service: