Everything you need to know about machine learning: part 3
- select the contributor at the end of the page -
This is the final installment of a three-part series. Part 1 covers the basics of machine learning, while part 2 provides a more in-depth look into Microsoft Azure Machine Learning and how to access it via Web services. Now, in part 3, we'll put all that knowledge to work with some examples.
To complete our three-part series on machine learning, you'll watch three videos that offer a quick overview of what we covered in part 2. You'll also learn the importance of choosing the right algorithm, and how the R programming language can be used to extend the service's features. The information here can be consumed however you like -- feel free to dive right in to the videos (which mainly apply to our discussion in part 2) or save them for later. Here are they are:
- Video 1: A complete example using the Titanic data
- Video 2: Adding an additional algorithm
- Video 3: Configuring and querying using a Web service
(Please note that Microsoft recently updated the service, so I've rewritten my script to query the Web service. Here's the new copy of my Windows PowerShell script on GitHub.)
To get started, let's expand on our part 2 discussion of algorithm choice.
Choosing your algorithm
While it's important to understand the type of learning problem you're dealing with, it's not quite as vital to know the specific algorithm to use. In the case of the Titanic example used in this series, I know I have a classification problem, so I started using the “Multiclass Decision Forest” algorithm to train my model, but I could have used another, and then tried to determine which was more accurate. Earlier in the series (and in the second video accompanying this post), I added a second algorithm to my experiment with mixed results. In actuality, this is part of the science in data science (read: experiment over and over).
One thing to know and remember is that more data can beat a better algorithm. In other words, having more data (and clean data) can help you train your model, that will be more important than the algorithm. Recently, it's been demonstrated that most of these algorithms all seem to converge to very similar models when the quality and quantity of training data grows.
MAML provides many features, but there's the possibility of extending it using other popular languages. Let's talk about the R language next.
R programming support
I've been surprised at the popularity of the R programming language. MAML continues to support this rise in popularity, not just among academics, as R was provided as the first extensible interface. As a really simple example using R, let's create an experiment that basically lists all of the installed R packages that we can in MAML. In the image below, you'll see a block that can run any R script passed to it. Look to the right side and you'll notice the basic script that outputs this information:
Once I run this, I can click on the left side output port, and visualize the resulting output. Currently, there are 410 installed packages! If you have any existing R scripts or even libraries, you can easily upload them into MAML -- by doing this, you'll create a Web service that is publicly available.
At the time of this writing, Python support is available, and there's been talk of plans for some kind of SDK that can possibly be used to extend the service to other languages in the future. By now, you may be getting anxious to try some of these things on your own. If so, you'll be happy to know that you can now can get started for free. In November, as the service exited from preview, Microsoft announced a free tier for MAML. The free tier limits you to 10GB of input data and restricts the Web service to staging-only. The only thing required to use the free tier is a free Microsoft account. Simply head here to get started.
Looking ahead
Can MAML help you predict the winning lottery numbers? Unfortunately not, but it is a service with a lot of promise, thanks to Microsoft listening to its users. For example, during the preview a user was unable to retrain a model programmatically, but that feature was added when the service came out of preview (even if it's implementation did seem a bit complicated).
Bottom line? As a Version 1 product from Microsoft, this service shows some serious promise. I'm excited about it and I think it's a building block to Internet of Things solutions in the future.