Everything you need to know about machine learning: part 3

- select the contributor at the end of the page -

This is the final installment of a three-part series. Part 1 covers the basics of machine learning, while part 2 provides a more in-depth look into Microsoft Azure Machine Learning and how to access it via Web services. Now, in part 3, we'll put all that knowledge to work with some examples.

To complete our three-part series on machine learning, you'll watch three videos that offer a quick overview of what we covered in part 2. You'll also learn the importance of choosing the right algorithm, and how the R programming language can be used to extend the service's features. The information here can be consumed however you like -- feel free to dive right in to the videos (which mainly apply to our discussion in part 2) or save them for later. Here are they are:


(Please note that Microsoft recently updated the service, so I've rewritten my script to query the Web service. Here's the new copy of my Windows PowerShell script on GitHub.)

To get started, let's expand on our part 2 discussion of algorithm choice.


Choosing your algorithm

While it's important to understand the type of learning problem you're dealing with, it's not quite as vital to know the specific algorithm to use. In the case of the Titanic example used in this series, I know I have a classification problem, so I started using the “Multiclass Decision Forest” algorithm to train my model, but I could have used another, and then tried to determine which was more accurate. Earlier in the series (and in the second video accompanying this post), I added a second algorithm to my experiment with mixed results. In actuality, this is part of the science in data science (read: experiment over and over).

One thing to know and remember is that more data can beat a better algorithm. In other words, having more data (and clean data) can help you train your model, that will be more important than the algorithm. Recently, it's been demonstrated that most of these algorithms all seem to converge to very similar models when the quality and quantity of training data grows.

MAML provides many features, but there's the possibility of extending it using other popular languages. Let's talk about the R language next.


R programming support

I've been surprised at the popularity of the R programming language. MAML continues to support this rise in popularity, not just among academics, as R was provided as the first extensible interface. As a really simple example using R, let's create an experiment that basically lists all of the installed R packages that we can in MAML. In the image below, you'll see a block that can run any R script passed to it. Look to the right side and you'll notice the basic script that outputs this information:


Once I run this, I can click on the left side output port, and visualize the resulting output. Currently, there are 410 installed packages! If you have any existing R scripts or even libraries, you can easily upload them into MAML -- by doing this, you'll create a Web service that is publicly available.

At the time of this writing, Python support is available, and there's been talk of plans for some kind of SDK that can possibly be used to extend the service to other languages in the future. By now, you may be getting anxious to try some of these things on your own. If so, you'll be happy to know that you can now can get started for free. In November, as the service exited from preview, Microsoft announced a free tier for MAML. The free tier limits you to 10GB of input data and restricts the Web service to staging-only. The only thing required to use the free tier is a free Microsoft account. Simply head here to get started.


Looking ahead

Can MAML help you predict the winning lottery numbers? Unfortunately not, but it is a service with a lot of promise, thanks to Microsoft listening to its users. For example, during the preview a user was unable to retrain a model programmatically, but that feature was added when the service came out of preview (even if it's implementation did seem a bit complicated).

Bottom line? As a Version 1 product from Microsoft, this service shows some serious promise. I'm excited about it and I think it's a building block to Internet of Things solutions in the future.

Get our content first. In your inbox.

Loading form...

If this message remains, it may be due to cookies being disabled or to an ad blocker.


Marco Shaw

Marco Shaw is an IT consultant working in Canada. He has been working in the IT industry for over 12 years. He was awarded the Microsoft MVP award for his contributions to the Windows PowerShell community for 5 consecutive years (2007-2011). He has co-authored a book on Windows PowerShell, contributed to Microsoft Press and Microsoft TechNet magazine, and also contributed chapters for other books such as Microsoft System Center Operations Manager and Microsoft SQL Server. He has spoken at Microsoft TechDays in Canada and at TechMentor in the United States. He currently holds the GIAC GSEC and RHCE certifications, and is actively working on others.