Artificial Intelligence and Machine Learning: AWS vs Azure vs GCP
This post compares the Artificial Intelligence and Machine Learning service offerings of AWS, Azure, and GCP.
Jun 08, 2023 • 12 Minute Read
- AI & Machine Learning
This post in our Cloud Provider Comparisons series jumps into a space that’s super dynamic for cloud providers – artificial intelligence and machine learning. Artificial Intelligence (AI) and Machine Learning (ML) combine massive data handling with virtually limitless computing power and a pay-only-for-what-you-need economic model. To see how the AI and ML services of AWS, Azure, and GCP stack up, keep reading.
Table of contents
- Kickstart your career development
- Are AI and ML the same thing?
- What are the ingredients for machine learning?
- Machine learning building block services
- Machine learning platforms
- Machine Learning Infrastructure
- More ACG ML training resources
Kickstart your career development
Get started with ACG and kickstart your Machine Learning career with courses and real hands-on labs in AWS, Microsoft Azure, Google Cloud, and beyond. (There's nothing artificial about the intelligence you'll develop!)
Are AI and ML the same thing?
Mainstream media often use Artificial intelligence and machine learning interchangeably. But they are not the same thing. AI is our pursuit of simulating human thought and decision-making in an automated fashion. As Arthur Samuel (who coined the term in 1959) explained, ML is “the field of study that gives computers the ability to learn without being explicitly programmed.” In other words, ML is one method we can use to try to achieve artificial intelligence.
What are the ingredients for machine learning?
So, what are the core ingredients needed to get an ML system going?
In a nutshell:
- Lots of data.
- A way to apply computation or algorithms to that data.
- Knowledge (to know what you’re doing).
Not too long ago, the capabilities to do machine learning was highly specialized and prohibitively expensive. Only governments and a few universities could afford it.
But cloud computing has managed to bring these tools within reach of anyone with an internet connection. Today, you can manage massive amounts of data and harness immense computing power using point-and-click tools that cloud providers have created. Best of all, you only pay for the specific parts you need. Cloud providers have also created some TurnKey services that let us make use of very powerful ML technology through a simple API call.
We are going to compare the AI and ML offerings of AWS, Azure, and GCP across three different areas: machine learning building block services, machine learning platforms, and machine learning infrastructure.
Machine learning building block services
Machine learning building blocks are the services you can use without having to know much about machine learning in the first place. Most people start with machine learning building blocks because the barrier to entry is so low.
These blocks are available either as an API call or using the SDK from the cloud provider. All the providers we’ll talk about below offer rest APIs for their machine learning services.
Let’s see what each brings to the table:
Speech to text and text to speech
For speech to text, AWS has a service called Amazon Transcribe. Azure and GCP both name their offerings (perhaps obviously) Speech to Text.
For converting text to audible speech, the AWS service name is Amazon Polly, while Azure and GCP have Text to Speech.
Like it or not, chatbots have become commonplace as the first line of customer support. Cloud providers are doing their part to help chatbots offer a better experience (or at least be a little less disappointing) by creating services to support and improve them.
Thankfully, translation services have come a long way since Babel Fish (now there’s a 90’s callback!). They are now a very standard offering. The names for the cloud providers’ translation services are pretty straightforward: AWS has Amazon Translate, Azure includes Translator, and GCP provides Translation.
Image and video analysis
These services can recognize objects and people in images, map faces, or detect potentially objectionable content.
Computers are pretty good at detecting when things are out of the ordinary, but you typically have to tell them what to watch for. Cloud providers use machine learning to create services that can watch a stream of events or data and figure out what’s different within the data set. This process is called anomaly detection.
Recommendation engines are becoming a popular addition to ecommerce sites. It's no wonder cloud providers have tried to do some heavy lifting here.
One thing to keep in mind: your recommendations will only be as good as the data you are able to feed into your system.
In fact, that goes for all the above services. If your source data is sketchy, the end results are likely to turn out quite disappointing!
Machine learning platforms
When we talk about machine learning platforms, we’re referring to the workbench and tools that ML practitioners use. It’s analogous to a developer using an IDE and some libraries to write their code.
For machine learning, Jupyter Notebook is the de-facto workbench for data scientists. Unsurprisingly, all three cloud providers offer Jupyter Notebooks or some slightly rebranded version as part of their platforms.
Another consistency across the board is support of major machine learning frameworks, including TensorFlow, MXNet, Keras, PyTorch, Chainer, SciKit Learn, and several more. Cloud providers integrate features like security, collaboration, and data management in their platforms.
Guided model development
For those of you just starting out on your ML journey, cloud providers have invested in some gentle introductions. For example, AWS’ “just getting started” service is called SageMaker Autopilot, Azure has Automated ML and a drag-and-drop tool called Azure Machine Learning designer, and GCP has a line of guided model creation tools that they call AutoML.
Full ML workbench
AWS also has Augmented AI (Amazon A2I), which is something we haven't seen yet on the other platforms (although it’s surely just a matter of time). Augmented AI is a way to enlist the power of real, living, breathing humans to help improve your machine learning service.
Here’s an extremely practical example:
Let's say you've determined that your machine learning model is about 95% accurate at identifying images of angry ferrets, but you need to have 100% accuracy. For those cases where the ML model’s confidence is low, you can direct the ferret picture in question over to a live human, who can then determine if the ferret is angry or not.
[Add Angry Ferret image from Scott’s ACG Projects episode?]
Machine Learning Infrastructure
All the cloud providers really like containers for their respective machine learning platforms, and for good reason. Containers are relatively lightweight, portable, and can be shuffled around without much hassle.
All three providers offer push-button container deployment for specific versions of the ML frameworks, optimized for training, validation, and inferences. If you’re more of a DIY person, all the providers have platform-optimized virtual machines for all the major frameworks as well. The latter is what most people use if they already have a model trained on-prem.
There’s a bit of a cloud provider arms race going on with machine learning. All three are leaning into optimized hardware, with each provider claiming superior performance and economics. All of the providers offer various levels of CPU and GPU virtual machine types. Additionally, some have also invested in specialized hardware in the form of application-specific integrated circuits (ASIC) and field-programmable gate arrays (FPGA).
- AWS offers Habana Gaudi ASIC instances and a custom processor they call AWS Trainium, optimized for model training. AWS also offers an ASIC called Inferentia for machine learning inferences.
- Azure has a line of FPGA-based virtual machines tuned specifically for machine learning workloads.
- GCP offers their custom Tensor Processing Unit (TPU), which is ASIC-optimized for the TensorFlow framework.
As always, there’s a tradeoff. These specialized Hardware platforms are really good at machine learning tasks, but economically speaking, they’re not very useful for anything else. CPU and GPU-based machines are much more flexible, and are generally what people use first as they develop and refine their ML models.
Machine learning explainability and bias
For all its inherent promise and opportunity, developing quality ML models is really hard. If you happen to get it wrong, the resulting ML-generated decisions can range anywhere from slightly embarrassing to downright immoral, both for ethical and sometimes regulatory reasons.
We need to be able to explain how our ML model makes its decisions. Practitioners call this explainability, and fortunately, cloud providers have tools to help out with this:
More ACG ML training resources
Machine learning is a rapidly evolving and iterating space and the cloud has accelerated that process even more. If you're just getting started on your ML journey, check out Intro to Machine Learning (by yours truly) on the ACG platform.
Then, after you have the basics down, choose your cloud and dive deeper. We have lots of courses and hands-on labs on the ML offerings in AWS, Azure, and GCP.