Article

7 essential skills for Machine Learning and AI developers on AWS

By Jorge Vasquez

Machine learning and artificial intelligence (ML/AI) are two advanced technologies with the power to transform the way businesses operate and humans interact. ML/AI are already impacting industries like IT, FinTech, healthcare, education and transportation—and it won’t stop there. According to Forrester Research, 2020 will be a big year for AI: companies will become laser-focused on AI value, get out of the experimentation phase and really focus on accelerating its adoption. This means that software engineers prepared to occupy ML/AI development roles will soon be in higher demand than ever before.

These are the seven skills you need to take advantage of the growing opportunity to build great ML/AI solutions:

1. Programming languages

To become an expert in machine learning it’s important to grow your experience with programming languages. According to GitHub, these are the top 10 machine learning languages to take a look at

    • Python

    • C++

    • JavaScript

    • Java

    • C#

    • Julia

    • Shell

    • R

    • TypeScript

    • Scala

While Python is the most common language among machine learning repositories on GitHub, Scala is becoming increasingly common, especially when it comes to interacting with big data frameworks such as Apache Spark.

2. Data engineering

The first step in machine learning development is pre-processing and storing raw data generated by your systems. For example, let’s imagine an online store that sells a variety of products to customers around the world. This online store will create lots of data related to particular events. When a customer clicks on a product description or purchases a product, new data is generated, and you’ll need to create Extract, Transform and Load (ETL) pipelines that process, clean up and store data so it’s easily accessible for other processes, such as analytics and predictions.

For storing data, you could use an object storage such as AWS S3 or a data warehouse such as AWS Redshift.

3. Exploratory data analysis

Being able to perform exploratory data analysis on a dataset is an especially important skill because it allows you to discover interesting patterns in data, identify certain anomalies and test hypotheses. You should be able to:

• Create summary statistics for a dataset, for example:

• Number of rows and columns

• Column data types

• Columns that are nullable or not

• Column mean, standard deviation, minimum and maximum values, percentile, etc

 

• Create graphical representations that allow for easy data visualization, for example:

• Histograms: for visualizing the distribution of a dataset

• Box plots: for a standardized way of displaying the distribution of quantitative data based on a five-number summary: minimum, first quartile, median, third quartile and maximum (You can also identify outliers and see if the distribution of data is symmetrical or skewed.)

• Heatmaps: for identifying correlations between variables of a dataset

 

• Sanitize and prepare data for modelling, for example: 

• Remove outliers from your dataset 

• Remove correlated variables

 

• Perform feature engineering to extract more information from your dataset, so you can improve the machine learning model(s) you will build

4. Models

If you want to be a pro in machine learning, you need to be proficient in machine learning algorithms. But that’s not enough; you also need to know when to apply them.

For example, if you have a dataset with a series of inputs with their corresponding outputs and you want to find a model that describes the relationship between them, you should use supervised learning algorithms, which can be further grouped into regression (when the output variable is a real value, such as “weight” or “age”) and classification algorithms (when the output variable is a category, such as “yes/no”).

If you only have a set of inputs with no outputs, and you want to identify different patterns in the inputs and cluster them according to similarities, you’ll want to use unsupervised learning algorithms.

Also, it’s important to mention that for doing more complex tasks such as image classification, object detection, face recognition, machine translation, dialogue generation, etc., you will need more complex algorithms that fall in the deep learning category, which is based on artificial neural networks.

5. Services

Once you have defined the most appropriate machine learning model for solving a given problem, you’ll then need to decide whether to implement the model from scratch or use existing services. For example:

• If you need to integrate nice conversational interfaces (chatbots) into any application using voice and text, AWS Lex provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions.

• To uncover the insights and relationships in unstructured text data, AWS Comprehend helps you with:

• Identifying the language of the text

• Extracting key phrases, places, people, brands or events

• Conducting sentiment analysis, to determine how positive or negative a text is

• Automatically organizing a collection of text files by topic

• If you want to integrate a neural machine translation service that delivers fast, high-quality, and affordable language translation into your app, take a look at AWS Translate.

• If you want to add image and video analysis to your applications, AWS Rekognition can be an interesting option, because it provides an API for identifying objects, people, faces, text, scenes and activities, as well as for detecting any inappropriate content.

Now, if you need to build your own machine learning models, and want a fully managed platform that allows you to quickly and easily build, train and deploy them into a production-ready hosted environment, AWS SageMaker is a great choice.

6. Deploying

For deploying machine learning solutions to AWS, you need to take into consideration key parameters, such as performance, availability, scalability, resiliency and fault tolerance. To that end, AWS provides solutions and best practices that will help you in the process. For example, you can enable monitoring on your solutions so you can check performance and scale your services up or down accordingly. You can even enable autoscaling so AWS takes care of that for you. You can also deploy your solutions to multiple availability zones to ensure maximum availability.

7. Security

As in every software solution, managing security for AWS machine learning solutions is a crucial task, especially because machine learning models need a lot of data to be trained, and access to that data should be provided to authorized people and applications only. The good news is that AWS has a specific service for this: The Identity and Access Management (IAM) service.

Rock AWS Machine Learning

The role of a ML/AI Developer is becoming a strategic necessity for most organizations. As you can see, there are several skills that you need to make the biggest impact possible in this role. Dedicate yourself to the essentials and you can make it. 

See how your AWS ML/AI developer skills measure up. Get your Role IQ

 

About the author

Jorge is a passionate person who loves building quality software that allows people to solve their problems. He also loves to teach, that's why he works several years ago teaching programming and software development to university students. He has experience developing highly performant backend systems with Java and Node.js, building ETL processes with Python and Scala and working with cloud platforms such as Amazon Web Services. His current areas of interest include Deep Neural Networks, Computer Vision, Natural Language Processing and Reinforcement Learning. When not developing software or teaching, he enjoys photography and spending time with his family.