Building Features from Text Data in Microsoft Azure

by Michael Heydt

This course covers aspects of building text features for machine learning using Azure Machine Learning Service virtual machines, including tokenization, stopword removal, feature vectorization, and more from natural language processing.

What you'll learn

Using text data to make decisions is key in creating text features for machine learning models. In this course, Building Features from Text Data in Microsoft Azure, you'll obtain the ability to structure your data several ways that are usable in machine learning models using Microsoft Azure Machine Learning Service virtual machines. First, you’ll discover how to use natural language processing to prepare text data, and how to leverage several natural language processing technologies, such as document tokenization, stopword removal, frequency filtering, stemming and lemmatization, parts-of-speech tagging, and n-gram identification. Then, you’ll explore documents as text features, where you'll learn to represent documents as feature vectors by using techniques including one-hot and count vector encodings, frequency based encodings, word embeddings, hashing, and locality-sensitive hashing. Finally, you'll delve into using BERT to generate word embeddings. By the end of this course, you'll have the skills and knowledge to use textual data and Microsoft Azure in conceptually sound ways to create text features for machine learning models.

About the author

Mike is a seasoned software developer, IT guy, cloud architect, IoT fanatic, and overall gadget hound. He is currently a freelance developer, DevOps engineer, author, trainer, and speaker.

Ready to upskill? Get started