Course

Skills

Building Features from Text Data

This course covers aspects of extracting information from text documents and constructing classification models including feature vectorization, locality-sensitive hashing, stopword removal, lemmatization, and more from natural language processing.

Preview this course

What you'll learn

From chatbots to machine-generated literature, some of the hottest applications of ML and AI these days are for data in textual form.

In this course, Building Features from Text Data, you will gain the ability to structure textual data in a manner ideal for use in ML models.

First, you will learn how to represent documents as feature vectors using one-hot encoding, frequency-based, and prediction-based techniques. You will see how to improve these representations based on the meaning, or semantics, of the document.

Next, you will discover how to leverage various language modeling features such as stopword removal, frequency filtering, stemming and lemmatization, and parts-of-speech tagging.

Finally, you will see how locality-sensitive hashing can be used to reduce the dimensionality of documents while still keeping similar documents close together.

You will round out the course by implementing a classification model on text documents using many of these modeling abstractions.

When you’re finished with this course, you will have the skills and knowledge to use documents and textual data in conceptually and practically sound ways and represent such data for use in machine learning models.

Course Overview

1min

Course Overview 2m

Representing Text as Features for Machine Learning

37mins

Building Feature Vector Representations of Text

27mins

Module Overview 1m
Bag-of-words and Bag-of-n-grams 3m
Bag-of-words Using the Count Vectorizer 7m
Inverse Transform Using the Count Vectorizer 2m
Bag-of-n-grams Using the Count Vectorizer 6m
Generating N-grams Using NLTK 3m
Bag-of-words Using the Tf-Idf Vectorizer 4m
Module Summary 1m

Simplifying Text Processing Using Natural Language Processing

33mins

Module Overview 1m
Natural Language Processing Operations 6m
Stopword Removal Using NLTK and scikit-learn 7m
Frequency Filtering Using scikit-learn 3m
Stemming 6m
Lemmatization 4m
Parts-of-speech Tagging 6m
Module Summary 1m

Reducing Dimensions in Text Using Hashing

27mins

Module Overview 1m
Feature Hashing 2m
Reducing Dimensions Using the Feature Hasher 4m
Reducing Dimensions at Scale Using the Hashing Vectorizer 6m
Locality-sensitive Hashing 5m
Similar Documents Using Jaccard Index and Locality-sensitive Hashing 7m
Module Summary 1m

Applying Text Feature Extraction Techniques to Machine Learning

27mins

Module Overview 1m
Naive Bayes for Classification 3m
Classification Using the Hashing Vectorizer 8m
Pre-process Text Using a Stemmer, Build Features Using the Hashing Vectorizer 3m
Building Features Using the Count Vectorizer 2m
Pre-processing with Stopword Removal, Building Features Using Count Vectorizer 2m
Pre-processing with Stopword Removal, Frequency Filtering, Building Features Using Count Vectorizer 3m
Building Features Using the Tf-Idf Vectorizer 2m
Building Features Using Bag-of-n-grams Model 2m
Summary and Further Study 2m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Ready to upskill? Get started

Contact Sales

Building Features from Text Data

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Building Features from Text Data

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?