Feature Engineering

Paths

Feature Engineering

Author: Janani Ravi

Feature engineering is the process of using domain knowledge and insight into data to define features that enable machine learning algorithms to work successfully. Feature... Read more

What you will learn:

  • Qualities of effective features and how to assess them
  • Numeric techniques (quantization binning, binarization, transforms, scaling, normalization)
  • Text techniques (bag-of-x, filtering, n-grams, phrase detection)
  • Categorical data techniques (one-hot encoding, hashing, bin counting, etc)
  • Dimensionality reduction (PCA)
  • Nonlinear featurization (K-means clustering model stacking)
  • Image processing techniques (feature extraction)

Pre-requisites

  • Data Literacy
  • Data Analytics Literacy
  • Statistics
  • Machine Learning Literacy

Beginner

Learn how feature engineering fits into the machine learning workflow, and build your first features from numerical data.

Preparing Data for Feature Engineering and Machine Learning

by Janani Ravi

Oct 29, 2019 / 3h 18m

3h 18m

Start Course
Description

However well designed and well implemented a machine learning model is, if the data fed in is poorly engineered, the model’s predictions will be disappointing. In this course, Preparing Data for Feature Engineering and Machine Learning, you will gain the ability to appropriately pre-process your data -- in effect engineer it -- so that you can get the best out of your ML models. First, you will learn how feature selection techniques can be used to find predictors that contain the most information. Feature selection can be broadly grouped into three categories known as filter, wrapper, and embedded techniques and we will understand and implement all of these. Next, you will discover how feature extraction differs from feature selection, in that data is substantially re-expressed, sometimes in forms that are hard to interpret. You will then understand techniques for feature extraction from image and text data. Finally, you will round out your knowledge by understanding how to leverage powerful Python libraries for working with images, text, dates, and geo-spatial data. When you’re finished with this course, you will have the skills and knowledge to identify the correct feature engineering techniques, and the appropriate solutions for your use-case.

Table of contents
  1. Course Overview
  2. Understanding the Role of Features in Machine Learning
  3. Preparing Data for Machine Learning
  4. Understanding and Implementing Feature Selection
  5. Exploring Feature Extraction Techniques
  6. Implementing Feature Extraction

Building Features from Numeric Data

by Janani Ravi

Apr 8, 2019 / 2h 25m

2h 25m

Start Course
Description

The quality of preprocessing that numeric data is subjected to is an important determinant of the results of machine learning models built using that data. With smart, optimized data pre-processing, you can significantly speed up model training and validation, saving both time and money, as well as greatly improve model performance in prediction. In this course, Building Features from Numeric Data, you will gain the ability to design and implement effective, mathematically sound data pre-processing pipelines. First, you will learn the importance of normalization, standardization and scaling, and understand the intuition and mechanics of tweaking the central tendency as well as dispersion of a data feature. Next, you will discover how to identify and deal with outliers and possibly anomalous data. You will then learn important techniques for scaling and normalization. Such techniques, notably normalization using the L1-norm, L2-norm and Max norm, seek to transform feature vectors to have uniform magnitude. Such techniques find wide usage in ML model building - for instance in computing the cosine similarity of document vectors, and in transforming images before techniques such as convolutional neural networks are applied to them. You will then move from normalization and standardization to scaling and transforming data. Such transformations include quantization as well as the construction of custom transformers for bespoke use cases. Finally, you will explore how to implement log and power transformations. You will round out the course by comparing the results of three important transformations - the Yeo-Johnson transform, the Box-Cox transform and the quantile transformation - in converting data with non-normal characteristics, such as chi-squared or lognormal data into the familiar bell curve shape that many models work best with. When you’re finished with this course, you will have the skills and knowledge of data preprocessing and transformation needed to get the best out of your machine learning models.

Table of contents
  1. Course Overview
  2. Using Numeric Data in Machine Learning Algorithms
  3. Building Features Using Normalization
  4. Building Features Using Scaling and Transformations

Intermediate

Transform nominal data, such as names or categories, into features appropriate for machine learning, and apply techniques for simplifying large data sets.

Building Features from Nominal Data

by Janani Ravi

Aug 12, 2019 / 2h 40m

2h 40m

Start Course
Description

The quality of preprocessing the numeric data is subjected to the important determinant of the results of machine learning models built using that data. In this course, Building Features from Nominal Data, you will gain the ability to encode categorical data in ways that increase the statistical power of models. First, you will learn the different types of continuous and categorical data, and the differences between ratio and interval scale data, and between nominal and ordinal data. Next, you will discover how to encode categorical data using one-hot and label encoding, and how to avoid the dummy variable trap in linear regression. Finally, you will explore how to implement different forms of contrast coding - such as simple, Helmert, and orthogonal polynomial coding, so that regression results closely mirror the hypotheses that you wish to test. When you’re finished with this course, you will have the skills and knowledge of encoding categorical data needed to increase the statistical power of linear regression that includes such data.

Table of contents
  1. Course Overview
  2. Implementing Approaches to Working with Categorical Data
  3. Understanding and Implementing Dummy Coding
  4. Understanding and Implementing Contrast Coding
  5. Implementing Bin Counting and Feature Hashing

Reducing Complexity in Data

by Janani Ravi

Apr 11, 2019 / 3h 20m

3h 20m

Start Course
Description

Machine learning techniques have grown significantly more powerful in recent years, but excessive complexity in data is still a major problem. There are several reasons for this - distinguishing signal from noise gets harder with more complex data, and the risks of overfitting go up as well. Finally, as cloud-based machine learning becomes more and more popular, reducing complexity in data is crucial in making training more affordable. Cloud-based ML solutions can be very expensive indeed. In this course, Reducing Complexity in Data you will learn how to make the data fed into machine learning models more tractable and more manageable, without resorting to any hacks or shortcuts, and without compromising on quality or correctness. First, you will learn the importance of parsimony in data, and understand the pitfalls of working with data of excessively high-dimensionality, often referred to as the curse of dimensionality. Next, you will discover how and when to resort to feature selection, employing statistically sound techniques to find a subset of the features input based on their information content and link to the output. Finally, you will explore how to use two advanced techniques - clustering, and autoencoding. Both of these are applications of unsupervised learning used to simplify data as a precursor to a supervised learning algorithm. Each of them often relies on a sophisticated implementation such as deep learning using neural networks. When you’re finished with this course, you will have the skills and knowledge of conceptually sound complexity reduction needed to reduce the complexity of data used in supervised machine learning applications.

Table of contents
  1. Course Overview
  2. Understanding the Need for Dimensionality Reduction
  3. Using Statistical Techniques for Feature Selection
  4. Reducing Complexity in Linear Data
  5. Reducing Complexity in Nonlinear Data
  6. Dimensionality Reduction Using Clustering and Autoencoding Techniques

Advanced

Extract features from text documents and images.

Building Features from Text Data

by Janani Ravi

Jun 28, 2019 / 2h 36m

2h 36m

Start Course
Description

From chatbots to machine-generated literature, some of the hottest applications of ML and AI these days are for data in textual form. In this course, Building Features from Text Data, you will gain the ability to structure textual data in a manner ideal for use in ML models. First, you will learn how to represent documents as feature vectors using one-hot encoding, frequency-based, and prediction-based techniques. You will see how to improve these representations based on the meaning, or semantics, of the document. Next, you will discover how to leverage various language modeling features such as stopword removal, frequency filtering, stemming and lemmatization, and parts-of-speech tagging. Finally, you will see how locality-sensitive hashing can be used to reduce the dimensionality of documents while still keeping similar documents close together. You will round out the course by implementing a classification model on text documents using many of these modeling abstractions. When you’re finished with this course, you will have the skills and knowledge to use documents and textual data in conceptually and practically sound ways and represent such data for use in machine learning models.

Table of contents
  1. Course Overview
  2. Representing Text as Features for Machine Learning
  3. Building Feature Vector Representations of Text
  4. Simplifying Text Processing Using Natural Language Processing
  5. Reducing Dimensions in Text Using Hashing
  6. Applying Text Feature Extraction Techniques to Machine Learning

Building Features from Image Data

by Janani Ravi

Aug 13, 2019 / 2h 10m

2h 10m

Start Course
Description

From machine-generated art to visualizations of black holes, some of the hottest applications of ML and AI these days are to data in image form. In this course, Building Features from Image Data, you will gain the ability to structure image data in a manner ideal for use in ML models. First, you will learn how to pre-process images using operations such as making the aspect ratio uniform, normalizing pixel magnitudes, and cropping images to be square in shape. Next, you will discover how to implement denoising techniques such as ZCA whitening and batch normalization to remove variations. Finally, you will explore how to identify points and blobs of interest and calculate image descriptors using algorithms such as Histogram of Oriented Gradients and Scale Invariant Feature Transform. You will round out the course by implementing dimensionality reduction using dictionary learning, feature extraction using convolutional kernels, and latent factor identification using autoencoders. When you’re finished with this course, you will have the skills and knowledge to move on to pre-process images in conceptually and practically sound ways to extract features from such data for use in machine learning models.

Table of contents
  1. Course Overview
  2. Representing Images as Features for Machine Learning
  3. Detecting Features and Text in Images
  4. Simplifying Image Processing Using Dimensionality Reduction
Offer Code *
Email * First name * Last name *
Company
Title
Phone
Country *

* Required field

Opt in for the latest promotions and events. You may unsubscribe at any time. Privacy Policy

By providing my phone number to Pluralsight and toggling this feature on, I agree and acknowledge that Pluralsight may use that number to contact me for marketing purposes, including using autodialed or pre-recorded calls and text messages. I understand that consent is not required as a condition of purchase from Pluralsight.

By activating this benefit, you agree to abide by Pluralsight's terms of use and privacy policy.

I agree, activate benefit