Featured resource
Forrester Wave Report 2025
Pluralsight named a Leader in the Forrester Wave™

Our tech skill development platform earned the highest scores possible across 11 criteria.

Learn more
  • Course
    • Libraries: If you want this course, consider one of these libraries.
    • AI

Data Engineering for Machine Learning

Expand your software engineering expertise by mastering essential data engineering skills for machine learning. Learn how to gather, clean, validate, and preprocess data effectively, transforming it into ML-ready datasets.

Brian Letort - Pluralsight course - Data Engineering for Machine Learning
by Brian Letort

What you'll learn

You'll build scalable data ingestion pipelines, implement feature engineering techniques, and explore automation strategies, while also addressing ethical considerations that impact model performance and reliability. In this course, Data Engineering for Machine Learning, you’ll gain hands-on expertise in preparing, validating, and transforming raw data into high-quality datasets ready for machine learning models. First, you'll start by understanding core data engineering concepts, exploring methods to gather and ingest data efficiently from diverse sources such as APIs, databases, CSV, and JSON files. Through practical Python demonstrations using VS Code and libraries like Pandas, you'll build scalable data ingestion pipelines capable of managing both batch and real-time data streams. Then, you'll master essential techniques for data cleaning, preprocessing, and validation to ensure accuracy and quality, significantly impacting downstream ML model performance. Finally, you’ll learn best practices for automating pipelines, handling growing data volumes, and integrating feature engineering processes—all while ensuring responsible and compliant data handling through built-in ethical considerations like bias prevention and data privacy. By the course's conclusion, you'll have the hands-on skills and practical knowledge necessary to confidently engineer robust, scalable, and ethically sound data pipelines, effectively preparing data for machine learning projects and setting a foundation for advanced MLOps practices.

Table of contents

About the author

Brian Letort - Pluralsight course - Data Engineering for Machine Learning
Brian Letort

Dr. Daniel “Brian” Letort is a 22+ year veteran of Information Technology. During a 21-year tenure at Northrop Grumman, Brian held various roles across software engineering, systems engineering, Chief Applications Architect, Chief Data Scientist, and Chief Enterprise Architect. Brian held the NG Fellow title for six years and Technical Fellow title for four years prior. In 2022, Brian joined Digital Realty as the Chief Architect - Product and Artificial Intelligence. Aside from working at Digital Realty, Brian has 12+ years experience in teaching Data Science and Computer Science classes as an adjunct professor. Brian has authored two books and holds two patents.

More Courses by Brian