Cleaning Data: Python Data Playbook

Cleaning the dataset is an essential part of any data project, but it can be challenging. This course will teach you the basics of cleaning datasets with pandas, and will teach you techniques that you can apply immediately in real world projects.
Course info
Rating
(38)
Level
Beginner
Updated
Dec 10, 2018
Duration
1h 8m
Table of contents
Description
Course info
Rating
(38)
Level
Beginner
Updated
Dec 10, 2018
Duration
1h 8m
Description

At the core of any successful project that involves a real world dataset is a thorough knowledge of how to clean that dataset from missing, bad, or inaccurate data. In this course, Cleaning Data: Python Data Playbook, you'll learn how to use pandas to clean a real world dataset. First, you'll learn how to understand, view, and explore the data you have. Next, you'll explore how to access just the data that you want to keep in your dataset. Finally, you'll discover different ways to handle bad and missing data. When you're finished with this course, you'll have a foundational knowledge of cleaning real world datasets with pandas that will help you as you move forward to working on real world data science or machine learning problems.

About the author
About the author

Chris is a software consultant focused on web, mobile, and machine learning. He uses React, React Native, Node.js, Ruby on Rails, and Python.

More from the author
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi everyone. My name is Chris Achard, and welcome to my course, Cleaning Data: Python Data Playbook. I'm an independent software consultant with 10 years of industry experience in web and mobile apps and machine learning. In any project that I've done that deals with a lot of data, cleaning data always seems to take an enormous amount of time, and I've always been looking for ways to speed up that process. With pandas and Python, I've found a toolset that makes it fast and efficient to sniff out the bad data and will give you any confidence that you can clean up any bad data in your dataset. It's an often overlook topic, but I'm excited to teach you about how to clean data because of how much it can help in your projects. In this course, we are going to look at a real-world dataset with 69, 000 rows. Some of the major topics that this course will cover include selecting and renaming columns and rows, filtering the dataset, transforming, or manipulating your data, and identifying and dealing with bad data. By the end of this course, you will have the toolset that you need to tackle and clean up any structured dataset and prepare it for the next step in your project, whether that's a machine learning model, sending it off to a database, or connecting to your API. Before beginning this course, you should have a basic understanding of Python, but you don't need to know anything specifically about pandas before we get started. I hope that you'll join me on this journey about how to clean up data with pandas, at Pluralsight.