Pandas Playbook: Manipulating Data

Pandas is one of the most popular software packages for data analysis. This course focuses on the core functionalities of Pandas for data wrangling, teaching you how to tackle everyday tasks for a data analyst, or data scientist.
Course info
Rating
(25)
Level
Intermediate
Updated
May 24, 2018
Duration
2h 15m
Table of contents
Description
Course info
Rating
(25)
Level
Intermediate
Updated
May 24, 2018
Duration
2h 15m
Description

Pandas is not just one of the most popular software packages for data analysis, it is also, without a doubt, the most convenient and fun way to work with your data. In this course, Pandas Playbook: Manipulating Data, you will cover the most important core functionalities of Pandas, focusing on the core functionalities of the two main Pandas classes: the DataFrame and the Series. First, you will take a look at a new dataset and try to get a feeling for it - how many rows and columns are there? What datatypes does it consist of? You will do some basic statistical exploration as well. Then, you'll focus on getting information out of your dataset. Basically, it's about asking the right questions and drilling down into your dataset. Finally, you will learn how to clean and transform your data. Here, you will see how to run Python functions against our data, including functions we write ourselves by using a very cool and powerful feature called groupby() - changing the structure of our columns and rows, and combining multiple dataframes into one. After watching this course, you will be ready for just about any data wrangling job that you might come across.

About the author
About the author

After years of working in software development, Reindert-Jan Ekker has decided to pursue another passion of his: education. He currently works as a college professor of Computer Science in the Netherlands, teaching many subjects like web development, algorithms and data structures and Scrum.

More from the author
Python Best Practices for Code Quality
Intermediate
1h 15m
May 17, 2019
Pandas Playbook: Visualization
Intermediate
2h 11m
Oct 9, 2018
More courses by Reindert-Jan Ekker
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi everyone, my name is Reindert-Jan Ekker, and welcome to this playbook about manipulating data with Pandas. I'm a senior developer and freelance educator, and in this course I'll teach you about Pandas, the most popular Python framework for doing data science and analysis. The role of Pandas in data analysis has been growing hard over the past few years, and you simply cannot go without Pandas anymore. This course goes over the core tasks that you will need to perform when working with any real-world dataset, and we'll do so in a very hands-on way. Some of the major topics that we will cover include exploring a new dataset; selecting, sorting, and filtering your data so that you can drill down into your dataset to answer specific questions; cleaning a dataset, which means doing things like fixing bad or missing data points and removing outliers; and transforming your data either by doing calculations on it, or by changing the structure of your dataset. By the end of this course, you'll have a good understanding of the core functionality of a Pandas DataFrame, and you'll be able to handle most everyday tasks. Before beginning the course, you should be familiar with the very basics of Python and data science. I hope you'll join me on this journey to learn how to manipulate your data with Pandas with the Pandas Playbook: Manipulating Data at Pluralsight.

Exploring Data
Hi, in this module we'll look at the first steps for a data scientist when working with new datasets, exploring it to get a feeling for the kind of data we have, the statistical properties it has, and more. Whenever you start to work with a new dataset, the first thing you normally want to do is to explore what the data looks like. How many items are there in the set, and what kind of measurements are there? To get a feeling for the dataset, you usually also want to see a quick overview of basic statistics like the mean, max, and minimum values, etc. , etc. , and maybe even make a few simple plots while you're at it. So this is exactly what I'll do in this module. I'll show you how to get up and running with a new dataset.

Selecting, Filtering, and Sorting Data
Hi, in this module I'll introduce you to several ways to select, retrieve, and sort data with Pandas. After watching this module, you'll be able to retrieve exactly the subset of data you need from any Pandas DataFrame. Usually there are several different ways to accomplish this, and I will compare and contrast those for you. Most of this module is about indexing, which is the primary way to get to your data. There's a lot to say about indexing, because Pandas offers many powerful options there. I'll start by covering the basics of the indexing operator and what happens when you pass a single value to it to select a single row or column. Next we'll see what happens when you index a DataFrame with a more complex data structure like a list or slice, and we'll see two DataFrame attributes called loc and iloc, which allow us to index in even more powerful ways. Then we'll move on to Boolean filtering, selecting data using a comparison like all rows that have a value higher than x, for example. We'll see that you can assign values to the part of a DataFrame that you select when you use an index, and finally, I have a short word about sorting. Let's start with a demo about indexing. I'll start with the basics, selecting single rows and columns, and we'll gradually move towards more complex use cases like indexing with slices, lists, and indexing with loc and iloc.

Cleaning Data
Hi, in this module we're going to take a look at the common ways to clean up your dataset. When you start working with a new dataset, usually there are some steps you want to take to tidy up. In this module, I will go over some of the more common scenarios and how to tackle them. To start, we will see how to investigate missing data, also called null values, and we'll see how to detect these values and take a closer look at them, how to remove them, and how to replace them with other values either by replacing all missing values with a filler, or by interpolating them from surrounding data. You might also have some other unwanted data in your dataset like outliers, or data that occurs more than once, and of course we'll see how to handle these as well. And we'll see how to fix values that don't have the correct data type, and finally, we'll take a look at tidying up the index of your DataFrame.

Transforming Data
Hi, welcome to the last module of this course, in which we'll see how to transform our data into something different. So you have a dataset, you've cleaned it up, let's look at some things we can do with our data. First of all, let's do some calculations. We'll see how to apply the basic math operators and how to apply mathematical functions. We'll also see how to define our own functions and apply those to our data. After that, we'll look at groupby, which is a very powerful feature in Pandas that allows us to split up our data into groups, apply functions to those groups, and then aggregate the results into a new DataFrame. Now this might sound a little bit abstract right now, but trust me, this is a really powerful and cool feature in Pandas. Then we'll move on to Pandas operations that don't change the values of your data, but the structure of your data. A lot of the time when you receive a dataset it doesn't have the structure you want. In this part of the module, we'll see several ways to restructure your data either from rows to columns, or vice-versa. Finally, we'll see several techniques to combine multiple datasets.