Importing Data: Python Data Playbook

In order to work with data in Python, you need to know how to get data into Python. This playbook defines data import recipes for common data import problems you’ll encounter using Python.
Course info
Level
Beginner
Updated
Nov 17, 2018
Duration
1h 35m
Table of contents
Description
Course info
Level
Beginner
Updated
Nov 17, 2018
Duration
1h 35m
Description

Python is one of the most powerful and widely used languages to work with data. In this course, Importing Data: Python Data Playbook, you will learn foundational knowledge and gain the ability to import data from multiple different file formats, including: text, tabular data, binary formats as well as from databases. First, you will learn how to import text and CSV files. Next, you will discover how to import data from JSON, XML, SAS, Stata, HDF5, Matlab, Pickle files, and more. Finally, you will explore how to import relational data from databases, including: SQLite, MySQL, and PostgreSQL. When you're finished with this course, you will have the skills and knowledge of importing data into Python needed to analyze, visualize, and in general work with data.

About the author
About the author

Xavier is very passionate about teaching, helping others understand search and Big Data. He is also an entrepreneur, project manager, technical author, trainer, and holds a few certifications with Cloudera, Microsoft, and the Scrum Alliance, along with being a Microsoft MVP.

More from the author
More courses by Xavier Morera
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
(Music playing) Hi everyone, my name is Xavier Morera, and welcome to my course Importing Data: Python Data Playbook. I am very passionate about working with data, and you know which is one of the most powerful and widely used languages to work with data? If you said Python, you are right. In this course, we're going to learn how to import data from multiple different formats with Python. We will learn which are the libraries used for this purpose, CSV, Pandas, NumPy, ElementTree, JSON, and more. Some of the formats we will work with include text, CSV, JSON, XML, Excel, SAS, HDF5, Stata, Pickle files, as well as other file formats. But we will also cover how to import data from different relational databases, including SQLite, MySQL, and PostgreSQL using Python's SQL toolkit, SQLAlchemy. By the end of this course, you will be able to quickly and easily import data stored in different formats in Python, but before beginning the course, you should be familiar with the basic mechanics of Python, including how to use the REPL and how to import libraries. This course is not a deep dive on Python. Instead, it complements your existing Python knowledge to help you get the data you need for your work, so if you need to import data, get your environment up and running, have your packages ready, and come join me on this journey to learn how to load data with the Importing Data Python Data Playbook at Pluralsight.

Importing CSV Data into Python Using csv and pandas
CSV, or comma-separated value files, have been historically one of the most common ways of exporting, sharing, and importing data; for example, for spreadsheets and databases in general. And for a long time, there was no official specification. Therefore, you could find differences between files that contained the same data. It is not as easy as splitting each line by a comma, as there may be fields that can contain one or more commas. In other cases, a different delimiter may be used, or you may have a different enclosing character, or even worse, none at all. And so in this module, we will go deeper into how we can import CSV data into Python using two popular and powerful libraries, Python's CSV, and pandas. Let's begin.

Import Data into Python from JSON and XML Files
Importing data in Python from JSON and XML files. In the previous modules, we covered how to import data into Python from some common file formats, like text, files generated with NumPy, comma-separated value files using the CSV library or with pandas, with which we could also load tab-separated value files, just to name a few. And they worked quite well, but let me ask you something. What do you see here? If you said data, you are right! One thing we did not see, though, is any information about this data, namely, metadata, and that's where some common data interchange formats have an edge, as they allow for storing and sharing data, plus the associated metadata, like this, two examples. This conveys more information on the data that we are importing, which makes it much easier as we not only have a defined structure, but we may also have information on the types. And now in this module, I will show you how to import data into Python from two of the most commonly used data interchange formats, JSON and XML, including which are some of the most commonly used and available parameters when loading the data. So let's begin!

Import Data into Python from Excel Files
Importing Data into Python from Excel Files. In previous modules, we learned how to load data from several useful file formats, like CSV, or TSV, as well as some commonly used data interchange formats like JSON and XML. And now we're going to learn how to load data from another file format that needs no introduction, as according to Microsoft, there are more than a billion people worldwide using it, Excel. Why so many? Well, Microsoft Excel spreadsheet software is an integral part of most businesses around the world. It comes loaded with many features, and it is very flexible, thus providing everyone from beginners to advanced users with a good way of working with data with a really low barrier to entry, which means that we can take all this data stored in Excel spreadsheets and import it into Python, and start analyzing and manipulating data; that is, taking your data analysis game to the next level. And for this purpose, we will use functions like read_excel or create an object using Excel file. All this with our beloved library, pandas. We will list sheets, load from a single worksheet or many, load ranges, as well as learn about what are the available options that we have when loading data from Excel with Python. Let's begin.

Import Data into Python from Common Binary Data File Formats
Importing data into Python from Common Binary Data File Formats. CSV is really common, XML is well-known, and in the early 2000s, it was touted as the one-stop shop data interchange format. But fast forward a few years and JSON showed up, and it became quite popular. We also have databases, which I could not lie to you, they are quite useful. However, there are some particular cases where a highly specialized format is required, be it because you're a data scientist that needs the functionality of a particular application, or whatever the reason it may be. And so in this module, you will learn how to import data into Python from specialized file formats, which include Pickle files, MATLAB, SAS, Stata, and HDF5. So let's begin.

Import Data into Python from Relational Databases
Importing Data into Python from Relational Databases. Databases are another useful and popular way of storing data. In fact, if all of a sudden all databases disappeared from the face of the earth, we would be in quite a pickle. No pun intended! And so in this module, you will learn how to import data with SQLite database files. Then we will expand our knowledge by adding pandas to get our results as DataFrames by several means, including read_sql, read_sql_query, and read_sql_table. Then we will improve our skills by using libraries like SQLite 3 and Psycopg2, plus a great toolkit that provides an abstraction of our databases, SQLAlchemy. And so we will easily import relational data from other databases, like MySQL, as well as reading data from PostgreSQL. For this particular module, I have created three separate databases that contain users, posts, and tags. Let's begin.