Web Scraping: Python Data Playbook

Learn how to tell a compelling graphical data story in a Jupyter Notebook with Seaborn having scraped information from a static web page with BeautifulSoup4 when no API is available.
Course info
Rating
(37)
Level
Beginner
Updated
May 2, 2019
Duration
1h 17m
Table of contents
Description
Course info
Rating
(37)
Level
Beginner
Updated
May 2, 2019
Duration
1h 17m
Description

Scrape data from a static web page with BeautifulSoup4 and turn it into a compelling graphical data story in a Jupyter Notebook. In this course, Web Scraping: The Python Data Playbook, you will gain the ability to scrape data and present it graphically. First, you will learn to scrape using the requests module and BeautifulSoup4. Next, you will discover how to write a trustworthy scraping module backed by a unit test. Finally, you will explore how to turn the columns of data in a graphical story that will change the opinions of your colleagues. When you're finished with this course, you will have the skills and knowledge of web scraping needed to create a graphically compelling Jupyter Notebook without the use of an API.

About the author
About the author

Ian is an Interim Chief Data Scientist and Coach, he co-organises the annual PyDataLondon conference with 700+ attendees and the associated 9,500+ member monthly meetup. With 16 years experience he runs Mor Consulting in London, speaks internationally often as keynote speaker and is the author of the bestselling O'Reilly book High Performance Python. For fun he's walked by his high-energy Springer Spaniel, surfs the Cornish coast and drinks fine coffee.

Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hello everyone, my name is Ian Ozsvald, and welcome to my course, Web Scraping: The Python Data Playbook. I'm an Interim Chief Data Scientist inside my consultancy, Mor Consulting, and I work with teams to accelerate their data science delivery. You want to make decisions with some data. You can see it in a web page, but you can't access it, as there's no API. If you could access it, you could make data-driven decisions, you could augment other datasets, and you could tell a visual story with your data to change people's opinions. In this course, we are going to scrape a static web page with BeautifulSoup4, interactively investigate the scraped data, write a trustworthy scraping module, and visually explore relationships in the data which we can share with our colleagues. Some of the major topics that we will cover include using PyCharm to develop, interactively debug, and refactor our module; an efficient workflow for investigating text, numeric, and categorical data; processes for identifying outliers and relationships in our extracted pandas data frame; building powerful visual explanations of our data using matplotlib and the statistical plotting library, Seaborn. By the end of this course, you'll know how to write a static web page scraper in Python, how to use unit tests to build trust in the data, how to interactively explore your own data to explain relationships, and how to summarize a visual story for your colleagues. Before beginning the course, you should be familiar with the basics of Python programming. I hope you'll join me on this journey to learn to scrape data and tell a visual story with Web Scraping: The Python Data Playbook, at Pluralsight.