Course

Skills

Scraping Your First Web Page with Python

by Janani Ravi

This course covers the important tools for retrieving web content using HTTP libraries such as Requests, Httplib2 and Urllib, as well as powerful technologies for web parsing. These include Beautiful Soup, which is a popular library, and Scrapy, which is a powerful, production-grade framework.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(40)

Level

Beginner

Updated

Nov 5, 2019

Duration

2h 39m

What you'll learn

Web scraping is an important technique that is widely used as the first step in many workflows in data mining, information retrieval, and text-based machine learning. In this course, Scraping your First Web Page with Python, you will gain the ability to apply different scraping techniques including Beautiful Soup, and Scrapy. First, you will learn and use various HTTP client libraries such as Requests, httplib2, and urllib to download HTML content. Next, you will discover how Beautiful Soup is an extremely popular Python library that does better than regex in important ways. You will see how Beautiful Soup fixes up badly formed HTML, and constructs a nice parse tree that can be traversed and queried. Finally, you will add to your toolkit the knowledge of Scrapy, which is a full-fledged web scraping framework that combines the steps of retrieving and parsing web content and does so at production-scale. When you’re finished with this course, you will have the skills and knowledge to identify the relative strengths and use-cases of different web retrieval and scraping technologies such as regular expressions, Beautiful Soup, and Scrapy.

Course Overview

1min

Course Overview 2m

Getting Started with Web Scraping

46mins

Working with the Parse Tree in BeautifulSoup

39mins

Module Overview 1m
The HTML Parse Tree 4m
Beautiful Soup for HTML Parsing 2m
Introducing Beautiful Soup 5m
Extracting Specific Page Elements 6m
Filtering Elements Using Find and Find All 7m
Searching and Filtering Using Custom Functions 3m
Extracting Links from a Page 6m
Using a Soup Strainer to Parse a Subset of a Document 4m
Module Summary 1m

Selecting Elements Using the Scrapy Shell

36mins

Module Overview 1m
Parsing Web Content 2m
Introducing Scrapy 4m
Getting Started with Scrapy 4m
Introducing the Scrapy Shell 4m
Selecting Elements Using CSS Selectors 7m
Advanced Selections Using CSS Selectors 5m
Selecting Elements Using XPath Selectors 7m
Module Summary 1m

Scraping Web Sites Using Scrapy Spiders

35mins

Module Overview 1m
How Scrapy Works 3m
Creating Your First Custom Spider 7m
Writing Scraped Contents to a File 2m
Exploring Items Using the Scrapy Shell 4m
Using Items to Store Extracted Content 4m
Using Item Loaders and Input and Output Processors for Scraped Data 7m
Using Pipelines to Transform Scraped Data 5m
Module Summary 1m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(40)

Level

Beginner

Updated

Nov 5, 2019

Duration

2h 39m

Ready to upskill? Get started

Contact Sales

Scraping Your First Web Page with Python

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Scraping Your First Web Page with Python

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?