Extracting Data from HTML with BeautifulSoup

This course covers the important aspects of scraping websites using Beautiful Soup. You will learn to build, manipulate and traverse the parse tree, as well as to leverage advanced features such as working with filters, CSS and XPath.
Course info
Level
Intermediate
Updated
Nov 1, 2019
Duration
2h 25m
Table of contents
Course Overview
Getting Started with BeautifulSoup
Navigating the Parse Tree
Searching for Elements in the Parse Tree
Leveraging Advanced Features of BeautifulSoup
Description
Course info
Level
Intermediate
Updated
Nov 1, 2019
Duration
2h 25m
Description

Web scraping is an important technique that is widely used as the first step in many workflows in data mining, information retrieval, and text-based machine learning.

In this course, Extracting Data from HTML with BeautifulSoup* you will gain the ability to build robust, maintainable web scraping solutions using the Beautiful Soup library in Python.

First, you will learn how regular expressions can be used to scrape web content, and how Beautiful Soup does better in important ways. Next, you will discover how Beautiful Soup parses HTML from web content, fixes up badly-formed tags, and builds a clean, easily traversable parse tree. You will then see how that parse tree can be used in order to find and retrieve specific patterns.

Finally, you will round out your knowledge by leveraging advanced features of beautiful soup such as working with CSS and XPath. When you’re finished with this course, you will have the skills and knowledge to implement robust web scraping using Beautiful Soup.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
Getting Started with Tensorflow 2.0
Beginner
3h 9m
Jul 23, 2020
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi. My name is Janani Ravi, and welcome to this course on Extracting Data from HTML with Beautiful Soup. A little about myself. I have a Master's degree in Electrical Engineering from Stanford, and have worked at companies such as Microsoft, Google, and Flipkart. At Google, I was one of the first engineers working on real-time collaborative editing in Google Docs, and I hold four patents for the timeline technologies. I currently work on my own start-up, Loonycorn, a studio for high-quality video content. Web scraping is an important technique that is widely used as the first step in many workflows in data mining, information retrieval, and text-based machine learning. In this course, you will gain the ability to build robust maintainable web-scraping solutions using the Beautiful Soup library in Python. First, you'll learn how regular expressions can be used to scrape web content and how Beautiful Soup does it better in important ways. Next, you will discover how Beautiful Soup parses HTML from web content, fixes up badly formed tags, and builds a clean, easily traversable parse tree. You will then see how that parse tree can be used in order to find and retrieve specific patterns. Finally, you'll round out your knowledge by leveraging advanced features of Beautiful Soup, such as working with CSS and XPath. When you're finished with this course, you will have the skills and knowledge to implement robust web scraping using Beautiful Soup.