Web Scraping with Python

less than 1 minute read

Basics

Open a Web Browser

import webbrowser

webbrowser.open('https://www.google.com')

Downloading from the Web with the Requests Module

import requests
res = requests.get('https://automatetheboringstuff.com/files/rj.txt')

print(res.text[:250])

Parsing HTML with the Beautiful Soup Module

Web pages are plaintext files formatted as HTML. HTML can be parsed with the BeautifulSoup module. BeautifulSoup is imported with the name bs4. Pass the string with the HTML to the bs4.BeautfiulSoup() function to get a Soup object. The Soup object has a select() method that can be passed a string of the CSS selector for an HTML tag. You can get a CSS selector string from the browser’s developer tools by right-clicking the element and selecting Copy CSS Path. The select() method will return a list of matching Element objects.

Ivan Bu

Web Scraping with Python

Basics

Open a Web Browser

Downloading from the Web with the Requests Module

Parsing HTML with the Beautiful Soup Module

Controlling the Browser with the Selenium Module

References:

Share on

You may also enjoy

Python Basics

Dataengineeringgcp

Google Analytics: App + Web

Pass the Google Cloud Professional Data Engineer exam