How to scrape a webpage with Pandas using Python

Apr 30th 2022 • 1 min

Pandas bring a lot to the table when we talk about data manipulation.

One of its core features is its capacity to read pretty much any file format (this does include some web scraping as well).

Pandas have the ability to parse static HTML pages using the pandas.read_html() method.

It will read any <table> tag available on the page. (if the website is rendered in javascript it won't work.)

Here is the code

Here we are scraping the content of the Wikipedia page listing the top NBA players.

# To work with dataframe
import pandas as pd

# We set the page we want to scrape
url = "https://en.wikipedia.org/wiki/List_of_National_Basketball_Association_career_scoring_leaders"

# We parse the HTML using pandas.read_html() method
# As the method returns a list of DataFrame we choose the relevant one
# here see the second (index: 1)
df = pd.read_html(url)[1]

# We check our DataFrame
print(df)

How to scrape a Wikipedia page with Pandas

Here you are! You can now scrape any static webpage containing a <table> tag!

Hey! I'm Bastien! 👋

How to scrape a webpage with Pandas using Python

Here is the code

Hey! I'm Bastien! 👋

Click on my face to learn about my story

Best Articles

Introduction to Volume Profiles in Python

Carry Trading: A step by step Guide to Profitable Strategies and Risk Management using Python

How to Read a Folder of CSVs in Python Using DuckDB

How to crawl multiple web pages using Python

How long does it take to learn Python for Data Science in 2023