How to scrape a webpage with Pandas using Python

1 min

Pandas bring a lot to the table when we talk about data manipulation.

One of its core features is its capacity to read pretty much any file format (this does include some web scraping as well).

Pandas have the ability to parse static HTML pages using the pandas.read_html() method.

It will read any <table> tag available on the page. (if the website is rendered in javascript it won't work.)

Here is the code

Here we are scraping the content of the Wikipedia page listing the top NBA players.

# To work with dataframe
import pandas as pd

# We set the page we want to scrape
url = "https://en.wikipedia.org/wiki/List_of_National_Basketball_Association_career_scoring_leaders"

# We parse the HTML using pandas.read_html() method
# As the method returns a list of DataFrame we choose the relevant one
# here see the second (index: 1)
df = pd.read_html(url)[1]

# We check our DataFrame
print(df)
How to scrape a Wikipedia page with Pandas 

Here you are! You can now scrape any static webpage containing a <table> tag!