How to scrape a webpage with Pandas using Python
• 1 minPandas bring a lot to the table when we talk about data manipulation.
One of its core features is its capacity to read pretty much any file format (this does include some web scraping as well).
Pandas have the ability to parse static HTML pages using the pandas.read_html() method.
It will read any <table> tag available on the page. (if the website is rendered in javascript it won't work.)
Here is the code
Here we are scraping the content of the Wikipedia page listing the top NBA players.
# To work with dataframe
import pandas as pd
# We set the page we want to scrape
url = "https://en.wikipedia.org/wiki/List_of_National_Basketball_Association_career_scoring_leaders"
# We parse the HTML using pandas.read_html() method
# As the method returns a list of DataFrame we choose the relevant one
# here see the second (index: 1)
df = pd.read_html(url)[1]
# We check our DataFrame
print(df)
Here you are! You can now scrape any static webpage containing a <table> tag!