How to sample a DataFrame using Pandas

0 min

Sometimes you might need to sample a dataset, either to perform some statistics over a sample of your population dataset.

To do so, Pandas comes with a .sample() method.

The method takes as argument the size of the sample you want.

To do so we define our sample dataframe

import pandas as pd

# We read a sample dataset from the web.
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

We draw a sample using size

We draw a sample size of 20.

df.sample(size=20)

We draw a sample using a fraction

We draw a sample that will amount to 20% of the dataset size.

df.sample(frac=0.2)

Here you are, you now know how to sample a dataframe !