What are Pandas DataFrames

1 min

The Pandas library is one of the most powerful library for data analysis and data manipulation.

The Pandas library has two core elements.

  • DataFrame
  • Serie

Each column in a DataFrame is a Serie.

The anatomy of a DataFrame

A Pandas DataFrame is basically like an excel sheet, a bunch of columns, rows, and cells that store data. It also has an extensive set of tool for data analysis and data manipulation.

Some features like : (Source: Pandas.org)

  • Easy handling of missing data.
  • Columns can be inserted and deleted quite easily (e.g. df["my_new_column"] = list_of_values)
  • Easy aggregation and transformation.
  • Slicing, indexing, subsetting for large data sets.
  • Intuitive merging and joining data sets.
  • Flexible reshaping and pivoting.
  • Super fast read and write to almost any data file format. (json, csv, excel, pickle, parquet, sql, etc...)
  • Time-series capabilities (shift, lag, statistics, conversion, moving-window, etc...)

Furthermore, when you use the Pandas library in Python, it's a C backend so it is as fast as doing it in C. (meaning SUPER FAST).

Nowadays, a lot of data pipelines are using pandas, because it's a blazing-fast and easy way to manipulate data.

Pipelines such as data scraping, zapier-like automations, data science, machine learning, financial analysis, financial engineering, quantitative analysis, etc...

If you want to learn how to use it, it won't take long. I wrote a bunch of articles on the topic accessible right here:

More on DataFrames

If you want to know more about DataFrame and Pandas. Check out the other articles I wrote on the topic, just here.