What are Pandas DataFrames• 1 min
The Pandas library is one of the most powerful library for data analysis and data manipulation.
The Pandas library has two core elements.
Each column in a DataFrame is a Serie.
A Pandas DataFrame is basically like an excel sheet, a bunch of columns, rows, and cells that store data. It also has an extensive set of tool for data analysis and data manipulation.
Some features like : (Source: Pandas.org)
- Easy handling of missing data.
- Columns can be inserted and deleted quite easily (e.g. df["my_new_column"] = list_of_values)
- Easy aggregation and transformation.
- Slicing, indexing, subsetting for large data sets.
- Intuitive merging and joining data sets.
- Flexible reshaping and pivoting.
- Super fast read and write to almost any data file format. (json, csv, excel, pickle, parquet, sql, etc...)
- Time-series capabilities (shift, lag, statistics, conversion, moving-window, etc...)
Furthermore, when you use the Pandas library in Python, it's a C backend so it is as fast as doing it in C. (meaning SUPER FAST).
Nowadays, a lot of data pipelines are using pandas, because it's a blazing-fast and easy way to manipulate data.
Pipelines such as data scraping, zapier-like automations, data science, machine learning, financial analysis, financial engineering, quantitative analysis, etc...
If you want to learn how to use it, it won't take long. I wrote a bunch of articles on the topic accessible right here:
More on DataFrames
If you want to know more about DataFrame and Pandas. Check out the other articles I wrote on the topic, just here.