How to store data efficiently in Python

7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

Whatever you do in Python you will need to store data for the long term.

There are many things to consider beforehand.

  1. What kind of data do you want to save?
  2. How often do you save this data?
  3. How often do you need to access it?

In this article, we are going to focus mainly on local data storage, but it is good to know that you could also save it online to allow others to access it.

The formats

The most commonly used format in the industry are the following :

  • text
  • csv
  • json
  • excel
  • pickle
  • hdf5
  • parquet

It is important to note that each of these formats has its own best use case.

TEXT

A text file format is just a file containing plain text without any structure.

If you have to store text once that could be the one. But keep in mind that it is always easier to work with structured data.

How to read a text file in Python
The simplest way to read a text file in Python.
How to write a text file in Python
Learn how to write a text file in Python in only 4 lines of code.

CSV

This is probably the one you will encounter the most in Python.

This file type is used to represent a DataFrame format. A bit like an excel spreadsheet.

I wrote an article on How to read/save a CSV file on how to use it with the Pandas library.

How to read a CSV file in Python using Pandas
The simplest way to read csv file in Python.Using the pandas library we can transform the csv into a workable dataframe object that looks just like an excel sheet.
How to save as CSV file in Python using Pandas
The simplest way to save a CSV file in Python using the Pandas library and to_csv() method.

JSON

JSON is one of the most famous formats for sharing data across the web.

It is simple to understand and very good a keeping things tidy.

Here are two articles that will show you how to read and write in JSON format.

How to read a JSON file in Python
How to read any JSON file. We explore the simplest methods to read JSON files.Two use-cases using the JSON and the Pandas libraries.
How to save data in JSON format
How to save data in JSON format in Python using the json library. A simple example of how to save data.

Excel

Excel is one of the most famous file formats used by millions of people. However, it is not the preferred one for web services or any kind of programmatic data exchange.

It is not lightweight and it is proprietary.

You will surely need to read or write such files, so here are two articles that will help you with that.

Read Excel spreadsheet using python
Here is how to read excel spreadsheet in two lines of Python.
How to save data to an excel spreadsheet with Python
How to save data to an excel spreadsheet with Python. Learn more about the simple method using the Pandas library.

Pickle

The Pickle format is one of the formats used for serializing Python objects.

This format is useful for storing Python objects as binary files and rereading them without losing their nature.

How to read a Pickle format with Pandas
How to read a Pickle format with Pandas. Learn how to use the .read_pickle() method to read a pickle format.
How to save a Pandas DataFrame in Pickle format
How to save a Pandas DataFrame in Pickle format. Learn how to use the .to_pickle() method.

HDF5

HDF5 is one of the formats used for Big Data and is efficient for storing matrices.

This is one of the most efficient ways to store big DataFrames which contains a lot of numbers. This format is ideal for Machine Learning datasets.

How to read hdf5 format with Pandas
How to read a Pickle format with Pandas. Learn how to use the .read_pickle() method to read a pickle format.
How to save a Pandas DataFrame in HDF5 format
How to save a Pandas DataFrame in HDF5 format. Learn how to use the .to_hdf() method.

Parquet

Parquet is another format that is famous for Big Data and for storing huge amounts of data.

Parquet is efficient at compressing data and is an Open Source Project specially made for the Hadoop ecosystem.

How to read a Parquet format with Pandas
How to read a Parquet format with Pandas. Learn how to use the .read_parquet() method to read a parquet format.
How to save a Pandas DataFrame in Parquet format
How to save a Pandas DataFrame in Parquet format. Learn how to use the .to_parquet() method.

Here you are! You now know everything there is to know about file formats.

More on DataFrames

If you want to know more about DataFrame and Pandas. Check out the other articles I wrote on the topic, just here :

Pandas - The Python You Need
We gathered the only Python essentials that you will probably ever need.
7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

Free Newsletter

Master Data Science in Days, Not Months 🚀

Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons – Approach used by real data scientist. Not bookworms. 📚

Weekly simple and practical lessons
Access to ready to use code examples
Skip the math, focus on results
Learn while drinking your coffee

By subscribing, you agree to receive our newsletter. You can unsubscribe at any time.