Optimizing pandas memory usage by the effective use of datatypes

Managing large datasets with pandas is a pretty common issue. As a result, a lot of libraries and tools have been developed to ease that pain. Take, for instance, the pydatatable library mentioned below.

Despite this, there are a few tricks and tips that can help us manage the memory issue with pandas to an extent. They might not offer the best solution, but the tricks can prove to be handy at times. Hence there is no harm in getting to know them. I talked about two such alternative ways of loading large datasets in pandas  in one of my previous articles.

These techniques are :

  • Chunking: subdividing datasets into smaller parts
  • Using SQL and pandas to read large data files

This article is a sort of continuation to the above techniques. Hence, if you haven’t read the previous article, it’ll be a good idea to do so now 😃. In this article, we’ll cover ways to optimize memory use by the effective use of datatypes. But first, let’s get to know the pandas’ datatypes in detail.

#pandas #python #programming #big-data #data-science

Reducing Memory Usage in Pandas with Smaller Datatypes
1.25 GEEK