These 3 tricks will make pandas more Memory Efficient

These 3 tricks will make pandas more Memory Efficient

You should start using these 3 tricks right now and stop using unnecessary tools. Many Data Analysis tasks are still performed on a laptop. This speeds up the analysis as you have your familiar work environment prepared with all of the tools. But chances are your laptop is not “the latest beast” with x-GB of main memory.

Many Data Analysis tasks are still performed on a laptop. This speeds up the analysis as you have your familiar work environment prepared with all of the tools. But chances are your laptop is not “the latest beast” with x-GB of main memory.

Then a Memory Error surprises you!

Image for post

Gif from Giphy

What should you do? Use Dask? You never work with it and these tools have usually some quirks. Should you request a Spark cluster? Or is a Spark a little exaggerated choice at this point?

Calm down… breathe.

Gif from Giphy

Before you think about using another tool, ask yourself the following question:

Do I need all the rows and columns to perform the analysis?

Tip 1: Filter rows while reading

In a case, you don’t need all rows, you can read the dataset in chunks and filter unnecessary rows to reduce the memory usage:

iter_csv = pd.read_csv('dataset.csv', iterator=True, chunksize=1000)
df = pd.concat([chunk[chunk['field'] > constant] for chunk in iter_csv])

Reading a dataset in chunks is slower than reading it all once. I would recommend using this approach only with bigger than memory datasets.

Tip 2: Filter columns while reading

In a case, you don’t need all columns, you can specify required columns with “usecols” argument when reading a dataset:

df = pd.read_csv('file.csv', usecols=['col1', 'col2'])

This approach generaly speeds up reading and reduces the memory consumption. So I would recommend using with every dataset.

analysis data-science python programming big-data

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Data Science With Python Training | Python Data Science Course | Intellipaat

🔵 Intellipaat Data Science with Python course: https://intellipaat.com/python-for-data-science-training/In this Data Science With Python Training video, you...

Role of Big Data in Healthcare - DZone Big Data

In this article, see the role of big data in healthcare and look at the new healthcare dynamics. Big Data is creating a revolution in healthcare, providing better outcomes while eliminating fraud and abuse, which contributes to a large percentage of healthcare costs.

Python for Data Science | Data Science With Python | Python Data Science Tutorial

🔥Intellipaat Python for Data Science Course: https://intellipaat.com/python-for-data-science-training/In this python for data science video you will learn e...

Applied Data Science with Python Certification Training Course -IgmGuru

Master Applied Data Science with Python and get noticed by the top Hiring Companies with IgmGuru's Data Science with Python Certification Program. Enroll Now

Top Microsoft big data solutions Companies | Best Microsoft big data Developers

An extensively researched list of top microsoft big data analytics and solution with ratings & reviews to help find the best Microsoft big data solutions development companies around the world.