400x times faster Pandas Data Frame Iteration

Avoid using iterrows() function

Data processing is and data wrangling is one of the important components of a data science model development pipeline. A data scientist spends 80% of their time preparing the dataset to make it fit for modeling. Sometimes performing data wrangling and explorations for a large-sized dataset becomes a tedious task, and one is only left to either wait quite long till the computations are completed or shift to some parallel processing.

Pandas is one of the famous Python libraries that has a vast list of API, but when it comes to scalability, it fails miserably. For large-size datasets, it takes a lot of time sometimes even hours just to iterate over the loops, and even for small-size datasets, iterating over the data frame using standard loops is quite time-consuming,

In this article, we will discuss techniques or hacks to speed the iteration process over large size datasets.

(Image by Author), Time constraints comparison to iterate over the data frame

#data-science #python #education #faster pandas #pandas data frame #400x times faster pandas data frame iteration

Avoid using iterrows() function

towardsdatascience.com

400x times faster Pandas Data Frame Iteration