6 Ways of Applying a Function to Pandas DataFrame Rows

6 Ways of Applying a Function to Pandas DataFrame Rows

In this article, you will measure the performance of 6 common alternatives. With a companion Colab, you can do it all in your browser. No need to install anything on your machine.

Applying a function to rows of a Pandas DataFrame is one of the most common operations during data wrangling. There are many ways of doing it. In this article, you will measure the performance of 6 common alternatives. With a companion Colab, you can do it all in your browser. No need to install anything on your machine.

Problem

Recently, I was analyzing user behavior data for an e-commerce app. Depending on the number of times a user did text and voice searches, I assigned each user to one of four cohorts:

  • No Search: Users who did no search at all
  • Text Only: Users who did text searches only
  • Voice Only: Users who did voice searches only
  • Both: Users who did both text and voice search

It was a huge data set with 100k to a million users depending upon the chosen time slice. Computing it with Pandas apply() function was excruciatingly slow, so I evaluated alternatives. This article is the distilled lessons from that.

I can’t share that dataset. So I am picking another similar problem to show the solutions: the Eisenhower method.

Image for post

The Eisenhower Method: Tasked put into 4 bins depending on importance and urgency. (Image by Author)

Based on a task’s importance and urgency, the Eisenhower Method assigns it into one of 4 bins. Each bin has an associated action:

  • Important and Urgent: Do right away
  • Important but not Urgent: Schedule for later
  • Not Important but Urgent: Delegate to someone else
  • Neither Important nor Urgent: Delete time wasters.

We will use the boolean matrix shown in the adjacent figure. Importance and urgency booleans make the binary integer value for each action: DO(3), SCHEDULE(2), DELEGATE(1), DELETE(0).

We will profile the performance of mapping tasks to one of the actions. We will measure which of the 6 alternatives take the least amount of time. And we will plot the performance for up to a million tasks.

It is a good time to open the companion Colab. If you want to see the code in action, you can execute the cells in the Colab as you read along. Go ahead, execute all the cells in the Setup section.

programming data-science python software-engineering machine-learning

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Learn Programming, Software Engineering, Machine Learning, And More

Best Free Resources to Learn Programming, Software Engineering, Machine Learning, And More All you need to learn. Do you know that you can take the courses from MIT, Stanford.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.