BYOL: Bring Your Own Loss

Dear connoisseurs, I invite you to take a look inside Careem’s food delivery platform. Specifically, we are going to look at how we use machine learning to improve the customer experience for delivery time tracking.

Intro

When planning a meal, timing is crucial. This is why we take a lot of care estimating the delivery time of our orders. However, delivery time depends on several complicated factors — for this reason, machine learning is the right choice to predict what the ETA will be.

Image for post

Careem’s Food Delivery platform interface for delivery time estimation

From a first glance, it is nothing but a typical regression problem: go get some features, train a reasonable model against historical delivery time to minimize RMSE, estimate expected decrease in average error with suitable cross-validation strategy and share it with leadership, deploy, announce it broadly and gain respect, trust, promotion…

In this post, I’m going to try and explain what is wrong with this approach. I will describe our solution to the problem and the way we measured user impact. I will then show you how we built a custom loss function to better optimize for user order satisfaction.

What is the problem

Training your model against RMSE.
RMSE, MAE, Huber, …etc losses are the typical choices for most regression problems. However, do any of these losses reflect how our customers’ stomach feels? The biggest problem is they are symmetric — they don’t distinguish between an order that is being delivered 20 minutes early or 20 minutes late. My gut can definitely tell the difference… It would be cool if our model showed a little more empathy.
Evaluating success via average error.
Here we go again. Imagine, your team has been working hard for a month setting real-time traffic data streaming with its costly infrastructure to get the features that would decrease the error, and now, it finally dropped from 3 minutes to 2.5 minutes on average! 20% gain, What a result! It was definitely worth it… Was it? Hard to say actually, it depends a lot on a particular problem. For us and for many other applications, there are high chances that the customers won’t even notice the change.
Gaining respect, trust, promotion.
This one is quite simple! Work for fun, not for a promotion!

#machine-learning #gradient-descent #delivery #gradient-boosting #loss-function

Intro

What is the problem

towardsdatascience.com

BYOL: Bring Your Own Loss