A few weeks ago I had a chance to interview an amazing person and total rockstar when it comes to modeling and understanding customer data.

We’ll talk about:

  • How to build models that are robust to change
  • How to become a leader in a technical organization
  • How to focus on the “right” questions
  • Why model ensembling can be more important in real-life than in competitions
  • and way more!

Check out the full video of our conversation:

What follows is not a 1–1 transcript but rather a cleaned-up, structured, and rephrased version of it.

You can watch the video to get the raw content.

What is your role right now?

I am a director of customer analytics at deepsense.ai and we do projects related to forecasting in the area of customer behavior.

For example, we do things like:

  • whether a customer will repay his credit or not
  • what kind of product should we recommend to this client
  • What is the customer lifetime value to see if it makes sense to reach out to her or not

Things that are mostly related to real customer data.

The data that we are working with typically has the shape of events and it is collected online. Think data from e-commerce websites where you have events and then out of these events you do feature engineering, and then you build some models.

How is your team structured?

I think that typically the most efficient size for a machine learning or artificial intelligence team is between 5 to 10 people per project. With 5 to 10 people, you can easily tackle most of the important aspects of an AI project. With more people, it becomes difficult to manage. There is a lot of overhead in the communication cost.

Since we actually have many projects going on we need to scale that team size. That is also true for larger companies where ML modeling is at the core of their business. From my experience, their (core) modeling team is around 5 to 10 people. So 5 to 10 people for a single task should be always enough to solve it.

5 to 10 people for a single task should be always enough to solve it

What is your current project?

Right now we are working on projects for a global ad network. This project is very difficult in terms of deployment and production needs because we need to deliver 5 million point forecasts per second, at 100 milliseconds latency.

It’s a very difficult project because for every customer and every product that they want to advertise via our clients’ network we need to create those forecasts. So multiple original numbers by the number of products and clients and the number of predictions our model needs to generate and this is just huge. Forecasts need to be delivered every second, it is online so yeah it is a tough one.

This project has actually taught me a lot in terms of production in a large scale forecasting system. We made a couple of unintuitive moves that improved the performance a lot. For example:

  • we went from one big lightGBM model to 20 small lightGBM models.
  • we used large (500 trees predicting 20 classes) and small (50 trees predicting 2 classes) models

And those models differ basically by the different data that was used to train it or by a different random seed.

The problem that we are facing requires us to stabilize the results and averaging predictions across 20 models is much better than taking one prediction from one big model.

Because when you are averaging predictions from 20 models, what you can also do is look at the standard deviations of those predictions. So, if all models vote positively for a given product, it means you should show it to the customer. If like five models, vote positively 15 models vote negatively, then you have a problem. A good thing is we can take advantage of model ensembling.

I know that forecasting on large datasets where you need low latency doesn’t seem like a typical scenario for model ensembling but that’s why I’m saying it is not intuitive. In such a production scenario, you would typically expect the model to be as simple as possible.

I think that in the production setting, model ensembling is much more important than, for example, in competitions. Typically people claim that you only use those monster ensembles to win competitions but you will never use it in real life. I say that in real life it is something much more important than in competitions because of a very simple reason that you get a lot of non-stationarity.

in real life model ensembling is something much more important than in competitions because of a very simple reason that you get a lot of non-stationarity.

Non-stationarity is something common In real life, which you don’t experience in competitions. And in the non-stationary problems having a diverse group of models helps a lot.

We were actually spending a couple of hundred thousand dollars on Amazon every month but we have decided to move from one model to 20 models because the performance improvement was so big.

#deep-learning #machine-learning #data-science #data analysis

Interview with a Head of AI: Pawel Godula
1.25 GEEK