Estimating Customer Lifetime Value via Cohort Retention

This is Part I of the two-part series dedicated to estimating customer lifetime value. In this post, I will describe how to estimate LTV, on a conceptual level, in order to explain what we’re going to be doing in Part II with the Python code.

First of all, why LTV? There are two reasons: creating a benchmark for customer acquisition costs (CAC) and comparing customers, e.g. if we’re targeting those who spend more or less than an average customer.

Many sources talking about using churn or retention to estimate customer lifetime value (LTV), and while the core idea remains the same, approaches to its calculation differ dramatically. So, while any analyst will benefit from reading this article, its primary objective is to explain how historical retention data can be used to estimate LTV for customers. We are not going to use statistical techniques to estimate churn and build our predictions. Instead, we will be making use of historical retention, which is an easier place to start with.

Why retention? The issue with customer lifetime value is the customer lifetime. If we’re talking subscription-based service, an estimate for customer brought-in value is recurring revenue (RR), or the amount a customer pays for a subscription. If your customer has a possibility to skip a period, however, do not forget to adjust for that (estimate the average % of skips).

What we do not know is how long a new customer will stay within the business, so we are trying to make an educated guess based on earlier acquired customers. It is often suggested that we calculate lifetime as an overall metric for the whole customer base, which gives a confusing average: across customers who could have spent years with the business, at least potentially, and those customers who joined last week or yesterday. At the same time, while older cohorts are good for analysis, we’d like our metrics to be actionable, and hence, to make estimates for younger cohorts. A retention matrix, or a curve, visually represent how many of the acquired customers stayed with the business, continuing to generate revenue. It is based on actual data, so you can start identifying patterns and approximate those for newer customers. So, how?

C_ohorts and retention matrix_

Because customers join the business at different times, there should be a way to “normalize” their retention. A simple example: 10% of the customers who joined a year ago are still with the business; however, 90% of last month’s customers are still with us. By no means, this implies that customers who joined last month are better (or worse) than the last year’s customers. They simply had less time to show how “sticky” or valuable your business is for them.

For this reason, we can (and should) split customers into cohorts (groups), based on the time they joined. Normally, cohorts and their retention are analyzed looking on a retention matrix, or similarly, a retention curve. In the matrix below, each square represents the proportion of originally acquired users that moved (re-ordered, re-subscribed) in the next month. For simplicity, I colour-coded them, as also shown below.

#customer #marketing-analytics #customer-retention #data-science #startup-marketing #data analytic

medium.com

Estimating Customer Lifetime Value via Cohort Retention