Customer churn and its role in developing customer lifetime value models have been previously discussed. Churn rate - or its complement, retention rate is one of the key factors that modifies the customer lifetime value formula to properly account for future cash flows.

In this article we will show how to predict retention rates assuming we have the data in the right format. The process of getting the data in this format involves some amount of number crunching on the transactional data which will not be covered here. Typical transactional datasets will contain multiple rows for the same customer id, while the columns will include data such as date of purchase, purchase flag (yes or no) or purchase amount, etc.

The input data for churn prediction however is somewhat unique. Each row will contain the purchase amounts (or flags) for the last few transactions. For example, in the case of cell phone customer transactions, we could have a few hundred rows of data for one customer where each row corresponds to the number of minutes used by the customer in prior weeks. Therefore the columns of this row would contain minutes used 4 weeks earlier, 3 weeks earlier, 2 weeks earlier and the current week. A final column would be the label or target value which would contain a true/false statement corresponding to whether the customer is likely to churn next week or not. The input to our churn prediction model will be formatted as shown below.

In the above case, row 16 could correspond to the current week and we know for a fact that this customer (id= c_1) churned. Thus the last row of the dataset is now changed to "true". Note that what was prior week in row 16 is the current week for row 15 (observe the "489" entry for the two cells).

Once we have the data in this format we can easily train any learning algorithm to predict customer churn. For example, we can use Linear Regression and get a class recall of about 57% for all the true cases. A better technique would be Logistic Regression and we can get a class recall of about 67% for the true cases. In either case, the class recall for "false" cases range in the 90+% giving an overall accuracy of around 90%. The reason for this lower class recall for the "true" is the lower sample size of "true" cases. (Read how to address this data imbalance issue and get a better performance).

This is one of the unique advantages of a program like RapidMiner which would allow you to easily switch back and forth between various algorithms and choose the best fit for your data. Once you have a customer churn prediction model that you are happy with, the next step is to actually deploy this model on "live" data. For example, every week we can select a cohort of customers and examine who is likely to churn the next week based on their usage behavior.

The above example shows how this is done for one customer. The confidence (false) value can be plugged into the retention rate factor of the customer lifetime value formula to complete the calculation of CLV.

Download our free whitepaper on components of the CLV formula for more details.