Customer sales data can be extremely valuable for optimizing production scheduling. It is very important to have a good idea about two critical dimensions when it comes to customer sales: which customers make up the bulk of sales – the classic 80-20 rule, and what are the purchase patterns of these customers. This data can then be used to improve time series forecasting models which in turn will be used to more accurately plan production schedules.
The idea behind using customer segmentation to improve time series forecasting is built upon the following logic:
- for each of the major customers, understand the buying patterns
- if required, build a time series model for each major customer
- compare the characteristics of these individual time series to the aggregate model for the entire business: look for similarities in trend coefficients, seasonality indices and overall levels
- adjust the aggregate series as needed. For example, if one of the customers consistently is responsible for 60% of the sales , and if this customer has a seasonal spike in March, adjust the aggregate time series to account for this.
The first step in this process is to apply the 80-20 rule to our customer data and segment the top few customers who make up 80% or more of the sales. This type of customer segmentation will show that not all customers are alike and is an important step. In this article we will demonstrate at a high level how this can be done and what the output can look like. In a follow up article, we will show how the individual time series for these major customers look like and how to adjust the aggregate model based on this information. While the process steps shown here may not be exactly applicable to all circumstances, the general flow of the process will be similar no matter the state of the initial data.
Step 1: Export customer data from database
In our example, the data was available in an ERP database which allows us to extract the top few customers. This data must be available for at least the last 2 years in order to be compatible with time series forecasting. However data in this raw form does not exactly reveal what we are interested in and requires a little more processing. Here is the structure of the available data.
Step 2: De-pivot data to enable building bubble or pie charts
From the raw time series data for the top few customers, we need to create a spreadsheet that lists the customer name as an attribute so that we can use a tool such as Tableau to create nice visuals for bubble charts for each month in the time series. Using bubbles or pie charts we can visualize the distribution of sales by customer for each month during the course of our time series. To enable this, we need to reformat the data as shown below.
This requires us to “de-pivot” the raw data so that the customer can be added as a new attribute. This is easily done using an open source tool like RapidMiner, which has a Depivot operator. The depivot operator “is usually used when your ExampleSet has multiple attributes that measure the same characteristic (may be at different time intervals) and you want to merge these observations into a single attribute without loss of information“. This is what we need: our multiple attributes (in the raw data table) are the customer columns and we want to merge these columns into a single “CustomerName” attribute without losing any information. The process below with the parameter settings shwon will help us to do just that. Make sure you deselect the date attribute before sending the data to the depivot operator as shown below.
The output from this process will need one final tweak before we import it into Tableau for the segmentation views. Import the Rapidminer output into a spreadsheet and sort the newly created column “CompanyName”. Rename the values in this column according to your customer list and you are done.
Step 3: Visualize depivoted data to understand customer segmentation over time
When the final data set is imported into a BI tool such as Tableau, it is easy to create a variety of visuals which will help you understand how the segmentation changes over time. We show a couple of charts below and see that Customer 1 and Customer 2 are clearly the dominant ones for Product 2 and Product 1 sales. Furthermore, the series plots show us that customer 1 has a seasonal peak in March for product 2 and customer 2 has a seasonal peaks in June and August for product 1.
In a next article we will show how to use this segmented information to update our aggregate time series forecasts.
Originally posted on Tue, Sep 10, 2013 @ 10:12 AM