Time series forecasting is one of the more basic predictive analytics needs of many businesses. There are a lot of data elements which come in as time series: product sales, shipping and transportation costs, commodity prices and so on. From a strategic perspective, managers and decision makers will frequently need to be able to predict trends and seasonal patterns for these elements.
As an example, a production manager may want to forecast sales for the next month so that s/he can schedule labor, raw material orders and equipment availability to meet the estimated demand. Similarly, a transportation manager may want to forecast shipping costs for the next quarter so that he can prepare and present a budget to his profit and loss team about estimated costs for his department. As a third example, a purchasing manager may want to predict the future costs of regularly ordered commodities so that she can quickly turn around and help the sales teams to prepare price quotes for products they manufacture using these commodities. Time series forecasting is one of the “entry-level” predictive analytics processes that businesses need to develop confidence in implementing and deploying.
If your business is looking to utilize this valuable decision support system, there are a few process related questions a business manager – who may use an internal analytics team or hire an external consulting team – must be willing to ask. These questions are listed below in order.
1. Is the time series “forecastable”?
This is one of the first checks you would need to do with your data. In many ways this is a data quality check. What makes a series “forecastable”? The simple answer to this is “if the time series data is not random”. Quite appropriately, there is something called a Random-Walk test which you can apply to the data to ensure that your time series indeed can be forecasted.
2. Is there sufficient good quality data for a time series analysis and forecasting?
If the data passes the random walk test (data quality), you must next make sure you have enough quantity to extract seasonal and trend patterns to make forecasts. Seasonality cannot be developed without at least 2 years of good quality monhtly data, if the seasons match a yearly cycle. If you are dealing with daily data with hourly “seasonality”, then you need at least 2 days of hourly data and so on. Further more, for practical time series forecasting, we would need to split the data into a training period and a validation period. Validation period would be the most recent data which would typically cover an entire seasonal cycle (1 year or 1 day, etc) and the training period would be the previous 2 seasonal cycles (2 years or 2 days). Thus a total of 3 years of monthly data (or 3 days of hourly data) would be needed.
3. Does the seasonal pattern make sense? Does the trend component match intuitive understanding?
Data which tend to grow or diminish in magnitude or data whose values tend to repeat periodically are called non-stationary. Thus if your time series has a trend (growing or reducing) and/or has a seasonal pattern, then the data is non-stationary and can (and should) be decomposed into such constituent parts to help develop a clearer understanding. If the data has passed the random walk test, then it is also non-stationary. At this point, you must ask your analytics team to provide a decomposition of the series and identify seasonal peaks and lows along with any trends.
As a domain expert, you will know if the predicted sales peaks in January, April and October, for example, agree with your knowledge about your product. If the patterns do not make sense, you must reexamine the decomposition methods used.
4. What are the MAD and MAPE values for validation period?
As a last check to ensure you have good models, you need to verify performance metrics. While a number of performance indicators with a wide range of sophistication are available, these are the minimum two you should be asking for from your team: Mean Absolute Deviation (MAD) and Mean Absolute Percentage Error (MAPE). These should be reported for the validation period identified in check #2 earlier. What should be the ranges of these indicators? MAD depends upon the business problem: if you are tring to forecast product volumes, then the units for MAD is the same as your product volume. MAPE is scale independent because it is a percentage and is usually easier to grasp from a business standpoint. As thumb rules, MAPE values less than 10% are good, less than 5% are excellent.
Originally posted on Fri, May 30, 2014 @ 07:53 AM