Time Series Forecasting: choosing from data driven vs. model based methods
In time series forecasting and analysis we are concerned with forecasting a specific variable, given that we know how this variable has changed over time in the past. In all other predictive models, time component of data is either ignored or was not considered. Such data are known as "Cross sectional" data.
Also we may not be interested in (or might not even have) data for other attributes which could potentially influence this target variable. In other words, independent or predictor variables are not considered or even necessary for time series forecasting!
Such time series forecasting examples are called data-driven forecasting methods, where there is no difference between a predictor and a target. The predictor is also the target! Techniques such as averaging or smoothing are considered data-driven approaches to time series forecasting. The most commonly employed data-driven time series forecasting methods is Exponential Smoothing or Holt-Winters methods.
However, there is also another class of time series forecasting techniques which are known as model based forecasting methods. Model based techniques are similar to “conventional” predictive models which have independent and dependent variables, but with a twist: the independent variable is now time. The simplest of such methods is of course linear regression with time as the independent variable. Given a training set, we estimate the values of regression coefficients to forecast future values of the target variable. Model driven techniques can get pretty complicated in the selection of the type of function. Commonly used functions are exponential, polynomial and power law functions.
Given this basic background, the natural question to ask is how do we decide which type works best for a given time series dataset.
The first dataset shown above consists of several quarters of revenue for a business. As you can see, the data follows a predictable patterns of seasonal highs and lows. Additionally there is a distinct upward trend. Such a dataset can be easily decomposed into trend, seasonal and random components. Data of this type are best suited for data-driven forecasting methods. As seen in the overlay below, exponential smoothing does an excellent job of capturing the trend and seasonality.
Forecasts made using data-driven approaches are more credible when the underlying data has distinct patterns in trend and seasonality.
On the other hand, consider the following data shown below, which has no discernible seasonality, and an oscillating "trend".
Blindly applying data driven smoothing techniques will produce a highly questionable fit and as shown below, when you attempt to make forecasts, they dont appear to be believable.
In fact the mean absolute percentage error (MAPE) for the above example is 711%! There a dozens of data driven techniques that can be applied and possibly some of them may produce a better MAPE than the one shown here. But the main point of this article is that when you have highly variable data with weak or non existent seasonal patterns, it is better to use model based learners to capture trends over short forecasting horizons. For example, when we use a polymial regression or support vector machine based learner, (a tool such as RapidMiner allows you to select scores of different learners), we can very accurately capture trends. The second dataset (WHI_combined) was split into a training and a testing set, with the last six data points used a test examples. As seen from the comparison below, the model based learner, still has a high MAPE at 46%, but short-term trends are replicated with great accuracy.
This type of trend prediction is quite valuable to many small businesses which deal with highly variable data. The data above could be the demand forecast trend for the next six months. While we may not be able to predict the exact sales volumes with great accuracy, at least knowing that the demand will spike in March and drop in May is much better than not having any idea where the sales might trend.
Typically several different products are manufactured using the same limited resources (labor and machinery). If similar forecasts are made for all the other products, and it was noticed that another product had a demand trend which was the mirror image (drop in March and spike in May), a general manager may be able to use this information to balance production schedules. Coupled with an easy to understand dashboard, this type of predictive analytics can become quite valuable.
All of these activities can be performed today affordably and even small manufacturers can benefit from these type of advanced analytics. How can you implement it in your business?
- Read our free cost modeling and forecasting whitepaper to get a better understanding.
- Contact us to set up a custom dashboard similar to the one shown below.