Time Series Forecasting: from windowing to predicting in RapidMiner
Previously we discussed the difference between RapidMiner and traditional tools for time series forecasting. There are two types of approaches for time series analysis: data-driven and model-driven. Rapidminer follows neither of these, but a sort of a "hybrid" approach. In this article we will explain the mechanics of using RapidMiner for a practical implementation of a time series forecast.
Central to the idea of time series forecasting in RapidMiner is of course, the concept of windowing. Windowing allows you to take any time series data and transform it into a "cross-sectional" format. What does this mean? The graphic below illustrates the idea. In the case of the above example, we have used a window size of 6, step size of 1 and a horizon of 1.
Once we have "windowed" a series, as shown above, we have essentially converted time values into cross sectional attributes and we can apply any predictive modeling algorithm to predict future values. Referring back to the above example, we can use the data from June, July, August, September, October and November to "predict" December values using our algorithm of choice (could be linear regression, neural network, support vector machine or any other relevant tool).
However if our original time series dataset ended at December, and we wanted to forecast the next 6 months, what can we do? Keep in mind that the windowing process has so far allowed us to only predict the next unknown value - January. This is done by advancing the window one step and use values from July to December and thus predict January. But beyond that? One can use the predicted January value in a new window which includes 6 values from August, September, October, November, December and January (predicted) to forecast February. Further, there is no reason why we cannot use the now predicted February value in a new 6 row window that includes September through December, plus (predicted) January and (predicted) February values to now forecast March. And so on. This is shown below. As you can see, by the sixth forecast, all the values used in the model to arrive at the next forecast are previously predicted values.
Implementing this in RapidMiner is not straight forward and requires a level of advanced familiarity with the tool. In particular, one must be comfortable in using Loop operator. You must be able to use the various operators which RapidMiner provides to accomplish the following two steps:
Step 1: Use the model built on the windowing to predict the (n+1)th forecast when we have a dataset of n time series values V1, V2 ... Vn. As shown above, the values from Vn, Vn-1, ... Vn-5 are used to predict Vn.
Step 2: Take these values inside a loop and rename the values as follows:
Vn --> Vn-1,
Vn-1 --> Vn-2
Vn-2 --> Vn-3 and so on. Notice that Vn is the predicted (label) value and Vn-5 is dropped. The resulting row is used to feed the model built in step 1 to predict Vn+2 and so on.
The actual implementation can be quite complicated, because of all the book keeping and data sorting that is required by this method. But unfortunately that is the price to pay for the advantage of being able to use a wide range of predictive analytics algorithms for time series forecasting!
Are you interested in a datamining cookbook that explains many of these techniques and shows you how to apply them using open source products like RapidMiner? Take the anonymous survey below to give us feedback!