Using RapidMiner for time series forecasting in cost modeling: 1 of 2
The process for product or transportation cost forecasting involves the following steps:
- identifying most relevant factors for the target variable which is usually an overall cost, such as weekly spend or per unit production cost,
- building a regression based cost model which functionally relates the input factors to the target variable,
- developing time series based forecasts for each of the factors identified in 1,
- using the regression model from 2, and time series forecasts in 3 make the final forecasts for costs.
The process has been described in detail in this article. In another article we focused on step 3 in detail and showed how to use the open source package, R to build time series forecasts for cost modeling applications. In this first part of two, we will discuss the approach used by RapidMiner to build time series forecasts which is fundamentally different from standard techniques. In the second part we will apply the process on cost modeling data.
RapidMiner's approach to time series is based on two main data transformation processes:
- Windowing to transform the time series data into a generic data set: this step will convert the last row of a window within the time series into a label or target variable
- Applying any of the "learners" or algorithms to predict the target variable and thus predict the next time step in the series
A "typical" time series data and its transformed structure (after windowing) is conceptually shown below.
The parameters of the "Windowing" operator allow changing the size of the windows (shown as colored vertical boxes on the left), the overlap between the windows (also known as step size) and a prediction horizon which is used for forecasting. Thus a series data is now converted into a generic data set which can be processed by any of the available RapidMiner operators.
The next main process required for running time series analyses using RapidMiner involves applying any of the available "learners" to "predict" the label variable shown in the green vertical box (see graphic above). The example set (or raw data) for this learner is the "horizontal" data set shown above with the target or label variable in the green box. Also, most of the Performance operators can be used to assess the fitness of the learning scheme to the data.
In part two of this series we will show a process that uses this logic to make time series forecasts and will compare with the results obtained from more traditional time series tools.
Sign up for our blog above and make sure you dont miss the next part.