How to perform feature selection for predictive analytics
A previous article discussed the need to reduce data dimensionality or perform feature selection. In this article we focus once again on prediction problems using numeric data and identify common techniques that can be used for this task.
For linear regression models, there are essentially four techniques that can be used for this purpose.
1. Exhaustive Search: This is simply a fancy name for a brute force search. An exhaustive search will build regression models with every possible combination of parameters and recommend the one which has the best adjusted-R^2 and most statistical significance based on t-value. For example a dataset with 5 parameters x1:x5 can have the following combinations of independent variables:
For a dataset with k independent parameters, there will be 2^k - 1 regression models to choose from.
2. Stepwise regression: We start with a regression model with a single independent variable that has the largest absolute t-value (one of the models from column 1 in the above table). In the next step, a second variable, say x2 is added and a new model is built. If the t-values with the new model are better than the first model, we keep the new model and add a third variable x3. If the new model performs worse (i.e none of the absolute t-values are significant) compared to the first one, we discard x1, keep x2 and build the next model with x2 and x3. This procedure repeats until all 2 variable combinations are tested, the best performing 2-variable combination is selected as the final model before a third variable from the remaining k-2 choices is added. The process ends when all significant variables are included in the model.
3. Forward selection: The only difference between this and Stepwise regression is that none of the variables are dropped out. Once x1 enters into a model, it is never deleted, and new variables will continue to be added as long as the decision criteria are met (i.e. improved t-value).
4. Backward elimination: In backward elimination, we start with a "full" model that includes all variables. Independent variables with nonsignificant t-values (which is established a priori) are dropped off and a new model is built with the remaining variables. If all variables have significant t-values then the procedure stops.
A more general method for feature selection method such as Principal Component Analysis may be used before setting up regression models. New methods involving information exchange between variables are being developed and will be available for registered users of visTASC for download. If you are interested in trying out these new techniques which promise faster computation and easier implementation, please consider signing up below.