“Truth is a Pathless Land”

...but finding an effective solution to your business problem does not have to be. Business analytics landscape does actually appear so, with a myriad techniques and vendor tools in the market.

Simafore provides tools and expertise to:

  • Integrate data
  • Select and deploy appropriate analytics
  • Institutionalize processes

About this Blog

The Analytics Compass Blog is aimed at two types of readers:

  • individuals who want to build analytics expertise and 

  • small businesses who want to understand how analytics can help them improve their business performance. 

If you fall into one of these categories, join hundreds of others and subscribe now!

Subscribe via E-mail

Your email:

Search SimaFore

FREE SMB Survey Report

describe the image
  • Our report (pdf) is a survey of analytics usage and needs of more than 100 SMBs
  • Find out which analytics applications have
    • Highest value
    • Most demand
    • Best ROI

Affordable Analytics for SMB


Browse by Tag

Blog - The Analytics Compass

Current Articles | RSS Feed RSS Feed

How to perform feature selection for predictive analytics


A previous article discussed the need to reduce data dimensionality or perform feature selection. In this article we focus once again on prediction problems using numeric data and identify common techniques that can be used for this task.

For linear regression models, there are essentially four techniques that can be used for this purpose.

1. Exhaustive Search: This is simply a fancy name for a brute force search. An exhaustive search will build regression models with every possible combination of parameters and recommend the one which has the best adjusted-R^2 and most statistical significance based on t-value. For example a dataset with 5 parameters x1:x5 can have the following combinations of independent variables:

exhaustive search feature selectionFor a dataset with k independent parameters, there will be 2^k - 1 regression models to choose from.

2. Stepwise regression: We start with a regression model with a single independent variable that has the largest absolute t-value (one of the models from column 1 in the above table). In the next step, a second variable, say x2 is added and a new model is built. If the t-values with the new model are better than the first model, we keep the new model and add a third variable x3. If the new model performs worse (i.e none of the absolute t-values are significant) compared to the first one, we discard x1, keep x2 and build the next model with x2 and x3. This procedure repeats until all 2 variable combinations are tested, the best performing 2-variable combination is selected as the final model before a third variable from the remaining k-2 choices is added. The process ends when all significant variables are included in the model.

3. Forward selection: The only difference between this and Stepwise regression is that none of the variables are dropped out. Once x1 enters into a model, it is never deleted, and new variables will continue to be added as long as the decision criteria are met (i.e. improved t-value).

4. Backward elimination: In backward elimination, we start with a "full" model that includes all variables. Independent variables with nonsignificant t-values (which is established a priori) are dropped off and a new model is built with the remaining variables. If all variables have significant t-values then the procedure stops.

A more general method for feature selection method such as Principal Component Analysis may be used before setting up regression models. New methods involving information exchange between variables are being developed and will be available for registered users of visTASC for download. If you are interested in trying out these new techniques which promise faster computation and easier implementation, please consider signing up below.

blog vistasc signup


Currently, there are no comments. Be the first to post one!
Post Comment
Website (optional)

Allowed tags: <a> link, <b> bold, <i> italics