The Analytics Compass Blog

Twice weekly articles to help SMB companies optimize business performance with data analytics and to improve their analytics expertise.

Subscribe via E-mail

Your email:

Search SimaFore

FREE SMB Survey Report

describe the image
  • Our report (pdf) is a survey of analytics usage and needs of more than 100 SMBs
  • Find out which analytics applications have
    • Highest value
    • Most demand
    • Best ROI

Affordable Analytics for SMB

 

Browse by Tag

Current Articles | RSS Feed RSS Feed

When to use mutual information correlation based feature selection

  
  
  
Two main types of feature selection methods

There are two main types of feature selection or dimensionality reduction algorithms: filter type and wrapper type. The filter model does not require any learning algorithm, where as the wrapper model is optimized for a particular learning algorithm. Examples of wrapper models include Forward Selection or Backward Elimination in multiple regression. In other words, filter model is unsupervised versus wrapper model being supervised feature selection methods.

The filter model works best when the following conditions are met:

  1. When the number of features or attributes is really large
  2. When computational expense is a criterion

But there are also instances when filter type mutual information based feature selection methods must be used with caution. This article highlights two scenarios when KeyConnect (to be launched soon), a mutual information based unsupervised feature selection tool, must be used with caution.

Scenario 1: When there are outliers in the dataset

Outliers result in an artificially high value of mutual information. This causes KeyConnect to spuriously select the variables involved as important features. The fix is very simple: use a program like RapidMiner to detect and eliminate the offending samples.

using rapidminer to detect eliminate outliers resized 600 Scenario 2: When attributes contain known strong correlations

This can happen for example when one column (or attribute) in a data set is derived from another column. For instance, when you have two columns such as Gross Profit and %Gross Profit. This is of course a very simplistic example and can be manually eliminated before applying feature selection. However, in cases where such correlations are not known beforehand, we can once again use RapidMiner to detect and remove correlated features. 

using rapidminer to remove correlated attributes resized 600

The reason we need to eliminate highly correlated features before using mutual information is that such attributes will dominate the overall information exchange computed in the analysis. The strength of a program such as KeyConnect is to detect weak interactions which may be missed by linear correlations, but still account for valuable information within the dataset. If two variables are collinear, there is no new information that is added by keeping both of the variables and hence one of them may be removed.

We have used this process to reduce a dataset which had 300+ attributes to a more manageable dozen or so which may then be used for building usable predictive models, for example.

using keyconnect for key driver detection and analysis resized 600

Sign up to become our beta tester and win a chance to use KeyConnect free!

Comments

Currently, there are no comments. Be the first to post one!
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics