The Analytics Compass Blog

Twice weekly articles to help SMB companies optimize business performance with data analytics and to improve their analytics expertise.

Subscribe via E-mail

Your email:

Search SimaFore

FREE SMB Survey Report

describe the image
  • Our report (pdf) is a survey of analytics usage and needs of more than 100 SMBs
  • Find out which analytics applications have
    • Highest value
    • Most demand
    • Best ROI

Affordable Analytics for SMB

 

Browse by Tag

Current Articles | RSS Feed RSS Feed

Feature selection with mutual information, Part 2: PCA disadvantages

  
  
  

This is the second and concluding part of the article which shows how one of the disadvantages of principal component analysis (PCA) for feature selection or dimension reduction can be addressed using mutual information based tools.

Just to recap, one disadvantage of PCA lies in interpreting the results of dimension reduction analysis. This challenge will become particularly telling when the data needs to be normalized. Here is part 1 of this series which explains this in detail.

A reason why we need to normalize before applying PCA is to mitigate the effects of scale. For example, if one of the attributes is orders of magnitude higher than others, PCA tends to ascribe the highest amount of variance to this attribute and thus skews the results of the analysis. By normalizing, we can get rid of this effect. However normalizing results in spreading the influence across many more principal components. In others words, more PCs are required to explain the same amount of variance in data. The interpretation of analysis gets muddied.

Mutual information based feature selection overcomes all of those challenges. The advantages it offers for dimension reduction or feature selection are:

  1. It is easy to interpret
  2. It is not sensitive to scale effects

The data from the cereal example is analyzed in just a couple of simple process steps using KeyConnect, a mutual information based tool as explained in the series of graphics below.

 STEP 1

mutual information feature selection step1 resized 600

 

STEP 2:

mutual information feature selection step2 resized 600

STEP 3

mutual information feature selection step3 resized 600

Instead of using the cut-off after the analysis, one could have changed the filter threshold in Step 2 and achieved similar results. However several runs using the tool may be helpful in gaining some understanding of the result.

This was the result of the first PCA analysis (without normalization): Potass, Sodium, Vitamins, Calories and Rating. Mutual information ranks variables by the amount of useful information they contain. However PCA simply ranks attributes by the total amount of variance that each variable contributes. So a very noisy attribute could overshadow more useful, better structured data. For building predictive models, you want to gather variables that contain more information, not more noise. This article explains why PCA may not always be the best technique for predictive analytics.

Sign up to become our beta tester for KeyConnect, a mutual information based feature selection web-application.

Comments

Currently, there are no comments. Be the first to post one!
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics