“Truth is a Pathless Land”

...but finding an effective solution to your business problem does not have to be. Business analytics landscape does actually appear so, with a myriad techniques and vendor tools in the market.

Simafore provides tools and expertise to:

  • Integrate data
  • Select and deploy appropriate analytics
  • Institutionalize processes

About this Blog

The Analytics Compass Blog is aimed at two types of readers:

  • individuals who want to build analytics expertise and 

  • small businesses who want to understand how analytics can help them improve their business performance. 

If you fall into one of these categories, join hundreds of others and subscribe now!

Subscribe via E-mail

Your email:

Search SimaFore

FREE SMB Survey Report

describe the image
  • Our report (pdf) is a survey of analytics usage and needs of more than 100 SMBs
  • Find out which analytics applications have
    • Highest value
    • Most demand
    • Best ROI

Affordable Analytics for SMB

 

Browse by Tag

Blog - The Analytics Compass

Current Articles | RSS Feed RSS Feed

How to run Principal Component Analysis with RapidMiner - Part 1

  
  
  

In this three part series, we explore how one can use RapidMiner 5.0, the open source analytics package to run a Principal Component Analysis (PCA). In part 1 we will quickly review the background for a PCA and explain the application logic. In part 2 we will do a PCA on non-standardized data and in part 3 we will show how to standardize data before running a PCA (and also why one should standardize).

Background - Why do a PCA?

In a previous article we discussed how PCA can add value in business analytics and also pointed out a couple of cautionary issues. To recap, PCA is a technique which will allow reducing the dimension of a dataset by identifying a few most influential parameters (if they exist). This sort of variable screening or feature selection will make it easy to apply other predictive modeling techniques and also make the job of interpreting the results easier.

PCA captures the parameters which explain the greatest amount of variation in the dataset. It does this by transforming the existing variables into a set of "principal components" or new variables which have the following properties:

  1. They are uncorrelated with each other
  2. They cumulatively contain/explain a large amount of variance within the data
  3. They can be related back to the original variables via weightage factors. Original variables with very low weightage factors in their principal components can be removed from the dataset.

The following schematic illustrates how PCA can potentially help in reducing data dimensions with a hypothetical dataset of m variables.

principal component analysis logic flow 

In part 2 we will apply this logic to a real dataset that can be downloaded. Using RapidMiner we will explain how to set up the main process and interpret the results.

Sign up for our analytics portal, visTASC for datasets, examples, and customizable business analytics content!

vistasc blog sign up - how to use rapidminer for pca

Comments

Currently, there are no comments. Be the first to post one!
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics