The Analytics Compass Blog

Twice weekly articles to help SMB companies optimize business performance with data analytics and to improve their analytics expertise.

Subscribe via E-mail

Your email:

Search SimaFore

FREE SMB Survey Report

describe the image
  • Our report (pdf) is a survey of analytics usage and needs of more than 100 SMBs
  • Find out which analytics applications have
    • Highest value
    • Most demand
    • Best ROI

Affordable Analytics for SMB

 

Browse by Tag

Current Articles | RSS Feed RSS Feed

How to build Pivot Tables: differences between RapidMiner and Excel

  
  
  

This article introduces a very useful feature of RapidMiner for data aggregation: the "Pivot" operator. This works very similar to the standard XL pivot table and therefore it is helpful to know how to set the parameters of the operator to get similar results.

An important fact to keep in mind is that a pivot operation simply transforms a data table's structure and should not reduce the number of entries (or cells in a spreadsheet context). However, the pivot operator may increase the number of cells. As explained in the graphic below, the raw data has only 18 entries, however the pivot table will have 20 cells. 

rapidminer pivot table basics

A pivot table created using XL is shown below. Customer ID is selected as "row label" and Product ID is selected as "column label". CPI makes up "Values" of each cell. Observe that there are now 20 total data cells (ignoring the Grand Total column).

basic xl pivot table

Comparing the XL Pivot with RapidMiner Pivot, we will notice 2 main differences.

First difference is that non-numeric terms (called "missing values" in the example) are not replaced by 0's in RapidMiner as they are by default in XL. Second difference: column label has a prefix which corresponds to the header of the data ("Values") column. Finally, just like in XL, missing entries also become missing values or blank cells.

rapidminer pivot table results simple dataset


Setting up the RapidMiner Pivot operator process

We illustrate in detail below the process for setting up the pivot operator using a real world example. There may be additional steps needed for other cases. First figure below is a snapshot of the Health, Nutrition and Population (HNP) statistics data set which includes data from 240 countries for the last 50 years. This data is downloadable freely from this site

HNP raw data from World Bank
Our objective is to set up a subset of data for one year – in this case we selected 2006. Each row of this new reduced dataset will hold statistics for one of the 240 countries and each column holds one of the 253 HNP attributes for the year 2006. Obviously we will need a pivot table operation and the figures below show the sequence of steps followed to get to the final outcome.

STEP 1: Import the raw data from above and select attributes to pivot on

This is similar to choosing the fields in XL. If this process was done using XL, Country_Name would go into the "Row Labels" field and Indicator_Name into the "Column Labels". The 2006 column would go into the "Values" field.

using rapidminer for building pivot tables resized 600


STEP 2: Set up the Pivot operator parameters in RapidMiner

using rapidminer for building pivot tables pivot attributes resized 600

Therefore, the "group attribute" corresponds to "Row Label" in XL and "index attribute" corresponds to "Column Label" in XL. Since we already selected the 2006 column using the "Select Attributes" operator, we are ready to run the process. The graphic below shows the output from the above RapidMiner process.

rapidminer pivot table output resized 600

As seen above, each attribute in the resultant pivot table is prefixed by "2006_". We have 240 rows and 253 attributes - Rapidminer also considers the Country_Name as an attribute here, but for the purposes of further analysis, this column could be ignored.

Sign up for our blog to get regular "How to" articles such as these.

Comments

I have followed the above process, but when I go to select the group and index attributes in the Pivot operator, the drop down list is empty, even after I have selected attributes in the Attribute operator. Any idea why this is happening and what to do about it?
Posted @ Wednesday, April 24, 2013 5:34 AM by rick davies
Rick 
 
It is difficult to say without having seen your process/data. Do you have any special attributes like "id" or "Label" which you are trying to pivot on? If that is so, have you checked the "include special attributes" box in the Select Attributes operator?
Posted @ Wednesday, April 24, 2013 7:20 AM by Bala Deshpande
Hi, do you know how to do the reverse process? I need to do an unpivot table, can you help me? 
thanks!
Posted @ Thursday, August 29, 2013 3:20 PM by Thomas
Thomas, have you tried using the de-pivot operator? here is an article which explains the depivot ... 
http://www.simafore.com/blog/bid/118359/How-to-use-customer-segmentation-to-improve-time-series-forecasting
Posted @ Tuesday, October 22, 2013 5:06 PM by Bala Deshpande
wow very clearly. For those who have not understood it in this way, I have here an another informative site for you: http://www.excel-aid.com/excel-create-a-pivot-tablecreating-a-basic-pivot-table-from-excel-data.html
Posted @ Thursday, March 27, 2014 9:25 AM by Valeska
Kieran 
Hard to say without knowing which version you are using. I have tried in 5.3 and cannot recall running into this specific issue.
Posted @ Monday, September 01, 2014 4:08 PM by Bala Deshpande
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics