“Truth is a Pathless Land”

...but finding an effective solution to your business problem does not have to be. Business analytics landscape does actually appear so, with a myriad techniques and vendor tools in the market.

Simafore provides tools and expertise to:

  • Integrate data
  • Select and deploy appropriate analytics
  • Institutionalize processes

About this Blog

The Analytics Compass Blog is aimed at two types of readers:

  • individuals who want to build analytics expertise and 

  • small businesses who want to understand how analytics can help them improve their business performance. 

If you fall into one of these categories, join hundreds of others and subscribe now!

Subscribe via E-mail

Your email:

Search SimaFore

FREE SMB Survey Report

describe the image
  • Our report (pdf) is a survey of analytics usage and needs of more than 100 SMBs
  • Find out which analytics applications have
    • Highest value
    • Most demand
    • Best ROI

Affordable Analytics for SMB

 

Browse by Tag

Blog - The Analytics Compass

Current Articles | RSS Feed RSS Feed

3 tips for setting up a Market Basket Analysis using RapidMiner

  
  
  

We have previously discussed how to apply a chi squared calculator to run a simple variant of a market basket analysis. In this article we will use RapidMiner to run a more sophisticated analysis.data analytics for department stores cosmetics buyers resized 600

A market basket database typically consists of a large number of transaction records. Each record lists all items purchased during a single customer transaction. The objective of this data mining exercise is to identify if certain groups of items are usually purchased together. The result is a set of rules, called association rules which summarize item associations as follows:

if [A] is purchased --> then [B] is also purchased, [x%] of time.

These association rules can be applied in an old-fashioned brick and mortar setting as well as in an online setting for real-time cross-selling or ad placement. In this article, we will cover some main aspects of applying association rules for running a market basket analysis and show how this can be set up using RapidMiner.

Two essential concepts - Support and Confidence:

A key idea to get comfortable with is that of frequent item sets. An item set can consist of one item or more. In our earlier example which consisted of customer transactions involving purchases of typical cosmetics items, one frequent item set example could be [brushes, nail polish].

market basket analysis transaction data

Frequent item sets are quantified by support which is the ratio of the number of instances where [brushes, nail polish] appeared together in a single transaction to the total number of transactions.

Support = occurrences of [brushes, nail polish]/total # of transactions

The next important metric that you will need to run a market basket analysis is confidence. Extending the above example, the confidence of finding [brushes, nailpolish] together is defined as

confidence [brushes, nailpolish] = occurrences of [brushes, nailpolish]/total # of [brushes]

Setting up a market basket analysis using RapidMiner

In RapidMiner, association rules are extracted using two operators in a sequence. The first operator, called FP Growth, is required to generate frequent item sets. The second operator, Create Association Rules, then produces the IF-THEN rules based on the confidence requirement. 

But before that you may need some pre-processing steps for selecting the attributes you want and more importantly, to convert the input data to binomial (true/false) format which is required by the FP Growth operator.

rapidminer process for market basket analysis resized 600

Tip 1: When using the FP Growth operator, the important parameter is "min support". RapidMiner will find only those item sets which exceed this minimum support value. However, if you check the box for "find min number of item sets", then the priority is given to "Min Number of item sets", in which case it will continue to reduce the support threshold until it finds at least that many item sets indicated in the "Min Number of item sets" field. 

Tip 2: After finding the frequent item sets, the next step in the process is to extract rules which meet the confidence requirement. You can provide this in the "min confidence" field under the parameter options for Create Association Rules operator.

Tip 3: When the above process is run, RapidMiner will generate outputs for both FP Growth and Create Association Rules operators. The FP Growth output is a table with support values for the minimum number of item sets requested in Tip 1. The association rules output consists of a text view, table view and graphical views of the extracted rules. The simplest and most intuitive view is surprisingly the text view which will show rules such as these below: 

Association Rules
[Blush] --> [Concealer] (confidence: 0.738)
[Brushes] --> [Nail Polish] (confidence: 1.000)

Are you interested in a datamining cookbook that explains many of these techniques and shows you how to apply them using open source products like RapidMiner? Take the anonymous survey below to give us feedback!

1-question-survey

Comments

This is a very good article. I liked it a lot, mainly given that there is so little information on on basket analysis with rapid miner. I wish I had some extra information on how to configure these tasks. I'm doing basically the same thing you are doing (as far as I know) and it's not working for me. I would greatly appreciate it if you gave me some help with this or some links pointing to places where I can find more information.
Posted @ Friday, October 19, 2012 4:46 PM by Amilkar
Appreciate your comments. We are working on a book which should be released soon which will answer this and hopefully many more questions. Please subscribe to our blog to stay tuned.
Posted @ Wednesday, October 24, 2012 1:22 PM by Bala Deshpande
If the information provided can add with what Amikar suggested and with the interpretation then it will be perfect!
Posted @ Monday, October 29, 2012 3:54 AM by sandra lim
Sandra - thanks for the feedback. We will certainly keep this in mind.
Posted @ Tuesday, October 30, 2012 1:31 PM by Bala Deshpande
It is really good topic which talks really in detail, thank you! 
I also want to know, how to change the data format from a transaction format to the Table format in the picture above? Another question is: could that be possible to work if there are some repeated values, for example, under one T_ID, there are items: 1, bread; 2, beer; 3, bread; 4, ham; 5 ... 
Thank you! I don't know whether i described clearly, if not, tell me and i will make it clear, 
Thank you again!~
Posted @ Monday, April 22, 2013 3:44 AM by Lvane
Hi Lvane 
To convert raw transaction data into table form, you need to use "Nominal to Binomial" operator in RM.  
Posted @ Thursday, August 08, 2013 1:51 PM by Bala Deshpande
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics