Imagine if you are the store owner of a drugstore and want to optimize your shelf space to improve cross-selling. Suppose you have customer transaction data from the sale of cosmetics: typically such data could contain the items purchased together by different customers. Specifically let us suppose we have a dataset that records what each cosmetics customer purchases during one transaction. (Such a data set is described in this data mining book). There are seven different cosmetics related items that are sold in a drugstore and the manager wants to use the transaction data described above to make best use of the shelf space.

How can such data be used to make business decisions?

The idea is to find out how the sales of some of items relate to others. Is the purchase of an item such as nail polish always accompanied by others such as brushes? Does the purchase of a bronzer always results in an accompanying purchase of a blush? Such analyses are typically called affinity or market basket analysis. The results of such studies will help managers to decide how and where to position the items on shelves to maximize sales by generating clear and simple rules: "IF bronzer is purchased THEN blush is also likely to be purchased" and so on.

How to use a chi squared analysis to draw some actionable insights?

As seen in the table below, transaction data results in collection of a bunch of factors. Each row is a separate transaction and the columns refer to items purchased during that particular sale, resulting in thousands of rows of such data. Each column is obviously a categorical variable and our job is to find out not only which of these columns are inter dependent but also the strength of their dependency.

How to use KeyConnect for such an analysis?

Clearly this is a task for which a tool such as KeyConnect was ideally designed for. Since we are dealing with categorical data, the chi squared calculator must be used to extract key relationships between the factors. KeyConnect also provides several different confidence levels at which we can test the strength of the extracted relationship. Once you have the data it should take no more than a few minutes to complete the analysis.

How to interpret the results of such an analysis?

Start by first checking the ranking bar chart on the left. It shows that for this example, "bronzer" sales are very important, followed by brushes, concealer and blush. Next checking the circle chart shows what the sale of bronzer is closely and strongly associated with. In this case, it is connected to the next three top items in the list, but the strongest connection exists between bronzer and brush sales indicating that these two should always go together in a display. We can then examine the Table of Chi Square values which will give an overview of all the items and their strongest connections.

This is a square matrix similar to a correlation table: each cell points to the relationship between the corresponding factors in its row and column. Focus on the green cells in the table of chi square values and notice the difference between the two numbers posted in the cell - the observed chi squared value and the critical chi squared value - higher the difference, stronger the association between the two factors. Clicking on an individual cell takes you to the contingency table for those two factors. Looking at the Eyebrow pencil column, we see that the sale of this particular item is independent of any of the others - so there are no strong associations for this item.

In the end, this simple analysis will help managers to quickly map out rules and optimize their display shelf arrangements.

Sign up for a free 30-day full access trial to KeyConnect now!

Eyeshadow Palette Image courtesy: Creative Commons