5 simple steps to apply chi-square test for business analytics
In a previous article, we talked about the background of the chi-square test of independence and how it can address common business analytics problems. In this article, we demonstrate the actual usage with an example. Remember that the chi-square test of independence helps to find out if two business parameters, X and Y, are related to or are independent of each other.
A specialty retail chain wants to determine if their strategy for changing the product mix has resulted in increased revenues. Their products are categorized into eight types according to price range. The category prices range from $30 per item to $120+ per item. Management decided that in order to increase sales, they need to reduce their higher priced inventory ($120+ range) by 50%.
Based on the data shown below, has their strategy worked?
The 5-step solution process:
Step 1: Identify the X's and Y's
This is the most important step, because the steps that follow is simply an algorithm that any tool can run through. Convention dictates that X's are usually the parameters that can be changed or controlled. In this case, the X is the strategy, and its data are the columns which represents all sales before strategy change and after strategy change. Therefore the Y's are the sales by category, whose data are rows which represent the different price categories.
Step 2: Calculate the margin summations.
Simply sum all rows and columns and enter these sums on the "margins".
Step 3: Complete the contingency table.
The contingency table is the same dimension as the data. Its entries are calculated as shown in the diagram.
Step 4: Calculate the observed chi-square value based on contingency table.
Step 5: Use standard tables to compare if observed chi-square to critical value of chi-square for the problem's degree of freedom and confidence level (also known as alpha).
The degree of freedom is simply = (number of rows -1)*(number of columns -1) in our original data table.
df = (8-1)*(2-1) = 7
Let us use a 90% level of confidence, which means alpha = 0.1
- If observed chi-square < critical chi-square, then variables are not related
- If observed chi-square > critical chi-square, then variables are not independent (and hence may be related).