How to automate key driver analysis using a chi squared calculator
There are many business reasons to run a key driver analysis. For example, to develop a budget forecast one needs to know which cost factors are the most influential. For a government agency, to understand which policy changes have the most effect on the objective of reaching certain economic growth targets. For a manufacturing company, to understand which operational factors can influence the production throughput most effectively. Clearly there are a number of scenarios where a simple and robust key driver analysis would be a good first step in analytics.
The data that needs to be analyzed could be mostly nominal (or categorical) data or mostly numeric data or it could be a mixture of the two. For numeric data, there are several well-known techniques which can do the job: correlation analysis, principal component analysis, factor analysis, linear discriminant analysis and so on. We have written about using mutual information as another viable technique to do this. For categorical data, chi square based techniques are the most prevalent.
In this article we will describe one way to extend the chi squared calculator to come up with an automated means for identifying key drivers of a categorical data set.
Step 1: Run a pairwise chi squared test of independence
This requires developing a contingency table for each pair and computing the observed chi square value and comparing it to a critical value. This will tell you whether two variables are independent or not independent.
Download our free ebook below for more details on the chi squared calculator
Step 2: Use the difference between observed and critical chi square values as an indication of strength of relationship between the two variables.
A reasonable assumption to make would be: if the observed chi squared value is significantly greater than the critical chi squared value, the stronger the relationship or connection.
Step 3: Build a matrix of dependencies
For each variable, list all the non-independent (or related) variables and the corresponding strengths (difference between observed chi squared calculation and critical chi squared value)
Step 4: Simply sum these "connection" strengths for a given variable across all variables.
Step 5: Rank the variables by their summed connection strengths.
The KeyConnect chi squared calculator does all of this automatically. Sign up here to beta test it.