Key driver analysis: a common business analytics challenge
A typical business problem relates to identifying the factors responsible for a performance metric that a division of business is judged against. For example, one of our customers' department is responsible for the performance of their tech support engineers and a metric used to rate the performance is "issue closure rate". Another of our customers is responsible for transportation costs (for their products) and they are judged against the metric of "cost per unit shipped". Yet another customer is responsible for energy usage in plant operations and they are measured by the monthly utility bill! For them, the process of reusing waste heat from the plant can potentially reduce energy consumption and are therefore interested in identifying activities that can help with this.
Before you can convert data into decisions ...
In all of these cases, analytics plays a major part in helping you transform data into decisions. But a key step in this process is first identifying which factors or parameters to focus upon to derive the best benefit. In most of the cases highlighted above, data is usually a quantity such as number of calls handled, time to close a call, weekly trucking miles, average diesel fuel cost, production throughput, kilowatt hours of electricity and so on. But in some cases the data you need to handle is not a simple number. It could be nominal as well, such as a "Yes/No" response or fixed ranges (such as "less than 30" or "between 0.5 and 1.0"). A most general case combines data types.
How to use RapidMiner for this task?
The point is, no matter what your data format is, you will still need to perform a key driver analysis before you can effectively build models or dashboards. Programs such as RapidMiner allow us to handle nominal data types and numeric data types separately. RapidMiner's "Attribute Weighting" operators provide a slew of options to determine these Key Performance Indicators (KPIs) against a specified objective (called a "label" variable in technical terms). For example, the process below performs a key driver analysis on nominal data using the Chi square technique.
But there are a couple of challenges here: RapidMiner cannot handle missing values in the Chi square application and secondly we will need to have two separate processes to handle mixed data. Additionally, in some instances we may not always have a specific target or label variable and you may want to simply rank all available data in terms of how they impact your business or process. RapidMiner's attribute weighting will not work here and you may need to resort to some other technique such as Principal Component Analysis for that.
We have previously discussed some issues with PCA and mentioned how KeyConnect overcomes these challenges. KeyConnect takes this analysis one step further and also provides a visual dashboard to see how the different nominal values are connected to each other. Is there a strong connection or are the variables truly independent (which is what Chi squared test actually measures).
A visual and easy-to-use approach
The process of deploying this is very simple:
1. load a data set (can be all numeric or all nominal or mixed)
2. run the analysis to rank the variables by their overall influence and verify how they are related to one another.
In case of purely nominal variables, KeyConnect will show a Table of Chi square values and color code the cells of the table to indicate if the variables are independent or NOT independent, in addition to the bar chart and circle chart shown above.
Sign up to beta test KeyConnect and win unlimited access.