Model building is a cornerstone of the analytics process. However generating business value is the real reason why we engage in using models to analyze data. Effectively utilizing models requires that we don't simply throw the kitchen sink of data at the modeling algorithm and sit back to enjoy the returns on investment on modeling.
There are three details to pay attention to in order to ensure that model building gives you full ROI. The first step is to make sure that your data is pertinent. Even if you dont have the highest "quality" data, at least make sure that you have the correct data to answer the business questions. Having the wrong data can quickly derail your key performance indicator analysis, particularly if you want to address more sophisticated decision making issues.
The next detail is to use several different types of models and see which model can suitably help with decision making. If your business users need to understand how a particular combination of inputs affect the overall product costs, then using an easy to understand linear regression model which combines the inputs to predict the output costs is the way to go. Do not waste the analytics consumers' time with sophisticated neural network models which can essentially be black boxes.
The last one is to make sure that your model is a "living, breathing entity". Using a model built on 5 year old data is not only pointless, it is dangerous. The business dynamics may have shifted substantially in the meantime rendering the earlier model useless. In a recent discussion with a practitioner in the area of debt collection, we learned that the landscape of bad debts has changed significantly since the meltdown of 2008. Prior to that year, lending was extremely lax and loans were made to anyone with a pulse. But today, a credit worthy mortgage refinance can take several months. If you are building models to predict which debts are recoverable based on information about the borrowers, make sure that the information reflects current trends and not 5 year old history.
In this article we discuss a way to make the models less complicated and more business user friendly by removing unimportant factors from the data. By properly identifying key performance indicators of a model we can greatly simplify its interpretation and make it easily consumable.
The classic example of using decision trees to identify good and bad credit can be simplified by first running a KPI detection. The data set originally consists of the following 14 factors shown below.
When you build a decision tree using this data, it is important to make sure that you and your business users understand the branching and splitting. For example, if the root node is Balance of Current Account, it must make sense to you and your business users that this factor is indeed a critical enough variable to make it to the top of the tree. Additionally, if the tree has too many splits, not only does it become harder to interpret, but technically it may have "overfit" your data. Here is the tree before KPI extraction.
When we run a key performance indicator extraction on the same data before we build a decision tree (using a tool such as KeyConnect), we see that the initial 14 factors are now reduced to 6 most critical ones. The graphic below shows the results of this:
This is a 50% reduction in your data size and results in reducing the depth of the decision tree built on only these 6 factors instead. The overall performance of the model is still around 70% (clearly can be improved!) even with the reduced set of factors. Finally the tree itself has fewer branches, and its depth has reduced from 6 to 4. This optimized tree is shown below.
Models are essential to the way people think and do business. But models can quickly become complex and hit a point of diminishing returns if the modeler is not careful. Models do not cause problems and but their inappropriate use can certainly result in major problems! Does that mean models are useless and "outdated"? By no means, consider an analogy: gasoline can cause fires and explosions. But no one suggests we abandon using gasoline immediately (global warming issue not withstanding)! On the contrary, a controlled and engineered use of gasoline powers 99% of all surface transportation today and makes life as we know it possible. Good model building practice does the same thing for business. Model building encompasses not just "simulation", but also pattern identification, measuring risk, quantifying relevance of parameters (or KPI ranking), and in general converting data into usable knowledge in order to power decision making.
Sign up for a free 30-day full access trial to KeyConnect now!