Decision tree models can be effectively used to determine the most important attributes in a dataset. The figure below shows an example of using a decision tree (in this case a C4.5 algorithm), to identify the most important parameters which would influence whether a banking customer would accept a personal loan offer during a campaign. The dataset contains 12 customer attributes ranging from Income, Education in years, mortgage, average credit card balance, family size, geographic data among others. The response (or target) variable is the binary condition of whether they would accept the loan offer - a "Yes" or a "No".

## A simple explanation of how entropy fuels a decision tree model

Posted by Bala Deshpande on Wed, Jan 11, 2012 @ 09:10 AM

Tags: data mining with rapidminer, decision tree technique, entropy, classification tree

## The perils of complete absence of systems thinking in city planning

Posted by Bala Deshpande on Thu, Jun 02, 2011 @ 08:07 AM

GK Chesterton once wrote that "life looks just a little more mathematical and regular than it [really] is". Clearly he had not visited the streets of the silicon plateau of India, Bangalore. Traffic (or life) here looks nowhere near mathematical! If there is any place on the planet that embodies the chaos which arises from uncontrolled **entropy**, this would be it. In fact, the city provides a text book example of how not to design a system. Forget systems thinking. That would be asking for too much.

## When Principal Component Analysis makes sense in business analytics

Posted by Bala Deshpande on Fri, May 06, 2011 @ 11:01 AM

Principal component analysis (PCA) is a technique according to Wikipedia that "uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called **principal components**."

Tags: advanced business analytics, correlations, data mining, entropy, principal component analysis

## Variable reduction with chi-square and entropy based methods

Posted by Bala Deshpande on Wed, Apr 20, 2011 @ 11:10 AM

One of the first steps in data mining or business analytics problem solving is the process of eliminating variables which are not significant. There are a couple of reasons for taking this step. The most obvious reason is that going from a few hundred variables to a handful will make the interepretation of the results easy. The second and probably more critical reason is that many modeling techniques become useless as the number of parameters increases. This is known as the **curse of dimensionality**.

Tags: chi square test, data mining tools, entropy, mutual information, information theory

## Building decision trees using information theory and shannon entropy

Posted by Bala Deshpande on Mon, Mar 21, 2011 @ 10:30 AM

There is something really hypnotic about this picture. It is hard to explain, but the patterns we see in this image evokes a strange feeling of peace and equity(or maybe its just me!)

Tags: decision tree technique, decision trees, business analytics, entropy, information theory

## 3 examples to show why correlations can fail

Posted by Bala Deshpande on Wed, Mar 09, 2011 @ 07:54 AM

Scientists and engineers - not to mention economists, statisticians and people in general -** look for correlations while searching for answers**. While correlations do give us important information, it is dangerous to assume that they are also indicators of causality. **Correlation does not necessarily imply causation. **

Tags: business analytics, correlations, entropy, mutual information

## Risk management in 60 seconds: Insights from Entropy

Posted by Bala Deshpande on Thu, Mar 03, 2011 @ 10:49 AM

This **flash video** explains in a minute how entropy can work for measuring risk and uncertainty for business analytics problems. You can continue reading below or simply watch the video.

Imagine a box that can contain one of three colored balls inside - red, yellow and blue. Without opening the box, if you were to guess what colored ball is inside, you are basically dealing with uncertainty. Now what is the highest number of "yes"/"no" questions that can be asked to reduce this uncertainty?

Is it red? No.

Is it yellow? No.

Then it must be blue. That is *two *questions. If there was a fourth color, green, then the highest number of (yes/no) questions is *three*. If you extend this reasoning, it can be mathematically shown that the maximum number of binary questions needed to reduce uncertainty is essentially **log (T)** where the log is taken to base 2 and T is the number of possible outcomes. (ex: If you have only 1 outcome, then log (1) = 0 which means there is no uncertainty)! If there are T events with equal probability of occurrence then T = 1/P.

Claude Shannon used this idea to define entropy as *log (1/P)* or **-log P** where P is the probability of an event occurring. If the probability for all events is not identical, we need a weighted expression and thus entropy, H

H = -Summation (pilog pi)

Tags: advanced business analytics, entropy, uncertainty, information theory

## A simple way to measure Information in Risk and Business Analytics

Posted by Bala Deshpande on Tue, Feb 22, 2011 @ 12:29 PM

Tags: data mining tools, risk management, entropy, risk

## Using Entropy for business analytics and risk management

Posted by Bala Deshpande on Mon, Oct 25, 2010 @ 06:30 PM

We recently encountered a fairly experienced risk manager who was very well-versed in quantitative methods, unlike a vast majority of "check-box" risk managers. His question was, what new information does *entropy *give him that his current sophisticated Bayesian models and analyses don’t? Business analytics professionals may have the same question.

Tags: risk management, entropy, risk

## Bubbles, Busts and the Economy versus Entropy

Posted by Bala Deshpande on Tue, Sep 21, 2010 @ 03:06 PM

In the brilliant book by Eric Beinhocker of McKinsey Global Institute titled "The Origin of Wealth", there is a very powerful and dramatic statement which attempts to explain wealth creation, thus: "All wealth is created by thermodynamically irreversible, entropy-lowering processes". We could not agree more! If everything in this universe must subscribe to and abide by the laws of thermodynamics, so must the economy.

Tags: data mining tools, business intelligence tools, economic crisis, economic recessions, entropy