Decision tree models can be effectively used to determine the most important attributes in a dataset. The figure below shows an example of using a decision tree (in this case a C4.5 algorithm), to identify the most important parameters which would influence whether a banking customer would accept a personal loan offer during a campaign. The dataset contains 12 customer attributes ranging from Income, Education in years, mortgage, average credit card balance, family size, geographic data among others. The response (or target) variable is the binary condition of whether they would accept the loan offer - a "Yes" or a "No".

GK Chesterton once wrote that "life looks just a little more mathematical and regular than it [really] is". Clearly he had not visited the streets of the silicon plateau of India, Bangalore. Traffic (or life) here looks nowhere near mathematical! If there is any place on the planet that embodies the chaos which arises from uncontrolled entropy, this would be it. In fact, the city provides a text book example of how not to design a system. Forget systems thinking. That would be asking for too much.

Principal component analysis (PCA) is a technique according to Wikipedia that "uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components."

One of the first steps in data mining or business analytics problem solving is the process of eliminating variables which are not significant. There are a couple of reasons for taking this step. The most obvious reason is that going from a few hundred variables to a handful will make the interepretation of the results easy. The second and probably more critical reason is that many modeling techniques become useless as the number of parameters increases. This is known as the curse of dimensionality.

There is something really hypnotic about this picture. It is hard to explain, but the patterns we see in this image evokes a strange feeling of peace and equity(or maybe its just me!)

Scientists and engineers - not to mention economists, statisticians and people in general - look for correlations while searching for answers. While correlations do give us important information, it is dangerous to assume that they are also indicators of causality. Correlation does not necessarily imply causation.

This flash video explains in a minute how entropy can work for measuring risk and uncertainty for business analytics problems. You can continue reading below or simply watch the video.

Imagine a box that can contain one of three colored balls inside - red, yellow and blue. Without opening the box, if you were to guess what colored ball is inside, you are basically dealing with uncertainty. Now what is the highest number of "yes"/"no" questions that can be asked to reduce this uncertainty?

Is it red? No.

Is it yellow? No.

Then it must be blue. That is two questions. If there was a fourth color, green, then the highest number of (yes/no) questions is three. If you extend this reasoning, it can be mathematically shown that the maximum number of binary questions needed to reduce uncertainty is essentially log (T) where the log is taken to base 2 and T is the number of possible outcomes. (ex: If you have only 1 outcome, then log (1) = 0 which means there is no uncertainty)! If there are T events with equal probability of occurrence then T = 1/P.

Claude Shannon used this idea to define entropy as log (1/P) or -log P where P is the probability of an event occurring. If the probability for all events is not identical, we need a weighted expression and thus entropy, H

`H = -Summation (pi log pi)`