Simafore provides tools and expertise to:
The Analytics Compass Blog is aimed at two types of readers:
individuals who want to build analytics expertise and
small businesses who want to understand how analytics can help them improve their business performance.
If you fall into one of these categories, join hundreds of others and subscribe now!
Decision tree models can be effectively used to determine the most important attributes in a dataset. The figure below shows an example of using a decision tree (in this case a C4.5 algorithm), to identify the most important parameters which would influence whether a banking customer would accept a personal loan offer during a campaign. The dataset contains 12 customer attributes ranging from Income, Education in years, mortgage, average credit card balance, family size, geographic data among others. The response (or target) variable is the binary condition of whether they would accept the loan offer - a "Yes" or a "No".
GK Chesterton once wrote that "life looks just a little more mathematical and regular than it [really] is". Clearly he had not visited the streets of the silicon plateau of India, Bangalore. Traffic (or life) here looks nowhere near mathematical! If there is any place on the planet that embodies the chaos which arises from uncontrolled entropy, this would be it. In fact, the city provides a text book example of how not to design a system. Forget systems thinking. That would be asking for too much.
Principal component analysis (PCA) is a technique according to Wikipedia that "uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components."
One of the first steps in data mining or business analytics problem solving is the process of eliminating variables which are not significant. There are a couple of reasons for taking this step. The most obvious reason is that going from a few hundred variables to a handful will make the interepretation of the results easy. The second and probably more critical reason is that many modeling techniques become useless as the number of parameters increases. This is known as the curse of dimensionality.
There is something really hypnotic about this picture. It is hard to explain, but the patterns we see in this image evokes a strange feeling of peace and equity(or maybe its just me!)
Scientists and engineers - not to mention economists, statisticians and people in general - look for correlations while searching for answers. While correlations do give us important information, it is dangerous to assume that they are also indicators of causality. Correlation does not necessarily imply causation.
This blog is about measurement of information content in data and how business analytics and risk professionals can adopt this simple and intuitive technique.
We recently encountered a fairly experienced risk manager who was very well-versed in quantitative methods, unlike a vast majority of "check-box" risk managers. His question was, what new information does entropy give him that his current sophisticated Bayesian models and analyses don’t? Business analytics professionals may have the same question.
In the brilliant book by Eric Beinhocker of McKinsey Global Institute titled "The Origin of Wealth", there is a very powerful and dramatic statement which attempts to explain wealth creation, thus: "All wealth is created by thermodynamically irreversible, entropy-lowering processes". We could not agree more! If everything in this universe must subscribe to and abide by the laws of thermodynamics, so must the economy.