Subscribe to Email Updates

Building decision trees using information theory and shannon entropy

Posted by Bala Deshpande on Mon, Mar 21, 2011 @ 10:30 AM

patterns to insights
There is something really hypnotic about this picture. It is hard to explain, but the patterns we see in this image evokes a strange feeling of peace and equity(or maybe its just me!)

We make a lot of decisions rapidly by looking at and matching patterns. Patterns quickly reveal similarities and differences. In the above picture, the two patterns, although unique are still extremely similar and helps us instantly identify the two animals as belonging to the same species.

While patterns can tell a great story, the key is to convert a pattern into a number or a set of numbers that distinguish one pattern from another. Such techniques are available in information theory. Formulations such as entropy and mutual information encapsulate these ideas.

A very common application of information entropy is in building decision trees. Here are some examples of decision trees based analytics techniques applied to sports predictions.

The main ideas behind using entropy for building a decision tree are these:

1. Using shannon entropy, sort the dataset into homogenous and non-homogenous variables. Homogenous variables have low entropy and non-homogenous variables have high entropy

2. Weight the influence of each independent variable on the target or dependent variable using the concept of joint entropy.

3. Compute the information gain, which is essentially the reduction in the entropy of the target variable due to its relationship with each independent variable. This is simply the difference between the target entropy found in 1 minus the joint entropy calculated in 2.

4. The independent variable with the highest information gain will become the "root" or the first node on which the data set is divided.

5. Repeat this process for each variable for which the shannon entropy is non-zero. If the entropy of a variable is zero, then that variable becomes a "leaf" node.

More detailed description of this algorithm using a simple example is found here.

A fundamental question for every business analytics situation is to identify if a given technique is a good fit for the business problem at hand. How can you get better informed, without getting lost in the technical jargon? 

build analytics expertise

Zebra Image Courtesy: Chandru (

Topics: entropy, business analytics, information theory, decision trees, decision tree technique

Most Recent

Most Popular