Naive Bayes is one of the most robust classification techniques and frequently trumps more sophisticated predictive analytics tools. In this article, we will show how the Naive Bayes technique can be applied using RapidMiner with a simple and classic example - the "golf" dataset.
In a previous article, we explained the core statistical basis behind the Naive Bayes classification algorithm. There, an example showed how to apply the Bayes rule for predicting if a defective part is produced by a given machine. Let us recap the facts (the figure is shown below again):
- We know that since machine A makes 70% of all parts, we can naively estimate there is 70% chance that a defective part was made by machine A
- With additional knowledge of defect rates (A is 5% and B is 10%), we update this naive estimate to 54% chance that a defective part was made by machine A.
Now let us extend this to our classic golf scenario. The objective here is to estimate the likelihood of playing golf (Yes or No) given weather condition information. There are 14 records and there were 9 instances where Play=Yes and 5 instances where Play=No. Without knowing any of the other data, the naive estimate is Play=Yes for 64% (=9/14) and No 36% (=5/14). However, now we do have additional information which can be better utilized. Suppose we had Outlook data as follows:
Now the question "if Outlook is Overcast, what is the likelihood of play?", can be answered by applying the Bayes rule just as we did in the machine example. Let us recap the facts just like before:
- We know that since "Play=Yes" 64% of all time, we can naively estimate there is 64% chance golf will be played!
- However with the following additional knowledge about Outlook:
Outlook=overcast for 28.4% of time (=4/9, 4 out of 9 cases) when "Play=Yes", and Outlook=overcast for 0% of time (=0/5, 0 out of 5 cases) when "Play=No ". We can now update the original naive estimate to 100% chance that when "Outlook=overcast", "Play=Yes". See the graphic below.
How is this applied and interpreted using RapidMiner?
The Naive Bayes operator is one of the simplest ones available. There is only one option to set in the parameter setttings: the Laplace correction, do not select it for this example. This short video below illustrates the process.
You can see that for all the cases from the test data set, everytime outlook=overcast, the model predicts a 100% confidence that Play=Yes, as we explained using the Bayes rule from first principles above. Obviously this example was highly simplified to illustrate the gradual progression from a statistical Bayes theorem implementation to a predictive analytics modeling scenario. As more attributes are added to the dataset, applying and interpreting the original Bayesian principles will clearly become cumbersome. But tools like RapidMiner make it easy to apply Naive Bayes to really complex examples by following a set process.
As indicated in the earlier article, one of the 2 caveats which must be kept in mind while applying Naive Bayes is the rule of attribute independence. How do you test if two attributes are independent? You can apply the chi-square test of independence if you have nominal parameters or use mutual information if you have numeric parameters. Technically, Naive Bayes classification works only with categorical predictors, however RapidMiner still works! How? Read this other article on how to apply Naive Bayes for numeric attributes where this question is answered in detail.