The process of developing classification models for business use cases follows a sequence that is represented in this diagram – also known as the CRISP-DM process.
The data preparation portion may be considered “pre-processing” while the evaluation portion may be considered “post-processing”. Before deployment however, we need to ensure model validity. Some common techniques that are used to verify if a classification model is fit for deployment are described in part 1 of this two part series. Techniques used to evaluate regression models will be described in another set of articles.
1. Confusion Matrix
Before training the model, the available data is typically split up into two components: a training set and a validation set in a 70-30 or 80-20 ratio. A confusion matrix is a table that makes comparisons between the model predicted classes and actual classes from the labeled data within the validation set.
The table above could be an example of a credit scoring exercise using a decision tree model. The entries a, b, c, and d are interpreted as follows:
a – number of cases for which the model predicted credit score was good and the actual credit rating (based on labeled validation data set) was also good. This is a measure of True Positives.
b – number of cases which the model predicted “good”, but was in reality “bad”. This is a measure of False Positives.
c – number of cases which the model predicted “bad”, but was in reality “good”. This is a measure of False Negatives.
d – number of cases which the model predicted “bad”, and was in reality also “bad”. This is a measure of True Negatives.
The main evaluation criteria of classification models are given in the bottom row of the confusion matrix. Sensitivity is a measure of true positive rate. Specificity is a measure of False positive rate. Accuracy is an indicator of overall effectiveness of the model across the entire data.
2. Gain and Lift charts
Many times the measure of overall effectiveness of classification models is not enough. It may be important to know if the model does increasingly better with more data. Is there any marginal improvement in the model’s predictive ability if for example, we consider 70% of the data versus only 50%?
Gain (and Lift) charts were developed to answer this question. The focus is on the true positives and thus it can be argued that they indicate the sensitivity of the model. These types of charts are common in business analytics of Direct Marketing where the problem is to identify if a particular prospect was worth calling.
Basis for building Gains charts: Randomly selecting x% of data (prospects) would yield x% of targets (to call or not). Gain is the improvement over this random selection that a predictive model can potentially yield.
- Generate scores for all the data points (prospects) in the validation set using the trained model
- Rank the prospects by decreasing score
- Count the labels (whether a particular prospect is a good target or not) in the first 10% (decile) of the validation data set, and then first 20% (add the next decile) of the validation set and so on
- Gain at a given decile level is the ratio of cumulative number of targets up to that decile to the total number of targets in the entire data set
- Lift is the ratio of gain % to the random expectation % at a given decile level. Remember that random expectation at the x-th decile is x%.
- Gain is shown as a percent value
- Lift is shown as a simple ratio >=1
In the next article we will introduce another very commonly used evaluation tool for classification models.
Originally posted on Wed, Apr 27, 2011 @ 10:51 AM