In this article we will briefly describe how to apply the results from a logistic regression analysis with RapidMiner (more detailed information on how to build logistic regression models). Let us start by recapping the basic elements of logistic regression.
1. Logistic regression is the equivalent of linear regression that is used when the response variable or label is binomial. A binomial response variable has two categories: Yes/No, Accept/Not Accept, Default/Not Default and so on.
2. Logarithm of the odds of the response, Y, being a "Yes" is expressed as a function of independent or predictor variables, X, and a constant term. That is, for example
log (odds of Y = "Yes") = mX + c ---- This is also called the Logit
3. The logit gives the odds of the "Yes" event, however if we want probability, we need to use the transformed equation below:
p (of Y = "Yes") = Reciprocal of [1+exp(-mX-c)]
A simple example
Let us use a simple example of predicting if a customer will accept a bank's personal loan offer as a function of their income.
When we run this simple dataset and build a logistic regression model, we see the following results
RapidMiner's implementation of logistic regression differs from many other (more conventional) approaches. The table on the left which shows the kernel model should not be confused with the logit model described above. In other words, w[Income] does not directly correspond to the slope "m" and Bias (offset) does not directly correspond to "c".
The easiest way to implement the results of the analysis is to use the process below which applies the results of the logistic regression learner on the example data set.
When the analysis runs, simply click on the "Example Set" tab and the "Data View" radio button. You will see that for each of the cases, there is a predicted result - Prediction (Personal loan) and the confidence or probability that the loan acceptance is "No" and the corresponding inverse probability of "Yes".
The main takeaway from this article is that, using RapidMiner it is easier to apply the developed model to new data to obtain probability of response variable being in one of the two categories, rather than trying to interpret the model parameters in the light of traditional formulas, such as the logit.
Get all our material on logistic regression in our e-book digest below.