“Truth is a Pathless Land”

...but finding an effective solution to your business problem does not have to be. Business analytics landscape does actually appear so, with a myriad techniques and vendor tools in the market.

Simafore provides tools and expertise to:

  • Integrate data
  • Select and deploy appropriate analytics
  • Institutionalize processes

About this Blog

The Analytics Compass Blog is aimed at two types of readers:

  • individuals who want to build analytics expertise and 

  • small businesses who want to understand how analytics can help them improve their business performance. 

If you fall into one of these categories, join hundreds of others and subscribe now!

Subscribe via E-mail

Your email:

Search SimaFore

FREE SMB Survey Report

describe the image
  • Our report (pdf) is a survey of analytics usage and needs of more than 100 SMBs
  • Find out which analytics applications have
    • Highest value
    • Most demand
    • Best ROI

Affordable Analytics for SMB

 

Browse by Tag

Blog - The Analytics Compass

Current Articles | RSS Feed RSS Feed

How support vector machines use kernel functions to classify data

  
  
  

Previously we discussed some of the advantages of support vector machines over other classification methods. We introduced some defining terms which the analyst has to make themselves familiar with such as hyperplane, margin, support vectors and the "linearly separable" concept. One key term was deliberately left undefined. This is the so called kernel function or basis function. In this article we will demonstrate how choosing the proper kernel function can significantly improve the SVM performance.

In the picture below, the data points belong to two main classes: an inner ring and an outer ring. Your intuition will tell you, correctly, that these two classes are not "linearly separable". In other words we cannot draw a straight line to split the two classes. However, it is also intuitively clear that an elliptical or circular "hyperplane" can easily separate the two classes.

non linear support vector machines for classification two ring dataset resized 600In fact, if we were to run a simple linear SVM on this data, we would get a classification accuracy of around 46%. As seen in the result below, a linear SVM would classify about half the inner ring and half the outer ring correctly.

linear svm classifying non linear data resized 600

How can we classify more complex feature spaces? A simple trick would be to transform the two variables x and y into a new feature space involving x (or y) and a new variable z defined as z = sqrt(x^2+y^2). The representation of z is nothing more than the equation for a circle. When the data is transformed in this way, the resulting feature space involving x and z will appear as shown below. The two clusters of data correspond to the two radii of the rings: the inner one with an average radius of around 5.0 and the outer cluster with an average radius of around 8.0.

using non linear kernels in support vector machine resized 600

Clearly this new problem in x and z dimensions is now linearly separable and we can apply a standard SVM to do the classification. When we run a linear SVM on this transformed data, we get a classification accuracy of 100%. After classifying the transformed feature space, we can inverse the transformation to get back our original feature space.

Kernel functions offer the user this option of transforming nonlinear spaces into linear ones. Most packages which offer SVM will include several non linear kernels ranging from simple polynomial basis functions to sigmoid functions. The user does not have to do the transformation before hand, but simply has to select the appropriate kernel function and the software will take care of transforming the data, classifying it and retransforming the results back into the original space.

Unfortunately with a large number of attributes in a dataset it is difficult to know which kernel would work best. The most commonly used ones are polynomial and radial basis functions. From a practical standpoint, it is a good idea to start with a quadratic polynomial and work your way up into some of the more exotic kernel functions till we reach a desired accuracy level. This flexibility of support vector machines does come at the price of cost of computation. 

This article is a brief excerpt from our upcoming book on DataMining and Predictive Analytics using RapidMiner. Take this survey and give us feedback on what else you would like to see in our book.

Comments

Currently, there are no comments. Be the first to post one!
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics