Predictive Analytics is used interchangeably with the term "machine learning" today. If there is one technique within predictive analytics which captures this aspect more than any others, it is Artificial Neural Network modeling.

Where does the "learning" come from?

To understand where the learning happens, we need to first look at a simple artificial neural network structure:

The network shown above does a very simple operation of multiplying the two inputs, x1 and x2 to give a desired output, d. In standard predictive analytics terminology, d is the target value. The problem is to identify correct coefficients, w1 and w2, for x1 and x2 respectively which will give the desired value, d. Artificial neural networks arose from the need to model more complicated functions, but the basic idea is the same. The starting points for w1 and w2 may be random, but after each computation, w1 and w2 are adjusted to give closer approximations (better predictions) to d.

The learning is really about iteratively finding the coefficients or weights given the inputs and the output. This really leads into the first application of neural nets for predictive analytics.

1. Function approximation:

The example described above is really nothing more than a function approximation. Finding the correct weights or coefficients basically fitting a function. To relate it to standard statistical techniques, think about an artificial neural network as an advanced form of multiple linear regression modeling. In a standard linear regression model, we try to fit a straight line function through the values of the independent variable. In an artifical neural network model there is no restriction that the line needs to be straight. Any arbitrary curve or shape can be "fitted". The simple example pictured above is actually a nonlinear function in itself. (Most software packages such as RapidMiner, however offer a standard sigmoid function with artifical neural network models. More about sigmoid functions can be found in this article on logistic regression modeling).

2. Forecasting:

Forecasting time series data is an extension of function approximation. The main difference is that in this case, the target variable that we aim to predict is the value of the same measure at a future point in time. The independent variables are the historic values of the same measure. At each step in the training phase, a set of historic data is presented to the neural network - for example last 50-week commodity price data - and ask the model to predict the next value - 51st week - in the time series. In this way, a forecasting problem simply reduces to a function approximation. Here is a video for a very good introduction to using neural networks to forecast stock prices.

3. Classification

Neural networks can be used effectively to classify samples, i.e., map input data to different classes or categories. In this case, each output node (the example above had only one output node), can stand for one class or category. An example or sample is determined to belong to class A if the A-th output node computes a higher value than all other output class nodes for that record.

Typical examples of using neural networks for classification include classifying loan applications into credit-worthy or non-credit-worthy groups.

4. Clustering

Clustering is another form of classification, where the number of classes are not known before hand. Therefore the working of neural networks for clustering is similar to classifying records. Nodes with higher outputs to an input sample react more strongly to that sample (i.e. assign higher weights) and to other samples "geographically"  near to that sample.

In a next article in this series, we will describe how to use Artificial Neural Network models in RapidMiner for each of the 4 cases mentioned above.