Machine learning optimization is becoming increasingly relevant in practice. Machine learning models, ranging from decision trees to support vector machines need specific parameter settings that sometimes leave an analyst wondering if the best model was fitted to available data. A common difficulty one encounters while building models for predictive analytics is selecting these parameters. Even a relatively “simple” decision tree model has half a dozen settings that one must adjust or tune to get a well performing model: terminal leaf size, minimum size to create a split, minimal gain, maximal tree depth, pre-pruning choices and of course the type of splitting criterion.

One way of implicitly verifying if the selected parameters were appropriate, is to of course test the performance of the model on the portion of data reserved for testing. But this is an open ended check – there is no way of knowing if there were other parameter setting choices which would have resulted in the same or better performance in lesser time. In order to check if indeed we put together the best possible model, many tools recommend optimizing for these parameters.

Optimization simply involves building the machine learning models iteratively by systematically cycling through various combination of these parameters and measuring the performance. RapidMiner, for example, offers three main optimization schemes to do this iterative model fitting: a grid search, a greedy search and an evolutionary search.

Grid search is an exhaustive scan of the entire “design landscape”, sometimes also referred to as brute force optimization. If there are six modeling parameters which we need to optimize, we simply cycle through all combinations of these six parameters. Clearly, unless the parameter ranges (or choices) are severely limited, this will turn out to be a combinatorial nightmare!

Greedy search is a method for finding the best solution at a given step without looking too far afield. If the objective is to maximize say, accuracy, a greedy search will pick the nearest solution (the first combination of the six parameters) which will give a maximum. It will not look through all available choices, unlike a grid search. Thus, a greedy search has a tendency to get stuck in local optimum.

An evolutionary search is smarter than a grid search and a more focused goal seeker. It starts with several random combinations (an initial population) of the six parameters, evaluates the objective for each candidate within this initial population, and chooses the one which has the best performance (highest accuracy). It then creates a new population by using some of the parameters from the best combination of the initial population by making random changes (mutation) or selective changes (crossover) similar to biological evolution. It evaluates the objective (accuracy) for the new population. If the maximum accuracy is within a certain tolerance level specified by the user, the iteration stops otherwise it continues.

RapidMiner’s optimizers (look for “Optimize Parameters” under Process Control operators) work as “wrappers” for the model build-evaluate processes. We first set up the model as usual and then nest the entire process inside one of the optimizers. We then have to select using the Optimize operator, which parameters need to be optimized.

*Originally posted on Mon, Oct 28, 2013 @ 09:30 AM*

## No responses yet