Simafore provides tools and expertise to:
The Analytics Compass Blog is aimed at two types of readers:
individuals who want to build analytics expertise and
small businesses who want to understand how analytics can help them improve their business performance.
If you fall into one of these categories, join hundreds of others and subscribe now!
We indicated that there are two main types of feature selection algorithms: wrapper type and filter type. A wrapper algorithm works within another machine learning program such as multiple linear regression. Good examples are Backward Elimination and Forward selection. Each iteration using a regression model either removes or introduces a variable which improves model performance. The iterations stop when a preset performance criterion (such as adjusted r-square or RMS error) is reached or exceeded. The inherent advantage of wrapper type methods are that multi-collinearity issues are automatically handled. However, you get no prior knowledge (or will be interested in afterwards) about the actual relationship between the variables.
We have described how cost models can be developed using regression. We have also described how to use these models for forecasting. Thus the model is used to not only develop a better understanding of the data (explanatory statistics), but it is then also used to make forecasts (predictive analytics).
Supposing you are building a multiple linear regression model using many factors, a first step is reducing the number of factors or predictors. This process is known as feature selection or dimension reduction among others. Two key points need to be kept in mind:
In part 1, we described a basic three workstation assembly line which is the "business end" of a supply chain. Here are the key management challenges in this supply chain: product assembly lead times are increasing and revenues are dropping (see chart below). Much of this complexity is driven by variability in the inventory levels of raw material parts coming into the assembly process.
Feature selection or data dimension reduction or variable screening in predictive analytics refers to the process of identifying the few most important variables or parameters which help in predicting the outcome. In today's charged up world of high speed computing, one might be forgiven for asking, why bother? The most important reasons all come from practicality.
When Dirk Nannes was forced out of injury, Royal Challengers Bangalore actually struck a gold mine by signing up the ever-explosive Chris Gayle to replace him. Gayle's base price was $400,000 and according to IPL rules, Chris could not be paid more than $650,000 because that was the value of the player he was replacing. Let us assume that was what he was finally signed for.
Principal component analysis (PCA) is a technique according to Wikipedia that "uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components."
An illuminating recent survey by Rexer Analytics showed that among the many business analytics techniques available, the three most commonly used were Decision Trees, Regression models and Clustering. With the expected growth of analytics given the data boom, and the shortage of expertise it would be safe to say that the usage of these three techniques will continue to grow, in particular by newly trained professionals. In this article we focus on some of the key points to watch out for while using linear regression models.
In the last couple of articles we discussed what type of business analytics problems can be addressed by chi-square test for independence and also how to implement the test with an actual example. In this article we will discuss the mechanics of the technique itself and try to understand why and how the chi-square technique works.
While acknowledging the general overall risk in using models, it is important to know how to mitigate some of these risks. In this article, we will specifically focus on 6 checkpoints to ensure that bivariate analyses used to develop models (such as simple regression models), or to verify if two parameters are related, are valid. Finally, we will briefly mention some advantages of using mutual information over simple regression models for bivariate analysis.