“Truth is a Pathless Land”

...but finding an effective solution to your business problem does not have to be. Business analytics landscape does actually appear so, with a myriad techniques and vendor tools in the market.

Simafore provides tools and expertise to:

  • Integrate data
  • Select and deploy appropriate analytics
  • Institutionalize processes

About this Blog

The Analytics Compass Blog is aimed at two types of readers:

  • individuals who want to build analytics expertise and 

  • small businesses who want to understand how analytics can help them improve their business performance. 

If you fall into one of these categories, join hundreds of others and subscribe now!

Subscribe via E-mail

Your email:

Search SimaFore

FREE SMB Survey Report

describe the image
  • Our report (pdf) is a survey of analytics usage and needs of more than 100 SMBs
  • Find out which analytics applications have
    • Highest value
    • Most demand
    • Best ROI

Affordable Analytics for SMB

 

Browse by Tag

Blog - The Analytics Compass

Current Articles | RSS Feed RSS Feed

Mutual information based filter vs. wrapper type feature selection

  
  
  
feature selection using mutual information boston housing resized 600

We indicated that there are two main types of feature selection algorithms: wrapper type and filter type. A wrapper algorithm works within another machine learning program such as multiple linear regression. Good examples are Backward Elimination and Forward selection. Each iteration using a regression model either removes or introduces a variable which improves model performance. The iterations stop when a preset performance criterion (such as adjusted r-square or RMS error) is reached or exceeded. The inherent advantage of wrapper type methods are that multi-collinearity issues are automatically handled. However, you get no prior knowledge (or will be interested in afterwards) about the actual relationship between the variables.

Cost modeling with multiple linear regression: 2 rules to ensure quality

  
  
  
multiple regression model correlations resized 600

We have described how cost models can be developed using regression. We have also described how to use these models for forecasting. Thus the model is used to not only develop a better understanding of the data (explanatory statistics), but it is then also used to make forecasts (predictive analytics).

2 ways to select predictors for regression models using RapidMiner

  
  
  
rapidminer gui discretize operator

Supposing you are building a multiple linear regression model using many factors, a first step is reducing the number of factors or predictors. This process is known as feature selection or dimension reduction among others. Two key points need to be kept in mind:

Supply chain analytics: applying systems thinking in 4 steps part 2/2

  
  
  
systems thinking for supply chain management

In part 1, we described a basic three workstation assembly line which is the "business end" of a supply chain. Here are the key management challenges in this supply chain: product assembly lead times are increasing and revenues are dropping (see chart below). Much of this complexity is driven by variability in the inventory levels of raw material parts coming into the assembly process.

Reasons why feature selection is important in predictive analytics

  
  
  
feature selection predictive analytics

Feature selection or data dimension reduction or variable screening in predictive analytics refers to the process of identifying the few most important variables or parameters which help in predicting the outcome. In today's charged up world of high speed computing, one might be forgiven for asking, why bother? The most important reasons all come from practicality.

Do IPL cricket statistics reveal logic behind player valuations?

  
  
  
ipl gecko batsman

When Dirk Nannes was forced out of injury, Royal Challengers Bangalore actually struck a gold mine by signing up the ever-explosive Chris Gayle to replace him. Gayle's base price was $400,000 and according to IPL rules, Chris could not be paid more than $650,000 because that was the value of the player he was replacing. Let us assume that was what he was finally signed for.

When Principal Component Analysis makes sense in business analytics

  
  
  
principal component analysis simple explanation

Principal component analysis (PCA) is a technique according to Wikipedia that "uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components."

3 checks to prevent abuse of regression models

  
  
  
3 checks to prevent abuse linear regression models

An illuminating recent survey by Rexer Analytics showed that among the many business analytics techniques available, the three most commonly used were Decision Trees, Regression models and Clustering. With the expected growth of analytics given the data boom, and the shortage of expertise it would be safe to say that the usage of these three techniques will continue to grow, in particular by newly trained professionals. In this article we focus on some of the key points to watch out for while using linear regression models.

3 basic concepts which underpin the chi-square test

  
  
  
contingency table chi square test

In the last couple of articles we discussed what type of business analytics problems can be addressed by chi-square test for independence and also how to implement the test with an actual example. In this article we will discuss the mechanics of the technique itself and try to understand why and how the chi-square technique works.

6 checkpoints to ensure regression model validity for analytics

  
  
  
accuracy of regression models split arrow

While acknowledging the general overall risk in using models, it is important to know how to mitigate some of these risks. In this article, we will specifically focus on 6 checkpoints to ensure that bivariate analyses used to develop models (such as simple regression models), or to verify if two parameters are related, are valid. Finally, we will briefly mention some advantages of using mutual information over simple regression models for bivariate analysis.

All Posts