The Analytics Compass Blog

Twice weekly articles to help SMB companies optimize business performance with data analytics and to improve their analytics expertise.

Subscribe via E-mail

Your email:

Search SimaFore

FREE SMB Survey Report

describe the image
  • Our report (pdf) is a survey of analytics usage and needs of more than 100 SMBs
  • Find out which analytics applications have
    • Highest value
    • Most demand
    • Best ROI

Affordable Analytics for SMB


Browse by Tag

Current Articles | RSS Feed RSS Feed

Keeping analytics simple and intuitive makes cost modeling effective

keep analytics simple

When executives do not clearly understand how a forecast or prediction works, they naturally tend to be suspicious about its usage. This suspicion gets stronger if the forecasts or predictions are very good! A common challenge that one has to address when using certain "black box" techniques, for example, artificial neural networks among others, is that they are difficult to explain to non-analytics people and therefore spread doubt and confusion about the real benefits businesses can derive from analytics. 

Integrating Tableau and R for data analytics in four simple steps

integrating r and tableau process overview resized 600

Tableau, the prom queen of data is finally going out with R, the alpha-geek of analytics. This is a moment a lot of us have been waiting for. Tableau will soon release their version 8.1 which allows super easy integration with R. I had the opportunity to test drive the beta version of 8.1 with really cool results. Below are a few initial impressions along with a simple workbook you can download and play with (if you have the beta version).

Ranking KPIs: a critical first step for small or big data analytics

keyconnect KPI ranking

When the website was launched in 2009, it had a measly 47 datasets. Four years later it has exploded to nearly 100,000 data sets in more than 50 formats. This is merely the public facing data which the government makes available to the tax paying citizenry. The "other" government data (still funded by taxes) which are not openly available to all, due to security and other reasons is clearly significantly larger. EMC Corporation recently released a report where they indicated that only about a quarter of this data is tagged and analyzed by the government currently. Officials have been quoted as saying that in the next 5 years, the feds will spend about $13 billion (16% of the total IT budget) to improve big data infrastructure and develop data mining best practices for this data. The report also summarized the top three areas where large government agencies can best leverage big data and analytics: improving process and efficiency, enhancing security and predicting trends.

Mutual information based filter vs. wrapper type feature selection

feature selection using mutual information boston housing resized 600

We indicated that there are two main types of feature selection algorithms: wrapper type and filter type. A wrapper algorithm works within another machine learning program such as multiple linear regression. Good examples are Backward Elimination and Forward selection. Each iteration using a regression model either removes or introduces a variable which improves model performance. The iterations stop when a preset performance criterion (such as adjusted r-square or RMS error) is reached or exceeded. The inherent advantage of wrapper type methods are that multi-collinearity issues are automatically handled. However, you get no prior knowledge (or will be interested in afterwards) about the actual relationship between the variables.

Cost modeling with multiple linear regression: 2 rules to ensure quality

multiple regression model correlations resized 600

We have described how cost models can be developed using regression. We have also described how to use these models for forecasting. Thus the model is used to not only develop a better understanding of the data (explanatory statistics), but it is then also used to make forecasts (predictive analytics).

2 ways to select predictors for regression models using RapidMiner

rapidminer gui discretize operator

Supposing you are building a multiple linear regression model using many factors, a first step is reducing the number of factors or predictors. This process is known as feature selection or dimension reduction among others. Two key points need to be kept in mind:

Supply chain analytics: applying systems thinking in 4 steps part 2/2

systems thinking for supply chain management

In part 1, we described a basic three workstation assembly line which is the "business end" of a supply chain. Here are the key management challenges in this supply chain: product assembly lead times are increasing and revenues are dropping (see chart below). Much of this complexity is driven by variability in the inventory levels of raw material parts coming into the assembly process.

Reasons why feature selection is important in predictive analytics

feature selection predictive analytics

Feature selection or data dimension reduction or variable screening in predictive analytics refers to the process of identifying the few most important variables or parameters which help in predicting the outcome. In today's charged up world of high speed computing, one might be forgiven for asking, why bother? The most important reasons all come from practicality.

Do IPL cricket statistics reveal logic behind player valuations?

ipl gecko batsman

When Dirk Nannes was forced out of injury, Royal Challengers Bangalore actually struck a gold mine by signing up the ever-explosive Chris Gayle to replace him. Gayle's base price was $400,000 and according to IPL rules, Chris could not be paid more than $650,000 because that was the value of the player he was replacing. Let us assume that was what he was finally signed for.

When Principal Component Analysis makes sense in business analytics

principal component analysis simple explanation

Principal component analysis (PCA) is a technique according to Wikipedia that "uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components."

All Posts