Keeping analytics simple and intuitive makes cost modeling effective

Posted by Bala Deshpande on Tue, Jan 28, 2014 @ 07:50 AM

When executives do not clearly understand how a forecast or prediction works, they naturally tend to be suspicious about its usage. This suspicion gets stronger if the forecasts or predictions are very good! A common challenge that one has to address when using certain "black box" techniques, for example, artificial neural networks among others, is that they are difficult to explain to non-analytics people and therefore spread doubt and confusion about the real benefits businesses can derive from analytics. 

Read More

Tags: cost modeling, multicollinearity, multiple linear regression, correlations

Integrating Tableau and R for data analytics in four simple steps

Posted by Bala Deshpande on Fri, Oct 18, 2013 @ 07:14 AM

Tableau, the prom queen of data is finally going out with R, the alpha-geek of analytics. This is a moment a lot of us have been waiting for. Tableau will soon release their version 8.1 which allows super easy integration with R. I had the opportunity to test drive the beta version of 8.1 with really cool results. Below are a few initial impressions along with a simple workbook you can download and play with (if you have the beta version).

Read More

Tags: technology reviews, correlations, data mining with R

Ranking KPIs: a critical first step for small or big data analytics

Posted by Bala Deshpande on Wed, Sep 04, 2013 @ 06:45 AM

When the website was launched in 2009, it had a measly 47 datasets. Four years later it has exploded to nearly 100,000 data sets in more than 50 formats. This is merely the public facing data which the government makes available to the tax paying citizenry. The "other" government data (still funded by taxes) which are not openly available to all, due to security and other reasons is clearly significantly larger. EMC Corporation recently released a report where they indicated that only about a quarter of this data is tagged and analyzed by the government currently. Officials have been quoted as saying that in the next 5 years, the feds will spend about $13 billion (16% of the total IT budget) to improve big data infrastructure and develop data mining best practices for this data. The report also summarized the top three areas where large government agencies can best leverage big data and analytics: improving process and efficiency, enhancing security and predicting trends.

Read More

Tags: keyconnect, correlations, KPI

Mutual information based filter vs. wrapper type feature selection

Posted by Bala Deshpande on Fri, Apr 27, 2012 @ 08:10 AM

We indicated that there are two main types of feature selection algorithms: wrapper type and filter type. A wrapper algorithm works within another machine learning program such as multiple linear regression. Good examples are Backward Elimination and Forward selection. Each iteration using a regression model either removes or introduces a variable which improves model performance. The iterations stop when a preset performance criterion (such as adjusted r-square or RMS error) is reached or exceeded. The inherent advantage of wrapper type methods are that multi-collinearity issues are automatically handled. However, you get no prior knowledge (or will be interested in afterwards) about the actual relationship between the variables.

Read More

Tags: keyconnect, correlations, mutual information, feature selection

Cost modeling with multiple linear regression: 2 rules to ensure quality

Posted by Bala Deshpande on Wed, Mar 21, 2012 @ 09:16 AM

We have described how cost models can be developed using regression. We have also described how to use these models for forecasting. Thus the model is used to not only develop a better understanding of the data (explanatory statistics), but it is then also used to make forecasts (predictive analytics).

Read More

Tags: cost modeling, multiple linear regression, correlations, linear regression models

2 ways to select predictors for regression models using RapidMiner

Posted by Bala Deshpande on Thu, Dec 15, 2011 @ 09:09 AM

Supposing you are building a multiple linear regression model using many factors, a first step is reducing the number of factors or predictors. This process is known as feature selection or dimension reduction among others. Two key points need to be kept in mind:

Read More

Tags: decision trees, correlations, linear regression models, feature selection

Supply chain analytics: applying systems thinking in 4 steps part 2/2

Posted by Bala Deshpande on Fri, Oct 28, 2011 @ 08:00 AM

In part 1, we described a basic three workstation assembly line which is the "business end" of a supply chain. Here are the key management challenges in this supply chain: product assembly lead times are increasing and revenues are dropping (see chart below). Much of this complexity is driven by variability in the inventory levels of raw material parts coming into the assembly process.

Read More

Tags: correlations, systems thinking, supply chain analytics

Reasons why feature selection is important in predictive analytics

Posted by Bala Deshpande on Tue, Jul 05, 2011 @ 10:30 AM

Feature selection or data dimension reduction or variable screening in predictive analytics refers to the process of identifying the few most important variables or parameters which help in predicting the outcome. In today's charged up world of high speed computing, one might be forgiven for asking, why bother? The most important reasons all come from practicality.

Read More

Tags: predictive analytics, business analytics, correlations, linear regression models, feature selection

Do IPL cricket statistics reveal logic behind player valuations?

Posted by Bala Deshpande on Mon, May 09, 2011 @ 11:10 AM

When Dirk Nannes was forced out of injury, Royal Challengers Bangalore actually struck a gold mine by signing up the ever-explosive Chris Gayle to replace him. Gayle's base price was $400,000 and according to IPL rules, Chris could not be paid more than $650,000 because that was the value of the player he was replacing. Let us assume that was what he was finally signed for.

Read More

Tags: predictive analytics in sport, cricket statistics, correlations

When Principal Component Analysis makes sense in business analytics

Posted by Bala Deshpande on Fri, May 06, 2011 @ 11:01 AM

Principal component analysis (PCA) is a technique according to Wikipedia that "uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components."

Read More

Tags: advanced business analytics, correlations, data mining, entropy, principal component analysis