Critical model steps for practical multiple linear regression - Pt 3

Posted by Bala Deshpande on Thu, Nov 20, 2014 @ 08:51 AM


In the first article of this series, we described the choice of variables for starting a multiple linear regression model. In the second article, we discussed how to build the model and evaluate/explain it to a business user. In this final article, we will make sure that we can correctly interpret the coefficients of the model. But before that we will need to ensure that the coefficients are statistically valid or meaningful.

Read More

Time for small business to embrace machine learning (not fear it)

Posted by Bala Deshpande on Tue, Nov 18, 2014 @ 09:27 AM

Steve Ballmer recently came out strongly in favor of machine learning, calling it the next era of computer science. Recently another business legend, Elon Musk, said the rapid pace of developments in artificial intelligence signals the end of humanity. So who is right? Musk may have a point - humans struggle to balance the benefits and risks of any new path breaking technology. For example, we are still struggling to balance the pros and cons of nuclear technology more than 70 years after its breakthrough. But in this case, my vote is for Ballmer, as much as I recognize the dangers posed by the rush of any new technology.  The truth as always is in between the extremes, as the Buddha says. I think Musk's fears of a Terminator style SkyNet may be a tad overblown. Here is my reasoning with a realistic example.

Read More

Tags: machine data analytics, data science

Coming soon: IoT analytics - the biggest use case for big data

Posted by Bala Deshpande on Thu, Nov 13, 2014 @ 08:00 AM

The most successful companies today are the ones who have acted upon their data assets by leveraging advanced analytics and big data technologies. If you think of Apple, Netflix, Google or FaceBook, the one thing they all have in common (in addition to being "tech" companies) is that they have highly evolved analytics strategies and practices. One can safely say that all of these companies are "data" companies, and not device or video rental or search or social media companies. Add to this list, a company like Domino's Pizza. I had the opportunity to visit their campus yesterday and was amazed by how data driven the business is. In fact, Domino's proudly announces in their lobby that, outside of Amazon and Google, they are the largest consumer of big data analytics. So who exactly are the actual users of all of the big data and analytics? Today the largest users of big data in business are the folks in marketing. They need to leverage Hadoop for everything from sentiment analysis to real time product recommendations.

Read More

Tags: internet of things, big data, sensor data analytics

Critical model steps for practical multiple linear regression - Pt 2

Posted by Bala Deshpande on Tue, Nov 11, 2014 @ 09:44 AM


In the previous post, we described the first couple of steps required for setting up multiple linear regression models for prediction. These steps focused mainly on exploring the predictors or variables in the data set that would influence the outcome. It was also mentioned that wrapper type feature selection methods such as forward selection or backward elimination are usually used to select the variables which will go into the model. In this article, we will look at how one of these methods can be employed to build a model and once the model is built how to quantify the model performance. In particular, we will explain the differences between using the adjusted R2 and standard error of the regression estimate to evaluate model performance.

Read More

Tags: multiple linear regression

How to progress along the analytics maturity model

Posted by Bala Deshpande on Thu, Nov 06, 2014 @ 09:35 AM

Today’s business world is filled with opportunities to review the information on the way that business organization, collect, and present  information. If you were to evaluate your organization’s current ability to report and analyze information, what analytic stage would you say your organization is at? According to the now-classic book "Competing on Analytics" by Thomas Davenport and Jeanne Harris, there are 5 main stages in the analytics maturity model:

Read More

Tags: analytics maturity grader

Critical modeling steps for practical multiple linear regression: Pt 1

Posted by Bala Deshpande on Tue, Nov 04, 2014 @ 09:00 AM

Regression models have been around for a couple of centuries now, and yet the utility of the technique is unsurpassed because it is a good candidate for any present day application which requires a numerical prediction or statistical abstraction. In analytics consulting practice, it is usually the go-to technique whenever there is a need for a tool to explain to customers what their data means.

Read More

Tags: predictive analytics, multiple linear regression

Analytics using SQL Server and RapidMiner: set up connections

Posted by Bala Deshpande on Fri, Oct 31, 2014 @ 07:00 AM


In this article we will cover the main steps required to connect a datamining tool such as RapidMiner to popular database management systems such as Microsoft SQL Server. We use RapidMiner 6.x and SQL Server 2012 for how-to article, but it should work in earlier and newer versions of the tools as the basics have not changed much.

Read More

Tags: data mining with rapidminer, databases

Cost modeling using commodity price forecasting

Posted by Bala Deshpande on Tue, Oct 14, 2014 @ 09:39 AM

Commodity price forecasting is an important activity for many different industry verticals. The underlying objective for commodity price forecasting - as with any forecasting activity - is quite simple: to predict future behavior of a variable quantity. The users of such analytics are typically operations and supply chain managers.

Read More

Tags: cost modeling, cost forecasting, advanced business analytics, time series analysis

Collaborative filtering using RapidMiner: item based recommenders

Posted by Bala Deshpande on Mon, Oct 06, 2014 @ 11:44 AM

Continuing along in our series on collaborative filtering, we now turn to item based recommenders. In the second article of the series, we discussed user based recommenders and in the third article we explained the key differences between user based and item based recommendation systems. Remember that either type of recommendation systems aim to rank different items for each user in the database. Item based recommenders first create a matrix that measures the similarities between all pairs of items (which in our series are the different movies). For any given user, an unranked item is then presented with a short-list (k-nearest neighbors) of "most" similar items. The final predicted rank for that item will be a weighted average of the ratings provided by the user for the k-similar items. We start with depivoted utility matrix such as the one shown below, the item based recommender then generates an item similarity matrix from which it will finally create a table of ratings for unrated items (which is the final product).


The Utility Matrix

Read More

Tags: recommender systems, item based recommender system

Data and Analytics form 2 of the 4 key pieces in internet of things

Posted by Bala Deshpande on Tue, Sep 23, 2014 @ 06:33 AM

This article was contributed by Vaibhav Waghmare.

Read More

Tags: technology reviews, internet of things, big data