Commodity price forecasting is an important activity for many different industry verticals. The underlying objective for commodity price forecasting - as with any forecasting activity - is quite simple: to predict future behavior of a variable quantity. The users of such analytics are typically operations and supply chain managers.Read More
Continuing along in our series on collaborative filtering, we now turn to item based recommenders. In the second article of the series, we discussed user based recommenders and in the third article we explained the key differences between user based and item based recommendation systems. Remember that either type of recommendation systems aim to rank different items for each user in the database. Item based recommenders first create a matrix that measures the similarities between all pairs of items (which in our series are the different movies). For any given user, an unranked item is then presented with a short-list (k-nearest neighbors) of "most" similar items. The final predicted rank for that item will be a weighted average of the ratings provided by the user for the k-similar items. We start with depivoted utility matrix such as the one shown below, the item based recommender then generates an item similarity matrix from which it will finally create a table of ratings for unrated items (which is the final product).
The Utility Matrix
This article was contributed by Vaibhav Waghmare.
In a previous article we discussed building user based recommenders in RapidMiner. Before we jump into the next type: item based recommenders, let us try to understand the key differences between the two types. From a descriptive standpoint, a user based recommender measures similarities between different users and uses the rating from the top k closest (in similarity) users to a given user to arrive at a rating prediction or recommendation for the given user. An item based recommender measures similarities between different items and picks the top k closest (in similarity) items to a given item to arrive at a rating prediction or recommendation for a given user for a given item.
If you simply collecting a lot of data, you do not have a big data use case. However you do have a big data use case, if you need to process and analyze the collected data in order to generate greater business value. A common example cited is that if you are storing millions of records from your customers personal info in a database that is not a big data use case. However if you are collecting web logs from millions of hits to your site from online transactions, the that probably is.
In an earlier article we discussed the overall intuition behind collaborative filtering based recommender systems. In this article we will describe the details behind building one using a popular GUI based tool such as RapidMiner. As mentioned in the introduction to recommender systems, the key piece of data that you will need to start with is the utility matrix. However, for practical implementation the utility matrix has to be de-pivoted. Recall that a utility matrix in this context consists of rows which represent different users and columns which represent different items that have been rated. This is sometimes called a cross-tabulation. But for reading into a tool like RapidMiner, we will have to de-pivot this into a table which has only 3 columns: a userID, an itemID and a rating. Thus each row contains a unique piece of data: the rating provided by a particular user for a particular item. See image below.
You know that predictive analytics has really arrived when there are two articles about it in a single issue of Time magazine (Aug 18, 2014 issue). While it is a rapidly maturing field (according to Gartner's hype cycle, it is already at its plateau of productivity), there are still some areas where improvements are being worked on.
Thanks to Amazon, Netflix and the like, recommender engines have become synonymous with predictive analytics and in many ways are the benchmark indicators of predictive analytics maturity in an organization, particularly in retail and ecommerce. Its adoption is going to increase to more non-tech type companies and even smaller businesses, thanks to the open source movement in analytics and big data infrastructure. At the end of the day, all content providers (including this website, via Add This!) will make use of recommender engines.
Recently I came across this bit of interesting factoid which went mostly unnoticed by media, because for most people it is not earth-shattering news- "... the operating cost of some robots is now less than the salary of an average Chinese worker". This has tremendous implications for manufacturing in the near future (10 years at the maximum) where most mundane and low level jobs will be taken over by machines - especially connected machines. Typical among this list of jobs which will move away from humans and towards machines are assembly line work (will be handled by robots, mostly) and complex manufacturing (3d printers). However in the non-manufacturing world too, the impacts of connected machines will be clear: taxi drivers and chauffeurs (replaced by self driving cars), delivery men (by drones), pharmacists and even personal physicians (by smart Watson-type programs) and so on. This is as game-changing as the internet was barely 20 years ago.
This article was contributed by Vaibhav Waghmare.