Twice weekly articles to help SMB companies optimize business performance with data analytics and to improve their analytics expertise.
This article was contributed by Vaibhav Waghmare.
If you simply collecting a lot of data, you do not have a big data use case. However you do have a big data use case, if you need to process and analyze the collected data in order to generate greater business value. A common example cited is that if you are storing millions of records from your customers personal info in a database that is not a big data use case. However if you are collecting web logs from millions of hits to your site from online transactions, the that probably is.
In an earlier article we discussed the overall intuition behind collaborative filtering based recommender systems. In this article we will describe the details behind building one using a popular GUI based tool such as RapidMiner. As mentioned in the introduction to recommender systems, the key piece of data that you will need to start with is the utility matrix. However, for practical implementation the utility matrix has to be de-pivoted. Recall that a utility matrix in this context consists of rows which represent different users and columns which represent different items that have been rated. This is sometimes called a cross-tabulation. But for reading into a tool like RapidMiner, we will have to de-pivot this into a table which has only 3 columns: a userID, an itemID and a rating. Thus each row contains a unique piece of data: the rating provided by a particular user for a particular item. See image below.
You know that predictive analytics has really arrived when there are two articles about it in a single issue of Time magazine (Aug 18, 2014 issue). While it is a rapidly maturing field (according to Gartner's hype cycle, it is already at its plateau of productivity), there are still some areas where improvements are being worked on.
Thanks to Amazon, Netflix and the like, recommender engines have become synonymous with predictive analytics and in many ways are the benchmark indicators of predictive analytics maturity in an organization, particularly in retail and ecommerce. Its adoption is going to increase to more non-tech type companies and even smaller businesses, thanks to the open source movement in analytics and big data infrastructure. At the end of the day, all content providers (including this website, via Add This!) will make use of recommender engines.
Recently I came across this bit of interesting factoid which went mostly unnoticed by media, because for most people it is not earth-shattering news- "... the operating cost of some robots is now less than the salary of an average Chinese worker". This has tremendous implications for manufacturing in the near future (10 years at the maximum) where most mundane and low level jobs will be taken over by machines - especially connected machines. Typical among this list of jobs which will move away from humans and towards machines are assembly line work (will be handled by robots, mostly) and complex manufacturing (3d printers). However in the non-manufacturing world too, the impacts of connected machines will be clear: taxi drivers and chauffeurs (replaced by self driving cars), delivery men (by drones), pharmacists and even personal physicians (by smart Watson-type programs) and so on. This is as game-changing as the internet was barely 20 years ago.
Portions of this article were contributed by Vaibhav Waghmare.
Supplier companies which manufacture commoditized products face constant cost pressures. As a way to separate themselves from the herd, many savvy suppliers are realizing that what may set them apart is the ability to help their customers' buyers make an informed decision. A manufacturer of paper designs which supplies national retailers has realized that in addition to providing the buyer with designs and sourcing information, they can also act as trusted advisors to their customers by identifying and recommending trendy ones. This is something predictive analytics can provide, starting with some basic time series forecasting.
There are many issues that impact customer retention for businesses. The cost of neglecting these can be enormous and in many ways is directly proportional to the size of a business. For example, for large automotive companies which sell millions of vehicles per year, a single percentage point decrease in customer retention can mean millions of dollars of lost business and untold loss in brand image. But this does not mean that small and medium manufacturers do not face this problem. While the absolute scale of the problem (in dollar figures) may not suggest a massive issue, relatively speaking, these same problems could devastate a smaller company.