There is a clear sense in the marketplace today that for the internet of things (IoT) to realize its true potential as the next-big-thing, analytics is going to be critical. After all what is the purpose of connecting all these devices and gathering the data if we are not going to do anything about it?
Unfortunately, as it stands today, less than 1% of all the data that is being collected is acted upon. And most of this action is still in the descriptive analytics phase - where attention is being focused on "what happened". As any data scientist can attest, this is just the tip of the iceberg. Next in the level of complexity and sophistication is diagnostic analytics, followed by predictive and eventually prescriptive analytics.
Businesses today seem to be focused (somewhat correctly) on the phase that actually precedes the descriptive analytics effort. Most companies we are engaged with are still battling with how to effectively gather, store and transform the data so that some analytics may be done on it. However this battle has been won in many verticals - most notably in the technology space. Companies like Facebook and Google and others have figured out how to cost effectively do this and these lessons are easily translated to manufacturing and IoT. Their answer is big data technologies such as Hadoop.
Similarly technology companies have also figured out (for themselves) how to convert this massive data into actionable business. But is this learning also translatable into IoT? Can manufacturers - who are going to be the key stakeholders here - simply take what the tech companies have done and apply that to their own data?
This is not as straightforward as the big data infrastructure piece. While Hadoop may work for all generators of big data, Analytics will not. There are many challenges, and the three below highlight the most important technical ones.
Challenge 1: Data structures
Most sensors send out data with a time stamp: some event or phenomenon was captured at a specific point in time. Most of the data is "boring" with nothing happening for much of the time. However once in a while something serious happens and needs to be attended to. While static alerts based on thresholds are a good starting point for analyzing this data, they cannot help us advance to diagnostic or predictive or prescriptive phases. There may be relationships between data pieces collected at specific intervals of times. In other words, classic time series challenges. This is something that most technology companies have not had to handle much of in their big data experiences.
Challenge 2: Combining multiple data formats
While time series data have established techniques and processes for handling, the insights that would really matter cannot come from sensor data alone. There are usually strong correlations between sensor data and other unstructued data. For example, a series of control unit fault codes may result in a specific service action that is recorded by a mechanic. Similarly a set of temperature readings may be accompanied by a sudden change in the macroscopic shape of a part that can be captured by an image or change in the audible frequency of a spinning shaft. We would need to develop techniques where structured data must be effectively combined with unstructured text or image or audio data.
Challenge 3: Predictive analytics needs cross-sectional data
While time series analysis is good for forecasting sales or identifying seasonal patterns, machine learning does not work on time series. It requires what is considered cross sectional data. Cross sectioning time data requires special aggregation techniques and this can cause data imbalance. Now data imbalance is not a new challenge - fraud detection algorithms have had to deal with them for a while. So some of this learning is translatable. However, to get the most out of machine learning, we would need to develop ways to convert time series that would not lose critical information.
There are also a few non-technical (or market based) challenges such as talent, culture and data security, but they are not unique to manufacturing or IoT and so we ignored them here. Your thoughts?
IT operations analytics (ITOA) is in many ways similar to IoT analytics. Download our case study on how to use predictive analytics to predict backup failures