There is a lot of expectant talk about the potential of analytics applied to sensor or machine
collected data. This is the other big data - just consider this: in a single flight from New York to Los Angeles, the sensors mounted on the engines of a commercial airliner generate up to 240 TeraBytes of data. This data can reveal a lot of patterns about energy usage and mechanical performance which can potentially help to reduce fuel costs, maintenance costs, and improve safety.
Similarly there is a lot of data that is generated in building usage. It is estimated that improperly maintained or controlled equipment in buildings can account for up to 30% wastage in energy. The savings from this can be potentially huge - if we can measure and control energy consumption accurately. For this we need to be able to also predict usage patterns. The scale of this data can also be monstrous depending upon how frequently the data is sampled. For example, building electrical meter readings are done once a month, but if they need to be monitored every hour or every minute, the data will be humungous in terms of volume and velocity. So once again, we will have big - no enormous - data coming from sensors or machines.
In this article we describe the "baby" steps one can take to monitor and predict the energy consumption in buildings. Many organizations make use of energy management systems for their buildings. This energy management system can generate data about energy consumption at minute/15 minute/hourly intervals. This data is collected from multiple channels for a specific location. If we consolidate this data for a year or two, then it would result in millions of rows of energy usage values at different times of the day and for different channels. (Multiply this by hundred or thousands of buildings and you get the big data picture).
This data clearly has a lot of valuable information. For example, we can derive from this data the following useful information:
- What period during the year was the energy consumption highest?
- Which are the usage areas that can yield most savings on utility bills?
- What are the energy consumption trends for various channels?
- Can we predict energy consumption based on HDD and CDD for particular period? (Note: HDD stands for heating-degree-days and CDD stands for cooling-degree-days. HDD represents the number of days in a given month where the average outside temperature was below a certain baseline and therefore the building needed heat. Similarly CDD represents the number of days where the average outside temperature was above a certain baseline and therefore the building needed to be cooled. See here for a full explanation).
In order to obtain answers to all of these questions, first we need to collect all the data from various sources at central location i.e. called as Staging area.
- Collection of Weather data from various weather stations. (Note: In order to relate consumption with external temperature, we need weather data. Once we have this, we can calculate HDD and CDD values and using those values we can predict expected consumption).
- Channel data from various sites
- Administrative data for all those sites
- Dimension data related to the sites, channels etc.
Once the data is arrived in staging area using suitable ETL techniques, now it is the time to move this data to a warehouse. Development of warehouse is based on the reporting/analysis requirement. As per the requirement develop a warehouse and move the data from Staging to Warehouse. Once the initial load is complete for last couple of years or so, the ETL activity can be continued on daily basis. Now, once the data is available in the warehouse, there are numerous reporting tools which can employed: for example, MicroStrategy provides their free Mobile and Reporting suite with limited number of users and features. One can also use more self serve BI tools such as Tableau.
Similary for predictive analytics, RapidMiner or R or even Excel can be used to do further analysis on the data available in the warehouse. In a recently completed project, the following activities were performed
- Collecting data from diversified sources at the staging location
- Moving this staging data to warehouse
- Creating MicroStrategy objects for reporting needs
- Developing Reports using those objects
- Developing Linear Regression model for Usage prediction
Top image courtesy: http://www.flickr.com/photos/araswami/986293551/