: Big Data Analytics for Renewable Energy Mark J. Embrechts Dept. Industrial and Systems Engineering Rensselaer Polytechnic Institute, Troy, NY, USA What is Data Mining? Data Mining Big Data Analytics Data-Driven Science Data-Driven Engineering Renewable Energy Challenges Promise of Big Data Analytics for Renewable Energy Science: what is possible Engineering: turn science into an everyday commodity (cheap, safe, reliable, resilient, ) CFES 2012-2013 Annual Conference January 25, 2013
Data Mining as a Structured Process data prospecting and surveying database select selected data preprocess & transform transformed data make model Interpretation& rule formulation
Early Motivating Applications for Data Mining Database Marketing Algo-Trading Market Basket Analysis www.information-drivers.com/market_basket_analysis.php
How is data mining different? Different from what? (statistics) Data can be unstructured (e.g., text) Data can come from different sources and have conflicts/missing data/outliers Usually a data information step is required Data pre-processing requires most of the time Data might not fit in memory Data mining is the process of the automated discovery of new, interesting, potential useful information from large amounts of data
Data Mining Pyramid Wisdom Understanding Knowledge Information Data Information: reduction and structuring of data without loss of domain-specific knowledge
Data Mining Challenges Large/huge/humongous data sets - Data sets can be rich in the number of data - Data sets can be rich in the number of attributes - Unlabeled data (data labeling might be expensive) - Data quality an data uncertainty Data preprocessing and feature definition for structuring data - Data representation - Attribute/Feature selection - Transforms and scaling Scientific data mining - Classification, multiple classes, regression - Continuous and binary attributes - Large datasets - Nonlinear Problems Erroneous data, outliers, novelty, and rare events - Erroneous & conflicting data - Outliers - Rare events - Novelty detection Smart visualization techniques Feature selection & Rule formulation Special outcomes: Associations (e.g., NETFLIX), nuggets, enrichment Recent challenges: causality, active learning, multi-classes, big data
Docking ligands is a nonlinear problem: Example crowdsourcing for anthrax drug
Data-Driven Science Data-Driven Engineering Example of data-driven science: Cholera epidemic root causes Examples of data-driven engineering - Denoising of windmill sensor data - Sensors for the early detection of levee failure - Sensing network instability - Load forecasting - Forecasting reactive power - Histology and electrophysiology for early failure detection of brain-implant devices Science: what is possible Engineering: turn science into an everyday commodity (cheap, safe, reliable, resilient, )
Design of Capacitor Batteries with QSPR
Data Mining Big Data Analytics NYT May 5, 2012 Mystery of Big Data s Parallel Universe Brings Fear, and a Thrill
A brief history of Big Data http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/ 4-19-2010 - Danah Boyd, Privacy and Publicity in the context of Big Data. Keynote WWW 2010 http://www.danah.org/papers/talks/2010/www2010.html May 2011 - James Manyika et al. Big Data: The next frontier for innovation, competition, and productivity. (McKinsey Global Institute Report). http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation 6-6-2011 - A very short history of Big Data. http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/ March 27, 2012 WIRED Magazine: Webcast: Obama goes big on big data. http://www.wired.com/cloudline/2012/03/obama-big-data/
Big Data Analytics Pyramid Wisdom Understanding Knowledge Information Data Data Fusion/Sensor Processing/ Crowd Sourcing/Open Source
How is Big Data Analytics Different? Different from what? (data mining) Data is unstructured Data comes from different sources and has conflicts/missing data/outliers Usually a data fusion step is required Data are dynamic Often has a crowdsourcing component (e.g., Twitter) Often sensor processing steps are required (domain-specific) Because of the size of processed data things have to be done differently Involves high-performance computing and specialized algorithms Big data analytics is the process of the automated discovery of potentially actionable/auctionable knowledge from diverse large data sources, where some of these data sources often have a crowdsourcing aspect
Orders of Magnitude of Data (source: Wikipedia) Typical for Big Data Capacity human memory Size internet archive 2004. All Climate data Entire Library of congress Google server farm 2004 1 petabyte = 1000 terabytes = 1000 x
Specific algorithms for big data analytics Modified algorithms Neural network infrastructure for regression/classification models Deep belief networks for ICA/feature detection Labeling of unlabeled data Semi-supervised learning and active learning New Algorithms Outlier detection Data cleansing with ICA and deep belief networks Data completion algorithms for partially filled data Data anonymization algorithms
Denoising wind turbine sensor data with stacked auto-encoders Calculate ICA Components Make filter by identifying & removing noise ICA s Original Data + Noise ICA Components Filter to Remove Noise Cleansed Data Do inverse with noise ICAs removed with rectangular pseudo-inverse
Smart grid Can we find wasteful customers? Can we detect early indications for grid instability? http://emileglorieux.blogspot.de/2011/04/smart-grid-report-1.html
Big Data and Renewable Energy Challenges Dynamic integration of renewable energy sources Net stability, reactive sources and control predictive optimization Wind farm/turbine allocation and control Solar panel direction control Energy storage (e.g., capacitor batteries, flywheels, ) Impact of unusual weather patterns The weather is unpredictable Energy pricing Environmental impact (noise, earthquakes, microclimates, ) Data anonymization and data privacy Identifying wasteful customers Improving energy efficiency Load/wind/weather forecasting New energy source paradigms - cogeneration superheated steam from solar energy - windmill/flywheel combo units - smart transformers for PDAs, cell phones - charging of gadgets from body-powered energy
Thank you for your attention!!! embrem@rpi.edu
History of data mining in a nutshell Data mining is the process of automatically extracting valid, novel, potentially useful and ultimately comprehensible information from very large databases