Science: what is possible. Engineering: turn science into an everyday commodity (cheap, safe, reliable, resilient, )



Similar documents
COMPLEMENTARITIES AND DIFFERENCES AND DATA MINING AND STATISTICS IN ANALYTICS AND BIG DATA BETWEEN MACHINE LEARNING

Introduction. A. Bellaachia Page: 1

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

Information Management course

Introduction to Data Mining

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Big Data Hope or Hype?

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

How To Use Neural Networks In Data Mining

FOR A FEW TERABYTES MORE THE GOOD, THE BAD and THE BIG DATA. Cenk Kiral Senior Director of BI&EPM solutions ECEMEA region

The Scientific Data Mining Process

Collaborations between Official Statistics and Academia in the Era of Big Data

Introduction to Data Mining

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

Big Data Challenges. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

From Data to Insight: Big Data and Analytics for Smart Manufacturing Systems

Big Data. Fast Forward. Putting data to productive use

Big Data: Image & Video Analytics

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Sanjeev Kumar. contribute

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

An analysis of Big Data ecosystem from an HCI perspective.

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Statistics for BIG data

COMP9321 Web Application Engineering

Sunnie Chung. Cleveland State University

Big Data Challenges in Bioinformatics

Training for Big Data

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Data Mining and Pattern Recognition for Large-Scale Scientific Data

Synchronized real time data: a new foundation for the Electric Power Grid.

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Towards a Domain-Specific Framework for Predictive Analytics in Manufacturing. David Lechevalier Anantha Narayanan Sudarsan Rachuri

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Indexed Terms: Big Data, benefits, characteristics, definition, problems, unstructured data

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Big Data R&D Initiative

Deep learning applications and challenges in big data analytics

Understanding Big Data Analytics Applications in Earth Science Morris Riedel, Rahul Ramachandran/Kuo Kwo-Sen, Peter Baumann Big Data Analytics

An Introduction to Data Mining

Big Data and Analytics: Challenges and Opportunities

Database Marketing, Business Intelligence and Knowledge Discovery

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics

SMART ENERGY. The only cloud that speeds up your. cloud services. broadband for smart grids. Last Mile Keeper

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

Big Trouble. Does Big Data spell. for Lawyers? Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado

From Big Data to Smart Data Thomas Hahn

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Total Survey Error: Adapting the Paradigm for Big Data. Paul Biemer RTI International University of North Carolina

Big Data Analytic Paradigms -From PCA to Deep Learning

Data, Data Everywhere

Data Literacy For All: Astrophysics and Beyond (Astronomy is evidence-based forensic science, thus it is a data & information science)

The data forest. Application. Application Application DATA. Office of Research

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Mining the Web of Linked Data with RapidMiner

Hexaware E-book on Predictive Analytics

A Review of Data Mining Techniques

Big Data a threat or a chance?

Augmented Search for Software Testing

Big Data Analytics in Mobile Environments

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Foundations of Business Intelligence: Databases and Information Management

What is Hot on the Market and Trends. SDA Bocconi Quantitative Methods Competence Center

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Smart Data THE driving force for industrial applications

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

ICT Perspectives on Big Data: Well Sorted Materials

The Tonnabytes Big Data Challenge: Transforming Science and Education. Kirk Borne George Mason University

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

NC STATE UNIVERSITY Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids

Data Mining and Machine Learning in Bioinformatics

Big Data Executive Survey

Artificial Intelligence for ICT Innovation

Big Data and Advanced Analytics Technologies for the Smart Grid

AppSymphony White Paper

DECISYON 360 ASSET OPTIMIZATION SOLUTION FOR U.S. ELECTRICAL ENERGY SUPPLIER MAY 2015

A semi-supervised Spam mail detector

life science data mining

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Is Big Data a Big Deal? What Big Data Does to Science

How To Use Big Data Effectively

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.

ANALYTICS IN BIG DATA ERA

Big Data-Challenges and Opportunities

Massive Scale Analytics for a Smarter Planet

Anuradha Bhatia, Faculty, Computer Technology Department, Mumbai, India

Dianne Fodell Global University Programs IBM Corporation

Learning from Big Data in

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Cloud Computing and Big Data That s Why! Ray Walshe 14 th March 2013

Transcription:

: Big Data Analytics for Renewable Energy Mark J. Embrechts Dept. Industrial and Systems Engineering Rensselaer Polytechnic Institute, Troy, NY, USA What is Data Mining? Data Mining Big Data Analytics Data-Driven Science Data-Driven Engineering Renewable Energy Challenges Promise of Big Data Analytics for Renewable Energy Science: what is possible Engineering: turn science into an everyday commodity (cheap, safe, reliable, resilient, ) CFES 2012-2013 Annual Conference January 25, 2013

Data Mining as a Structured Process data prospecting and surveying database select selected data preprocess & transform transformed data make model Interpretation& rule formulation

Early Motivating Applications for Data Mining Database Marketing Algo-Trading Market Basket Analysis www.information-drivers.com/market_basket_analysis.php

How is data mining different? Different from what? (statistics) Data can be unstructured (e.g., text) Data can come from different sources and have conflicts/missing data/outliers Usually a data information step is required Data pre-processing requires most of the time Data might not fit in memory Data mining is the process of the automated discovery of new, interesting, potential useful information from large amounts of data

Data Mining Pyramid Wisdom Understanding Knowledge Information Data Information: reduction and structuring of data without loss of domain-specific knowledge

Data Mining Challenges Large/huge/humongous data sets - Data sets can be rich in the number of data - Data sets can be rich in the number of attributes - Unlabeled data (data labeling might be expensive) - Data quality an data uncertainty Data preprocessing and feature definition for structuring data - Data representation - Attribute/Feature selection - Transforms and scaling Scientific data mining - Classification, multiple classes, regression - Continuous and binary attributes - Large datasets - Nonlinear Problems Erroneous data, outliers, novelty, and rare events - Erroneous & conflicting data - Outliers - Rare events - Novelty detection Smart visualization techniques Feature selection & Rule formulation Special outcomes: Associations (e.g., NETFLIX), nuggets, enrichment Recent challenges: causality, active learning, multi-classes, big data

Docking ligands is a nonlinear problem: Example crowdsourcing for anthrax drug

Data-Driven Science Data-Driven Engineering Example of data-driven science: Cholera epidemic root causes Examples of data-driven engineering - Denoising of windmill sensor data - Sensors for the early detection of levee failure - Sensing network instability - Load forecasting - Forecasting reactive power - Histology and electrophysiology for early failure detection of brain-implant devices Science: what is possible Engineering: turn science into an everyday commodity (cheap, safe, reliable, resilient, )

Design of Capacitor Batteries with QSPR

Data Mining Big Data Analytics NYT May 5, 2012 Mystery of Big Data s Parallel Universe Brings Fear, and a Thrill

A brief history of Big Data http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/ 4-19-2010 - Danah Boyd, Privacy and Publicity in the context of Big Data. Keynote WWW 2010 http://www.danah.org/papers/talks/2010/www2010.html May 2011 - James Manyika et al. Big Data: The next frontier for innovation, competition, and productivity. (McKinsey Global Institute Report). http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation 6-6-2011 - A very short history of Big Data. http://whatsthebigdata.com/2012/06/06/a-very-short-history-of-big-data/ March 27, 2012 WIRED Magazine: Webcast: Obama goes big on big data. http://www.wired.com/cloudline/2012/03/obama-big-data/

Big Data Analytics Pyramid Wisdom Understanding Knowledge Information Data Data Fusion/Sensor Processing/ Crowd Sourcing/Open Source

How is Big Data Analytics Different? Different from what? (data mining) Data is unstructured Data comes from different sources and has conflicts/missing data/outliers Usually a data fusion step is required Data are dynamic Often has a crowdsourcing component (e.g., Twitter) Often sensor processing steps are required (domain-specific) Because of the size of processed data things have to be done differently Involves high-performance computing and specialized algorithms Big data analytics is the process of the automated discovery of potentially actionable/auctionable knowledge from diverse large data sources, where some of these data sources often have a crowdsourcing aspect

Orders of Magnitude of Data (source: Wikipedia) Typical for Big Data Capacity human memory Size internet archive 2004. All Climate data Entire Library of congress Google server farm 2004 1 petabyte = 1000 terabytes = 1000 x

Specific algorithms for big data analytics Modified algorithms Neural network infrastructure for regression/classification models Deep belief networks for ICA/feature detection Labeling of unlabeled data Semi-supervised learning and active learning New Algorithms Outlier detection Data cleansing with ICA and deep belief networks Data completion algorithms for partially filled data Data anonymization algorithms

Denoising wind turbine sensor data with stacked auto-encoders Calculate ICA Components Make filter by identifying & removing noise ICA s Original Data + Noise ICA Components Filter to Remove Noise Cleansed Data Do inverse with noise ICAs removed with rectangular pseudo-inverse

Smart grid Can we find wasteful customers? Can we detect early indications for grid instability? http://emileglorieux.blogspot.de/2011/04/smart-grid-report-1.html

Big Data and Renewable Energy Challenges Dynamic integration of renewable energy sources Net stability, reactive sources and control predictive optimization Wind farm/turbine allocation and control Solar panel direction control Energy storage (e.g., capacitor batteries, flywheels, ) Impact of unusual weather patterns The weather is unpredictable Energy pricing Environmental impact (noise, earthquakes, microclimates, ) Data anonymization and data privacy Identifying wasteful customers Improving energy efficiency Load/wind/weather forecasting New energy source paradigms - cogeneration superheated steam from solar energy - windmill/flywheel combo units - smart transformers for PDAs, cell phones - charging of gadgets from body-powered energy

Thank you for your attention!!! embrem@rpi.edu

History of data mining in a nutshell Data mining is the process of automatically extracting valid, novel, potentially useful and ultimately comprehensible information from very large databases