System Informatics: Combine Engineering Knowledge and Big Data for Model Building and Decision Making



Similar documents
Uncertainty of Power Production Predictions of Stationary Wind Farm Models

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

MSCA Introduction to Statistical Concepts

Master s Program Requirements. Industrial & Systems Engineering at Texas A&M University. Since Fall 2014.

Introduction to Data Mining

Statistics for BIG data

MSCA Introduction to Statistical Concepts

Big Data Analytics for SCADA

NC STATE UNIVERSITY Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids

Solar Irradiance Forecasting Using Multi-layer Cloud Tracking and Numerical Weather Prediction

COMPARISON OF LIDARS, GERMAN TEST STATION FOR REMOTE WIND SENSING DEVICES

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

Guidelines for Course Selection MS Degree Students

SureSense Software Suite Overview

Course Requirements for the Ph.D., M.S. and Certificate Programs

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Learning outcomes. Knowledge and understanding. Competence and skills

A Comparison Between Data Mining Prediction Algorithms for Fault Detection (Case study: Ahanpishegan co.)

Predictive Modeling Techniques in Insurance

USE OF REMOTE SENSING FOR WIND ENERGY ASSESSMENTS

Statistics in Applications III. Distribution Theory and Inference

German Test Station for Remote Wind Sensing Devices

Machine Learning with MATLAB David Willingham Application Engineer

A Statistician s View of Big Data

ENERGY YIELD ASSESSMENT

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Getting insights about life cycle cost drivers: an approach based on big data inspired statistical modelling

Principles of Data Mining by Hand&Mannila&Smyth

Imputing Values to Missing Data

International Journal of Computer & Organization Trends Volume21 Number1 June 2015 A Study on Load Balancing in Cloud Computing

Onshore Wind Services

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Power fluctuations from large offshore wind farms

Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA)

POWER CURVE MEASUREMENT EXPERIENCES, AND NEW APPROACHES

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

DATA MINING TECHNIQUES AND APPLICATIONS

Opportunities to Overcome Key Challenges

HT2015: SC4 Statistical Data Mining and Machine Learning

Wind turbine blade heating

OFFSHORE WIND ENERGY

Database Marketing, Business Intelligence and Knowledge Discovery

Industrial Dr. Stefan Bungart

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

OCRP Implementation to Optimize Resource Provisioning Cost in Cloud Computing

MIT M2M ZU INDUSTRIE 4.0

Nonparametric Time Series Analysis: A review of Peter Lewis contributions to the field

Statistical Models in Data Mining

GE Power & Water Renewable Energy. Digital Wind Farm THE NEXT EVOLUTION OF WIND ENERGY.

Tracking in flussi video 3D. Ing. Samuele Salti

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Adjustment of Anemometer Readings for Energy Production Estimates WINDPOWER June 2008 Houston, Texas

Smart City Australia

Igniting the Next Industrial Revolution

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Perspectives on Data Mining

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Dynamic Process Modeling. Process Dynamics and Control

Advanced Methods for Pedestrian and Bicyclist Sensing

Statistics Graduate Courses

Data Mining mit der JMSL Numerical Library for Java Applications

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Statistical Challenges with Big Data in Management Science

STA 4273H: Statistical Machine Learning

SMART TRANSPORTATION

Data Mining - Evaluation of Classifiers

Master of Science (MS) in Biostatistics Program Director and Academic Advisor:

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Geography 4203 / GIS Modeling. Class (Block) 9: Variogram & Kriging

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Predictive Analytics

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

ANALYTICS IN BIG DATA ERA

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Big Data and Cloud Computing for GHRSST

Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data. and Alex Gray

Virtual Met Mast verification report:

Traffic Prediction and Analysis using a Big Data and Visualisation Approach

Business Intelligence. Data Mining and Optimization for Decision Making

RESEARCH INTERESTS Modeling and Simulation, Complex Systems, Biofabrication, Bioinformatics

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Transcription:

System Informatics: Combine Engineering Knowledge and Big Data for Model Building and Decision Making Yu Ding Barnes Professor of Industrial & Systems Engineering Feb 13, 2015 Yu Ding (ISEN) BigData Workshop 1 / 10

ISEN Big Data related areas Advanced Manufacturing Human & Organizational Systems Operations Research System Informatics Yu Ding (ISEN) BigData Workshop 2 / 10

ISEN System Informatics group Dr. Andy Banerjee, system-level simulation visualization for healthcare, security and manufacturing applications. Dr. Justin Yates, spatial and geodata analysis and social media networks for security and logistics applications. Dr. Yu Ding, applied statistics and quality engineering for energy, manufacturing, and security applications. Dr. Li Zeng, applied Bayesian statistics and statistical process control for bio-manufacturing and nano-manufacturing. Dr. Andy Johnson, production economics and nonparametric estimation for energy, manufacturing, and healthcare applications. Data: Manufacturing & production data, material characterization data, economics data and logistic data. Engineering Applications: Manufacturing, healthcare, energy, and security and logistics. Yu Ding (ISEN) BigData Workshop 3 / 10

Nano informatics Data: Static and dynamic (video) nano imaging data, in the amount exceeding 1.2 GB. Data Providers: TAMU Nanomaterials Processing and Atomic Imaging Lab; Chinese National Nano Science Center; KAUST; Georgia Tech Manufacturing Institute; FSU s High Performance Material Institute. Collaborators: Huang (STAT), Mallick (STAT), Liang (MEEN), Ji (ECEN), Bukkapatnam (ISEN). Yu Ding (ISEN) BigData Workshop 4 / 10

Energy informatics High production efficiency Anatomy of a wind turbine: gearbox is particularly prone to failure. ground vehicle wind turbine component crew member Construction of wind turbines at a remote site. Low production efficiency Data: Real data in the amount of 120 GB and simulated data (by NWTC) of 1.2 TB. Data Providers: Risø National Lab, Nationa Wind Technology Center (NWTC), commercial wind farms, and ABB (pending). Collaborators: Huang (STAT), Mallick (STAT), Genton (STAT), Xie (ECEN), Kumar (ECEN), Kezunovic (ECEN), Johnson (ISEN), Moreno-Centeno (ISEN), Ntaimo (ISEN). Yu Ding (ISEN) BigData Workshop 5 / 10

System informatics theme System Informatics Fusing systems and information Systems Energy production systems. Health and human systems. Manufacturing systems. Security and logistics systems. Sensors, Sensor Networks, and Metrology Devices Methodologies Diagnosis, predictive analytics, and maintenance Fast computation and inference of big data Fault tolerance analysis and uncertainty quantification Optimal sensor deployment System informatics promotes an effective and seamless integration of physical knowledge and first principle models with data driven methodologies. Yu Ding (ISEN) BigData Workshop 6 / 10

A case study: multiple-dependency model for wind power curve Based on Lee, Genton, Xie, Ding. 2015, Power curve estimation with multivariate environmental factors for inland and offshore wind farms," Journal of the American Statistical Association, forthcoming. Technical objective: build a probability predictive model p(y x), in which y is the power output of a turbine; x comprises the vector of, at least: 1. wind speed v; 2. wind direction d; 3. air density ρ (incorporating temperature and air pressure); 4. turbulence intensity t b ; 5. humidity h m; 6. above-hub wind shear w a; 7. below-hub wind shear w b. Yu Ding (ISEN) BigData Workshop 7 / 10

Current practice Industrial standard (recommended by IEC): Binning Cut-in speed = 4 m/s 9.75 10.25 Other data driven methods: Multivariate product kernels; Smoothing spline method (SSANOVA); Bayesian Additive Regression Trees (BART): a Bayesian version of the classification and regression tree method. Challenge: balance capability to capture system complexity and scalability. Yu Ding (ISEN) BigData Workshop 8 / 10

Our approach Physical law provides us some clue. Physics behind: Power = 1 2 ρ A Cp v 3 At least three important factors affect wind power generation; Functional relationship nonlinear with function form unknown; Interactions exist among the factors. This insights motivate a hybrid structure that we name it an Additive multivariate kernel (AMK) model: p(y x) kernel(v, d, ρ) + kernel(v, d, h m ) + kernel(v, d, t m ) +... A three-component multivariate kernel captures the critical factor interactions (with wind speed and wind direction); An additive structure ensures the scalability of the final model. Yu Ding (ISEN) BigData Workshop 9 / 10

What difference does this make? Six sets of turbine data for comparison. WT 1 - WT4 : inland turbine datasets; WT 5 - WT6: offshore turbine datasets. Noise level: WT1-WT3 s noise relatively low, WT4 data sees a 50% increase, and WT5 and WT6 s noise level triples that of WT1-WT3 s. Against Binning (industry practice): AMK reduces predictive error by as much as 45%. Against BART: Comparably on WT1-WT3 data; but considerably better on WT4-WT6 data. AMK beats BART by 18% on WT4 data and over 30% on WT5-WT6 data. Implication: AMK is competitive because of the special kernel model structure advised by the physical insights. BART tries to learn the intrinsic structure through the data. BART does so well enough when data are of good quality but its capability becomes remarkably less effective when data are noisy. Yu Ding (ISEN) BigData Workshop 10 / 10