NC STATE UNIVERSITY Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids



Similar documents
Machine Learning Logistic Regression

How can we discover stocks that will

Environmental Remote Sensing GEOG 2021

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Solar Irradiance Forecasting Using Multi-layer Cloud Tracking and Numerical Weather Prediction

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )

Lecture 3: Linear methods for classification

Data Mining Part 5. Prediction

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Statistical Machine Learning

A semi-supervised Spam mail detector

Understanding Raster Data

Prediction of Stock Performance Using Analytical Techniques

Principles of Data Mining by Hand&Mannila&Smyth

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Data Mining mit der JMSL Numerical Library for Java Applications

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Predictive Dynamix Inc

DATA MINING SPECIES DISTRIBUTION AND LANDCOVER. Dawn Magness Kenai National Wildife Refuge

Data Mining Part 5. Prediction

Classification of Bad Accounts in Credit Card Industry

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)

How To Use Neural Networks In Data Mining

DATA MINING TECHNIQUES AND APPLICATIONS

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS.

A Study on the Comparison of Electricity Forecasting Models: Korea and China

OUTLIER ANALYSIS. Data Mining 1

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Course Syllabus. Purposes of Course:

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Weather-readiness assessment model for Utilities. Shy Muralidharan Global Product Manager Energy Solutions Schneider Electric

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Design & Analysis Tools for Inventory & Monitoring: DATIM

ANALYSIS OF INDIAN WEATHER DATA SETS USING DATA MINING TECHNIQUES

Design and Deployment of Specialized Visualizations for Weather-Sensitive Electric Distribution Operations

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

TIETS34 Seminar: Data Mining on Biometric identification

HT2015: SC4 Statistical Data Mining and Machine Learning

Predicting daily incoming solar energy from weather data

IBM Big Green Innovations Environmental R&D and Services

CHAPTER-24 Mining Spatial Databases

BIG DATA What it is and how to use?

An innovative approach combining industrial process data analytics and operator participation to implement lean energy programs: A Case Study

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

The Artificial Prediction Market

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

In this lesson, students will identify a local plant community and make a variety of

Design & Analysis of Ecological Data. Landscape of Statistical Methods...

Innovations and Value Creation in Predictive Modeling. David Cummings Vice President - Research

Predict Influencers in the Social Network

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Information Management course

Fire Weather Index: from high resolution climatology to Climate Change impact study

Objectives. Raster Data Discrete Classes. Spatial Information in Natural Resources FANR Review the raster data model

Piping Plover Distribution Modeling

Chapter 12 Discovering New Knowledge Data Mining

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Black Tern Distribution Modeling

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

MACHINE LEARNING BASICS WITH R

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Benchmarking of different classes of models used for credit scoring

Spatial Data Analysis Using GeoDa. Workshop Goals

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

Land Use/ Land Cover Mapping Initiative for Kansas and the Kansas River Watershed

Burrowing Owl Distribution Modeling

Data Mining and Neural Networks in Stata

In comparison, much less modeling has been done in Homeowners

Data quality in Accounting Information Systems

U.S. Geological Survey Earth Resources Operation Systems (EROS) Data Center

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Name Period 4 th Six Weeks Notes 2015 Weather

PREDICTIVE AND OPERATIONAL ANALYTICS, WHAT IS IT REALLY ALL ABOUT?

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Correlation Analysis of Factors Influencing Changes in Land Use in the Lower Songkhram River Basin, the Northeast of Thailand

Use of Artificial Neural Network in Data Mining For Weather Forecasting

Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Cloud Model Verification at the Air Force Weather Agency

Discussion Section 4 ECON 139/ Summer Term II

Crime Hotspots Analysis in South Korea: A User-Oriented Approach

Data Mining. Nonlinear Classification

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics

Transcription:

Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids Yixin Cai, Mo-Yuen Chow Electrical and Computer Engineering, North Carolina State University July 2009

Outline Introduction Integrate data Evaluate a single feature Build a fault cause classifier Summary Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 2

Introduction Intelligent fault management detection, recording, location, diagnosis, restoration, Problem of interest Diagnosis: predict what the root cause is based on the available information before the engineers go on-site Help (not replace) the engineers to identify the root cause faster Challenges Stochastic nature of faults Noisy data with errors More and more incoming data in Smart Grids Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 3

Data Integration Data sources Utility OMS database Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 4

Data Integration Data sources Utility OMS database Public database on weather, environment, geographic features, etc. http://www.nc-climate.ncsu.edu/products http://landcover.usgs.gov/usgslandcover.php Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 5

Data Integration Data sources Utility OMS database Public database on weather, environment, geographic features, etc. Private venders http://www.mapmart.com/default.aspx Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 6

Data Integration Data integration under GIS framework Spatial relation Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 7

Data Integration Data integration under GIS framework Spatial relation Spatial-temporal relation Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 8

Evaluate a Single Feature Data preprocess Define the root cause of interest Clean errors and noises Extract features Categorical features Likelihood measure (Mosaic) plot Li, j = Po ( i X = xj) = N i, j N j Weather vs. Tree Faults Season vs. Tree Faults Land Use vs. Tree Faults 1 2 4 6 8 1 2 3 4 21 22 23 2431 41 42 4352 71 81 82 90 1 1 1 Tree Faults 0 Tree Faults Tree Faults 0 0 Weather Conditinos 1:Clear Weather, 2: Extreme Temperature, 4:Raining, 6:Thunderstorm, 8:Windy Season 1:Spring, 2:Summer, 3:Fall, 4:Winter Land Use 21~24:Developed, 41~43:Forest, 71:Grassland, 81~82:Cultivated, 90:Wetlands Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 9

Evaluate a Single Feature Continuous features Likelihood measure plot Li, j = Po ( i X xj) = N i, j N j Distance to Trees vs. Tree Faults Distance to Roads vs. Tree Faults Wind Speed vs. Tree Faults Tree Fault Likelihood 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Tree Fault Likelihood 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Tree Fault Likelihood 0.3 0.4 0.5 0.6 0.7 0.8 0 50 100 150 200 250 300 Distance to Trees (meters) 0 200 400 600 800 Distance to Roads (meters) 0 10 20 30 40 Hourly Max Wind Speed (miles/hour) Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 10

Build a Fault Cause Classifier Linear discriminant analysis (LDA) D = T N = i= 1 1 wf wf D = [ ( μ μ )] T f i i 1 0 Logistic regression (LR) Pc ( = 1) T logit( c = 1) = ln = α+ β f Pc ( = 0) 1 Pc ( = 1) = T 1 + e α β f Comparison LDA LR Model linear classifier non-linear classifier Data assumption normal distributed with equalvariance none Computation matrix manipulation maximum likelihood Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 11

Case Study Data sources Progress Energy Carolinas outage database NC Climate Office NC State Univ. GIS data service Fault causes of interest Tree-caused Animal-caused Other Features 7 categorical 5 continuous Classifiers LDA LR Tree fault Animal fault Tree fault Animal fault Classification Performance Using LDA on Sample Dataset 6 Features 12 Features training testing training testing ACC 0.75(0.01) 0.76(0.01) 0.77(0.02) 0.76(0.02) POD 0.32(0.03) 0.34(0.03) 0.41(0.03) 0.39(0.03) FAR 0.34(0.03) 0.32(0.03) 0.32(0.04) 0.33(0.04) ACC 0.84(0.02) 0.83(0.01) 0.84(0.02) 0.84(0.01) POD 0.31(0.04) 0.29(0.04) 0.35(0.03) 0.35(0.03) FAR 0.42(0.05) 0.43(0.05) 0.39(0.05) 0.41(0.05) Classification Performance Using LR on Sample Dataset 6 Features 12 Features training testing training testing ACC 0.76(0.02) 0.76 (0.02) 0.77 (0.01) 0.77(0.02) POD 0.32(0.03) 0.32(0.03) 0.44 (0.03) 0.44(0.03) FAR 0.30(0.04) 0.30(0.04) 0.32 (0.03) 0.34(0.03) ACC 0.83(0.02) 0.83 (0.02) 0.84(0.01) 0.84(0.01) POD 0.30(0.03) 0.31(0.03) 0.37 (0.04) 0.35(0.03) FAR 0.42(0.04) 0.41(0.06) 0.41 (0.06) 0.41(0.06) Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 12

Summary Methods for exploratory data analysis Integrate data from multiple sources under GIS framework Use likelihood measure to evaluate both categorical and continuous features Apply LDA and LR as fault cause classifiers Findings LDA and LR performs similar Adding new features helps fault diagnosis Future work Systematic feature selection methods Advanced fault diagnosis algorithms Novel sampling strategy Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 13