Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids Yixin Cai, Mo-Yuen Chow Electrical and Computer Engineering, North Carolina State University July 2009
Outline Introduction Integrate data Evaluate a single feature Build a fault cause classifier Summary Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 2
Introduction Intelligent fault management detection, recording, location, diagnosis, restoration, Problem of interest Diagnosis: predict what the root cause is based on the available information before the engineers go on-site Help (not replace) the engineers to identify the root cause faster Challenges Stochastic nature of faults Noisy data with errors More and more incoming data in Smart Grids Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 3
Data Integration Data sources Utility OMS database Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 4
Data Integration Data sources Utility OMS database Public database on weather, environment, geographic features, etc. http://www.nc-climate.ncsu.edu/products http://landcover.usgs.gov/usgslandcover.php Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 5
Data Integration Data sources Utility OMS database Public database on weather, environment, geographic features, etc. Private venders http://www.mapmart.com/default.aspx Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 6
Data Integration Data integration under GIS framework Spatial relation Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 7
Data Integration Data integration under GIS framework Spatial relation Spatial-temporal relation Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 8
Evaluate a Single Feature Data preprocess Define the root cause of interest Clean errors and noises Extract features Categorical features Likelihood measure (Mosaic) plot Li, j = Po ( i X = xj) = N i, j N j Weather vs. Tree Faults Season vs. Tree Faults Land Use vs. Tree Faults 1 2 4 6 8 1 2 3 4 21 22 23 2431 41 42 4352 71 81 82 90 1 1 1 Tree Faults 0 Tree Faults Tree Faults 0 0 Weather Conditinos 1:Clear Weather, 2: Extreme Temperature, 4:Raining, 6:Thunderstorm, 8:Windy Season 1:Spring, 2:Summer, 3:Fall, 4:Winter Land Use 21~24:Developed, 41~43:Forest, 71:Grassland, 81~82:Cultivated, 90:Wetlands Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 9
Evaluate a Single Feature Continuous features Likelihood measure plot Li, j = Po ( i X xj) = N i, j N j Distance to Trees vs. Tree Faults Distance to Roads vs. Tree Faults Wind Speed vs. Tree Faults Tree Fault Likelihood 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Tree Fault Likelihood 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Tree Fault Likelihood 0.3 0.4 0.5 0.6 0.7 0.8 0 50 100 150 200 250 300 Distance to Trees (meters) 0 200 400 600 800 Distance to Roads (meters) 0 10 20 30 40 Hourly Max Wind Speed (miles/hour) Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 10
Build a Fault Cause Classifier Linear discriminant analysis (LDA) D = T N = i= 1 1 wf wf D = [ ( μ μ )] T f i i 1 0 Logistic regression (LR) Pc ( = 1) T logit( c = 1) = ln = α+ β f Pc ( = 0) 1 Pc ( = 1) = T 1 + e α β f Comparison LDA LR Model linear classifier non-linear classifier Data assumption normal distributed with equalvariance none Computation matrix manipulation maximum likelihood Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 11
Case Study Data sources Progress Energy Carolinas outage database NC Climate Office NC State Univ. GIS data service Fault causes of interest Tree-caused Animal-caused Other Features 7 categorical 5 continuous Classifiers LDA LR Tree fault Animal fault Tree fault Animal fault Classification Performance Using LDA on Sample Dataset 6 Features 12 Features training testing training testing ACC 0.75(0.01) 0.76(0.01) 0.77(0.02) 0.76(0.02) POD 0.32(0.03) 0.34(0.03) 0.41(0.03) 0.39(0.03) FAR 0.34(0.03) 0.32(0.03) 0.32(0.04) 0.33(0.04) ACC 0.84(0.02) 0.83(0.01) 0.84(0.02) 0.84(0.01) POD 0.31(0.04) 0.29(0.04) 0.35(0.03) 0.35(0.03) FAR 0.42(0.05) 0.43(0.05) 0.39(0.05) 0.41(0.05) Classification Performance Using LR on Sample Dataset 6 Features 12 Features training testing training testing ACC 0.76(0.02) 0.76 (0.02) 0.77 (0.01) 0.77(0.02) POD 0.32(0.03) 0.32(0.03) 0.44 (0.03) 0.44(0.03) FAR 0.30(0.04) 0.30(0.04) 0.32 (0.03) 0.34(0.03) ACC 0.83(0.02) 0.83 (0.02) 0.84(0.01) 0.84(0.01) POD 0.30(0.03) 0.31(0.03) 0.37 (0.04) 0.35(0.03) FAR 0.42(0.04) 0.41(0.06) 0.41 (0.06) 0.41(0.06) Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 12
Summary Methods for exploratory data analysis Integrate data from multiple sources under GIS framework Use likelihood measure to evaluate both categorical and continuous features Apply LDA and LR as fault cause classifiers Findings LDA and LR performs similar Adding new features helps fault diagnosis Future work Systematic feature selection methods Advanced fault diagnosis algorithms Novel sampling strategy Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids 13