Towards Model Evaluation using Self-Organizing Maps

Similar documents
Self Organizing Maps: Fundamentals

Data Mining and Neural Networks in Stata

Visualization of Breast Cancer Data by SOM Component Planes

Information Visualization with Self-Organizing Maps

UNIVERSITY OF BOLTON SCHOOL OF ENGINEERING MS SYSTEMS ENGINEERING AND ENGINEERING MANAGEMENT SEMESTER 1 EXAMINATION 2015/2016 INTELLIGENT SYSTEMS

Cartogram representation of the batch-som magnification factor

Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data

ANALYSIS OF MOBILE RADIO ACCESS NETWORK USING THE SELF-ORGANIZING MAP

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理

Visualization and Data Mining of Pareto Solutions Using Self-Organizing Map

Icon and Geometric Data Visualization with a Self-Organizing Map Grid

Visualization by Linear Projections as Information Retrieval

USING SELF-ORGANISING MAPS FOR ANOMALOUS BEHAVIOUR DETECTION IN A COMPUTER FORENSIC INVESTIGATION

ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization

A Computational Framework for Exploratory Data Analysis

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Data Clustering and Topology Preservation Using 3D Visualization of Self Organizing Maps

ultra fast SOM using CUDA

Using Predictive Analytics to Detect Fraudulent Claims

Visualization of Crime Trajectories with Self-Organizing Maps: A Case Study on Evaluating the Impact of Hurricanes on Spatio- Temporal Crime Hotspots

From IP port numbers to ADSL customer segmentation

Data topology visualization for the Self-Organizing Map

Comparing large datasets structures through unsupervised learning

Local Anomaly Detection for Network System Log Monitoring

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

Chapter ML:XI (continued)

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Neural Network Add-in

USING SELF-ORGANIZED MAPS AND ANALYTIC HIERARCHY PROCESS FOR EVALUATING CUSTOMER PREFERENCES IN NETBOOK DESIGNS

Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations

Fuel Cell Health Monitoring Using Self Organizing Maps

A Discussion on Visual Interactive Data Exploration using Self-Organizing Maps

A Review of Data Clustering Approaches

VISUALIZATION OF GEOSPATIAL DATA BY COMPONENT PLANES AND U-MATRIX

Flexible Neural Trees Ensemble for Stock Index Modeling

A Growing Self-Organizing Map for Visualization of Mixed-Type Data

On the use of Three-dimensional Self-Organizing Maps for Visualizing Clusters in Geo-referenced Data

Visualizing an Auto-Generated Topic Map

Visualization of Large Font Databases

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Computational Aspects of High Dimensional Large Scale Data Visualization. Baidya Nath Saha

QoS Mapping of VoIP Communication using Self-Organizing Neural Network

Segmentation of stock trading customers according to potential value

Application of self-organizing maps to clustering of high-frequency financial data

Proceedings - AutoCarto Columbus, Ohio, USA - September 16-18, 2012

Customer Data Mining and Visualization by Generative Topographic Mapping Methods

Flash Flood Science. Chapter 2. What Is in This Chapter? Flash Flood Processes

SOM-based Experience Representation for Dextrous Grasping

Topology-Preserving Mappings for Data Visualisation

Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Chapter 7. Cluster Analysis

6.2.8 Neural networks for data mining

Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

PEST - Beyond Basic Model Calibration. Presented by Jon Traum

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Artificial Intelligence and Machine Learning Models

Analysis of Performance Metrics from a Database Management System Using Kohonen s Self Organizing Maps

NETWORK-BASED INTRUSION DETECTION USING NEURAL NETWORKS

Java Modules for Time Series Analysis

Grid e-services for Multi-Layer SOM Neural Network Simulation

Advanced Ensemble Strategies for Polynomial Models

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08

Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis

Data Mining on Sequences with recursive Self-Organizing Maps

Analysis of electric power consumption using Self-Organizing Maps.

International Journal of Computer Science and Applications Vol. 6, No. 3, pp 20 32, 2009

Neural network software tool development: exploring programming language options

Models of Cortical Maps II

Collective behaviour in clustered social networks

Cluster Analysis: Advanced Concepts

MIKE 21 FLOW MODEL HINTS AND RECOMMENDATIONS IN APPLICATIONS WITH SIGNIFICANT FLOODING AND DRYING

2002 IEEE. Reprinted with permission.

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

A MATLAB Toolbox and its Web based Variant for Fuzzy Cluster Analysis

A Study of Web Log Analysis Using Clustering Techniques

Visualising Class Distribution on Self-Organising Maps

EVALUATION OF NEURAL NETWORK BASED CLASSIFICATION SYSTEMS FOR CLINICAL CANCER DATA CLASSIFICATION

The relation between news events and stock price jump: an analysis based on neural network

The Impact of Network Topology on Self-Organizing Maps

Topological Tree Clustering of Social Network Search Results

Framework for Modeling Partial Conceptual Autonomy of Adaptive and Communicating Agents

Machine Learning: Overview

Data Mining using Rule Extraction from Kohonen Self-Organising Maps

Predict Influencers in the Social Network

An introduction to Value-at-Risk Learning Curve September 2003

Visual analysis of self-organizing maps

Network Intrusion Detection Using an Improved Competitive Learning Neural Network

Detecting Denial of Service Attacks Using Emergent Self-Organizing Maps

Pricing and calibration in local volatility models via fast quantization

06 - NATIONAL PLUVIAL FLOOD MAPPING FOR ALL IRELAND THE MODELLING APPROACH

3 An Illustrative Example

Data, Measurements, Features

Clustering of the Self-Organizing Map

Appendix 4 Simulation software for neuronal network models

Arbeitspapiere. Herausgeber: Univ.-Professor Dr. Helge Löbler. Neural Networks as Competitors for methods Of data reduction and classification in SPSS

Conformational analysis of lipid molecules by self-organizing maps

DAME Astrophysical DAta Mining Mining & & Exploration Exploration GRID

Transcription:

Towards Model Evaluation using Self-Organizing Maps M. Herbst M.C. Casper Dept. of Physical Geography University of Trier, Germany

Model evaluation and model comparison Can we process the information contained in model time series more efficiently? Standard performance measures (RMSE, R², ) and model evaluation: assumptions about model errors violated loss of Information 30 discharge [m³/s] 25 20 15 Approaches to model evaluation using Self-Organizing Maps 10 5 0 14/01/95 23/02/95 04/04/95 14/05/95 23/06/95 02/08/95 11/09/95 21/10/95 dd/mm/yy

The Self-Organizing Map Some facts about Self-Organizing Maps (SOM): an unsupervised learning neural network model since 1982 developed by T. Kohonen (Helsinki University of Technology) biologically inspired by the organization of the cerebral cortex A Self-Organizing Map projects high-dimensional data on a low-dimensional map is topology-preserving: similar data is mapped to nearby locations common applications in pattern recognition, clustering etc.

Self-Organizing Map of model time series N=4000 x(t)={x 1,x 2, x 17472 } carry out a Monte Carlo Simulation with hydrological model arrange and cluster resulting discharge time series pattern by similarity similarity measure: Euclidean distance

The Self-Organizing Map architecture prototype vectors m i Input x(t)= {x 1, x 2, x 3 } Step 1: Initialization constitutes of neurons ( nodes ) located on a regular map grid data samples x(t) are considered as vectors with n (here 3) components each neuron has an associated prototype vector m ii

The SOM training (example) Euclidean distances di = x m i Input x(t)= {3, 4, 6} Step 2: randomly pick an input vector from the training set calculate Euclidean distances d ii find the neuron with the smallest d ii for the given x (the so called best-matching unit, BMU)

The SOM training (example) Updating of reference vectors: m i ( t + 1) = m ( t) + α( t) h ( t) [ x( t) m ( t) ] i ci i Input x(t)= {3, {5, 4, 9, 6} 1} (3;3.2; 5) Step 3: update reference vectors within the neighbourhood around the BMU ( Gaussian kernel ) pick another data item and repeat step 2 + 3 cycle through the whole input data (2.2;4. 8;5.2) (3;4.5; 6)

Adaptation to multidimensional distributions data item x(t) is projected onto a neuron on the map the neighbouring neurons around the BMU adjust their weight vectors m i by Δm i BMU the degree of adjustment decreases with distance from the BMU x Δm i m i SOM V model space modified after Ritter et. al. (1990)

Method N=4000 x(t)={x 1,x 2, x 17472 } Monte-Carlo simulations (7 free parameters): 4000 data items (vectors) à à 17.472 elements Normalize each time step using x = ( x x ) σ x Each node now represents some model realizations with similar time series pattern Display means of the 7 parameter values on each node Display means of performance measures on each node SOM-Training (15x22 = 330 nodes) Project the measured data onto the SOM (= find BMU)

Monte-Carlo Simulation: model parameters Name Function Range RetBasis storage coefficient baseflow [h] 0.5-3.5 RetInf storage coefficient interflow [h] 2.0-6.0 RetOf storage coefficient surface runoff [h] 2.0-6.0 StFFRet storage coefficient runoff in impervious surfaces (urban areas) [h] 2.0-6.0 hl horizontal hydraulic conductivity factor 2.0-8.0 maxinf maximum infiltration factor 0.025-1.025 vl vertical hydraulic conductivity factor 0.005-0.105

Method N=4000 x(t)={x 1,x 2, x 17472 } Monte-Carlo simulations (7 free parameters): 4000 data items (vectors) à à 17.472 elements Normalize each time step using x = ( x x ) σ x Each node now represents some model realizations with similar time series pattern Display means of the 7 parameter values on each node Display means of performance measures on each node x Δmi m i BMU SOM SOM-Training (15x22 = 330 nodes) Project the measured data onto the SOM (= find BMU) V

Parameter mean values on each node storage coefficient baseflow (RetBasis) storage coefficient interflow (RetInf) storage coeff. surface (RetOf) storage coefficient urban (StFFRet) partially sensitive parameters? sensitive parameters insensitive/interacting(?) parameters horizontal hydraulic conduct. factor (hl) max. infiltration factor (maxinf) vertical hydraulic conduct. factor (vl)

Method N=4000 x(t)={x 1,x 2, x 17472 } Monte-Carlo simulations (7 free parameters): 4000 data items (vectors) à à 17.472 elements Normalize each time step using x = ( x x ) σ x Each node now represents some model realizations with similar time series pattern Display means of the 7 parameter values on each node Display means of performance measures on each node x Δmi m i BMU SOM SOM-Training (15x22 = 330 nodes) Project the measured data onto the SOM (= find BMU) V

Statistical performance measures used Name BIAS RMSE CEFFlog Description Mean error Root of mean squared error Logarithmized Nash-Sutcliffe coefficient of efficiency IAg Willmott s index of agreement; 0 IAg 1 MAPE VarMSE Rlin Mean average percentual error Variance part of the mean squared error Coefficient of determination

Distribution of mean performance values BIAS RMSE CEFFlog IAg 0.15 0.1 0.05 1.7 1.6 1.5 1.4 1.3 0.5 0.4 0.3 0.2 0.9 0.89 0.88 0.87 0.86 0.85 MAPE VARmse Rlin 46 44 42 40 38 36 0.25 0.2 0.15 0.1 0.05 0.82 0.8 0.78 0.76

Method N=4000 x(t)={x 1,x 2, x 17472 } Monte-Carlo simulations (7 free parameters): 4000 data items (vectors) à à 17.472 elements Normalize each time step using x = ( x x ) σ x Each node now represents some model realizations with similar time series pattern Display means of the 7 parameter values on each node Display means of performance measures on each node x Δmi m i BMU SOM SOM-Training (15x22 = 330 nodes) Project the measured data onto the SOM (= find BMU) V

Identification of Best-Matching Unit (BMU) Input Q observed (t) BMU identify the map unit which is most similar to the measured discharge time series retrieve the model simulations attributed to this node

Position of observed discharge time series BIAS RMSE CEFFlog IAg 0.15 0.1 0.05 1.7 1.6 1.5 1.4 1.3 0.5 0.4 0.3 0.2 0.9 0.89 0.88 0.87 0.86 0.85 = position of Q obseved on the SOM MAPE VARmse Rlin 46 44 42 40 38 36 0.25 0.2 0.15 0.1 0.05 0.82 0.8 0.78 0.76

Parameter ranges of BMU model realizations 1 most sensitive parameters 0.8 norm. param. value 0.6 0.4 0.2 0 RetBasis RetInf RetOf StFFRet hl maxinf vl M. Herbst & M.C. Parameter Casper - iemss 2008 - Barcelona

Model results 40 envelope of all Monte-Carlo simulations 35 30 discharge [m³/s] 25 20 15 10 5 0 14/01/95 23/02/95 04/04/95 14/05/95 23/06/95 02/08/95 11/09/95 21/10/95 dd/mm/yy

Model results 40 BMU of observed discharge time series (7 model realizations) 35 30 discharge [m³/s] 25 20 15 10 5 0 14/01/95 23/02/95 04/04/95 14/05/95 23/06/95 02/08/95 11/09/95 21/10/95 dd/mm/yy

Model results 40 observed discharge time series 35 30 discharge [m³/s] 25 20 15 10 5 0 14/01/95 23/02/95 04/04/95 14/05/95 23/06/95 02/08/95 11/09/95 21/10/95 dd/mm/yy

Model results 40 result of manual expert model calibration 35 30 discharge [m³/s] 25 20 15 10 5 0 14/01/95 23/02/95 04/04/95 14/05/95 23/06/95 02/08/95 11/09/95 21/10/95 dd/mm/yy

Model results 40 35 result of RMSE optimization using Shuffled Complex Evolution (SCE-UA, Duan et al. 1993) 30 discharge [m³/s] 25 20 15 10 5 0 14/01/95 23/02/95 04/04/95 14/05/95 23/06/95 02/08/95 11/09/95 21/10/95 dd/mm/yy

Conclusions using a SOM, the model realizations can be ordered by similarity gives insights into the occurrence of insensitive or interacting parameters model optima with regard to different performance measures the SOM allows a kind of parameter estimation computational cost is still considerably high

Coming up next adding contextual meaning to the map data reduction improved performance BMU %BiasRR %BiasFDCm %BiasFHV %BiasFLV %BiasFMM

Coming up next adding contextual meaning to the map data reduction improved performance Model comparison using SOM model independence check model ensemble control Model LARSIM Model NASIM

The question behind the question Can we process the information contained in model time series more efficiently?? What is the information contained in the time series? How detailed do we have to describe the model behaviour (i.e. how many descriptors do we need?)

Thank you very much for your attention! correspondence to: herbstm@uni-trier.de All experiments have been conducted using the SOM Toolbox for MATLAB (Vesanto et al. 2000) 1997-2000 by the SOM toolbox programming team. http://www.cis.hut.fi/projects/somtoolbox/