Exploratory Data Analysis for Ecological Modelling and Decision Support



Similar documents
GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING

DESCARTES & EFIMOD: An Integrated System for Simulation Modelling and Exploration Data Analysis for Decision Support in Sustainable Forestry

Exploratory Spatial Data Analysis

Interactive Analysis of Event Data Using Space-Time Cube

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

Information Visualization WS 2013/14 11 Visual Analytics

GEOGRAPHIC INFORMATION SYSTEMS

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

The STC for Event Analysis: Scalability Issues

Blending Aggregation and Selection: Adapting Parallel Coordinates for the Visualization of Large Datasets

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Data Exploration Data Visualization

Business Intelligence and Process Modelling

Data Visualization Techniques

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Visualization Techniques

How To Understand The History Of Navigation In French Marine Science

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

DHL Data Mining Project. Customer Segmentation with Clustering

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

Data Visualisation and Its Application in Official Statistics. Olivia Or Census and Statistics Department, Hong Kong, China

Business Intelligence System for Monitoring, Analysis and Forecasting of Socioeconomic Development of Russian Territories

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Visualization methods for patent data

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

APPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE

Risk Visualization: Presenting Data to Facilitate Better Risk Management

Data, Measurements, Features

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Table of Contents Find the story within your data

CE 504 Computational Hydrology Computational Environments and Tools Fritz R. Fiedler

DATA WAREHOUSING AND OLAP TECHNOLOGY

Tutorial for proteome data analysis using the Perseus software platform

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

INTRAFOCUS. DATA VISUALISATION An Intrafocus Guide

Introduction to Exploratory Data Analysis

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Theoretical Perspective

Data Visualization Handbook

Geovisual Analytics Exploring and analyzing large spatial and multivariate data. Prof Mikael Jern & Civ IngTobias Åström.

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

cognitive work analysis.

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Create Cool Lumira Visualization Extensions with SAP Web IDE Dong Pan SAP PM and RIG Analytics Henry Kam Senior Product Manager, Developer Ecosystem

SQL Server Administrator Introduction - 3 Days Objectives

Getting Started With Mortgage MarketSmart

COMP Visualization. Lecture 11 Interacting with Visualizations

3D Interactive Information Visualization: Guidelines from experience and analysis of applications

DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence

Visualization Quick Guide

Introduction. A. Bellaachia Page: 1

Customer Analytics. Turn Big Data into Big Value

PROGRAM DIRECTOR: Arthur O Connor Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements

The Masters of Science in Information Systems & Technology

An Introduction to Point Pattern Analysis using CrimeStat

WebSphere Business Monitor

Course Descriptions: Undergraduate/Graduate Certificate Program in Data Visualization and Analysis

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC Politecnico di Milano)

Intro to GIS Winter Data Visualization Part I

The Scientific Data Mining Process

NV301 Umoja Advanced BI Navigation. Umoja Advanced BI Navigation Version 5 1

I Newsletter Q1 WELCOME TO OUR FIRST NEWSLETTER EDITION FOR 2015!

Oracle Data Miner (Extension of SQL Developer 4.0)

Business Intelligence, Analytics & Reporting: Glossary of Terms

Environmental Remote Sensing GEOG 2021

Data Analysis and Statistical Software Workshop. Ted Kasha, B.S. Kimberly Galt, Pharm.D., Ph.D.(c) May 14, 2009

Sanjeev Kumar. contribute

Data Mining for Successful Healthcare Organizations

Geostatistics Exploratory Analysis

<no narration for this slide>

What you can do:...3 Data Entry:...3 Drillhole Sample Data:...5 Cross Sections and Level Plans...8 3D Visualization...11

Data Visualization. or Graphical Data Presentation. Jerzy Stefanowski Instytut Informatyki

Multivariate Data Visualization

OPTIMIZATION OF DATABASE STRUCTURE FOR HYDROMETEOROLOGICAL MONITORING SYSTEM

Impact of Data and Task Characteristics on Design of Spatio-Temporal Data Visualization Tools

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

GeoGebra Statistics and Probability

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Introduction to Data Mining

Exploratory Data Analysis

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Database Marketing, Business Intelligence and Knowledge Discovery

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

ROLAP with Column Store Index Deep Dive. Alexei Khalyako SQL CAT Program Manager

Transcription:

Exploratory Data Analysis for Ecological Modelling and Decision Support Gennady Andrienko & Natalia Andrienko Fraunhofer Institute AIS Sankt Augustin Germany http://www.ais.fraunhofer.de/and 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 1 Outline 1. Geo-visualisation s view on ecological modelling: demanding problems and challenging tasks 2. Case study 1: pesticide accumulation 3. Case study 2: forest dynamics 4. A systematic approach to exploratory data analysis (EDA): elements of the general theory 5. Software issues 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 2 1

A View on Ecological Modelling Complex and multidimensional; May contain errors May have many parameters; May contain errors Need to be interpreted; To be used for informed decision making Input data Model Output data Data Complexity: 1) Space, 2) Time, 3) Multiple attributes & dimensions, 4) Variability of values, abrupt changes 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 3 Outputs of simulations Multiple attributes referring to Simulation scenarios; Spatial locations (objects); Time moments; (e.g. species, age groups etc.) 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 4 2

Complexities Number of attributes Length of time series Number of spatial objects High dimensionality => huge number of combinations (normally 10 5-10 8 )! Abrupt temporal changes Great variability of values 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 5 Complexities: example 1 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 6 3

Complexities: example 2 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 7 Our goals Support analysts and decision makers in: Preparing and harmonizing input data; Tuning models and their parameters; Interpreting outputs of simulation; Exploring alternatives for decision making; Justifying and communicating the resulting decisions. Instruments: interactive visualisation enhanced by intelligent aggregation tools and other tools for exploratory data analysis 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 8 4

Outline 1. Geo-visualisation s view on ecological modelling: demanding problems and challenging tasks 2. Case study 1: pesticide accumulation 3. Case study 2: forest dynamics 4. A systematic approach to exploratory data analysis (EDA): elements of the general theory 5. Software issues 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 9 GIMMI project Multiple simulation scenarios (different crops and active ingredients) about 1000 plots Geographical Information and Mathematic Model Interoperability IST-2001-34245, 2002 2004 TXT Italy, EIG Germany, AIS Germany simulation depth: 10+ years several output variables that characterize various environmental aspects (pesticide accumulation etc.) 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 10 5

Decisions to be made What crop? What active ingredient? In what concentration? for individual plots and for the whole territory 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 11 Pesticide accumulation dynamics In fact, we see the extreme values only! Zoom? 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 12 6

Zoomed pesticide accumulation The extreme values and the overall view are lost But the details are still not visible! 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 13 Log10 transformed values Let s transform values to log10 Now we can see something! 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 14 7

Only means, medians, and envelopes Remove individual lines: look only at the dynamics of the general characteristics 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 15 Details of the distribution And finally add more details on the overall level! 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 16 8

Count plots within intervals 1. Specify classes according to pesticide concentration 2. Count number of plots within each interval for each year 3. Draw the counts as stacked bars 4. Possible extension: use areas or other amounts instead of the counts 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 17 Compare the accumulation in all scenarios for the whole territory Another view on the overall characteristics of all 5 scenarios 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 18 9

Look at the individual plots 2 nd scenario => 7 th year Aggregated diagram representation supports the evaluation of the scenarios for individual plots Dynamic linking between displays supports selection of interesting plots 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 19 Outline 1. Geo-visualisation s view on ecological modelling: demanding problems and challenging tasks 2. Case study 1: pesticide accumulation 3. Case study 2: forest dynamics 4. A systematic approach to exploratory data analysis (EDA): elements of the general theory 5. Software issues 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 20 10

Silvics project 104 forest compartments 4 scenarios of development: NATural Selective CUtting Legal RUssian ILLegal Simulation results for 200 years (41 time moments) 6 species 13 age groups 104*4*41*6*13=1,000,000 combinations For these combinations: 20 attributes! SILVICS - Silvicultural Systems for Sustainable Forest Resources Management Univ. Wageningen (NL), EFI (FI), AIS (DE), RAS (RU) INTAS, 2002-2005 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 21 Compare biomass in two scenarios SCU: rather stable number of forest compartments in all classes; LRU: high temporal variability 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 22 11

Look at biodiversity: Dominant Species/Age classification Oak, 7 th age group dominates in some compartments Two interactively specified thresholds 1. Presence 2. Dominance level Tilia, 5 th age group is present in some compartments 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 23 Dominant Species and Age Class (1) natural selective Russian illegal 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 24 12

Dominant Species and Age Class (2) natural selective Russian illegal 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 25 Species Structure (1) 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 26 13

Species Structure (2) 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 27 Age Structure (1) 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 28 14

Age Structure (2) 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 29 Outline 1. Geo-visualisation s view on ecological modelling: demanding problems and challenging tasks 2. Case study 1: pesticide accumulation 3. Case study 2: forest dynamics 4. A systematic approach to exploratory data analysis (EDA): elements of the general theory 5. Software issues 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 30 15

Recap: aggregation tools 1. Several variants of time series aggregation 2. Aggregation of multiple attributes via selection of the dominant attribute both in a spatial context closely integrated with interactive visualisation and data transformation 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 31 Roles of aggregation tools in EDA Aggregation supports grasping the overall characteristics on the processes / scenarios To be instrumental, aggregation tools should be interactive and dynamic for: 1. Flexible and powerful data transformation 2. Immediate feedback on visual displays 3. Analysis of sensitivity to the aggregation parameters 4. Selection of interesting data instances, access to them Intelligent aggregation is important for decision support as a tool for the exploration and evaluation of alternatives 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 32 16

What Is EDA? Emerged in statistics in 1970ies; originator: John Tukey A philosophy and discipline of unbiased looking at data: What can data tell me? rather than Do they agree with my expectations? Similar to the work of a detective (J.Tukey) Need to look at data focus on visualisation and user interaction with data displays 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 33 Purposes of EDA Uncover peculiarities of the data and, on this basis, understand how the data should be further processed (e.g. filtered, transformed, split into parts, fused, ) Generate hypotheses for further testing (e.g. using statistical methods) Choose proper methods for in-depth analysis (possibly, domain-specific) 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 34 17

EDA vs. other analyses EDA does not substitute rigor methods of numerical analysis, either general or domain-specific, but should give the understanding what methods and how to apply Original data 1. EDA 2. Data processing Understanding of the data (mental model) Processed data 3. In-depth analysis Conclusions, theories, decisions, 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 35 EDA vs. information presentation EDA makes intensive use of graphics However, nice presentation and reporting are not EDA purposes Primary goal of presentation: convey certain idea or set of ideas to others Understandably Convincingly Aesthetically attractively This requires different visual means than exploration 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 36 18

Case study 3: EDA for the exploration of forest defoliations Large volume: 6169 spatially-referenced time series Two dimensions: S&T Many missing values No full compatibility across countries, species, time etc. Data from NEFIS project 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 37 General procedure of the EDA 1. See the whole Space + Time 2 complementary views 1) Evolution of spatial patterns in time 2) Distribution of temporal behaviours in space 2. Divide and focus Data are complex Have to be explored by slices and subsets (species, age groups, countries, years, ) 3. Attend to particulars Detect outliers, strange behaviours, unexpected patterns, 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 38 19

See the whole: Handle large data volumes General approach: Data aggregation Task 1: Explore evolution of spatial patterns Appropriate data transformation: aggregate by small space compartments (regular grid with 4025 cells); separately for different species; various aggregates (mean, max) Gain: no symbol overlapping 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 39 Explore evolution of spatial patterns a) Animated map b) Map sequence Observations: Persistently high values in Poland Improvement in Belarus Mosaic distribution in most countries: great differences between close locations Outliers 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 40 20

Divide and Focus: Exploration on country level Recommendable due to inconsistencies between countries Observation: abrupt changes between locations spatial smoothing methods are not appropriate 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 41 Explore spatial distribution of temporal behaviours Are behaviours in neighbouring places similar? Step 1. Smoothing supports revealing general patterns and disregarding fluctuations and outliers (we shall look at outliers later) 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 42 21

Explore spatial distribution of temporal behaviours Are behaviours in neighbouring places similar? Step 2. Temporal comparison (e.g. with particular year, mean for a period) helps to disregard absolute differences in values and thus focus on behaviours Observation: no strong similarity between neighbouring places 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 43 Compare behaviours in plots with different main species Mosaic signs: 6 rows for species; 14 columns for years 1990-2003; Colours encode defoliation values Observation: behaviours differ for different main species 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 44 22

Explore overall temporal trends Line overlapping obstructs data analysis apply aggregation 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 45 Aggregation method 1: by quantiles 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 46 23

Aggregation method 2: by intervals 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 47 Divide and Focus: Germany 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 48 24

Divide and Focus: age groups 1,3 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 49 Attend to particulars Types of particulars (examples): Extreme values Extreme changes High variability Questions: When? Where? What is around? Why? (a question for further, in-depth analysis) Domain knowledge is essential 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 50 25

Attend to particulars: extreme values 1. Click on a segment corresponding to extreme values 2. The behaviour(s) is(are) highlighted on the time graph 3. The location(s) is(are) highlighted on the map 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 51 Attend to particulars: what is around? In some neighbouring places the behaviours during the period 2000-2003 are somewhat similar 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 52 26

Attend to particulars: extreme changes 1. Transform the time graph to show changes 2. Select extreme changes in a specific year (here 2003) 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 53 Attend to particulars: high variation 1. Aggregate time graph by quantiles 2. Save counts 3. Visualise e.g. on a scatter plot 4. Select items with high variation 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 54 27

Attend to particulars: high fluctuation Select items with maximal number of jumps between quantiles 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 55 Attend to particulars: stable extremes Select items being always in the topmost 10% 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 56 28

Attend to particulars: stable increase 1. Turn the time graph in the segmentation mode 2. Choose increase and set minimum difference 3. Select a sequence of years by clicking 4. Check sensitivity to the time period! 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 57 Recap: Exploration procedure See the whole Evolution of spatial patterns in time Distribution of temporal behaviours in space Divide and focus Data were explored by slices and subsets (species, age groups, countries, years, ) Attend to particulars Extreme values, extreme changes, high variation, high fluctuations, stable growth 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 58 29

Recap: Tools Visualisation on thematic maps, time graphs, other aspatial displays Aggregation: reduce data volume & symbol overlapping Filtering: divide and focus (select subsets) Marking: see corresponding data on several displays Data transformation: smoothing, computing changes, normalisation etc. It is important to use the tools in combination 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 59 Elements of the theory of EDA Data Have structure and properties Observations, findings, conclusions, decisions Principles Task = Target + Constraints defined by properties of the data Are suitable for specific types of data and tasks Tasks Principles Tools 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 60 30

Notes for designers of new tools Tool design (in particular, map design) should base on task analysis! Data Potential tasks Assessment of existing tools Data structure Tool requirements Combining existing tools and inventing new ones 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 61 Outline 1. Geo-visualisation s view on ecological modelling: demanding problems and challenging tasks 2. Case study 1: pesticide accumulation 3. Case study 2: forest dynamics 4. A systematic approach to exploratory data analysis (EDA): elements of the general theory 5. Software issues 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 62 31

Requirements to EDA software Space- and Time-awareness Work with complex multidimensional data Support for uncertain and missing data Scalability Support and encouraging of several complementary views on the same data Dynamic linking and coordination of several data displays From the overall view to particulars of interest From idea generation to hypothesis testing using statistical methods, followed by reporting 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 63 GIS for EDA: major problems Time-awareness Work with complex and multidimensional data Processing uncertain and missing data Scalability Interactivity of visualisations Dynamic linking of multiple displays Idea processing 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 64 32

Potentially useful tools for EDA Information visualisation tools, for example, HCE & TimeSearcher from HCIL, Univ. Maryland Geovisualisation tools, for example GeoVistaStudio (Penn State Univ.) and Descartes/CommonGIS (Fraunhofer Institute AIS) Graphical statistics tools, for example, Manet & Mondrian (Augsburg Univ.) Usually such systems are research prototypes that implement innovative ideas, but provide restricted functionality and limited user support 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 65 CommonGIS (not a common GIS ) A variety of well-integrated tools for EDA Time-aware maps + statistical graphics; several mechanisms of display coordination Designed to gain synergy of Visualisation Display manipulation Data manipulation Querying Computational techniques, including aggregation and data mining Quick demo? 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 66 33

Still open issues (for all tools!) Work with qualitative (non-numeric) data Work with fuzzy, uncertain, and missing data Continue scalability efforts Intelligent guidance through the overall process of data analysis, avoiding cognitive complexity Adaptability to user, data, tasks, and hardware Support in processing and management of observations: recording, structuring, browsing, searching, checking, combining, interpreting Help in visual communication of derived data, constructed knowledge, and recommended decisions 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 67 Conclusions 1. EDA is essential in ecological modelling for preparation of data, verification and tuning of models, interpretation of results, and evaluation of decision alternatives 2. Systematic application of EDA requires careful consideration of characteristics of data, relevant analytical tasks, properties of tools 3. EDA tools should combine interactive visualisation with data transformation, dynamic query, and sophisticated computations 4. Still there are many things to do for scientists and for software developers 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 68 34

To Learn More: Software: http://www.commongis.com Papers, tutorials, on-line demos: http://www.ais.fraunhofer.de/and Book to appear: Natalia and Gennady Andrienko Exploratory Analysis of Spatial and Temporal data. A Systematic Approach (Springer-Verlag, end 2005) A theoretical framework for linking tasks, tools, and principles of data analysis 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 69 http://www.ais.fraunhofer.de/and In press, to appear end 2005 5th ECEM conference, Pushchino, Russia, 19-23.9.2005 70 35