Geography 4203 / 5203. GIS Modeling. Class (Block) 9: Variogram & Kriging

Similar documents

INTRODUCTION TO GEOSTATISTICS And VARIOGRAM ANALYSIS

Introduction to Modeling Spatial Processes Using Geostatistical Analyst

AMARILLO BY MORNING: DATA VISUALIZATION IN GEOSTATISTICS

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Geostatistics Exploratory Analysis

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R)

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Data Exploration Data Visualization

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

ArcGIS Geostatistical Analyst: Statistical Tools for Data Exploration, Modeling, and Advanced Surface Generation

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, , , 4-9

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Pre-Algebra Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems

Geostatistical Analyst Tutorial

A STATISTICS COURSE FOR ELEMENTARY AND MIDDLE SCHOOL TEACHERS. Gary Kader and Mike Perry Appalachian State University USA

Annealing Techniques for Data Integration

Spatial Data Analysis

Review Jeopardy. Blue vs. Orange. Review Jeopardy

ArcGIS 9. Geostatistical Analyst

GRADES 7, 8, AND 9 BIG IDEAS

Statistics Graduate Courses

The Effect of Environmental Factors on Real Estate Value

THE USE OF THE VARIOGRAM CLOUD IN GEOSTATISTICAL MODELLING

Common Core Unit Summary Grades 6 to 8

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Geography 651 Spatial Statistics Fall 2014

Exploratory Data Analysis

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Studies on the spatial variability of rebound hammer test results recorded at in-situ testing

Prentice Hall Algebra Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

NEW MEXICO Grade 6 MATHEMATICS STANDARDS

Introduction to time series analysis

Measurement with Ratios

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Exploratory Data Analysis

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Introduction to Regression and Data Analysis

Grade 6 Mathematics Common Core State Standards

DELAWARE MATHEMATICS CONTENT STANDARDS GRADES PAGE(S) WHERE TAUGHT (If submission is not a book, cite appropriate location(s))

How To Understand The Theory Of Probability

Simple Linear Regression Inference

BayesX - Software for Bayesian Inference in Structured Additive Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

RELEVANT TO ACCA QUALIFICATION PAPER P3. Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam

Data Preparation and Statistical Displays

Exploratory data analysis (Chapter 2) Fall 2011

Big Ideas in Mathematics

Non-Parametric Tests (I)

Introduction to Geostatistics

521493S Computer Graphics. Exercise 2 & course schedule change

Experiment #1, Analyze Data using Excel, Calculator and Graphs.

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Chapter 10 Introduction to Time Series Analysis

Fast and Robust Normal Estimation for Point Clouds with Sharp Features

Math 1. Month Essential Questions Concepts/Skills/Standards Content Assessment Areas of Interaction

Prentice Hall Mathematics Courses 1-3 Common Core Edition 2013

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Time Series Analysis

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Visualizing of Berkeley Earth, NASA GISS, and Hadley CRU averaging techniques

Chapter 6: Multivariate Cointegration Analysis

Exercise 1.12 (Pg )

STAT355 - Probability & Statistics

y = Xβ + ε B. Sub-pixel Classification

Tutorial - PEST. Visual MODFLOW Flex. Integrated Conceptual & Numerical Groundwater Modeling

Mathematics. What to expect Resources Study Strategies Helpful Preparation Tips Problem Solving Strategies and Hints Test taking strategies

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Digital Imaging and Multimedia. Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University

This is Geospatial Analysis II: Raster Data, chapter 8 from the book Geographic Information System Basics (index.html) (v. 1.0).

Tutorial on Markov Chain Monte Carlo

Diffusione e perfusione in risonanza magnetica. E. Pagani, M. Filippi

Statistical Models in Data Mining

1 Short Introduction to Time Series

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Antenna Properties and their impact on Wireless System Performance. Dr. Steven R. Best. Cushcraft Corporation 48 Perimeter Road Manchester, NH 03013

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Mathematics. Mathematical Practices

Algebra 1 Course Information

Correlation key concepts:

Spatial Interpolation based Cellular Coverage Prediction with Crowdsourced Measurements

Confidence Intervals for the Difference Between Two Means

SQUARES AND SQUARE ROOTS

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

the points are called control points approximating curve

Variables. Exploratory Data Analysis

Chapter 111. Texas Essential Knowledge and Skills for Mathematics. Subchapter B. Middle School

MATH. ALGEBRA I HONORS 9 th Grade ALGEBRA I HONORS

UNCERT: GEOSTATISTICAL, GROUND WATER MODELING, AND VISUALIZATION SOFTWARE

KEANSBURG SCHOOL DISTRICT KEANSBURG HIGH SCHOOL Mathematics Department. HSPA 10 Curriculum. September 2007

Multivariate Normal Distribution

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics

Lecture 8: Signal Detection and Noise Assumption

Using Spatial Statistics In GIS

Kriging Interpolation Filter to Reduce High Density Salt and Pepper Noise

Cluster Analysis: Advanced Concepts

Transcription:

Geography 4203 / 5203 GIS Modeling Class (Block) 9: Variogram & Kriging

Some Updates Today class + one proposal presentation Feb 22 Proposal Presentations Feb 25 Readings discussion (Interpolation)

Last Lecture You have seen the first part of spatial estimation which was about interpolation techniques You hopefully obtained an idea of what the basic idea of interpolation is and when and why me are supposed to make use of it You had some insights into the concepts of different techniques such as NN, pycnophylactiv I., IDW, Splines, You hopefully understood the mathematical foundation of these approaches to better understand what you are doing in the lab

Today s Outline We will begin with Geostatistics which means we talk about Kriging It is important to understand the difference between Kriging and interpolation techniques we had so far We will talk about the conceptual basics, constraints and methods for Kriging without going too much into detail Here we need a deeper understanding of autocorrelation and regionalized variable theory as well as the incorporation of semivariogram models into estimations

Learning Objectives You will understand how Kriging works and why it is called an optimized local interpolator You will better understand terms like autocorrelation, semivariance, covariance You will understand the mathematical foundation of Kriging and how we make use of spatial autocorrelation and semivariance to estimate weights for estimating

Back to How to Improve Spatial Estimation Remember how we incorporated autocorrelation into interpolation so far Implicitly - we just assumed the data show spatial dependence, we knew this from data exploration Parameter settings e.g. in IDW are arbitrary (Radius?, # neighbors?, Weighting?) - we get more knowledge through testing different combinations How to make theory-based choices and parametersettings? This is where Geostatistical methods come into the picture to help us with this decision-making process

Predictions can be improved by incorporating the knowledge of autocorrelation (knowing the value at one location provides information of locations nearby)

Geostatistics Too many definitions to go into detail Basically, it is about the analysis and inference of continuously-distributed variables (temperature, concentrations, ) Analysis: describing the spatial variability of the phenomenon under study Inference: Estimation of the values at unknown locations (unknown values) Overall we need techniques for statistical estimation of spatial phenomena and this is what Geostatistics provides

Tobler and his Message - how to Theory-based According to Tobler we expect attribute values close together to be more similar than those more distant This expectation drives the idea of interpolation and Geostatistical methods A graphical representation of this is the variogram cloud Presentation of the square root of the difference between attribute pairs against the distance between them

Variogram Cloudiness Let s look at one example from O Sullivan and Unwin (2003) Might be we understand more about spatial autocorrelation and how to catch it Do we have a trend in here?

Variogram Cloudiness Can you see a trend? If so what does it mean? Remember the visual trend in the surface

Variogram Cloudiness This is the representation of N-S (open circles) and E-W (filled circles) directions Which one shows a clearer trend and why? How do we call this effect? Why are there so few sample points now?

Variogram Cloudiness For only few points we have lots of pairs to be represented and a variogram or semivariogram becomes hard to be interpreted How to summarize the variogram cloud? One way is to use box-plots showing means, medians, quartiles and extremes for individual lags

Regionalized Variable Theory (RVT) By putting together the last slides you can understand what RVT means The value of a variable in space (at location x) can be expressed by summing together three components: - A structural component (e.g., a spatial trend or just the mean, m(x) ) - Random, spatially autocorrelated ( regionalized ) variable (local spatial autocorrelation = (x)) - Random noise (stochastic variation, not dependent on location = ) Z(x) = m(x) + (x) +

Regionalized Variable Theory (RVT) Combination of these three components in a mathematical model to develop an estimation function (function is applied to measured data to estimate values across area) m(x) is our trend (global trend surface) is the random component that has nothing to do with location (x) is our regionalized component and describes the local variation of the variable in space This latter relationship can be represented by a dispersion measure

Experimental Semivariance here we call it the experimental semivariance for lag h: r N( h) 1 r ˆ( h) = r z( u ) z( u + h) 2 Nh ( ) = 1 N(h) is the number of pairs separated by lag h (+/- lag tolerance) h - lag width or the distance between pairs of points z - is the value of a point at u or u+h Remember what we did with the variogram cloud by binning the amount of data points 2

The Experimental Semiviogram Nugget: Initial semivariance when autocorrelation is highest (intercept); or just the uncertainty where distance is close to 0 ( ) Sill: Point where the curve levels off (inherent variation where there is little autocorrelation) Range: Lag distance where the sill is reached (over which differences are spatially dependent)

Lag and Lag Tolerance This has to do with binning Lag distance is the direct (planar) distance between points a and b Tolerances used to define sets of values that are similar distances apart (distances rarely repeat in reality)

Semivariance, Covariance and Correlation Spread vs. Similarity from Bohling (2005)

Semivariogram Model The empirical semivariogram allows us to derive a semivariogram model to represent semivariance as a function of separation distance Thus we can infer the characteristics of the underlying process using the model Compute the semivariance between points Interpolate between sample points using an optimal interpolator ( kriging )

Constraints for a Semivariogram Model Monotonically increasing Asymptotic max (sill) Non-negative intercept (nugget) Anisotropy

Semivariogram Models Sperical model Gaussian model Exponential model a - range c - (nugget + sill) however fitting the semivariogram is complex What can you think of would happen if there is a trend in the data? from http://www.asu.edu/sas/sasdoc/sashtml/stat/chap34/sect12.htm

Where is the Best Fit? from Bohling (2005)

Why is Anisotropy Important So far an isotropic structure of spatial correlation has been assumed The semivariogram depends on the lag distance not on direction (omnidirectional semivariogram) In reality we often have differences in different directions and need an anisotropic semivariogram One model is geometric anisotropy (same sill in all direction but within different ranges)

Geometric Anisotropy Find ranges in three orthogonal principal directions and create a three-dimensional lag vector h = (hx,hy,hz) Transform this 3D vector into an equivalent isotropic lag: Semivariance values computed for pairs that fall within directional bands and prescribed lag limits (searching for directional dependence)

Geometric Anisotropy These directional bands are defined by azimuthal direction, angular tolerance and bandwidth You did this already in the lab and hopefully knew what you were doing

Kriging Statistically-based optimal estimator of spatial variables Predictions based on regionalized variable theory - you know this now! Kriging first used by Matheron (1963) in honour of D.G. Krige - a south African mining engineer who laid the groundwork for geostatistics

Kriging Principles Similar to IDW, weights are used with measured variables to estimate values at unknown locations (a weighted average) Weights are given in a statistically optimal fashion (given a specific model, assumptions about the trend, autocorrelation and stochastic variation in the predicted variable) In other words: We use the semivariance model to formulate the weights

Ordinary Kriging For a basic understanding we look at Ordinary Kriging Weighting of data points according to distance Estimated values z are the sum of a regional mean m(x) and a spatially correlated random component (x) (m(x) is implicit in the system) Assumptions: no trend isotropy variogram can be defined with math model same semivariogram applies for the whole study area (variation is a function of distance, not location + constant mean - stationarity of the spatial process)

Ordinary Kriging You see it looks similar to IDW But: Weight computation is much more complex based on the inverted variogram Weights reach the value zero when we reach the range a Ordinary: no trend, regional mean is estimated from sample

How Semivariance Determines Kriging Weights Weighted average of the observed attribute values Weights are a function of the variogram model & sum to 1 (they should to be unbiased!) You see the sizes of the points proportional to their weights they get Points with distances > range will have a weight of zero, WHY?

Typical Working Steps Describing the data / identifying spatial autocorrelation and trends (experimental semivariogram) Building the semivariogram model by using the mathematical function Using the semivariogram model for defining the weights Evaluate interpolated surfaces

A First Summary Kriging is complex Here you obtained a first overview of the principles of this technique Weights are derived from the semivariogram model to intelligently infere unmeasured values You have seen the underlying rules of the semivariogram and hopefully understood why autocorrelation, directional trends and stationarity are important You also understood the difference to other local interpolators, don t you?