Introduction to spatial data analysis



Similar documents
Spatial Data Analysis

Objectives. Raster Data Discrete Classes. Spatial Information in Natural Resources FANR Review the raster data model

An Introduction to Point Pattern Analysis using CrimeStat

Data Visualization Techniques and Practices Introduction to GIS Technology

Lecture 9: Geometric map transformations. Cartographic Transformations

Intro to GIS Winter Data Visualization Part I

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Working with the Raster Calculator

ANALYSIS 3 - RASTER What kinds of analysis can we do with GIS?

Spatial Analysis with GeoDa Spatial Autocorrelation

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Using GIS to Identify Pedestrian- Vehicle Crash Hot Spots and Unsafe Bus Stops

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

FOR375 EXAM #2 STUDY SESSION SPRING Lecture 14 Exam #2 Study Session

Geographically Weighted Regression

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

What is GIS? Geographic Information Systems. Introduction to ArcGIS. GIS Maps Contain Layers. What Can You Do With GIS? Layers Can Contain Features

A Cardinal That Does Not Look That Red: Analysis of a Political Polarization Trend in the St. Louis Area

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Data Exploration Data Visualization

GIS. Digital Humanities Boot Camp Series

Environmental Remote Sensing GEOG 2021

Lesson 15 - Fill Cells Plugin

Tutorial 8 Raster Data Analysis

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Cloud Model Verification at the Air Force Weather Agency

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Simple Linear Regression Inference

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

New Tools for Spatial Data Analysis in the Social Sciences

Using Spatial Statistics In GIS

ETL PROCESS IN DATA WAREHOUSE

EXPLORING SPATIAL PATTERNS IN YOUR DATA

A quick overview of geographic information systems (GIS) Uwe Deichmann, DECRG

Cluster Analysis: Advanced Concepts

A GIS helps you answer questions and solve problems by looking at your data in a way that is quickly understood and easily shared.

Quantitative Methods for Finance

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

EE4367 Telecom. Switching & Transmission. Prof. Murat Torlak

Calculation of Minimum Distances. Minimum Distance to Means. Σi i = 1

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Introduction to Imagery and Raster Data in ArcGIS

Raster to Vector Conversion for Overlay Analysis

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS INTRODUCTION TO STATISTICS MATH 2050

How performance metrics depend on the traffic demand in large cellular networks

GIS Tools for Land Managers

Obesity in America: A Growing Trend

Geography 4203 / GIS Modeling. Class (Block) 9: Variogram & Kriging

University of Arkansas Libraries ArcGIS Desktop Tutorial. Section 2: Manipulating Display Parameters in ArcMap. Symbolizing Features and Rasters:

SPATIAL ANALYSIS IN GEOGRAPHICAL INFORMATION SYSTEMS. A DATA MODEL ORffiNTED APPROACH

Optical Design Tools for Backlight Displays

Understanding Raster Data

Introduction to Exploratory Data Analysis

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

The Spatiotemporal Visualization of Historical Earthquake Data in Yellowstone National Park Using ArcGIS

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Statistical Models in R

Crime Mapping Methods. Assigning Spatial Locations to Events (Address Matching or Geocoding)

Lab 6: Distance and Density

Government 98dn Mapping Social and Environmental Space

Files Used in this Tutorial

SPATIAL DATA ANALYSIS

Tutorial 3 - Map Symbology in ArcGIS

Assessment of Groundwater Vulnerability to Landfill Leachate Induced Arsenic Contamination in Maine, US - Intro GIS Term Project Final Report

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Geography 3251: Mountain Geography Assignment III: Natural hazards A Case Study of the 1980s Mt. St. Helens Eruption

Generalized Linear Models

Topic 13 Predictive Modeling. Topic 13. Predictive Modeling

Getting Started with the ArcGIS Predictive Analysis Add-In

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

ArcGIS Data Models Practical Templates for Implementing GIS Projects

GIS Analysis for Applied Economists 1

Foundation of Quantitative Data Analysis

Spatial Analyst Tutorial

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

GEOGRAPHIC INFORMATION SYSTEMS Lecture 20: Adding and Creating Data

Identifying High Crime Areas

GIS & Spatial Modeling

PHOTON mapping is a practical approach for computing global illumination within complex

This is Geospatial Analysis II: Raster Data, chapter 8 from the book Geographic Information System Basics (index.html) (v. 1.0).

IMPLEMENTING INTRANET/EXTRANET Estimate of cost Feasibility check

Bachelor's Degree in Business Administration and Master's Degree course description

GIS III: GIS Analysis Module 2a: Introduction to Network Analyst

BYLINE: Michael F. Goodchild, University of California, Santa Barbara,

Pre-Algebra Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Geofutures. Prepared by Geofutures for The Responsible Gambling Trust. Authors: Gaynor Astbury & Mark Thurstain- Goodwin

UNIVERSITY of MASSACHUSETTS DARTMOUTH Charlton College of Business Decision and Information Sciences Fall 2010

How To Hydrologically Condition A Digital Dam

3D Analysis and Surface Modeling

Transcription:

Introduction to spatial data analysis 3 Scuola di Dottorato in Economia, La Sapienza, 2015/2016 Instructors: Filippo Celata, Federico Martellozzo and Luca Salvati http://www.memotef.uniroma1.it/node/6524 Spatial statistics: - f(location, distance..) - to identify invisible geographical properties of data (eg. distribution patterns) - spatial association: to verify the degree of similarity of spatial events which are a function of their distance John Snow s map of Cholera London, 1854 Types of spatial association: 1. That are due to spatial dependence between geographical features (eg. Similar plants require similar soils) 2. That are due to spatial autocorrelation: the presence of a certain event increases the probability of finding similar events nearby, due to a reciprocal influence or «real contagion» (eg. Similar plants cluster because they are generated by other similar plants) Methods: A. To analyze the spatial distribution of a pre-selected set of similar event (point patterns or point processes) (eg. Firms owned by foreign born) B. Autocorrelation analysis: the degree to which nearby features are more similar than distant ones (to identify relations between proximity and intensity; polygons)

1. (Simple) spatial distribution measures - Spatial distribution Case field: to identify different centres for different categories of features (marked point pattern) Weight: absolute vs. relative centrality MEDIAN CENTER / MEAN CENTER Do: the distribution of firms owned by foreign born Identify and render the mean center (spatial statistics / measuring geog. distr. / ) for firms owned by Bangla, Egyptians, Romanian, Chinese and Lybians (input: lez3/rm_immigdt.shp; weight field: ADD08 ; case field: ORIGINE ) Do a kernel density map of firms owned by foreign born: spatial analyst / density / kernel density (Input: rm_immigdt.shp; Population field: CNT; cell size: 10 mts (or average distance between all points); search radius: 2.000 meters/ in environments : extent and raster analysis/mask = zoneurbanistiche.shp). Mapping: modify the symbology of both ouput layers, and go to view/layout view to export the map (.tif, 300 dpi) Discrete vs. surface statistical analysis Eg. Surface-based indicators (-> map algebra) -> measures of spatial segregation (Descrete) segregation index: relation between two normalized or standardized density coefficients (eg. Normalized density of firms owned by Chinese / Normalized density of all firms) (from -1 to +1). S (Surface-based) segregation index (numerator) (O Sullivan-Wong 2007): the local contribution to global spatial segregation = difference between the max and min values in any point of the kernel density (eg. Italians/Chinese = max(pci,pii) min (pci,pii)]

Grado di segregazione tra aree a prevalenza di imprenditori cinesi e aree a prevalenza di imprenditori italiani Contributo locale alla segregazione tra aree a prevalente presenza di unità condotte da imprenditori cinesi o italiani 2. POINT PROCESSES: spatial distribution of events in a point pattern (or scheme) -> Cluster analysis - Spatial cluster: the spatial distribution of (similar) events (points) is (more) clustered (than a complete spatial random distribution, and/or than the general/global distribution of the process. Eg. Diseases due to local causes). Eg. Business cluster -Clustering: a general tendency of (similar) events to co-locate - Hot-spot: areas with an anomalous concentration of similar events Point processes and cluster analysis: to verify if the spatial distribution of (similar) events is clustered, dispersed (uniform or inhibitory) vs. the complete spatial randomness hypothesis Firms clustering and external economies of scale: empirical evidence random uniform / inhibitory (concentrated*) clustered

[Problems with standard (discrete, regional, a-spatial) concentration measures (eg. GINI index)] 1) MAUP (modifiable area unit problem): the degree of concentration is influenced by the spatial partition and spatial resolution of data (Geographical concentration measures and problems) 2) The degree of concentration is not function of the degree of polarization of the most dense regions (Arbia 2001) Concentration vs. polarization Concentration vs. co-agglomeration - Ellison and Glaeser concentration index (1997): a measure of co-agglomeration which takes into account the average degree of industrial concentration (Herfindahl index) and is not influenced by the degree of spatial resolution of data (MAUP) Degree of concentration Degree of spatial auto-correlation

2. POINT PROCESSES: spatial distribution of events in a point pattern -> Cluster analysis Point processes: clustering of events - Spatial cluster: the spatial distribution of (similar) events (points) is (more) clustered (than a complete spatial random distribution, and/or than the general/global distribution of the process. Eg. Diseases due to local causes). -Clustering: a general tendency of (similar) events to co-locate - Hot-spot: areas with an anomalous concentration of similar events Complete spatial randomness (Diggle, 1983) = the event has the same probability to locate anywhere = - The number of events in any subregion is distributed as a Poisson -The location of events is not depending upon the location of similar events (indipendence) - The number of events in two nonoverlapping regions are independent 3) The average number of events per unit area (intensity) is homogeneous throughout the area (spatial statitionery) Random distributions implies a certain degree of concentration and/or clustering. This distribution is clustered whenever the degree of concentration is higher than what we would expect in case of complete spatial randomness. Different techniques imply different CSR hypothesis Problems with the analysis of spatial data #1: -Study area extension (if too small, the analysis may not include elements which are important to provide an exhaustive explanation. If too big, the spatial distribution pattern may be due of a diversity of processes which have nothing to do with what we want to explain. Example: suburban, scattered and low density urban areas). -> reduce the size of the area Creat a mask of the area within the GRA (ring road) by selecting (manually) the zone urbanistiche within the GRA and exporting the selection as mask_area.shp

Clustering: global indexes (to measure the global degree of clustering for the whole set of events) -> methods based on quadrats (joint count) vs. on distances AVERAGE NEAREST NEIGHBOUR: the distance between events is less (clustering) or more (pattern inibitorio) of the expected distance in case of complete spatial randomness? (Clark-Evans, 50s) Nearest neighbour ratio = observed mean distance / expected mean distance (CSR) -> Input: Points: unweighted (= 1) / Projected coordinate system! (Polygons and lines: convert into points with x, y = centroids) Output: - Observed Mean Distance -Expected Mean Distance - Nearest Neighbor Index -Graphic report - Test variables: -> Toolbox / Spatial statistics / Analyzing patterns p-value: probabilty of the spatial distribution to be random z-score: standard deviation of the real values from expected values - measure the ANN for firms within the GRA (selection of rm_immig.shp)