Spatial Data Analysis Using GeoDa 9 Jan 2014 Frank Witmer Computing and Research Services Institute of Behavioral Science Workshop Goals Enable participants to find and retrieve geographic data pertinent to their study and conduct spatial analysis using GeoDa Geographic data sources and formats Data joins in ArcGIS Exploratory spatial data analysis (ESDA) in GeoDa Provide experience using ArcGIS and GeoDa software Provide the opportunity for you to work with your own data and/or find data relevant to your study area 1
Types of Geographic Data 1) Spatial data helpful to conceptualize as maps necessary for answering Where questions used to establish spatial relations (e.g. distance, connectivity, containment) used to support spatial analysis 2) Attribute data helpful to conceptualize as tables necessary for answering What questions (and metadata too, typically in.xml format) Geographic Data Spatial data and attribute table are linked together State Name Population Governor New Jersey 7,730,188 C. Whitman Pennsylvania 11,881,643 T. Ridge Etc 2
IBS Data Links Page http://www.colorado.edu/ibs/crs/geographic_data_sources.html Some highlights: WONDER from CDC http://wonder.cdc.gov/ ESRI Data Products http://www.esri.com/data/find data Census data American FactFinder http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml Census Explorer (no direct data download) http://www.census.gov/censusexplorer/ http://blogs.census.gov/ Census Maps & Data http://www.census.gov/geo/maps data/ TIGER Products Cartographic Boundary Files County 500k, 5m, 20m reflect the scale of the data scale is the ratio of map distance to earth distance, so 1:500,000 has more detail than 1:20,000,000 TIGER/Line Shapefiles 2010 Download Web Interface 3
International Borders For individual countries, can sometimes find a gov t agency that provides geographic data ESRI borders online or from Data & Maps DVD Global Administrative Areas (GADM) http://www.gadm.org/ Joining Attribute Data to Geodata Will often find attribute data in tabular form So might need to obtain geodata separately and join the attribute data to it Challenge: construct common field that matches exactly! 4
Joining Tables In ArcMap, right-click on the destination to begin a join! Some GIS File Types ESRI Shapefiles Very common since file format is open Multiple files with different extensions (.shp,.shx,.dbf) Display quickly and are editable But careful, polygons do not share boundary lines! ESRI SDC Smart Data Compression, files are compressed for efficient storage ESRI Interchange files Extension.e00 ESRI GRID Attribute table stores number of occurrences/value ESRI Geodatabases Integrated approach for storing & managing all types of geographic data and their relationships Consists of Feature Datasets which contain Feature Classes Better to use File geodatabases (instead of Personal) 5
Exploratory Spatial Data Analysis (ESDA) Actively find interesting patterns in the data Facilitated by dynamically linked views Use statistical measures of spatial association such as global & local Moran s I to explore spatial dependence Global: one statistic to summarize the pattern Local: location specific statistics Moran s I frequently used to test for spatial autocorrelation in regression residuals, but it is also of interest when exploring the spatial distribution of variables Standardized cov/var Significance tests Normal distribution Randomization/permutation Spatial correlation 1 > neg. correlation (regularity) 0 > no correlation 1 > pos. correlation (clustering) Moran s I Statistic I = n w i j ij i j i wzz z w ij = weights matrix ij i 2 i for contiguity matrix, w ij = 1 if i and j adjacent z i = x i μ where x is the attribute value j 6
Moran s I and testing for spatial independence H 0 : spatial independence Normal Distribution: Assume X s are identically normally distributed (each value for each region has same distribution) Use E[I] and VAR[I] to calculate Z statistic If Z statistic lies beyond critical value, then reject null Randomization/permutation: Many times randomly rearrange the data on map and compute I each time. Create a histogram of distribution of I. Then calculate the mean and variance of the distribution. And then a z statistic. If Z statistic lies beyond critical value, then reject null Moran s I for CO Mortality E[I] = 1/(n 1) = 1/63 7
Anti correlated High value clustering Low value clustering Anti correlated Local Moran s I Provides a measure of spatial autocorrelation for every areal unit, I i n i=1 I i = ci c = a constant of proportionality If assume I i is normally distributed, can be transformed into Z statistic to test for significance. 8
GeoDa Software Open source, available for Windows, Mac & Linux http://geodacenter.asu.edu/ project is led by Luc Anselin at ASU only supports shapefiles, so must use ArcGIS (or other GIS software) to convert to shapefiles Linking selection in one view results in selection in all views (e.g. maps, tables, scatterplots) Brushing dynamic version of linking click & drag rectangle over map or scatterplot Spatial Analysis Requires definition of a spatial weights file that defines neighbors contiguity distance (warning: be sure data are projected!) Moran s I global measure of spatial autocorrelation are neighboring values similar to self? Local Indicators of Spatial Association (LISA) Local Moran s I Local Getis G i * 9