Introduction to spatial data analysis 3 Scuola di Dottorato in Economia, La Sapienza, 2015/2016 Instructors: Filippo Celata, Federico Martellozzo and Luca Salvati http://www.memotef.uniroma1.it/node/6524 Spatial statistics: - f(location, distance..) - to identify invisible geographical properties of data (eg. distribution patterns) - spatial association: to verify the degree of similarity of spatial events which are a function of their distance John Snow s map of Cholera London, 1854 Types of spatial association: 1. That are due to spatial dependence between geographical features (eg. Similar plants require similar soils) 2. That are due to spatial autocorrelation: the presence of a certain event increases the probability of finding similar events nearby, due to a reciprocal influence or «real contagion» (eg. Similar plants cluster because they are generated by other similar plants) Methods: A. To analyze the spatial distribution of a pre-selected set of similar event (point patterns or point processes) (eg. Firms owned by foreign born) B. Autocorrelation analysis: the degree to which nearby features are more similar than distant ones (to identify relations between proximity and intensity; polygons)
1. (Simple) spatial distribution measures - Spatial distribution Case field: to identify different centres for different categories of features (marked point pattern) Weight: absolute vs. relative centrality MEDIAN CENTER / MEAN CENTER Do: the distribution of firms owned by foreign born Identify and render the mean center (spatial statistics / measuring geog. distr. / ) for firms owned by Bangla, Egyptians, Romanian, Chinese and Lybians (input: lez3/rm_immigdt.shp; weight field: ADD08 ; case field: ORIGINE ) Do a kernel density map of firms owned by foreign born: spatial analyst / density / kernel density (Input: rm_immigdt.shp; Population field: CNT; cell size: 10 mts (or average distance between all points); search radius: 2.000 meters/ in environments : extent and raster analysis/mask = zoneurbanistiche.shp). Mapping: modify the symbology of both ouput layers, and go to view/layout view to export the map (.tif, 300 dpi) Discrete vs. surface statistical analysis Eg. Surface-based indicators (-> map algebra) -> measures of spatial segregation (Descrete) segregation index: relation between two normalized or standardized density coefficients (eg. Normalized density of firms owned by Chinese / Normalized density of all firms) (from -1 to +1). S (Surface-based) segregation index (numerator) (O Sullivan-Wong 2007): the local contribution to global spatial segregation = difference between the max and min values in any point of the kernel density (eg. Italians/Chinese = max(pci,pii) min (pci,pii)]
Grado di segregazione tra aree a prevalenza di imprenditori cinesi e aree a prevalenza di imprenditori italiani Contributo locale alla segregazione tra aree a prevalente presenza di unità condotte da imprenditori cinesi o italiani 2. POINT PROCESSES: spatial distribution of events in a point pattern (or scheme) -> Cluster analysis - Spatial cluster: the spatial distribution of (similar) events (points) is (more) clustered (than a complete spatial random distribution, and/or than the general/global distribution of the process. Eg. Diseases due to local causes). Eg. Business cluster -Clustering: a general tendency of (similar) events to co-locate - Hot-spot: areas with an anomalous concentration of similar events Point processes and cluster analysis: to verify if the spatial distribution of (similar) events is clustered, dispersed (uniform or inhibitory) vs. the complete spatial randomness hypothesis Firms clustering and external economies of scale: empirical evidence random uniform / inhibitory (concentrated*) clustered
[Problems with standard (discrete, regional, a-spatial) concentration measures (eg. GINI index)] 1) MAUP (modifiable area unit problem): the degree of concentration is influenced by the spatial partition and spatial resolution of data (Geographical concentration measures and problems) 2) The degree of concentration is not function of the degree of polarization of the most dense regions (Arbia 2001) Concentration vs. polarization Concentration vs. co-agglomeration - Ellison and Glaeser concentration index (1997): a measure of co-agglomeration which takes into account the average degree of industrial concentration (Herfindahl index) and is not influenced by the degree of spatial resolution of data (MAUP) Degree of concentration Degree of spatial auto-correlation
2. POINT PROCESSES: spatial distribution of events in a point pattern -> Cluster analysis Point processes: clustering of events - Spatial cluster: the spatial distribution of (similar) events (points) is (more) clustered (than a complete spatial random distribution, and/or than the general/global distribution of the process. Eg. Diseases due to local causes). -Clustering: a general tendency of (similar) events to co-locate - Hot-spot: areas with an anomalous concentration of similar events Complete spatial randomness (Diggle, 1983) = the event has the same probability to locate anywhere = - The number of events in any subregion is distributed as a Poisson -The location of events is not depending upon the location of similar events (indipendence) - The number of events in two nonoverlapping regions are independent 3) The average number of events per unit area (intensity) is homogeneous throughout the area (spatial statitionery) Random distributions implies a certain degree of concentration and/or clustering. This distribution is clustered whenever the degree of concentration is higher than what we would expect in case of complete spatial randomness. Different techniques imply different CSR hypothesis Problems with the analysis of spatial data #1: -Study area extension (if too small, the analysis may not include elements which are important to provide an exhaustive explanation. If too big, the spatial distribution pattern may be due of a diversity of processes which have nothing to do with what we want to explain. Example: suburban, scattered and low density urban areas). -> reduce the size of the area Creat a mask of the area within the GRA (ring road) by selecting (manually) the zone urbanistiche within the GRA and exporting the selection as mask_area.shp
Clustering: global indexes (to measure the global degree of clustering for the whole set of events) -> methods based on quadrats (joint count) vs. on distances AVERAGE NEAREST NEIGHBOUR: the distance between events is less (clustering) or more (pattern inibitorio) of the expected distance in case of complete spatial randomness? (Clark-Evans, 50s) Nearest neighbour ratio = observed mean distance / expected mean distance (CSR) -> Input: Points: unweighted (= 1) / Projected coordinate system! (Polygons and lines: convert into points with x, y = centroids) Output: - Observed Mean Distance -Expected Mean Distance - Nearest Neighbor Index -Graphic report - Test variables: -> Toolbox / Spatial statistics / Analyzing patterns p-value: probabilty of the spatial distribution to be random z-score: standard deviation of the real values from expected values - measure the ANN for firms within the GRA (selection of rm_immig.shp)