University of Thessaly, Department of Planning and Regional Development Master Franco Hellenique POpulation, DEveloppement, PROspective Volos, 2013 Data classification methods in GIS The most common methods Databases and Geographic Information Systems Bases de Donnees - SIG Vassilis PAPPAS, Associate Professor vpappas@upatras.gr Laboratory of Urban and Regional Planning Department of Architecture School of Engineering University of Patras Data Classification methods DRAWING FEATURES DRAWING CATEGORIES OF FEATURES DRAWING QUANTITIES OF FEATURES Categories in Urban Planning: LAND USE, BUILDING CUNSTRUCTION TYPE, BUILDING QUALITY, Quantities in Urban Planning: POPULATION, LAND VALUES, A feature layer is a reference to a feature class and has an associated drawing method.
Data Classification methods A layer lets us assign any type of drawing method to a geographic dataset. BLOCK_ID AREA PERIMETER ADEQUACY COVER (%) F.A.R. P. HEIGHT P. LAND USE 1 3954,227000 254,311000 20-600 40 0,6 7,5 gk2 57 1896,508000 174,281100 20-600 40 0,6 7,5 gk1 167 2647,750000 212,230700 12-300 40 0,6 7,5 ak 168 316,265600 71,350790 15-500 70 0,8 8,5 gk 169 3702,945000 304,357800 15-500 70 0,8 8,5 ak 170 846,890600 129,475100 15-500 70 0,8 8,5 ak 171 1096,961000 132,914700 15-500 70 0,8 8,5 ta 172 343,187500 75,198450 15-500 70 0,8 8,5 gk 242 2578,617000 209,745900 25-1000 60 0,6 10,5 pk 243 4661,258000 348,084000 10-300 60 0,6 10,5 ak 244 3522,188000 341,614400 10-300 60 0,6 10,5 gk 245 385,757800 78,804500 15-500 70 0,8 8,5 gk 246 1125,063000 135,727400 15-500 70 0,8 8,5 gk 247 3006,898000 265,786300 15-500 70 0,8 8,5 xo 252 868,085900 146,842800 10-300 60 0,6 10,5 gk 253 343,929700 76,586040 10-300 60 0,6 10,5 ak 254 890,703100 136,971300 10-300 60 0,6 10,5 ak 257 2051,953000 219,292200 10-300 60 0,6 10,5 gk 258 1912,266000 192,092900 25-1000 60 0,6 10,5 ak 259 1174,656000 137,505000 25-1000 60 0,6 10,5 ak 264 1768,484000 170,005800 15-500 70 0,8 8,5 gk 265 1376,539000 150,555600 10-300 60 0,6 10,5 gk 266 1320,211000 150,161500 25-1000 60 0,6 10,5 ak 292 1157,344000 141,667200 10-300 60 0,6 10,5 gk 293 892,242200 121,613000 25-1000 60 0,6 10,5 ak 295 755,593800 124,848200 15-500 70 0,8 8,5 gk 296 730,953100 116,225200 10-300 60 0,6 10,5 ak 297 2548,492000 211,246600 25-1000 60 0,6 10,5 gk 300 2910,672000 222,256800 10-300 60 0,6 10,5 ak Part of a digital map and Attribute table for the official building census (2001) 41.738 buildings Geographic datasets do not contain the instructions for drawing the data Drawing features Maps present descriptive information about geographic features using symbols and labels. Points Marker symbol Lines Line symbols Areas Fill symbols Character Simple Arrow Picture Multilayer Simple Line Marker Gradient Picture Multilayer Cartographic Hash Marker Multilayer The simplest way to draw a feature layer is to draw all the features with the same symbol
Drawing features Single symbol All road axes have the same line symbol All buildings have the same fill symbol All blocks have the same fill symbol Drawing features Unique values All blocks have different fill symbol according to their size All buildings have different fill symbol according to their coverage Practically this map does not give us any useful information
Drawing features Group of Quantities (quantile) All blocks have the same fill symbol as a background All buildings are grouped in three classes according to their coverage: But, how we define what is small, medium or big? Data in Urban Planning Two main types of data Numerical data (quantitative data) Classification method Population density Building heights Building coverage etc Textual data (qualitative data) Land use Building construction type Building condition etc Categories based on their semantics
Categories in Urban Planning Land use Grouping Categories CODE LAND USE 000 NOT URBAN LAND USE 010 VACANT PLOT 020 AMBANDONED BUILDING 030 UNDER CONSTRUCTION 040 CULTIVATED LAND 050 TREE LANDS 060 FREE LANDS 070 080 090 OTHER NOT URBAN LAND USE 100 RESIDENCE 110 PRIMARY RESIDENCE 111 SINGLE FAMILY 112 SINGLE FAMILY (WITH GARDEN) 113 DUPLEX FAMILY 114 DUPLEX FAMILY (WITH GARDEN) 115 GROUP QUARTERS 116 MULTI-FAMILY 117 118 119 RESIDENCE, OTHER TYPE 120 SECONDARY RESIDENCE Tree structured coding system Patterns may be easier to see through generalization. That means many categories to few. The process of grouping categories is based to their meaning (semantics) and the used coding system 121 COUNTRY HOUSE It is easer to 122 read COUNTRY a thematic MULTI-STORE HOUSE map with less than seven (7) categories 123 (Mitchell A.,1999)
Classification methods for numerical data BLOCK_ID AREA PERIMETER ADEQUACY COVER (%) F.A.R. P. HEIGHT P. LAND USE 1 3954,227000 254,311000 20-600 40 0,6 7,5 gk2 57 1896,508000 174,281100 20-600 40 0,6 7,5 gk1 167 2647,750000 212,230700 12-300 40 0,6 7,5 ak 168 316,265600 71,350790 15-500 70 0,8 8,5 gk 169 3702,945000 304,357800 15-500 70 0,8 8,5 ak 170 846,890600 129,475100 15-500 70 0,8 8,5 ak 171 1096,961000 132,914700 15-500 70 0,8 8,5 ta 172 343,187500 75,198450 15-500 70 0,8 8,5 gk 242 2578,617000 209,745900 25-1000 60 0,6 10,5 pk 243 4661,258000 348,084000 10-300 60 0,6 10,5 ak 244 3522,188000 341,614400 10-300 60 0,6 10,5 gk 245 385,757800 78,804500 15-500 70 0,8 8,5 gk 246 1125,063000 135,727400 15-500 70 0,8 8,5 gk 247 3006,898000 265,786300 15-500 70 0,8 8,5 xo 252 868,085900 146,842800 10-300 60 0,6 10,5 gk 253 343,929700 76,586040 10-300 60 0,6 10,5 ak 254 890,703100 136,971300 10-300 60 0,6 10,5 ak 257 2051,953000 219,292200 10-300 60 0,6 10,5 gk 258 1912,266000 192,092900 25-1000 60 0,6 10,5 ak 259 1174,656000 137,505000 25-1000 60 0,6 10,5 ak 264 1768,484000 170,005800 15-500 70 0,8 8,5 gk 265 1376,539000 150,555600 10-300 60 0,6 10,5 gk 266 1320,211000 150,161500 25-1000 60 0,6 10,5 ak 292 1157,344000 141,667200 10-300 60 0,6 10,5 gk 293 892,242200 121,613000 25-1000 60 0,6 10,5 ak 295 755,593800 124,848200 15-500 70 0,8 8,5 gk 296 730,953100 116,225200 10-300 60 0,6 10,5 ak 297 2548,492000 211,246600 25-1000 60 0,6 10,5 gk 300 2910,672000 222,256800 10-300 60 0,6 10,5 ak Part of a digital map and Attribute table for the official building census (2001) 41.738 buildings How many categories for these records (cases)? It is easer to read a thematic map with less than seven (7) categories (Mitchell A.,1999) The practical type of Sturges gives the number of classes (k) with good results: k = 1+3,322*log 10 n, where n = number of cases Classification methods for numerical data A classification method subdivides a group of attribute data in classes according to the desired criteria. Classes group attribute data (features) with similar data, by assigning them the same symbol. Each class has a lower and upper numeric limit (class breaks: minimum and maximum for the specific class). By changing the classes we create very different maps that change the way we ready and translate the specific spatial unity (reference area). Apart from the following technocratic approach (following classification methods), a crucial factor to define classes is the very good knowledge of the specific spatial variables, their behaviour, distribution and substantial meaning (thematic approach).
Classification method: Manual Normally we use this method if we want to emphasize particular patterns by placing breaks at important threshold values, or if we need to comply with a particular standard that demands certain class breaks. Classification method: Equal interval This method divides the attribute range into equally sized classes, and is best applied to familiar data ranges such as percentages. Normally we use this method to emphasize the relative amount of attribute values compared to other values.
Classification method: Quantile Each class will contain an equal number of features. This method is well suited to linearly distributed data. Classification method: Natural Breaks Classes are based on natural groupings of data values. In this method, data values are arranged in order. The class breaks are determined statistically by finding adjacent feature pairs, between which there is a relatively large difference in data value (minimizes the internal standard deviation for the data of each class). This is the default classification method in ArcGIS 9.2
Classification method: Geometrical interval This method creates class ranges based on intervals that has a geometric sequence based on a multiplier (and its inverse). It creates these intervals by minimizing the square sum of elements per class, this ensures that each interval has an appropriate number of values within it and the intervals are fairly similar. This algorithm was specifically designed to accommodate continuous data. It produces a result that is visually appealing and cartographically comprehensive. Classification method: Standard Deviation Use this method to emphasize how much feature values vary from the mean. Best used on normally distributed data.
References Bibliography for further reading Mitchell A., The ESRI Guide to GIS Analysis. Volume 1: Geographic Patterns & Relationships, ESRI Press, Redlands, USA 1999 Zeiler M., Modeling our World, The ESRI Guide to Geodatabase design, ESRI Press, USA, 1999 Online Help of ArcGIS 9.2 (and 10.1) Online Help of ArcView 3.3