GIS Data Quality and Evaluation Tomislav Sapic GIS Technologist Faculty of Natural Resources Management Lakehead University
The quality of GIS data gains in importance when considering some of the main characteristics of GIS data: GIS databases are expensive to create and update, often resulting in datasets last time updated five, ten years ago. GIS data are easily modifiable scalable, shareable. Historically, GIS data and GIS work was mostly in domain of large organizations, governmental or private; however recently (i.e., last 10 years or so) GIS and GIS data have started to be used by a wide range of users. Thus, a GIS user can end up with GIS data of a broad spectrum of qualities in her hands, leading to a strong need for a system that describes and explains GIS data quality and instructs the user on how to kick the GIS data tires.
GIS Data Quality The Theory
Data Quality in GIS GIS is mostly used to depict and analyse the infinite complexity of Earth s surface through an abstract and simplified (finite) model. Errors are inevitable in the model creating a need to describe the quality of data.
Internal Data Quality - Representing the infinite complexity is addressed by designing the nominal ground, i.e., the desired spatial and attribute representations and their accuracies (e.g. NTDB: http://geogratis.gc.ca/api/en/nrcan-rncan/esssst/f3d83500-2564-d61e-4f37-fef860e6ddc0.html, http://ftp2.cits.rncan.gc.ca/pub/bndt/doc/stdntdb3_en.pdf, http://ftp2.cits.rncan.gc.ca/pub/bndt/doc/appentd3_en.pdf, http://ftp2.cits.rncan.gc.ca/pub/bndt/doc/ntdb_v3_shape_format_geogratis_en.pdf ). - The final product will differ from the Nominal Ground by the degree and extent of errors introduced in the process of data capturing. Criteria: - Completeness Internal Data Quality - Logical consistency - Positional accuracy Evaluate quality - Temporal accuracy internally or against a - Thematic accuracy reference dataset with greater accuracy End User? Nominal Ground procedure Data Product
Internal Data Quality Type of Quality* Completeness Logical consistency Positional accuracy Temporal accuracy Thematic accuracy Description Presence and absence of features, their attributes and relationships. Degree of adherence to logical rules of data structure, attribution, and relationships. Accuracy of the position of features. Accuracy of temporal attributes and temporal relationships of features. Accuracy of quantitative and non-quantitative attributes. Example: http://ftp2.cits.rncan.gc.ca/pub/canvec+/doc/canvec+_product_specifications.pdf * The terminology surrounding spatial data quality is subject to numerous variations, and different terms are sometimes used to describe the same concept (Servigne et al. in Devillers and Jeansoulin 2006, p 181).
External Data Quality - The difference between the data product and the end user needs. Internal Quality The end user might have data expectations and needs that differ from the produced data. E.g., roads classified based on the surface material, wooded area classified by tree species, recent clearcuts captured, etc. External Data Quality Criteria: - Definition - Coverage - Lineage - Precision - Legitimacy - Accessibility
External Data Quality Type of Quality Definition Coverage Lineage Precision Legitimacy Accessibility Is this what I need? Description Does this have the needed territory and time coverage? Where, how, and why was this created? Are the existent semantic, spatial and temporal precisions of the objects and the attributes what I need? Does it meet the existing legal standards? How easy can the data be accessed?
Metadata Descriptions and explanations of data quality are part of metadata. Metadata data about data are used to describe things such as: o When the dataset was created o Who created it o For what purpose, for how long it is intended to be used o What is the spatial accuracy and precision o What are the meanings of attributes and their holders (e.g. fields) o What spatial and attribute relationships exist between the features o Map projection o Etc. Data quality is part of metadata. Metadata are often enclosed through supporting documentation (e.g. pdf files, websites). New ArcGIS file formats allow for metadata to be inserted in the dataset package.
Metadata Standards The variability in potential type and volume of collected and expressed metadata on one hand and the importance of metadata on the other, calls for standardization. Recently the North American Profile of International Standard on Geographic Information Metadata was published through the collaboration between Canada and USA. http://www.fgdc.gov/metadata/geospatial-metadata-standards#nap US ANSI American National Standards Institute US INCITS US International Committee for Information Technology Standards FGDC ArcGIS Federal Geographic Data Committee ArcGIS ISO International Organization for Standardization Geomatics Committee LIO Land Information Ontario Canada SCC Standards Council of Canada ESRI CGSB Canada Canadian General Standards Board Geomatics Committee
Spatial Accuracy Horizontal Accuracy NMAS (American National Map Accuracy Standards, 1947) - Threshold Based 90% of the well-defined points that are tested must fall with a specified tolerance: for map scales larger than 1:20,000, horizontal tolerance is 1/30 inch for map scales of 1:20,000 or smaller, horizontal tolerance 1/50 inch (12.2 m on the ground for 1:50,000)
Spatial Accuracy Horizontal Accuracy CMAS (Circular Map Accuracy Standard) - Confidence Level Based assumes normal distribution of errors around mean equal to zero. NTDB CMAS σ c = 0.7071(σ x2 + σ y2 ) 1/2, σ x st. dev in the X-axis, σ y st. dev in the Y-axis, CMAS = 2.1460 σ c at 90% confidence level Urban area CMAS = 10 m Rural area CMAS = 25 m Isolated area (not urban or rural) CMAS = 125 m OBM Horizontal Accuracy = +/- 10 m; Vertical Accuracy (contours) +/- 5 m. http://lioapp.lrc.gov.on.ca/edwin/edwincgi.exe?ihid=2493&agencyid=1&theme=al l_themes
ArcGIS Precision Precision is dealt with in ArcGIS through the coordinate system resolution and feature tolerance. Coordinate System Resolution The default coordinate system resolution is 0.0001 m (in a particular coordinate system, i.e., whether it is meter, decimal degrees, or feet based). (ArcGIS Help File)
ArcGIS Precision Feature Tolerance The tolerance value is the minimum distance between coordinates. If two coordinates are found within the minimum distance they are deemed to be in the same location. The default tolerance is set to 0.001 m ten times the default resolution. Tolerances can also be set by the user but should never be below twice the resolution. (ArcGIS Help File)
MNR FIM Precision OMNR, FIM Base and Values Technical Specifications 2009:
Attribute Accuracy Rarely even described or declared. An example from Ontario Land Cover metadata s Perspective on Accuracy : Waterbodies are classified with a higher confidence than any other class. Forest types are classified with a high level of confidence. Some degree of unavoidable confusion exists, however, between treed wetlands and sparse forest classes. Regional variations in canopy closure and ground vegetation suggest that forest classification derived from spectral evidence is most effectively interpreted from an in-depth knowledge of local forest conditions. Forest clearcuts and forest burns (both recent and old) are classified with a high level of confidence; however, bedrock outcrops may be confused with recent clearcuts if the two classes occur in close proximity.
Metadata Examples Ontario Land Information Ontario datasets https://www.javacoeapp.lrc.gov.on.ca/geonetwork/srv/en/main.home (e.g., search for ohn waterbody) CanVec+ http://ftp2.cits.rncan.gc.ca/pub/canvec+/doc/canvec+_product_specifications.pdf NRVIS.pdf NRVIS Values in Forest Management - FIM Base and Values Technical Specifications 2009.pdf Forest Resource Inventory - FIM Forest Resources Inventory Technical Specifications June 01_2007.pdf Metadata as part of a GIS dataset enclosed, ESRI s, Ontario_major_lakes shapefile.
How to Access Dataset s Metadata in ArcCatalog If working with, e.g., FGDC CSDGM Metadata: 1. Go to Customize > ArcCatalog Options 2. Switch to FGDC CSDGM Metadata in the Metadata Tab.
How to Access Dataset s Metadata in ArcCatalog 3. Select the dataset. 4. Go to the Description tab. 5. If wanting to edit, click on Edit. Metadata categories open for editing. If the metadata categories for a particular dataset are still not fully accessible, go to a different folder and return back i.e., refresh the folder.
How to Evaluate Data Quality? There are no specific tools in ArcGIS that deal with data quality. One should start with a general inspection of the dataset: o Determining the extent of metadata, either imbedded or associated with the dataset. o Visually comparing the dataset with a known, reliable, reference dataset (e.g. basic databases such as OBM and NTDB). o Using Select by Location to learn about the presence and spatial relations between the features in the same layer and between layers. o Inspecting the table and using Select by Attributes and Field Calculator to check for completeness, logical consistency. Tip: Select by Attributes and Symbology windows give access to unique values. o Raster datasets can be compared with other rasters or with vector datasets through raster <> vector conversion. Conditional, Raster Calculator and Math tools from ArcToolbox can be used for a cell-by-cell inspection. o Map projection is very important A more detailed inspection can be done through topology.
Resources: Devillers, R. and R. Jeansoulin (edited by). 2006. Fundamentals of Spatial Data Quality. ISTE Ltd. 309 pp. Geomatics Canada. 2010. Standards and Specifications of the National Topographic Data Base. http://ftp2.cits.rncan.gc.ca/pub/bndt/doc/stdntdb3_en.pdf. Viewed October, 2010. National Spatial Data Infrastructure. 2010. Geospatial Positioning Accuracy Standards Part 3: National Standard for Spatial Data Accuracy. http://www.fgdc.gov/standards/projects/fgdc-standardsprojects/accuracy/part3/chapter3. Viewed October, 2010. Spectranalysis Inc. 2004. Introduction to the Ontario Land Cover Data Base, Second Edition (2000): Outline of Production Methodology and Description of 27 Land Cover Classes. Report to Ontario Ministry of Natural Resources. Unpublished.