EEOS 381 -Spatial Databases and GIS Applications Lecture 3 GIS Data Models Data Formats
Overview GIS Data Models Common GIS Data Formats EEOS 381 - Spring 2015: Lecture 3 2
Overview Key points: It is important to understand what model to use, based on the application The model determines what specific format you use The format may determine what types of analysis you perform EEOS 381 - Spring 2015: Lecture 3 3
Data Model General definition: Abstraction or representation of objects and processes in the real world, incorporating properties relevant to the application at hand EEOS 381 - Spring 2015: Lecture 3 4
GIS Data Model Definition: Digital representation of geographic objects (spatial data) in GIS software includes relationships between and attributes of objects doesn t include all of reality in context of a digital environment EEOS 381 - Spring 2015: Lecture 3 5
The role of a data model in GIS Levels of GIS data model abstraction EEOS 381 - Spring 2015: Lecture 3 6
Levels of abstraction: Reality Real-world phenomena - e.g. wells, streets, lakes Conceptual Model Logical Model Physical Model Decide which objects are applicable, what relationships exist among them, what processes they participate in List objects, with names, descriptions, behavior, interaction, location, what GIS will do Specific file and table names, attributes, object relationships, processes (commands) EEOS 381 - Spring 2015: Lecture 3 7
Example implementation: Reality Conceptual Model Logical Model Physical Model Wells, dry cleaners, streets Ask - How does pollution from dry cleaners and major roads affect public water supplies (wells and reservoirs)? Use ArcGIS to compare wells (points), reservoirs (polygons), dry cleaners (points) and streets (lines), with buffer and proximity operations; focus on wells with 100+ gallons per minute yield and major roads, in eastern Mass. BUFFER shapefile WELLS_PT, join to GPM table on YIELD field; determine how many dry cleaners are within 1 mile of large wells and proximity to reservoirs and wells to major roads; store in Oracle-based ArcSDE geodatabase EEOS 381 - Spring 2015: Lecture 3 8
2 Conceptual Views Discrete objects World is empty except where occupied by objects with well-defined locations and/or boundaries e.g. wells, streets, lakes Fields Measurements may be made at any location over a continuous surface e.g. elevation, temperature, population density EEOS 381 - Spring 2015: Lecture 3 9
EEOS 381 - Spring 2015: Lecture 3 10
Raster is a data model space is divided into array (rows and columns) of cells each cell (pixel, or picture element) in a layer is the same size and has a homogeneous value cell size refers to resolution (10m, 1 foot, etc.) usually associated with field view includes images, elevation models, surfaces EEOS 381 - Spring 2015: Lecture 3 11
GIS Data Models Raster - examples Aerial (ortho) photograph EEOS 381 - Spring 2015: Lecture 3 Land use types 12
Raster Cells may belong to zones (groups of cells with same values, usually representing the same feature) Can include NODATA - null values (out of range of dataset or no information available for that cell) Some image formats can include attributes (value attribute table) EEOS 381 - Spring 2015: Lecture 3 13
Raster Advantages: A simple data structure a matrix of cells with values, representing a coordinate, sometimes linked to an attribute table. A powerful format for intense statistical and spatial analysis; perform overlays with complex data faster than with vector data. Spatial Analyst extension in ArcGIS The ability to represent continuous surfaces and perform surface analysis. The ability to uniformly store points, lines, polygons, and surfaces. Compression EEOS 381 - Spring 2015: Lecture 3 14
Raster Disadvantages: Inherent spatial inaccuracies due to the cell-based feature representation, especially if low resolution. Datasets can be very large. EEOS 381 - Spring 2015: Lecture 3 15
Vector is a data model points - single coordinate values lines (arcs) - strings of connected points polygons (areas) - enclosed lines usually associated with discrete object view stores geography and attributes EEOS 381 - Spring 2015: Lecture 3 16
Vector the basics POINT - location with a set of coordinates (0-D) 2 line segments (a direct line between two points) shown here LINE connected string of points (1-D) POLYGON area defined by a line (2-D) EEOS 381 - Spring 2015: Lecture 3 17
Vector (other objects/definitions) (curve string) (topological junction, or endpoint of line) (direct connection between two nodes) (sequence of any line segments with closure) (a link between two nodes, with one direction designated) (sequence of line segments) (directed sequence of nonintersecting line segments with nodes at each end) (an area defined by an outer ring without inner rings) (an area defined by an outer ring with inner rings) EEOS 381 - Spring 2015: Lecture 3 18
Vector Advantages: Precise values Efficient storage Topological relationships High-quality cartographic output Useful for a variety of spatial analysis operations EEOS 381 - Spring 2015: Lecture 3 19
Vector Disadvantages: Poor for storing continuous surfaces (e.g. elevation models) Overlay operations can be timeconsuming and computer intensive (need lots of RAM) EEOS 381 - Spring 2015: Lecture 3 20
Vector Simple vs. Topologic features: Simple - a.k.a. spaghetti model - no inherent connectivity relationships Topologic - simple features with defined spatial relationships Spaghetti 4 linear features Node Line Topologic - 14 linear features - 13 nodes EEOS 381 - Spring 2015: Lecture 3 21
Spaghetti Data Model No details of logical relationships between objects The line shared by two adjacent polygons is stored separately (twice) in the computer Spatial relationships are only implied Efficient for cartographic display but not data storage At first, GIS used vector data and cartographic spaghetti structures EEOS 381 - Spring 2015: Lecture 3 22
Topology Connectivity: chains are connected at which nodes? Direction: defined by a from node and a to-node of a chain Example analysis: Modeling flow through the connecting lines in a network EEOS 381 - Spring 2015: Lecture 3 23
Topology Adjacency: which polygons are on the left and which are on the right side of a chain? Example analysis: Identifying adjacent features; Combining adjacent polygons with similar characteristics EEOS 381 - Spring 2015: Lecture 3 24
Topology Inclusion: simple spatial objects (node, chain, smaller polygon) are within a polygon Example analysis: Overlaying geographic features EEOS 381 - Spring 2015: Lecture 3 25
Network Type of topologic vector data model (see pgs 218-219 in book) Models flow of goods and services (e.g. routes of roads, rivers, utility lines) Radial - flow in one direction (e.g. upstream, downstream) Looped - intersections allowed, choices for flow allowed Network Analyst extension in ArcGIS contains tools for this type of analysis EEOS 381 - Spring 2015: Lecture 3 26
Regions Type of topologic vector data model Groups of polygons in coverages Multi-part polygons EEOS 381 - Spring 2015: Lecture 3 27
Routes Composite line features Created from sections (whole or partial arc) contain M values (measures along route) Ex.: All the arc segments in ALL_ROADS that make up Interstate 90, treated as one feature in MAJOR_ROUTES EEOS 381 - Spring 2015: Lecture 3 28
Linear Referencing System (LRS) Uses a relative position along an already existing linear feature, without explicit x,y coordinates. Location is given as a position, or measure, along it (distance, or percent along). Have base layer of lines, plus a series of related event tables Address, Speed Limit, Route Number tables, etc Highways/city streets (MassDOT), railroads, rivers, and pipelines, water and sewer networks Dynamic segmentation / flat file See pages 219-221 in textbook EEOS 381 - Spring 2015: Lecture 3 29
Linear Referencing System (LRS) 1 Base arc Speed limit # of lanes ID = 1 0 100 55 mph 45 mph 30 mi. 3 lanes 2 lanes 60 mi. LRS Tables SPEEDLIMIT Table NUMLANES Table Base arcs feature class attribute table ID 1 2 ID 1 1 F_MEAS 0 30 T_MEAS 30 100 SPEEDLIMIT 55 45 ID F_MEAS T_MEAS 1 0 60 1 60 100 NUMLANES 3 2 3 Flat file arcs Flat file arcs feature class attribute table ID = 1 2 3 ID SPEEDLIMIT NUMLANES 1 2 3 55 45 45 3 3 2 EEOS 381 - Spring 2015: Lecture 3 30
TIN (Triangular Irregular Network) Topologic data model for surfaces (e.g. elevation) made up of connected triangles (faces) Triangle nodes have X,Y,Z values Triangles may be sized differently, based on original data density See pages 219-221 in textbook EEOS 381 - Spring 2015: Lecture 3 31
TIN As viewed in ArcScene EEOS 381 - Spring 2015: Lecture 3 32
Terrain Dataset a multiresolution, TIN-based surface built from measurements stored as features in a geodatabase. They're typically made from LiDAR, sonar, and photogrammetric sources. Terrains reside in the geodatabase, inside feature datasets with the features used to construct them. EEOS 381 - Spring 2015: Lecture 3 33
Annotation text labels (vector features) fixed position, size, orientation anno does NOT reposition as you pan and zoom N/A for shapefiles (only in GDB and coverages) EEOS 381 - Spring 2015: Lecture 3 34
Object-Relational Everything stored in database tables attributes, geometry in RDBMS Defined relationships between objects Can store topology Can design with CASE (Computer-Aided Software Engineering) tools (like MS Visio) to produce UML (Unified Modeling Language) diagrams (see pages 221-226 in textbook) Download models from esri.com for various industries Geodatabases (ArcSDE, Personal and File) EEOS 381 - Spring 2015: Lecture 3 35
Object-Relational UML Diagram An example of a CASE tool (Microsoft Visio) The UML model is for a utility water system EEOS 381 - Spring 2015: Lecture 3 36
Object-Relational Diagram A water-facility object model EEOS 381 - Spring 2015: Lecture 3 37
Definition Format - The pattern into which data (coordinates, attributes, indexes, spatial reference, etc.) is systematically arranged for use on a computer. A file format is the specific design of how information is organized in the file. (All GIS data is a file on disk at the most basic level). For example, ArcInfo has specific, proprietary formats used to store coverages. DLG, DEM, and TIGER are geographic datasets with different file formats. ESRI has also developed Shapefiles and Geodatabases. EEOS 381 - Spring 2015: Lecture 3 38
GIS Data Formats Common raster formats: GeoTIFF, TIFF, BIL, BIP MrSID (.SID), JPG, JPEG 2000 GRID, DEM ERDAS IMAGINE (.IMG) Intergraph - CIT, COT ER Mapper ADRC NTIF - National Image Transfer Format Geodatabase raster datasets EEOS 381 - Spring 2015: Lecture 3 39
GIS Data Formats Raster - file components: Image file (.tif,.sid,... ) Header ( world ) file (.tfw, sdw, ): 1.000000000000000 0.000000000000000 0.000000000000000-1.000000000000000 237000.500000000000000 897999.500000000000000 Cell size (x-scale) Rotation terms Cell size (y-scale) Coordinates of center of upper left pixel Auxiliary file (.aux) - stores spatial reference Reduced raster resolution (.rrd or.ovr) stores pyramid levels EEOS 381 - Spring 2015: Lecture 3 40
GIS Data Formats Common vector formats: Shapefile, Coverage, Geodatabase feature classes DXF, DWG - CAD-based MapInfo - MIF DLG TIGER, VPF ASCII, DBF SDTS - Spatial Data Transfer Standard SDC - Smart Data Compression XML, GML EEOS 381 - Spring 2015: Lecture 3 41
Definitions A feature is a point, line, or polygon in a dataset that represents a real-world object A feature class is a collection of features, categorized by the type of geometry used to define the feature (e.g., how the coordinates are stored, as a point, line, or polygon) polygon feature class, arc feature class, point feature class, etc. Should represent similar objects EEOS 381 - Spring 2015: Lecture 3 42
Common ArcGIS Formats Coverage Shapefile Vector File-based data model Geodatabase ( geographic database ) Personal, File Spatial Database Engine (SDE) DBMS-based data model (aka Object data model) Vector & Raster EEOS 381 - Spring 2015: Lecture 3 43
GIS Data Formats -Shapefile Developed by ESRI (ArcView 2) Stored on disk in folders Consists of a set of files.shp spatial geometry.shx spatial geometry index.dbf dbase file (feature attributes) always present optional others (.prj,.sbn,.sbx,.ain,.aih,.aig, ) EEOS 381 - Spring 2015: Lecture 3 44
GIS Data Formats -Shapefile Simpler than coverages - useful for mapmaking and some kinds of analysis. Fast display (especially when local) Single feature class (geometry) per shapefile Point (points and multipoints) or Line (simple lines and multipart polylines) or Polygon (simple and multipart) No topology or annotation 10-character max. field names (dbf limitation) May be edited in ArcGIS and ArcView GIS 2x+ Open format (specs available); may be produced from other applications EEOS 381 - Spring 2015: Lecture 3 45
GIS Data Formats -Coverage Developed by ESRI, c.1981 Traditional (Arc/Info) format for complex geoprocessing, high-quality geographic data, and sophisticated spatial analysis. Stores features and attributes for thematically associated data Can explicitly store topology (features stored only once) - use BUILD or CLEAN commands (vs. spaghetti data model EEOS 381 - Spring 2015: Lecture 3 46
GIS Data Formats -Coverage Stored on disk as a directory (folder) of files, with more files in associated info directory Attributes in INFO format (tables) Coverage folder stored in a workspace - a special name for a folder with a coverage (or Grid or TIN) Workspace Coverages View in Windows Explorer View in ArcCatalog EEOS 381 - Spring 2015: Lecture 3 47
GIS Data Formats -Coverage Multiple feature classes can be grouped and stored in one coverage Primary (label point, arc, polygon, node) Secondary (tics, links, annotation) Compound (routes/sections, regions; built from primary features) like multi-part features Edit in ArcInfo Workstation only You cannot have points and polygons in the same coverage Polygons can t have holes (because of universal polygon (i.e. the background) EEOS 381 - Spring 2015: Lecture 3 48
GIS Data Formats -Coverage (point attribute table) (arc attribute table) (node attribute table) <cover>.rat<route> (route attribute table) (polygon attribute table) EEOS 381 - Spring 2015: Lecture 3 49
GIS Data Formats -Coverage Explicit topology Connectivity (arc-node topology) - arcs connect to each other at nodes EEOS 381 - Spring 2015: Lecture 3 50
GIS Data Formats -Coverage Explicit topology Area Definition (polygon-arc topology) - Arcs that connect to surround an area define a polygon EEOS 381 - Spring 2015: Lecture 3 51
GIS Data Formats -Coverage Explicit topology Contiguity (adjacency) - Arcs have direction and left and right sides EEOS 381 - Spring 2015: Lecture 3 52
GIS Data Formats -Coverage Coverage attribute tables have Sacred Items Point/Polygon: AREA, PERIMETER, <COVER>#, <COVER>-ID Arc: <COVER>#, <COVER>-ID, FNODE#, TNODE#, LPOLY#, RPOLY#, LENGTH Topology between feature classes managed with sacred items Ex.: <cover># in.pat (polygon attribute table) relates to LPOLY# and RPOLY# in.aat (arc attribute table) <cover># = 1 in polygon coverages universal polygon (hidden in ArcGIS Desktop) EEOS 381 - Spring 2015: Lecture 3 53
Data Format Conversion In ArcMap, Rightclick layer in Table of Contents and choose Data > Export Data > and select format Workflow may dictate that data need to be in another format EEOS 381 - Spring 2015: Lecture 3 54
Data Format Conversion Right-click layer(s) in ArcCatalog EEOS 381 - Spring 2015: Lecture 3 55
Data Format Conversion Use ArcToolbox Conversion Tools ArcInfo license and installation of ArcInfo Workstation required for Coverage conversion tools EEOS 381 - Spring 2015: Lecture 3 56
Distribution Process of moving data from one location to another Copy/paste in ArcCatalog if source and destination are both accessible, otherwise: Coverage export to Arc/Info Export File (a.k.a interchange file ) in ArcToolbox ASCII file with.e00 extension User then Imports file with ArcToolbox (ArcInfo) Shapefile send all components or use WinZip, PKZIP, StuffIt, etc., to send all in one file Geodatabase Export to XML, plus other options EEOS 381 - Spring 2015: Lecture 3 57