Raster Data Structures

Raster Data Structures Tessellation of Geographical Space Geographical space can be tessellated into sets of connected discrete units, which completely cover a flat surface. The units can be in any reasonable geometric shape, either regular or irregular. Regular tessellations include squares, rectangles, hexagons, equilateral triangles, etc. Grid Cell/Square based Raster Data Model Dividing space into discrete uniform units-square cells, namely a cellular model of geometry. The squares have valuable conditions: equality of sides, decomposability, stability for orientation and aggregation. Therefore, squares dominate the regular tessellations for representing spatial information. Location is inherent in the storage structure, namely, implied by row and column number of the grid cell rather than through the use of explicit spatial coordinates. Raster data structure does not provide precise locational information. Two-dimensional array of grid cells is called a layer, a grid, or an image in different contexts. Each layer of raster data is often used to represent a particular topic (theme). Multi-layers in a GIS database, Multi-spectral, multi-band data in image analysis. Resolution and Volume of Raster data General rules The cell must be small enough to capture the required detail. A larger cell size can be used for a more homogenous region, while a small cell size is required for heterogeneous region. Reducing the grid cell size to half the current size will increase the data volume four times. Quantification precision (accuracy) of grid cell value also influences the data volume. Whittaker-Shannon Sampling Theorem: According to Whitteker-Shannon sampling theorem, the cell size must be smaller than half of the minimum feature (minimum map units) that you intend to represent. The commonly suggested cell size is 1/5-1/7 of the minimum feature to be captured. Encoding Grid Cell Value Centroid method Each cell is assigned the value of the feature that passes through the center of the cell. Predominant type

Each cell is assigned the value of the feature that fills the majority of the cell. Most important type Each cell is assigned the value associated with the features that have been specified as more important to the study. Percentage breakdown A cell is assigned several values according to the percent each feature occupies within the cell. Raster Data Storage and Compression Data Format Two-dimensional array of cell values; Data file type: ASCII vs. binary; The depth of cell values: integer vs. real/float, one byte vs. two bytes In addition to cell values, row and column number, the depth of each cell must be specified for raster data retrieval, known as magic number. All general-purpose image files have the information in header records. For geo-referenced raster file, the information about cell size, the x,y coordinates of lower left or upper left cell, and map projection are also included in an independent header file, metadata file, or in header records. A false color image file has a color look-up table associated. A true color image consists of three bands (channels): RGB. Header file Matadata: a set of summary information about the data, data about data. Cell-by-cell Array If each cell has a unique value, there is no way to compress the information. This is usually true for float-point continuous surface data. Data Compression: Invertible (lossless): the original data are reproduced exactly upon decompression; Lossy compression: some algorithms offer greater compression factors, but cannot exactly reproduce the original data when decompressed. Run-length Codes When adjacent cells have the same values in each row, compact the data by stating the value & their total run. Run-length coding stores data by row. Quad-tree Model Systematically and recursively split (decompose) two-dimensional geographical space into finer and finer units by a rule of four. The result is a hierarchical data structure termed a quad-tree. The entire region is successively and iteratively divided into four quadrants This subdivision process continues until all the quadrants are homogeneous. Quadtree structure has variable resolution, because it can operate at any level of quadtree subdivision.

Hierarchical structure of quadtree can improve the speed of search through the database. Used in SPANS (Spatial Analysis System) from Tydac Multi-layer/Multi-band data structure Multi-layers in a GIS database, Multi-spectral, multi-band data in image analysis. In ArcInfo, individual grids can be grouped into multi-layer data file-stack. Many commands and functions are available in ArcInfo to deal with stacks (multi-layer raster data). Most of remote sensing software packages can display and process multi-band imagery. Source of Raster Data Cell-by-cell entry Drum scanner Aerial photographs (CCD camera or scanned), scanned maps, pictures and photos Digital Elevation Models (DEMs) Satellite images: LANDSAT, SPOT, AVHRR, ERS-1 SAR, JERS-1 SAR, Radarsat-1 SAR, IRS, MODIS, etc. Comparisons between Vector and Raster Data Structure Advantages and Limitations of Vector Model Advantages of Vector Model Spatial objects are represented based on precise x, y coordinates, and therefore measurements of area, perimeter and distance, and graphic representation are more accurate and precise. Data structure is more compact and less redundant, and thus less demanding for data storage. Besides geometric properties, topological relationships between spatial objects can be explicitly encoded and stored. Support a wide variety of advanced, topology-based analyses and well suited for representing and modeling linear features and network, such as, address geocoding, path-tracing, pavement management, bus routing, emergency response planning, pipeline planning, sales analysis and wildlife management. Encoded topological relationships facilitate error checking in vector database. Easy to do visual overlay analysis. Multiple vector layers can be overlaid together, or draped on top of raster data. Limitations of Vector Model Complex data structure, and time-intensive data acquisition and input. Computationally intensive and complicated for some spatial operations, such as overlay, calculation of area, neighborhood analysis, etc.

Not suitable for representing a gradual change (transition zone) between adjacent units. Many physical characteristics such as soil and vegetation types vary and have fuzzy borders. Not suitable for representing continuous surface like terrain. Surface metric properties, like slope aspects, curvature, cannot be easily calculated from contour representation. Incompatible with digital image data. Manipulation and enhancement of remote sensing data are difficult in a vector-based GIS system. Advantages and Limitations of Raster Model Advantages of Raster Model Simple and straightforward data structures-matrix-like 2D array. The easiest format to be dealt with Fortran, C, and other computer languages. Not only support the discrete (categorical) objects but also continuous geographical features. Highly varying surface like terrain can be effectively and efficiently represented in a raster format. Computationally efficient in some types of quantitative analysis: map overlay, map algebra, surface modeling and simulation, such as cut-fill analysis, visibility and siting, watershed modeling, slope and aspect calculation, and threedimensional display. Compatible to remotely sensed data and photogrammetric data. Traditional digital image processing techniques can be introduced for the manipulations of cellbased raster data. Compatible to modern high speed graphic input and output devices. Disadvantages of Raster Model Unable to explicitly representing the topological relations, therefore does NOT support network type of analysis. Data redundancy in homogenous areas and corresponding large volume of data. Limited accuracy of location and corresponding area and distance measurements. The resolution and accuracy depends on the size of the grid cells. The output of graphics is less aesthetically pleasing because irregular lines and boundaries tend to be a blocky, jagged, stair-case like appearance rather than the smooth lines. Integration of Vector and Raster Data Most full-featured GIS systems allow a mix of raster & vector data structures. ArcInfo fully support both vector (point, line, polygon, region, route coverages, TIN, CAD drawings) and raster (grids, lattices, images). By adopting a common map projection and scale, and adjusting coordinates, each data model can be georeferenced onto a common coordinate system, leading to a single consistent geographic database. The integration provides greater flexibility for analyzing and displaying data. This allows for selecting the optimum data model for representing a particular aspect of the Earth.

Conversion between Vector and Raster Data Often necessary to able to change from one basic data structure to another. Some information/data may be lost during conversions. Converted data will not be more accurate than original data. Vector to raster (rasterization): rasterizing points, lines, or polygons is relatively easy. Raster to vector (vectorization): vectorizing scanned map sheets and classified map imagery into vector data is much more complicated and difficult. Reading assignment: Longley, P.A., M.F. Goodchild, D.J. Maquire, D.W. Rhind, (2001) Geographic Information Systems and Science, John Wiley & Sons, 454p.