Data storage and data structures. this is lecture 4



Similar documents
Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Vector storage and access; algorithms in GIS. This is lecture 6

2) What is the structure of an organization? Explain how IT support at different organizational levels.

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

Memory Systems. Static Random Access Memory (SRAM) Cell

Introduction to GIS (Basics, Data, Analysis) & Case Studies. 13 th May Content. What is GIS?

Quiz 4 Solutions EECS 211: FUNDAMENTALS OF COMPUTER PROGRAMMING II. 1 Q u i z 4 S o l u t i o n s

Spatial data models (types) Not taught yet

GCE Computing. COMP3 Problem Solving, Programming, Operating Systems, Databases and Networking Report on the Examination.

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

Create a folder on your network drive called DEM. This is where data for the first part of this lesson will be stored.

Oracle8i Spatial: Experiences with Extensible Databases

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?

DATA STRUCTURES USING C

DATA OBFUSCATION. What is data obfuscation?

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

I PUC - Computer Science. Practical s Syllabus. Contents

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Glossary of Object Oriented Terms


Comparing SQL and NOSQL databases

Attribute Data and Relational Database. Lecture 5 9/21/2006

Measurement Information Model

DATABASE MANAGEMENT SYSTEM

4.3: Multimedia Database Systems Multimedia Database Management System Data Structure Operations on Data Integration in a Database Model

Configuration Manager

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

Sign Inventory and Management (SIM) Program Introduction

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Fuzzy Spatial Data Warehouse: A Multidimensional Model

Encoding Text with a Small Alphabet

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Business Insight Report Authoring Getting Started Guide

10. Creating and Maintaining Geographic Databases. Learning objectives. Keywords and concepts. Overview. Definitions

Databases in Organizations

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Database Systems. Multimedia Database Management System. Application. User. Application. Chapter 2: Basics

How to Import Data into Microsoft Access

1. INTRODUCTION TO RDBMS

Algorithm & Flowchart & Pseudo code. Staff Incharge: S.Sasirekha

1/20/2016 INTRODUCTION

CHAPTER 4 ESSENTIAL DATA STRUCTRURES

Optimizing the Performance of Your Longview Application

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

GEOGRAPHIC INFORMATION SYSTEMS

Oracle Database 12c: Introduction to SQL Ed 1.1

Image Compression through DCT and Huffman Coding Technique

Unit 5.1 The Database Concept

Introduction to Geographic Information System course SESREMO Tempus Project. Gabriel Parodi

The programming language C. sws1 1

CE 504 Computational Hydrology Computational Environments and Tools Fritz R. Fiedler

Optional custom API wrapper. C/C++ program. M program

DATA ITEM DESCRIPTION

Application of GIS in Transportation Planning: The Case of Riyadh, the Kingdom of Saudi Arabia

Visualization Quick Guide

Hierarchical Data Visualization

Big Data Analytics. Rasoul Karimi

Task: ASC Ascending Paths

Symbol Tables. Introduction

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

Numbering Systems. InThisAppendix...

Once the schema has been designed, it can be implemented in the RDBMS.

Common Questions and Concerns About Documentum at NEF

INFO Koffka Khan. Tutorial 6

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

50 Computer Science MI-SG-FLD050-02

Raima Database Manager Version 14.0 In-memory Database Engine

Oracle Database 10g: Introduction to SQL

12 File and Database Concepts 13 File and Database Concepts A many-to-many relationship means that one record in a particular record type can be relat

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

Physical Data Organization

CHAPTER 1: CLIENT/SERVER INTEGRATED DEVELOPMENT ENVIRONMENT (C/SIDE)

The Subnet Training Guide

Lecture 3: Models of Spatial Information

Curriculum Map. Discipline: Computer Science Course: C++

ALLIED PAPER : DISCRETE MATHEMATICS (for B.Sc. Computer Technology & B.Sc. Multimedia and Web Technology)

Chapter 2: Remote Procedure Call (RPC)

Data Structures and Data Manipulation

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

A Brief Introduction to MySQL

Database Programming with PL/SQL: Learning Objectives

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Representing Geography

File Management. COMP3231 Operating Systems. Kevin Elphinstone. Tanenbaum, Chapter 4

2. Basic Relational Data Model

In the two following sections we separately consider hardware and software requirements. Sometimes, they will be offered for sale as a package.

Data Structure [Question Bank]

Programming Languages

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

The Import & Export of Data from a Database

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Information Systems SQL. Nikolaj Popov

Transcription:

Data storage and data structures this is lecture 4

Main points in today s lecture quantification; digital storage; structuring devices; data structures; and data models.

Quantification Information to data (discretization) Data to data structures Data structures to data models Analysis Start with problem of continuous data and discretization. The process of discretization is a fundamental requirement of using GIS data.

Question What is discretization?

Problems with discretization Difficult/impossible to determine where boundaries should be drawn. Especially with respect to natural phenomena. Human-made entities such as buildings, roads, bridges, dams, and sports fields are easier to define using crisp boundaries.

Selection After discretization, need to select objects that will be included in the database. Selection is a necessary step in acquiring and storing data. Including all features would require an infinitely large database.

Context Context and goals of analysis determine what is included. If you are analyzing the spread of measles among pre-schoolers, then the ratio of conifers to deciduous trees in the area doesn t concern you.

After discretization and selection Georeferencing provides us with a method to encode discrete objects. Georeferencing systems include addresses and postal codes; administrative zones (town of Burnaby); grids and map sheets. Georeferencing involves quantification.

Quantification Once data is selected and referenced, need to quantify it in order to use it in a computer. Quantification uses numerical representation. Sometimes, it is easy to quantify. The width of a highway is a case of simple quantification. Likewise, determining the square mileage of a lake is straightforward.

Points about quantification A computer system stores unique or discrete values. These may or may not faithfully represent the continuum of values that exist in the real world. The nature of the data is important, as different types of mathematical operations can be performed on different data. Numerical values can be defined with respect to nominal, ordinal, interval or ratio scales of measurement.

Simple vs complex quantification Quantification may be simple, or require considerable abstraction. Example 1 (Simple): The maximum height of a mountain can easily be included in a GIS. Example 2 (Complex) There are many options for coding the characteristics of a forest in a GIS, including: -numeric codes for forest categories such as rainforest or woodland; -the canopy closure expressed as a percentage; -or a numeric code for the dominant species of tree in the forest.

Complex quantification Spatial science is riddled with complex instances of quantification. How do you quantify the zoning areas of the city of Vancouver? How do you encode areas with high, low or medium risk of avalanches? QUESTION: Which of these problems first requires discretization of continuous data?

Summary: to convert info to data 1. select data. 2. classify it (make categories) 3. discretize 4. georeference 5. quantify

Structuring digital data Once a numerical designation has been determined, we have to input the data in a way which is acceptable to the computer. Need to review how digital data is stored inside a computer.

Bits and bytes The basic unit of storage is a single character called a bit which is short for Binary digit. A bit can only have two states: on or off. Eight bits make up a byte and groups of bytes make up words.

bytes Bits are rarely seen alone in computers. They are almost always bundled together into 8-bit collections, and these collections are called bytes. The 8-bit byte is something that people settled on through trial and error over the past 50 years. As accidental as 12 eggs in a dozen. With 8 bits in a byte, you can represent 256 values ranging from 0 to 255, as shown here: 0 = 00000000 1 = 00000001 2 = 00000010... 254 = 11111110 255 = 11111111

Words The number of bits that the computer uses as the basic unit to store data is called the word size. For example, the following sizes are commonly used: 16-bit (2-bytes) "personal computers" (previous generation) 32-bit (4-bytes) "personal computers" (current generation) 64-bit (8-bytes) mainframes

Save as.txt Establishment of ASCII as a standard revolutionized data transfer as it allows us to use the same semantic coding between systems. ASCII stands for American Standard Code for Information Interchange. It assigns each letter and symbol on the keyboard a standardized numerical code. Note that when preparing files for exchange, you will be often asked to store them as ASCII. Or.txt format. Same thing.

Data storage Rely on structuring principles which are themselves based on computer architecture. Structuring principles include: arrays, matrices, lists, stacks, queues and deques, records, sets, trees, tables and networks.

Structuring devices Structuring devices are ways of storing information that directly conform to and reflect computer architecture. The lowest order of structuring devices are lists, stacks, arrays, queues and deques. Records, sets, trees tables and networks are higher order structuring devices, and are dependent on lower order devices.

Lists Lists are a slightly lower level of structuring devices but closely related to arrays. A list or linear list is a dynamic data structure (meaning it can shrink or grow depending on how many items it includes). A list is quite literally a list and it usually contains like data such as integers or real numbers or text strings rather than a mix. However, this is not a strict rule. One characteristic of lists is that they are ordered.

Ordering of lists Each element or data item is in a specific order whether alphabetical or numerical or other. Lists can be implemented by using arrays. In such a case, the list is "held" by the array.

Stacks, queues, and dequeues Stacks, queue and deques are all instances of the linear list. They are transitory data structures as they go out of commission as soon as data elements are retrieved. In a stack, all the additions and deletions are made at one end -- the top of the stack. LIFO. In a queue, input is at the top and output is at the bottom of the list. FIFO. The much more flexible deque allows insertions and deletions at either end.

Registers The final step in memory is the registers. These are memory cells built right into the CPU that contain specific data needed by the CPU, particularly the arithmetic and logic unit (ALU). An integral part of the CPU itself, registers are controlled directly by the compiler that sends information for the CPU to process.

Relations of registers to lists etc. Lists and arrays may seem esoteric concepts but they refer directly to the computer architecture. If you think of the registers of a computer, lists and arrays directly address positions in the register. They constitute the base map for how information is stored in the computer. Their terminology must be precise because the computer meaning is computationally precise. Computers store data items in literal addresses. The entire system has a unique architecture.

Arrays An array is a structure which accommodates the inherent row and column nature of much data. It comprises of a block of contiguous memory in the computer in which data elements are stored. It can have one or many dimensions and programming languages will allow the user to DIMension arrays. In BASIC, the syntax for dimensioning an array is: dim array_1(20) which translates to make space for a one dimensional array with 20 elements.

Matrices A matrix is like an array but it is not necessarily computer compatible. A matrix is a good way to imagine an array. Once a matrix is encoded in the computer, it becomes an array. A typical matrix looks like this. 55 65 73 93 34 98 23 87 225 9 12 65 94 356 7 983 * How many dimensions is this matrix?

Difference btwn arrays and matrices A matrix is a higher level data structure (like an array) but one which could be expressed on paper. An array is, by contrast, a computer data structure. Arrays specify how the table information is stored and accessed by the computer while a matrix is a just a table of numbers.

Records (of a database) A record is a common organizing concept for grouping data items together. Records are organized in arrays. If you think of the rows in ArcGIS datatables, each row constitutes one record. In precise computer terminology, a record is a "linear sequence of variable items which have a collective identity" (Bracken and Webster, 1990, 159). In many computing environments, records constitute a built-in data structure.

What is a database? A database is a collection of persistent data which is formally defined and centrally controlled for use in a computer.

Advantages to storing data in a database There are several advantages to using databases to store information: data is easily shared. data in a database is permanent and usually remains in a database for long periods data is easily accessible through search, intersect and overlap functions databases can easily used by the computer.

Data Structures The Flat File data structure is just a simple list. The Index File data structure finds objects based on their attributes. The conventional datastructures in GIS are the relational, network and hierarchical. Relational datastructures are organized by records which resemble tables. Hierarchical data structures are based on the tree structure with parent-child relationships. Network datastructures are classified according to record types with pointers linking associated records. Increasingly the Object-oriented datastructure is emerging as an alternative in GIS.

Simple data structures Since the days of early computers, computer scientists have evolved more sophisticated ways to keep track of the 0s and 1s that represent information in the computer. The simplest way to order information in the computer is to put in files like an rolodex file on a desk.

Flat file data structure (simple lists) Simplest data structure is simple list of all items. Each new item is inserted at the end of the list in no particular order. Easy to add data, but hard to retrieve it. Looking for something in an unstructured list is like looking for a needle in a haystack especially when the list is large.

Indexed data structure Sorting was difficult using the flat-file data structure (e.g. bubble sorts) Indexed d/s allow search for the attributes of an entity rather than the entity itself. For example, we might search for all census districts containing high-income populations Or look for postal codes with people containing the attribute home owner. The attributes act like an index in a book; they point to the real thing.

March of progress Computer Science continued to develop more sophisticated data structures. Today there are three main data structure used in GIS: 1. hierarchical data structures; 2. network systems; and 3. relational database structures.

Hierarchical data structures Hierarchical Data Structures are a familiar concept in that they use a family-tree type structure to organize data. This will also be familiar to those of you who used DOS on PCs before Windows came out. There was a root directory, with sub-directories and then files within those. The hierarchical data structure is basically a tree structure with parent-child relationships. It is also the basis on biological taxonomy with species, genus, phyllum etc.

Trees Trees represent data relationships that are hierarchical. For example, if a database stores data related to a genus, then at the top or root, we might have the genus, followed by species nodes, followed in turn by subspecies.

Trees in the database concept Example of a tree: different levels of government with federal at the root, followed by state, followed in turn by county and municipal governments. The terminal links are called leaves while the connecting links are called nodes. Each of these structures and principles are involved in some extent in the database concept.

Navigating hierarchies The problem with the HDS is that it is cumbersome to navigate up and down it when looking for information.

Problems with hierarchies Its rigid structure makes it less than perfect for GIS. Your text book offers an example of how the HDS s non-flexible format makes it difficult to use at times. The director of the Royal Botanical Gardens in London wanted to query the botanical database before a trip to Mexico to find all plants that were native to the area he was going to visit.

But despite, all inventory having been computerized using a HDS, the geograpical location of the plants had not been included in the database so it was impossible to select those indigenous to Mexico. It is also difficult to add new detail later (i.e. to update).

Many to many relationships HDS uses one to many relationships. In GIS, we often have many to one or many to many relationships. For instance, if we have an urban database, one polygon might have many point locations (intersections, for instance) with several types of convenience stores and gas stations associated with them. But the same brand of convenience stores and gas stations will be located in other polygons.

Network data structures In a network database structure, entities have pointers which point to related entities. So any piece of data can point to any other piece of data in the database. The pointers indicate relationships between data. This is a much less rigid system than HDS. It is used a lot in transportation databases where the really important relationships are between routes and nodes.

Problems with networks A drawback, however, is that the number of pointers (and relationships) can get out of hand and require too much storage space. Each relationship needs to be explicitly defined with the use of pointers. These numerous relationships can become a tangled web and the system lead to incorrect linkages and general confusion. The network database structure is only appropriate for certain types of GIS.

Relational data structures The relationships between data tables are based on primary keys. Most common GIS database.

Benefits of relational systems Relational systems are useful because: (i) they re simple; (ii) most accounting and other non-spatial databases are relational so it makes it easy to transfer such data to GIS; (iii) there is a well-established query systems developed for relational database management systems (RDBMS) called SQL standardized query language.

Next week Data models and the resurrection of the HDS for portraying objects.