The process of database development Reality (Universe of Discourse) Relational Databases and SQL Basic Concepts The 3rd normal form Structured Query Language (SQL) Conceptual model (e.g. Entity-Relationship model) Logical model (e.g. Relational model) Graça Abrantes Implementation 2 Logical model: relational DBMS Relational DBMS are based on a set of theoretical concepts introduced on 1970 by E. F. Codd. Advantages of relational DBMS: simplicity of concepts the concepts have formal definitions quick learning generally adopted by different software development companies they support adequately the representation of most characteristics of reality entities, objects, phenomena relationships among them 3 Relation Relation is the basic concept of relational databases A relation is defined by o one schema and o one table A schema is defined by: o the relation name o the name of each attribute o the datatype of each attribute o... A relational database is a set of relations o the database schema is the set of relation schema of every database table 4 1
Table and schema - examples Table example: Relation schema: soil (FID, Shape, AREA, PERIMETER, CODSOLO, NOME, SUBNOME, ESPECIFI) Other examples of relation schema: Camping (code, name, capacity, owner) River (FID, Shape, length, code, name, type) 5 Attributes An attribute defines a property of an object, entity, phenomena An attribute A i takes values in a set D i, called the attribute domain the domain specifies the set of values that the attribute can take Given U={A 1, A 2,...,A n }, a relation R over U is a subset of the cartesian product D 1 x D 2 x... x D n. Every tuple of this cartesian product is called an instance of relation R. 6 Table Remarks The set of instances of relation R is a table in which - the instances are also called rows or records o or geographic objects in GIS - the attributes are also called columns or fields the values of one attribute belong all to the same domain; one attribute value must be atomic; one relation cannot contain equal instances; the order by which the instances are in a table is meaningless; instances may have some attributes without value; in this case the attribute is called optional; when, given an instance, an optional attribute does not have one value, we say that its value is null; the names (or identifiers) of attributes belonging to the same relation schema must be unique inside this schema 7 8 2
Key(s) of a relation A set of attributes that takes different values on every instance is a primary key of the relation the value of a primary key uniquely specifies an instance within a table a composite key is a key made up of two or more attributes Within a relation, the primary key is the minimum subset of attributes by which every instance are uniquely specified natural attributes are sometimes good primary keys often, an artificial attribute is assigned to an object in order to identify it uniquely; this kind of keys have no intrinsic meaning but they are useful to uniquely identify every instance of the relation for instance, in a table of data about students at a school, they might all be assigned a student ID one table may have several candidate keys but it must have one and only one primary key 9 Primary keys and Foreign Keys On relational DBMS, the relationships among entities are represented in tables using common attributes. Example: Primary key of table soil Foreign key in table soil A foreign key is an attribute (or a set of attributes) in a relation that matches the primary key of another relation. Primary key of table soildiss$ 10 Relational model: 1 st normal form The value of each attribute is atomic. Example: instead of name type border Tejo major maritime, land Mondego major maritime use the 1NF name type maritime border Tejo major Yes Yes Mondego major Yes No land border 11 Relational model Functional dependency Given a relation R, an attribute A i in R is said to functionally determine another attribute A j, also in R, (written A i A j ) if, and only if, each A i value is associated with precisely one A j value. Functional dependency 12 3
Relational model: 2 nd normal form A table is in 2NF if and only if it is in 1NF and no non-prime attribute is dependent on any yproper p subset of any candidate key of the table. a non-prime attribute of a table is an attribute that is not a part of any candidate key of the table a table is in 2NF if and only if it is in 1NF and every non-prime attribute of the table is dependent on the whole of a candidate key. when a 1NF table has no composite candidate keys, the table is automatically in 2NF the general case concerning GIS tables Relational model: 3 rd normal form A table is in 3NF if and only if it is in 2NF and no non-prime attribute is dependent on any other nonprime attribute. For instance, table conc_1998 is not in 3NF Functional dependency 13 14 Normalization Example: 2FN 3FN A table is in 3NF if every non-key attribute provides a fact about the key, the whole key, and nothing but the key. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. Database normalization is the process of organizing the attributes and tables of a relational database to minimize redundancy and dependency the objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships 15 Example: given the following table verifying 2NF to replace it with 2 tables: 16 4
Example: Suppose that you have to design a database schema concerning remarkable trees in order to be stored as vector data in a GIS. For each tree it is required to record its common name, scientific name, family, height and trunk diameter. In the study area there is around one thousand remarkable trees and several species are represented by more than one tree. What relation schema best fits these requirements? Structured Query Language (SQL) SQL is a standard programming language (ANSI - 1986) for managing data held in a relational DBMS data insert query update and delete schema creation and modification data access control 17 18 The SELECT statement The most common operation in SQL is the query, which is performed with the SELECT statement. SELECT retrieves data from one (or more) tables. Standard SELECT statements have no persistent effects on the database the SELECT statement can have persistent effects when an output table is also mentioned to store the SELECT statement result The SELECT statement retrieves a subset of a given table according to a given logical condition a logical condition is an expression that returns either TRUE or FALSE the SELECT statement retrieves the instances for which the given logical condition returns TRUE 19 SELECT By Attributes (ArcMap) 20 5
The WHERE clause The WHERE clause (examples) SELECT attribute1, attribute2,... (or *) FROM table1, table 2,... WHERE condition; this statement retrieves the instances that hold the condition elementary conditions are built using relational operators (<, <=, >, >=, =, <>) and other operators (IN, LIKE, BETWEEN) conditions may also be built using elementary conditions and logical operators (NOT, AND, OR). 21 select * from soil where CODE >= 800 select * from soil where CODE > 400 and CODE <= 700 select * from soil where not ( CODE > 400 and CODE <= 700) select * from soil where CODE <= 400 or CODE > 700) (1) Remark: this statement is equivalent to the previous statement select * from NUTSII where NAME like A% select * from rivers where "TYPE" = major' or description" in ( 'land border', maritime border' ) select * from NUTSII where name = 22 The WHERE clause in GIS Joining tables In GIS the condition of a SELECT statement may also use spatial operators such as intersect, are within a distance of, contain, are within, touch the boundary of,... 23 The FROM clause specifies the names of the tables containing the records to select. When the FROM clause refers to more than one table, the SELECT statement executes the cartesian product (joining) of the referred tables each tuple of the cartesian product is composed of one row from each of these tables. The WHERE clause of a SELECT statement that joins tables is used to constrain the resulting subset of the cartesian product those tuples in which a foreign key value is equal to a primary key value. 24 6
Example (joining tables): SELECT * FROM NUTS_1998, AddedValue WHERE DTCC=code; where code is the primary key of table AddedValue and DTCC is a foreign key of table NUTS_1998 Another example: Suppose that you want to create a GIS vector theme concerning agricultural cultures in a given region. For each land parcel, it is required to record (i) the species common name, (ii) the species scientific name, (iii) the average yield of each species in the region, (iv) the seeding or planting dates and (v) the parcel area. The study area is a region with a high level of parcel disaggregation. 1. What GIS data structure best suits these requirements? 2. Design a suitable database schema. 3. Explain the main advantages of your schema. 4. Suppose that it is necessary to get information concerning the first seeding or planting date of each species in the region. How to do it? 25 26 7