AS LEVEL Computer Application Databases

Similar documents

IT2305 Database Systems I (Compulsory)

Unit 2.1. Data Analysis 1 - V Data Analysis 1. Dr Gordon Russell, Napier University

IT2304: Database Systems 1 (DBS 1)

Data Analysis 1. SET08104 Database Systems. Napier University

Lesson 8: Introduction to Databases E-R Data Modeling

Modern Systems Analysis and Design

Foundations of Information Management

2. Conceptual Modeling using the Entity-Relationship Model

Data Modeling Basics

Database Design Overview. Conceptual Design ER Model. Entities and Entity Sets. Entity Set Representation. Keys

A brief overview of developing a conceptual data model as the first step in creating a relational database.

Tutorial on Relational Database Design

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

COMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model

The Entity-Relationship Model

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

Relational Database Basics Review

DATABASE INTRODUCTION

2. Basic Relational Data Model

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

DATABASE MANAGEMENT SYSTEMS. Question Bank:

Foundations of Information Management

Designing Databases. Introduction

1 File Processing Systems

Database Design. Adrienne Watt. Port Moody

Entity-Relationship Model

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Fundamentals of Database Design

- Suresh Khanal. Microsoft Excel Short Questions and Answers 1

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Introduction to Computing. Lectured by: Dr. Pham Tran Vu

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

SAMPLE FINAL EXAMINATION SPRING SESSION 2015

Chapter 1. Database Systems. Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel

Chapter 2. Data Model. Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel

Data Modeling: Part 1. Entity Relationship (ER) Model

How To Create A Table In Sql (Ahem)

TIM 50 - Business Information Systems

INFO Koffka Khan. Tutorial 6

Introduction to Databases

Chapter 2: Entity-Relationship Model

Foundations of Business Intelligence: Databases and Information Management

Once the schema has been designed, it can be implemented in the RDBMS.

Foundations of Business Intelligence: Databases and Information Management

The structure of accounting systems: how to store accounts Lincoln Stoller, Ph.D.

Conceptual Design Using the Entity-Relationship (ER) Model

THE OPEN UNIVERSITY OF TANZANIA FACULTY OF SCIENCE TECHNOLOGY AND ENVIRONMENTAL STUDIES BACHELOR OF SIENCE IN INFORMATION AND COMMUNICATION TECHNOLOGY

Study Notes for DB Design and Management Exam 1 (Chapters 1-2-3) record A collection of related (logically connected) fields.

Chapter 2: Entity-Relationship Model. Entity Sets. " Example: specific person, company, event, plant

SQL Server Table Design - Best Practices

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Outline. Data Modeling. Conceptual Design. ER Model Basics: Entities. ER Model Basics: Relationships. Ternary Relationships. Yanlei Diao UMass Amherst

Databases Model the Real World. The Entity- Relationship Model. Conceptual Design. Steps in Database Design. ER Model Basics. ER Model Basics (Contd.

Exercise 1: Relational Model

Chapter 2: Entity-Relationship Model. E-R R Diagrams

Designing a Database Schema

not necessarily strictly sequential feedback loops exist, i.e. may need to revisit earlier stages during a later stage

THE ENTITY- RELATIONSHIP (ER) MODEL CHAPTER 7 (6/E) CHAPTER 3 (5/E)

1. INTRODUCTION TO RDBMS

Database Design Methodology

Relational Database Concepts

Why & How: Business Data Modelling. It should be a requirement of the job that business analysts document process AND data requirements

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

We know how to query a database using SQL. A set of tables and their schemas are given Data are properly loaded

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

ICAB4136B Use structured query language to create database structures and manipulate data

Literature for the 1 st part

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Lecture 6. SQL, Logical DB Design

Data Hierarchy. Traditional File based Approach. Hierarchy of Data for a Computer-Based File

There are five fields or columns, with names and types as shown above.

Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model

How To Write A Diagram

LOGICAL DATABASE DESIGN

Lecture Notes INFORMATION RESOURCES

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

Introduction to Database Systems

Databases and BigData

3. Relational Model and Relational Algebra

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Information Systems Modelling Information Systems I: Databases

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Chapter 1: Introduction

MySQL for Beginners Ed 3

CSC 742 Database Management Systems

Basic Concepts of Database Systems

7. Databases and Database Management Systems

Foundations of Business Intelligence: Databases and Information Management

Answers to Review Questions

Chapter 3. Data Modeling Using the Entity-Relationship (ER) Model

Concepts of Database Management Seventh Edition. Chapter 6 Database Design 2: Design Method

Conceptual Design: Entity Relationship Models. Objectives. Overview

Transcription:

AS LEVEL Computer Application Databases YLLSS

In the syllabus, we have Applications of databases in society Students should be aware of the uses and applications of databases in everyday life (e.g. the library system, inventory system in a supermarket, credit card system, etc.). Students should be given opportunities to discuss the importance of databases in business environments and how they are related to the success of a business. Concepts and terminology Students should understand the following terminology and concepts: data and information data, fields, records, tables, files and databases common data types such as integer, real, character, string, boolean, date, etc. indexes and keys database management systems (e.g. data definition language, data manipulation language, data dictionary, transaction processing and access control, etc.) program-data independence data redundancy and data integrity Basic concepts of a relational database Students should know the basic concepts underpinning relational databases such as entity, relation, attribute, domain, primary key, foreign key, candidate key, entity integrity, referential integrity, domain integrity, etc. Students should be able to identify these basic elements in examples taken from everyday applications. Students should know how to organise data differently but sensibly in a relational database and be able to establish the required relationships to link up the tables. Creating a relational database Students should be able to create a simple relational database2 based on specified requirements using SQL. Database maintenance and manipulation Students should be able to use SQL to maintain a simple relational database, manipulate its data or retrieve the required information. They should be able to: modify the structure of the tables add, delete and modify the data in the tables view, sort and select the contents by filtering 1

use appropriate operators and expressions such as the in, between and like operators, arithmetic operators and expressions, comparison operators and logical operators etc. to perform specific operations use simple built-in functions such as aggregate and string functions, etc. perform multiple field indexing and multi-level ordering perform queries on multiple tables including the use of equi-join, natural join and outer join perform sub-queries (for 1 sub-level only) export query results to, for example, text, html or spreadsheet format, etc. The conceptual data model Students should understand the importance of good database design in effective database management. They should be aware of the three levels of data abstraction; namely conceptual level, physical level and view level. Entity-Relationship modeling Students should be aware of the three types of relationship (one-to-one, one-to-many, many-to-many) among entities in a relational database. Students should be able to create simple entity-relationship (ER) diagrams involving binary relationships only in designing databases for simple business scenarios. This includes the resolution of many-to-many relationships into multiple one-to-many relationships in order to implement the database. Students should be able to transform the ER diagrams to tables in relational databases and be able to create a database schema for a given set of data to describe the characteristics of the database. Introduction to Normalisation Students should be able to briefly explain the meaning and purpose of normalisation. They should be aware of the methods or measures used to reduce data redundancy 2

Introduction to Databases Data vs. Information Numbers, text, images or any recording in a form that is accessible to human beings are classified as data. Data themselves have no meaning. It is only when data is interpreted then the data content will become meaningful. Interpreted data are referred to as information. For example, the number 33.5 tells us almost nothing. However when readers are told that the number stands for the temperature in centigrade, the number makes sense to us. In this example, 33.5 is a piece of data whereas 33.5 as a temperature in centigrade is a piece of information. Information is stored in computers such that both its data value and interpretation will be recorded. In most cases, interpretation of computer data is typically given by the corresponding data name. In the context of databases (which will be elaborated in the next section) as well as in daily use, the terms information and data are often used interchangeably although such a kind of confusion is not desirable. In most cases, the interpretation of the term data should be clear from the context of discussion. In the context of databases, data usually means information. The Data Hierarchy Each information system has a hierarchy of data organization, and each succeeding level in the hierarchy is the result of combining the elements of the preceding level. The six levels are bits, characters (bytes), fields (data elements), records, files, and data base (see Figure 1). A bit is a binary digital which has a value of either 0 or 1. A byte is a composed of 8 ordered bits. Figure 1. Hierarchy of data organization. 3

Question to ponder Are byte and character types the same? (Answer: Not necessarily. This depends on the underlying encoding scheme being adopted. Even an ASCII character may need more than one byte to store in certain implementation of Unicode.) Data Field/Element A (data) field or data element is the lowest level logical unit in the data hierarchy that can be interpreted in a meaningful way, e.g., David for a name, 23469345 for a phone number. The maximum number of characters (not bytes) that a field can have is called field length. A field may consist of a single character only, e.g., M(ale) and F(emale) for representing sex. How fine is the granularity of a field is a user s decision, e.g., we can treat an address as a single field or as an aggregate of several fields such as flat-and-floor-number, street-number-and-street-name, district, city and country, etc. The key concern is the application needs. If certain processing is required to handle an address at city level, we of course need to divide the address field into its components. Record A record is a logical group of related data fields describing an event or an item, e.g., a student enrollment record consists of fields such as student-id, student-name, programme-code, module-code, date-of-enrollment, etc. A record is the lowest level logical unit that can be accessed from a file. In other words, if one would like to access a data field within a record, the whole record has to be retrieved first before the required data field is identified. File A logical file is composed of occurrences of records. A physical file is used to refer to a named area on a secondary storage device that contains a program, a textual material, or even an image. One logical file is not necessary mapped to one physical file and vice versa. For example, a logical file may consist of an index area and a record area such that each of the areas is associated with a separate physical file. End-users are usually concerned with logical files instead of physical files. Questions to ponder Give an example that a physical file may contain more than one logical file. 4

Give an example that a logical file may be stored in multiple physical files. Data Base A data base is a collection of files that are logically related and integrated to one another so that data redundancy is minimized or reduced. Data redundancy exists when a data field is stored in more than one logical file. Data redundancy often cannot be eliminated entirely for various reasons but it should be kept under control. Database management system is devised to control the data redundancy problem by ideally storing every data item once and/or by propagating data changes to all related record occurrences probably among a number of files so that data integrity (which concerns the validity, accuracy and correctness of data) can be maintained. Database management system is often referred to as DBMS, database or database system. Teaching remark In many books or online learning resources, the term data base is often incorrectly referred to as database. It comes to a stage that people begin to use the two terms interchangeably. In fact, the ASCA and ALCS Curriculum and Assessment Guides also use the terms interchangeably. Need for Storing Persistent Data Almost all computer applications require some data be kept for describing some inherently stable properties or up-to-date status of certain items or events. Let us think about the information kept by a bank for its saving account holders. For each saving account, the bank must store its unique account number, name(s) of account holder(s), contact address(es) of account holder(s), account balance, etc., to say a few. Those data are considered to be persistent data as they are not changed frequently. However some data are more persistent than the others. For example, an account number should never been changed whereas there is a slim chance that changes would be required for the name(s) of the account holder(s). Account balance is most susceptible to change among the pieces of listed data as transactions like money deposit or withdrawal will affect its value. Obviously the correctness of all recorded persistent data is important to the functioning of the associated computer applications. Whether or not a piece of data is persistent varies from application to application. Age may not be considered as a piece of persistent data as it changes every year for most people. However the age field is definitely persistent if it appears on a death certificate. Problems of File Systems 5

Persistent data can be stored in file(s). However there are potential problems with that. 1. Since files are designed to fit individual application needs, a data element may appear in several files if that piece of data is needed in several applications. For example, a bank customer may open a saving account and a stock account at the same time. For the stock account, the account balance is composed of the quantity of each stock purchased. Obviously at least two different files are needed to keep data for the two types of accounts but data elements such as name(s) of account holder(s) and contact address(es) of account holder(s), are common. When the customer moves to a new address, both file are required to be updated. This is caused by the data redundancy problem. Data redundancy can cause a number of problems during data modifications; those problems are referred to as data anomalies (which will be detailed later). 2. A consequence of data redundancy problem is integrity problem or data consistency problem. Data become inconsistent if copies of data are not updated simultaneously. 3. Traditional file systems suffer from sharing problem and security problem. If a new report which needs to use some but not all data from two files is required, should one be allowed to access both files? As access control on a file system can only be made at the file level, allowing someone to read both files implies unnecessary exposure of data. If a new file is created to store all data needed to produce the new report, data redundancy problem emerges. 4. Structural dependence (also known as program-data dependence) exhibits in file systems. In order to use a file, a program needs to know the file structure, i.e., details of all data stored in the file. A change in any file s structure requires the modification of all programs using that file. Aims of Database Systems The aims of database systems are as follows: Reduce data redundancy and inconsistency Separate user data view from physical file structure (see next session for details) Impose data integrity constraints, e.g., for data validation Tackle atomicity problem, i.e., all activities in a transaction is either completely performed or undone. For example, if money is transferred from a saving account to a stock account, the saving account will be debited whereas the stock account will be credited with the same amount. The corresponding transaction has to ensure that both data changes are done as one single unit. Allow concurrent data access Offer secured data access Help make data management more efficient and effective Allow quick answers to ad hoc queries using some query language Provide end users better access to more and better-managed data 6

Some databases may not be able to achieve all the above aims. Early databases may not support transaction processing or offer secured data access for concurrent users. Separating User Data View from Physical File Structure A key advantage of database is that the end-users and application programmers do not have to know how data files are organized and stored in the database. This is referred to as the structural independence (or program-data independence). Thus changing the structure of a file does not necessarily require computer programs that access the file be modified. Databases achieve structural independence by organizing data through advanced data structures in which the data fields and records are related to each other. Computer programs do not access files for data. Instead, computer programs that need accessing data have to direct their requests to the DBMS which in turn processes the requests against the data base; in other words, all operations on the data base are coordinated by the DBMS. Figure 2 describes the interactions between different parties in a database environment. Figure 2. Interactions between various parties in a database environment. Applications of Databases in Society Almost all computer applications need to use database to store persistent data. In a library system, at least the following data need to be kept. Library user ID number Library user name Library user contact address 7

Maximum number of books that a library user can borrow Library user ID number, book s call number and due date of the loan period for each book which is on loan Author name(s), publisher, year of publication, and status (e.g, on-loan, on-hold, on-request, and missing, etc.) of each book. The above information must be kept in order to support basic library operations like book search, borrowing and return, etc. In a supermarket, inventory information needs to be stored so as to facilitate the inventory, purchasing, marketing and other business functions of the company. Some of the information to be kept is given below. Item ID number Item name (e.g., ABC dental cream) Item category (e.g., oral hygienic) Unit price Stock level Reorder level (below which an order needs to be placed for replacement) Reorder amount (i.e., the number of items to be ordered) In a credit card system, the following information should be recorded. Card number Card owner s name, contact address and phone number Credit limit Credit amount used Card s expiry date Card s date of issue First issued date Number of times that the card was reported missing Number of times of late payment Databases not only support day-to-day operations of organizations only. Applications can be built to analyze historical data in databases for planning purpose. Banks use various types of customer information such as account balances, salary information, saving patterns, credit card repayment patterns, mortgage repayment patterns to create their customers profiles. Customer details like occupation, age and marital status are recorded too. Such information is stored in databases and would be analyzed so as to enable the banks to identify potential customers for specific products, e.g., fund investment and insurance. Such a kind of database applications is known as data mining which analyzes data in databases to look for data trends or anomalies without the knowledge of the meaning of the data. 8

The amount of operational data would be too much for management staff to digest. Besides, the data would be too raw for them to make management decisions. In practice, operational data are typically summarized (and stored in a data warehouse sometimes) before they are presented to the management. All mentioned data, no matter in a raw or digested form, are stored in some form of database. Types of Databases There are many ways to classify databases and two of them are listed below. Number of Users Many databases designed to run on personal computers are expected to be used by one user at a time. We usually referred them as single-user databases. Earlier versions of Microsoft FoxPro and Access belong to such a type. More sophisticated databases like MySQL, Microsoft SQL Server, IBM DB2 and Oracle are called multi-user databases as they have built-in facilities for secured and concurrent data access. Location A database may be either centralized or distributed. In a centralized database, all database functions run entirely on a single computer. A distributed database is composed of a set of partially independent databases running on a group of networked computers that share a common schema (i.e., an overall design of data base), and coordinate processing of transactions that access non-local data (Silberschatz et. al., 1997). Reference Silberschatz, A., Korth, H.F., & Sudarshan, S. (1997). Database System Concepts (3 rd McGraw Hill. ed.). Another form of distributed implementation of databases, more commonly known as client-server databases, focuses on the distribution of various database functions over multiple computers. In particular, the database front-end functionality such as input validation is typically done by the client machines (which are usually personal computers) whereas the back-end functionality like transaction handling and data base update is provided by server systems, which are typically either data servers or transaction servers. 9

Data Models A data model is a collection of logical constructs used to represent data structure, data semantics and data relationships found within the database. Database models can be conceptual or implementation oriented. Conceptual data models are used to describe data at the logical and (user) view levels. It offers no description about the implementation issues. Conceptual models are often used as a communication tool between database designers and end-users so as to help the designers understand the data requirements of the end-users correctly. The entity-relationship model is an instance of conceptual data model. Another type of data model provides a high-level description of the implementation. Three popular implementation models are hierarchical, network and relational models. Note that the problem of structural dependence in both hierarchical and network models is resolved in the relational model. The key advantages of relational model are as follows: Structural independence Improved conceptual simplicity as data are structured in simple-to-understand tables Easier database design, implementation, management, and use Ad hoc query capability with the use of the structured query language Powerful database management system can be built with the system s complexity being hidden from the user view 10

Relational Database Concepts Introduction In this section, basic relational database terminology and concepts will be introduced. The definitions and characteristics of entity, relation, attribute, domain and key, etc., are detailed. In particular, the difference between keys and indexes, and three concepts about data integrity, namely entity integrity, referential integrity and domain integrity, are explained. In order to help explain the above terminology and concepts, a problem scenario about a school library is introduced as below: The library of XYZ School has decided to computerize its services so as to make them more efficient and effective. Since computerization is relatively new to the school, the library aims to provide only basic library functions to the users initially through the implementation of a simple computerized library system. The system is expected to offer a computerized catalogue of all library items, e.g., books and past examination papers, and basic circulation functions such as item borrowing, returning and reserving. Obviously the system needs to keep library user information such as the number of library items that s/he is allowed to borrow, dates and call numbers of those library items that s/he has borrowed, or requested, etc. Library item details such as its call number, author(s), ISBN, year of publication and status (e.g., available, on loan, requested and damaged), etc., are also kept. As a teacher librarian of the school, you are asked to design a suitable database schema to support the mentioned library operations. Whenever applicable, examples will be provided in relation to the above problem scenario so as to provide a clear context for illustrating the database terminology and concepts. Entity and Entity Set/Type An entity is a distinguishable object to be described. It can be any object such as a person, a place, an event or a thing, etc. Entities that share the same properties or attributes are collectively referred to as an entity set (or entity type). Example entity sets that can be found in a school environment are students (person), classrooms (place), examinations (event), and subjects (thing), etc. 11

Entity sets in the XYZ School library example: o Suppose Linus and Jeff are students, they are entities (library users) because they share properties of a student and are distinguishable objects in a school library system. o Library users who may be teachers or students (person), library items (things), circulation transactions such as a book request (event), and user privilege (things) etc. Teaching remark (out of syllabus) An entity set (type) may be further divided into supertype and subtypes if required. In the XYZ School library example, both teachers and students are classified as library users. However, it is possible that we need to further divide teachers and students into separate subtypes for meeting certain application needs. For example, a student library user is required to be associated with his/her class if the school would like to research into the number of books borrowed per student from each class. Such an association also helps the teacher librarian to learn more about the reading habit of various classes of students. Obviously such a sort of class association does not exist for teachers. Conversely, the library may want to know the number of times that a teacher does not return borrowed items to the library on time and the cumulative number of days overdue (as students are fined for late return of library items but it is not always easy to implement a similar system on teachers). Such a function can help the teacher librarian to identify those colleagues who do not fully respect the library regulations. Storing those pieces of information is obviously not necessary for student library users. The similarity and differences in the application need for various library users imply a need for a finer classification among them. For example, common attributes of library users such as library user ID, name, address, etc., are kept in the supertype whereas non-common attributes of teacher and student are kept in the corresponding refined entities. The supertype and corresponding subtypes are structured to form a generalization hierarchy. In relational database, an entity set is typically represented in terms of one or more relations (a mathematical term for tables), with each of which being composed of rows and columns. Each tuple (a mathematical term for rows in a table) in a relation represents an entity of the associated entity set. Each column, which is uniquely named within the table that it is associated with, represents a category of information that corresponds to an attribute. A relational database is typically composed of a number of related tables. Note that the order of the rows and columns within a table is immaterial to the database. As shown in the table below, the user privilege entity set of XYZ School library example is composed of 6 rows with each row defining the privilege of a user type for a given material type. 12

column (attribute) row (tuple) Table 1. The user privilege table of XYZ School library. Attributes Attribute and Domain Each entity has certain descriptive properties known as attributes (or fields). Some potential attributes for the student entity are student-name, student-number, and sex, etc. Attributes in the XYZ School library example: o student ID, class name (in the library usesr entity set) o call number, material type (e.g., CD-ROM, book), item name (e.g., book title) The set of all possible values for an attribute is called its domain. For the student entity set, the domain of the attribute sex should be {female, male} whereas the domain of the attribute age should be any positive integer (although it may make more sense by setting an upper bound for the domain). Attribute domains in XYZ School library example: o Domain of class name : all valid class names found in XYZ school. o Domain of maximum number of library items that a user can borrow : any non-zero integer not greater than 10. The relational database theory does not restrict what data type that an attribute can associate with. However, some commonly supported data types in relational database are: Number (integer or real number) Text (fixed length or variable length) Boolean type 13

Date and time Simple vs. Composite Attributes Attributes that cannot be divided into subparts are known as simple attributes (e.g., age); otherwise they are composite attributes (e.g., address). Whether there is a need to re-structure an attribute to finer attributes depends on the application needs. In the XYZ School library example, the library user name is represented as a composite attribute as it is not further divided into simpler attributes such as first-name and surname. Such a representation does not cause any problem as the library does not have any need of processing library information in accordance with its user s first-name or surname. To facilitate detailed queries (for the future), many database designers prefer to change a composite attribute into a series of simple attributes. Null Attributes It is possible to use a null as the value of an attribute of an entity. For example, the value of the ISBN field will be set to null for past examination papers but a valid ISBN is needed for most books. Derived Attributes In some occasions, the value of an attribute can be derived from other related attributes or entities. Such a kind of attributes is referred to as derived attribute. Suppose a database keeps an employee table to store employee information like employee-number, employee-name and number-of-dependents, and a dependent table to record information of each employee s dependent in a separate row. In this case, the number-of-dependents attribute in the employee table is a derived attribute as its value is equal to the number of associated rows in the dependent table. In a good database design, integrity constraint (which will be detailed later) should be defined between derived attributes and their base attributes in order to ensure that an update of the value of any base attribute will trigger a corresponding update of any associated derived attributes. Otherwise, data inconsistency will occur. Intuitively, we should eliminate all derived attributes of a database because their values, if required, can be computed in real-time. However the use of derived attributes can improve the efficiency of a database. In the XYZ School library example, it is better to have (derived) attributes to record the number of times that a teacher does not return borrowed items to the library on time and the cumulative number of days overdue although those pieces of information can be derived from the teacher s circulation records history. The use of derived attributes in this example can greater 14

enhances the database efficiency when compared to rescanning all past circulation records of a teacher for computing the required information. In this example, the computational effort for maintaining the integrity of the values of the derived attributes and their base attribute values is small. 15

Keys A key is a value of one or more selected attributes used to identify an entity in an entity set. The concerned attribute(s) is/are known as the key field(s). A potential key field of the library user entity set of the XYZ School library example is the library user ID which is unique for each library user. A superkey is a set of one or more attributes that, taken collectively, uniquely identify an entity in an entity set. However, a superkey may contain extraneous attributes. In the user privilege table of the XYZ School library example, all of the following combinations of attributes are superkeys o User type and Type of material o User type, Type of material and Loan period o User type, Type of material, Loan period, and Total number of items that can be borrowed o Description and Type of material o Description, Type of material and Loan period. Once the values of any of the above attribute combinations are given, we can always uniquely identify an entity (row) in an entity set (table). The following attribute combinations are NOT superkeys: o User type and Description o Type of material and Loan period because giving the values of any of the above attribute combinations, more than one entity (row) may be identified. Teaching remarks The identification of superkeys for a table must be based on the semantics of the attributes of the table instead of the table content. In the user privilege table of the XYZ School library example (see Table 2), it appears that giving the values of the Loan Period and Total number of items that can be borrowed, a unique entity (row) can be identified and thus the two attributes, when combined, can be taken as a superkey. However this is misleading. Suppose school alumni are allowed to use the library and they are allowed to borrow up to 3 books for a maximum of 14 days. This obviously makes the Loan Period and Total number of items that can be borrowed no longer a superkey as a junior student is also allowed to borrow the same number of books for the same loan period. In reality, teachers as well as textbooks often use table contents to explain the concept of key (and normalization, which will be covered later). Teachers must indicate to students their assumption that the table contents give an exhaustive illustration of the table semantics. 16

Minimal superkeys are called candidate keys. Removal of any attribute in a candidate key will render the remaining attribute(s) no longer a key. In the user privilege table of the XYZ School library example, all of the following combinations of attributes are candidate keys o User type and Type of material o Description and Type of material In the above example, it clearly shows that it is okay for a table to have more than one candidate key. However multiple candidate keys in a table might imply the existence of transitive dependency in the table. Transitive dependency is an indicator of poor database design and should be avoided. The notion of transitive dependency will be introduced when introducing the notion of database normalization. Teaching remark Like superkeys, the identification of candidate keys for a table must NOT base on the table content, but the semantics of the attributes of the table. A primary key is a candidate key chosen by the database designer as the major means of identifying an entity (row) within an entity set (table). No part of a primary key can be null. Unlike the candidate key, a table can only have one primary key. Teaching remark Some textbooks in the market may have given an imprecise definition of candidate key and primary key. In one textbook, a primary key is defined as a field or combination of fields that uniquely and minimally identify a particular record in a table. According to this definition, it is possible that a table would have more than one primary key but this is obviously incorrect. The definition given in the book in fact describes a candidate key rather than a primary key. Any attribute which is not a part of any candidate key is known as a non-key attribute. In the XYZ School library example, the loan-period is a non-key attribute. A foreign key is either null or not a superkey in its own table but a candidate key in another table. Suppose we have two tables, namely student-subject and subject which store the subjects that a student has enrolled and the subject description respectively. The student-subject table records student-id (a part of the primary key) and subject-id (another part of the primary key) whereas the subject table stores subject-id (primary key) and subject-descriptor. The subject-id in the student-subject table is a foreign key to the subject table. 17

student-subject subject student-id subject-id subject-id subject-descriptor 200425642 CS1132 CS1132 Databases 200425654 CS1132 CS1145 Programming 200425854 CS1145 foreign key to the subject table Teaching remarks It is wrong to say the subject-id in the student-subject table is a foreign key. The notion of foreign key is defined on two tables. Many textbooks do not explicitly state that the value of a foreign key can be null. Indexes One or more indexes can be defined for a table for efficient data retrieval. Unlike primary key, an index does not have to be unique. Whether or not an index is required for a table depends on the application needs. Inclusion or omission of an index in a table definition may affect the efficiency, but not the functionality, of any data retrieval. An index is an implementation structure such that given one or more attribute values, relevant rows can be efficiently retrieved. It is typically implemented through the use of sophisticated data structures like ISAM and B+ trees. Common mistakes Some people may use the terms index and secondary key interchangeably but this should be avoided. Keys are logical concepts whereas indexes are implementation concepts. In fact, there is no notion of secondary key or index in relational database theory. Teaching remarks Most relational databases create an index for the primary key of each table for efficient data retrieval. Although indexing can facilitate efficient data retrieval, it should not be overused. Index creation and maintenance may involve a lot of computations that take time to finish. 18

Data Integrity As mentioned before, data integrity is concerned with the validity, accuracy and correctness of data. In relational database, three type of data integrity are of particular concerns. They are entity integrity, domain integrity and referential integrity. Entity Integrity Entity integrity is a property that ensures that 1. no rows are duplicated, and 2. no attributes that make up the primary key have a null value. Note that condition 1 must be enforced or a primary key will not be able to uniquely identify an entity (a row) in an entity set (a table). As an example, the user privilege table does meet the criteria of entity integrity. Domain Integrity Domain integrity is a property that ensures that whenever a new data item is entered into the database, it must be within the domain of the corresponding attribute. For instance, the enforcement of domain constraint can stop one from entering a value other than female or male to the sex attribute. Referential Integrity Referential integrity is concerned with the data consistency between coupled tables. In particular, we may want to ensure that an attribute value that appears in one table also appears for a certain set of attributes in another table. For example, the XYZ School library database may keep one table to store library user personal information like user-id, user-name, and contact-address, etc. and another table to keep information about loaned books like user-id, book-call-number, and due date, etc. The user-id is the primary key of the library-user-details table whereas the concatenation of user-id and book-call-number forms the primary key of the loaned-book table. The user-id attribute of the loaned-book table is a foregin key to the library-user-details table (as user-id is not a superkey in the loaned-book table but a candidate key in the library-user-details table). Obviously, it is important to ensure that any value appeared in the user-id attribute of the loaned-book table also appears in the user-id attribute of the library-user-details table. In relational databases, referential integrity is typically enforced by defining a referential constraint between a primary key and a foreign key. For referential integrity to hold, any attribute(s) in a table that is declared a foreign key can contain only values from the primary key attribute(s) of 19

the table that the foreign key relationship is referred to. Thus, deleting a row that contains a value referred to by a foreign key in another table would break referential integrity. In the XYZ School library example, this is equivalent to removing a library user from the library-user-details table without demanding the user to return all books that s/he has borrowed. More examples about referential integrity can be found here. It is important to note that a referential constraint may not enable us to avoid errors at the database design level. The following example illustrates such a problem. The table on the left stores ID numbers and names of all library users whereas the table on the right keeps all loaned books. ID and call number are the primary keys of the library user and loan event tables respectively. user ID in the loan event table is a foreign key to the library user table. According to the definition of foreign key, it is acceptable to assign a null value to user ID as found in third record of the loan event table. This obviously does not make sense from a user perspective to allow a book being loaned to an unknown person but the referential constraint setting between the two tables does not stop the assignment of null to user ID. To avoid the problem, we need to make user ID in the loan event table a mandatory attribute. Teaching remark SQL92 and SQL99 provides standard features to define constraints for modeling various data integrity constraints but many commercial database management systems such as Microsoft Access tend to provide non-standard customized features to serve the purpose. Such details are not within the curriculum and will not be further discussed here. 20

Introduction to Database Design Methodology Three Levels Database Architecture Database can be viewed at three levels of abstraction, namely conceptual level, physical (or internal) level and view (or external) level. The key concerns of the three levels are as follows: View level or external level is concerned with how individual users see the data. Note that a user is may range from application programmers to casual users who interact with the database with ad-hoc query facilities. For example, a library user may be interested in the library collection but not the library user statistics. The librarian would not be expected to have any interest in the information about individual library user s reading habit. Conceptual level is concerned with a community user view of the entire information content of the database that is of interest to the organization. In this level, no physical consideration is considered. A change in the internal view to improve performance may not involve any change in the conceptual view of the database. Physical level or internal level is concerned with how data is actually stored. Efficiency is the prime concern at this level. The following aspects, among others, are considered at this level: 1. Data structures chosen, e.g. B-trees, hashing, etc. 2. Access paths, e.g. specification of primary and secondary keys, indexes and pointers and sequencing. 3. Miscellaneous issues, e.g. data encryption and compression techniques. Figure 1 outlines the three levels database architecture. Figure 1. The three-level database architecture. The key advantage of the three-level database architecture is that it separates (1) the conceptual view from the physical view, and (2) the external views from the conceptual view. The former enables a database designer to provide a logical description of the database without the need to specify physical structures. This is often called physical data independence. The latter enables a 21

database designer to change the conceptual view without affecting the external views in most cases. This separation is sometimes called logical data independence. Readers may click here for a more detailed discussion of the three levels. Logical Data Modeling In order to identify the data need of an organization, logical data modeling is usually applied. Logical data modeling explores the domain concepts, and their relationships, of a problem domain. In databases, logical data modeling typically exhibits in the form of entity relationship modeling. The basic idea is to identify data objects called (logical) entity sets, which are described by their (data) attributes, and their relationships that meet all data requirements of the concerned organization, typically expressed in a type of diagram called entity relationship diagrams (ERD). Logical data modeling, so does entity relationship modeling, may be performed for the scope of a single project or for the entire enterprise. Teaching remarks Different variants of ERD come with different notations and it is important to tell students to describe any potentially ambiguous ERD notations when answering a question. For a comprehensive description on data modeling, the Information Technology Services of the University of Texas has produced an online practical guide to data modeling which is definitely worth reading. Note that the ERD notations used there are not always consistent with the ERD notations adopted in this package. Terminology and Notation of Entity Relationship Modeling Some of the terminology, e.g., such as entity and attributes, of entity relationship modeling that readers need to be familiar with have already been covered in the section entitled Basic Terminology. The description below offers some additional information about those mentioned terms as well as details of those terms that have not given previously. Corresponding ERD notion used in this package is also shown. Entity An entity is a representation of any composite information of a real object (e.g., a bank customer) or an abstract object (e.g., a money withdrawal transaction of a bank). o Entities encapsulate data only, i.e., an entity is described only by its associated attributes. How its attributes will be manipulated is out of the scope of the entity. For example, an entity about a money withdrawal transaction of a bank is concerned with what amount of money being taken out from which account on a particular date. How those recorded data may be used for various purposes are immaterial from the logical data modeling perspective. Entities may be related to one another, e.g., a bank customer may perform a number of money withdrawal transactions over a given period. The ERD notation for entities is a rectangle. A STUDENT entity is represented below. 22

Figure 2. Notation for entities (rectangle). Attribute Attributes define the properties of an entity so as to o name an instance of an entity o describe the instance o make reference to another instance Example: A school subject is an entity which is characterized by the subject code or name; a subject also has other attributes such as subject description; a subject may not be taken unless a student has completed its prerequisite subjects which are objects themselves The ERD notation for attributes is an oval. An attribute is linked to the associated entity by a line or two lines depending whether or not the attribute is a multi-valued attribute. Suppose the previously mentioned STUDENT entity has two attributes only name and address. The corresponding ERD representation is given below. Figure 3. Notation for attributes (oval connected to a rectangle with a line). The above example assumes that every student has exactly one name and one address. For a student that has more than one address, the corresponding ERD representation is as follows: Figure 4. Notation for multi-value attributes (oval connected to a rectangle with double lines). If an attribute is the (primary) key or a component of the primary key of an entity, the attribute name may be underlined. Assuming each student has a unique name, the corresponding ERD representation is changed as below. 23

Figure 5. Notation for multi-valued attributes (oval connected to a rectangle with double lines). Teaching remarks Apparently the A/AS Level curricula do not require students to be familiar with how multi-valued attribute be drawn in an ERD. In reality, it would be tedious to show attributes of entities in an ERD due to space limitation. Besides, the attributes associated with a selected entity are usually clear from the context. Thus attributes of entities are typically omitted in an ERD. Relationship Relationships are links connecting to entities that define the relationships of the entities. There may be more than one relationship between two (or more) entities, e.g., customers open accounts, customers close accounts in which open and close are relationships between the customer and account entity sets. Note that an entity may have a reflexive relationship with itself, e.g., the work supervisor of an employee of a company is also an employee of the company. Although a relationship can be classified by its degree, cardinality, connectivity, direction, type, and existence, etc., not all modeling methodologies use all these classifications. This package will only focus the discussion in degree, cardinality, and existence. The ERD notation for relationship is a diamond with the name of the relationship as the label of the shape. The following ERD says a teacher would mark assignment. Figure 6. Notation for relationships (diamond shape connected to associated entities). Another occasionally used notation for relationship is to get rid of the diamond and simply put the relationship name as a label of the line that represents the relationship. The previous example is now depicted as follows: 24

Figure 7. An alternative notation for relationships (line directly connected to associated entities). Although in most cases relationships are not associated with any attributes, it is possible that attributes may be required to describe some relationships. Suppose we have a relationship called borrow which relates the Student and Book. Obviously we need to keep the due date for return for each book on loan. The information can only be attached to the borrow relationship as it is not an attribute of Student or Book. In some literature, such a type of relationship is referred to as associative entity. Teaching remark In the Curriculum and Assessment Guide (C&A guide), no associative entity is mentioned. However, associating attributes to relationship is very common in practice and the concept should be covered. Although the literature usually introduces a separate notation for associative entity, the C&A guide does not provide any for it. Having said that, we may simply associate attribute(s) to a relationship to capture the essence of an associative entity. So far, all the above examples do not offer us any information to answer the following questions. Would an assignment be marked by more than one teacher? What are the minimum and maximum numbers of teachers to mark an assignment? Would a teacher mark more than one assignment? What are the minimum and maximum numbers of assignments that a teacher needs to mark? Would there be any teacher who does not need to mark any assignment? Is it acceptable to have any unmarked assignment? In order to answer the above questions, we need to know additional properties of the relationship. Degree The degree of a relationship is the number of entity sets associated with the relationship. Most relationships are binary relationship where the degree is two but ternary relationship that involves three entity sets can be found occasionally, e.g., teachers teach subjects to students. An n-ary relationship is a relationship with degree n. Many modeling approaches typically deal with binary relationships only. Ternary or n-ary relationships are typically decomposed into two or more binary relationships. Thus this e-learning package focuses its discussion on binary relationship only. 25

Cardinality and Existence (or Modality) Cardinality defines the actual number of entities that must be included in a relationship. Cardinality information can be divided into two types minimum cardinality and maximum cardinality. Data modeling concerns whether or not the minimum cardinality is zero and whether or not the maximum cardinality is greater than one, i.e., one (1) or many (n or m), as such information will affect how a data model is translated into a data schema, i.e., database design. Existence or modality denotes whether the existence of an entity instance is dependent upon the existence of another related entity instance. The existence of an entity in a relationship is defined as either mandatory if every instance of the entity involves in that relationship. Otherwise, the existence of an entity in a relationship is defined as optional. It is clear that the minimum cardinality of an entity that has an optional existence must be zero. Conversely, a mandatory existence of an entity in a relationship implies that the minimum cardinality of the entity in the relationship is a positive integer. The following examples are devised to illustrate the above concepts. Example 1 - Man is-married-to Woman The minimum cardinality and maximum cardinality of the relationship are 0 and 1 respectively as a man (or woman) may be married to no woman (or man) and the maximum number of women (or men) that a man (or woman) can be married to is one. Obviously, both entities (Man and Woman) optionally participate in the marry relationship. Thus the existence of both entities in the relationship is optional. In ERD, a small circle is added on the line that joins an entity and a related relationship if the existence of the entity in the relationship is optional. An ERD that represents the connectivity and existence information of the marry relationship is given below. Figure 8. The Man is-married-to Woman scenario. 26