SQA Higher Information Systems Unit 2: Relational Database Systems

Transcription

1 SCHOLAR Study Guide SQA Higher Information Systems Unit 2: Relational Database Systems David Bethune Heriot-Watt University Tom Kelly Heriot-Watt University Tom Liversidge Heriot-Watt University Heriot-Watt University Edinburgh EH14 4AS, United Kingdom.

2 First published 2007 by Heriot-Watt University This edition published in 2007 by Heriot-Watt University Copyright c 2007 Heriot-Watt University Members of the SCHOLAR Forum may reproduce this publication in whole or in part for educational purposes within their establishment providing that no profit accrues at any stage, Any other use of the materials is governed by the general copyright statement that follows. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, without written permission from the publisher. Heriot-Watt University accepts no responsibility or liability whatsoever with regard to the information contained in this study guide. SCHOLAR Study Guide Unit 2: Information Systems 1. Information Systems Printed and bound in Great Britain by Graphic and Printing Services, Heriot-Watt University, Edinburgh.

3 Acknowledgements Thanks are due to the members of Heriot-Watt University s SCHOLAR team who planned and created these materials, and to the many colleagues who reviewed the content. We would like to acknowledge the assistance of the education authorities, colleges, teachers and students who contributed to the SCHOLAR programme and who evaluated these materials. Grateful acknowledgement is made for permission to use the following material in the SCHOLAR programme: The Scottish Qualifications Authority for permission to use Past Papers assessments. The Scottish Executive for financial support. All brand names, product names, logos and related devices are used for identification purposes only and are trademarks, registered trademarks or service marks of their respective holders.

4 i Contents 1 Database Fundamentals Prior Knowledge and Revision Introduction Traditional Databases Database Management Systems (DBMS) Data, Entities and Relationships Summary Data Modelling Prior Knowledge and Revision Introduction Early Data Models The Relational Model Data Analysis and Specification Entity Occurrence Modelling Summary The Relational Model Prior Knowledge and Revision Introduction Data Structure Data Operators From Data Model to Database Keys Translating an Entity-Relationship Diagram into Tables Data Integrity Summary Representing the Relational Data Model Prior Knowledge and Revision Introduction Properties of Tables Relational Data Modeling Database Integrity: Primary Keys and Entity Integrity Summary Normalisation Prior Knowledge and Revision Introduction The Normalisation Process

5 ii CONTENTS 5.4 Why Normalise? Summary Glossary 47

6

7 1 Topic 1 Database Fundamentals Contents 1.1 Prior Knowledge and Revision Introduction Database Terminology Traditional Databases Problems with file-based systems The Database Solution Database Management Systems (DBMS) Advantages of Using a Database System Data, Entities and Relationships Relationships Summary Learning Objectives By the end of this topic you should be able to: define a database management system describe the problems associated with traditional paper-based databases define a database describe the advantages of using computerised databases identify data entities and relationships

8 2 TOPIC 1. DATABASE FUNDAMENTALS 1.1 Prior Knowledge and Revision The topic assumes that you have previously considered some of the basic principles of database design including the benefits of electronic systems in comparison with manual databases. This may have been done through the study of the corresponding Unit at Intermediate 2 or in Standard Grade Computing Studies, or any other course on Computing or ICT. Check your understanding by trying some revision questions. 1.2 Introduction This topic takes an introductory look at how the storage, processing and retrieval of information have developed from paper based systems to computerised databases. The problems encountered in such systems are covered and what needs to be considered to overcome them. Included in this topic are: 1. the terminology of databases 2. a look at tradition forms of databases 3. problems associated with traditional methods 4. advantages in using database systems Database Terminology A database is a collection of related information about a set of persons, objects or events. Databases have been around for longer than computers! examples of databases which are stored on paper are: Some common everyday an address book the Phone Book TM a recipe book which stores names, addresses and other contact details about people which stores names, addresses and telephone numbers about people or companies which stores ingredients and instructions about dishes Nowadays, most databases are stored electronically on computers or other electronic equipment. Examples include: your contacts list in your or instant messaging application names and numbers in your mobile phone playlists in an MP3 player which stores contact information about people which stores contact information about people which stores information about tracks of music A database management system (DBMS) is a software package which is used to create, manipulate and present data from electronic databases. Example include Microsoft Access, Filemaker Pro, MySQL, Oracle, Ingres, Postgres, SQL Server and many others.

9 1.3. TRADITIONAL DATABASES Traditional Databases Why do database systems exist? In order to understand the answer to this important question, it is useful to consider what was used before database systems were created. The first databases were manual paper-based systems. Often paper records were organised into folders, and stored in filing cabinets. There were several problems associated with paper-based databases: the storage of paper records was very bulky, often requiring several large filing cabinets it was very easy to mis-file a paper record, or for records to be lost or damaged data was often duplicated in several records keeping records up-to-date was difficult and time consuming, and often resulted in data inconsistency, where duplicated values were updated in one record but not in others many people were employed to maintain the records, which was costly searching for records was time consuming producing reports, such as sorted lists or data collated from several sources, was extremely time consuming, if not impossible The first computerised data processing systems were file-based systems with data stored in files, and application programs operating on that data. Although this system was an improvement on previous approaches where data was stored within the application itself, there were still various difficulties with file-based systems Problems with file-based systems If a system is developed following an evolutionary approach, with the systems being created and modified, then the structure of the data may need to change. Changes to this data means that all programs that refer to this data must also be changed. This problem is often known as data dependence.

10 4 TOPIC 1. DATABASE FUNDAMENTALS Data dependence Please visit the online course to view an interactive demonstration of this concept. In a file-based systems, the same information may well be duplicated in multiple files. For example, the full name and address of a customer may appear in several files, depending on what sort of orders they placed (small or large) and when they placed them. This is known as duplication of data, sometimes known as data redundancy because the same information does not need to be repeated. This duplication of data often leads to another problem, that of inconsistency of the data where different copies of the data do not agree. Data Redundancy Please visit the online course to view an interactive demonstration of this concept The Database Solution As a result of these problems a more effective solution was required. This solution is having a collection of related data in a database. This database is then controlled and accessed by means of a database management system (DBMS). It should be noted, however, that although a DBMS can help to address the problem, it still requires certain rules to be followed. In particular, if a database is not designed correctly, issues such as duplication leading to inconsistency can still occur. Good design of the system is therefore essential to avoid these potential problems. 1.4 Database Management Systems (DBMS) A database management system (DBMS, or database system) is a collection of interrelated data and a set of programs to manage and access that data. It provides the ability to present many different views of the database to support many different types of user. In fact, most database systems have a variety of people who use it, all for different purposes. Common categories of user include database designers, database administrators, application programmers as well as general users. These users all use the database system in different contexts with differing requirements. Here are some examples of different users of a database system: Database designers will be involved in tasks including storage and structure of the database. Database administrators will determine access authorisation for users. Application programmers will work on the development of application programs that operate on the data. General database users manipulate the data in the database system by means of application programs, or by using queries to interact with the database more directly.

11 1.5. DATA, ENTITIES AND RELATIONSHIPS Advantages of Using a Database System Computerised databases have several advantages over manual databases: The ability to store large amounts of data Searching and sorting operations can be performed much more quickly and easily than manually Information is more easily available to users, due to improved methods of data retrieval. Databases are usually stored and accessed using computers in a local area network (LAN). This means that any user using a computer on the LAN can access the data when required. (Please note, this is not a requirement of all database management systems as some smaller "personal" database management systems exist that cater only for single user applications. However, it is still a desirable feature, and can be essential for some applications.) Security and integrity of data is improved, due to the methods of data storage and updating There is no data redundancy: that is, no unnecessary duplication of data. In a relational database, data is never stored more often than is necessary Data consistency: this means that when data is updated, this is only done once, so removing the possibility of data inconsistency. As soon as the data is updated, all users have access to it. Data independence: the software which processes the data is separate from the data itself. This means that several programs can be developed to perform operations on the data without affecting the data itself. 1.5 Data, Entities and Relationships A data entity is something that information will be recorded about in a database. This could be a person, an object, or something abstract like a holiday booking. Examples of entities about persons include a customer, airline passenger, a library borrower, an employee, a school pupil, a college student. Example of objects include a product, a flight, a book, a job, a course at school or college. Examples of abstract entities include a borrowing, a booking, or a rental. An entity set is a collection of entities of the same type and is represented as a table in a database system. Each entity has a set of attributes which describe examples or instances of that entity. For example, the attributes of a school pupil might include the following: forename, surname, address, date of birth, year group, tutor group. An instance of a school pupil might be Joanna Hamilton, whose attributes are: Joanna, Hamilton, 10 Main Road, Kinglass, 01/04/1990, 5, SW. The attributes of a DVD Rental entity are code, title, cost, date out, date due and

12 6 TOPIC 1. DATABASE FUNDAMENTALS member number, and the attributes of the Member entity are member number, name and telephone number. Attributes will have constraints on what is permitted. For example, an attribute representing a person s name will contain characters only. This is called the domain of an attribute and will define the type and value of data that the attribute can hold. Examples of data types that attribute may use are: Text - e.g. a person s name Integer - e.g. the number of books borrowed from a library Real - e.g. the conversion rate for foreign currency Object - a photographic image Boolean - a True or False value Date - someone s date-of-birth Time - the departure of a train Entities Please visit the online course to view an interactive demonstration of this concept Relationships Entities can be linked, or related to one-another. When a link exists, it is called a. A school will have a headteacher, and that headteacher will have a school. This is a one-to-one relationship between the entities School and Headteacher. In the same school, a pupil will study many subject and each subject will be studied by many pupils. This is a many-to-many relationship between the entities Pupil and Subject. A one-to-many relationship in a school exists between the entities School and Pupil. Here a school has many pupils and each pupil attends one school. These relationships form the basis of how relational database systems are modelled. There is a table on the online course shows how some entities can be linked by a relationship. Some of these will be further exemplified in the next Topic 1.6 Summary By now you should be able to: describe a database in terms of information storage, processing and retrieval describe the problems associated with traditional paper-based databases

13 1.6. SUMMARY 7 describe the problems associated with early file-based systems describe the advantages of using a database management system identify data entities and relationships End of Topic Test An online assessment is provided to help you review this topic.

14 8 TOPIC 1. DATABASE FUNDAMENTALS

15 9 Topic 2 Data Modelling Contents 2.1 Prior Knowledge and Revision Introduction Early Data Models The Relational Model Data Analysis and Specification Entity Occurrence Modelling Relationships Keys Summary Learning Objectives By the end of this topic you should be able to: understand the historical motivation for the development of relational database systems identify other database models in addition to the relational model correctly interpret an Entity Relationship Diagram (ERD) design basic ERD from an application scenario

16 10 TOPIC 2. DATA MODELLING By the end of this topic you should be able to: understand the historical motivation for the development of relational database systems identify other database models in addition to the relational model correctly interpret an Entity Relationship Diagram (ERD) design basic ERD from an application scenario 2.1 Prior Knowledge and Revision Please test your Prior Knowledge using the online revision questions. 2.2 Introduction This topic builds on the fundamentals of databases and introduces the idea of modelling data. This is done by considering the early methods of data modeling that led to the development of the relational model. The relational model is introduced with some of the basic ideas covered. At this stage this extends to the use of ER diagrams as a representation of a relational data model. 2.3 Early Data Models The first database systems (in the 1960s) were all hierarchical, with data being stored as trees of records and links as shown below:

17 2.3. EARLY DATA MODELS 11 Department Project Employee Manager Worker Figure 2.1: Hierarchical StorageModel This type of model is good for one-to-many relationships between entities e.g. a Department having many Employees, with each Employee assigned to just one Department. However, this model would be poor for many-to-many relationships as it is difficult to structure a hierarchy with multiple parent nodes for each child. For example within a Department, many Projects could be underway with many Workers involved. This would be difficult to show using a hierarchical model. Another type of model is a network DBMS (developed from the 1960s), with data again being stored as records and links as shown below:

18 12 TOPIC 2. DATA MODELLING Employee 1 Employee 2 Project A Department Employee 3 Project B Employee 4 Figure 2.2: Networked DBMS Storage Model Here the many-to-many relationship between Project(s) and Employee(s) can be represented. This logical data model is better than the hierarchical model for manyto-many relationships as the top-down hierarchy no longer needs to be enforced. This allows nodes to have multiple relationships with other nodes. Both network and hierarchical systems provide an efficient means of storing large amounts of data and were a a significant improvement on file-based systems. However, there are some problems associated with network and hierarchical data models. Both models are difficult to navigate and require complex programs to be written to answer a query. Other problems include data dependence where changing the data can mean that application programs have to be changed. 2.4 The Relational Model The hierarchical and network models do not offer data independence and therefore a better solution was needed. A model that was designed to solve some of these problems is the relational data model. The relational model was designed to include: data independence a sound theory querying without having to traverse a tree or graph to find information The relational data model was developed in the 1970s and is currently the most common database model. Example database management systems based on this model include

19 2.5. DATA ANALYSIS AND SPECIFICATION 13 Oracle, Ingres, IBM DB2, mysql and Microsoft Access. In the relational model, data is stored as records in tables and is interrogated using the Structured Query Language (SQL). 2.5 Data Analysis and Specification Before a designer can create a database using any model, an understanding of the requirements of the system is necessary. This analysis of the data will lead to the choice of tables, identification of keys, and an understanding of the requirements of users of the database. At this early stage of the design process the system is considered in terms of concepts and ideas produce an abstract model of the required system. In the relational model, this design makes use of Entity-Relationship (E-R) modelling. This modelling is independent of the choice of relational database used to implement the database so it does not matter at this stage whether the system is to be implemented on a large scale using something like Oracle, or on a small scale using a system such as mysql, Microsoft Access or Filemaker. The modelling of the requirements is separate from the final implementation so that important design decisions can be taken independently of a particular product. 2.6 Entity Occurrence Modelling Entity occurrence modelling was developed to help design databases by modelling with objects (entities) and relationships between these objects. The properties associated with an entity are known as the attributes of the entity. Entity Relationship Diagrams (ERDs) use a limited number of components and are easily understood as illustrated below: Attribute Attribute Attribute Entity Entity Figure 2.3: An example of an Entity Relationship Diagrams (ERDs) ERDs can include entities (represented by rectangles), attributes of these entities (ellipses) and finally relationships (lines with a trapezoid label).

20 14 TOPIC 2. DATA MODELLING Relationships Entities can be associated with each other by relationships. The number of entities that another entity relates to is known as the mapping cardinality, or the degree of the relationship, and can be one of the following: One-to-one An entity is related to at most one other entity, written as a 1:1 relationship One-to-many An entity is related to more than one other entity, written as a 1:M relationship Many-to-one A number of entities are related to one entity, written as a M:1 relationship Many-to-many Many entities relate to many other entities, written as a M:N relationship. One to One Examples Bowmore High School Martin Greer St. Cuthbert's Academy Scott Smyth St. Joseph's Secondary Sarah Kelly Lochmaben High School Paul Fryer Tyree High School Christopher Onions Figure 2.4: An example of various One to One relationships Each School has one Head teacher and each head teacher has one school. This can be represented in an ERD as 1:1 School Headteacher One to Many Examples Figure 2.5: One to One (1:1)

21 2.6. ENTITY OCCURRENCE MODELLING 15 Standard Grade English Tom Dickson Higher English Intermediate English English for Speakers of Other Languages Standard Grade Computing Elaine Rankin Higher Information Systems PC Passport Figure 2.6: A few examples of One to Many relationships Here a teacher can teach many courses and each course is tought by one teacher. This can be represented in an ERD as a one to many relationship Teacher Course Figure 2.7: One to Many (1:M) Many to Many Example On the online course you can see an interactive example of a Many to Many relationship This can be represented in an ERD as a many-to-many relationship. Pupil Course Figure 2.8: Many to Many (M:N) Keys It is important to be able to distinguish between different entities. This is achieved by using one or more attributes of that entity as the key that identifies an entity. A key is a set of attributes that uniquely identifies an entity. For example, the order number attribute of an order entity is a unique identifier thus order number is a key. When looking for the key to an entity, there may be more than one attribute or

22 16 TOPIC 2. DATA MODELLING combination of attributes that could uniquely identify the entities. These attributes under consideration are referred to as candidate keys. Once a key is chosen from the candidate keys as the principal identifier for an entity set, this becomes the primary key. 2.7 Summary By the end of this topic you should be able to: undertand the historical motivation for the development of relational database systems identify other database models in addition to the relational model correctly interpret an ERDs design basic ERDs from an application scenario End of Topic Test An online assessment is provided to help you review this topic.

23 17 Topic 3 The Relational Model Contents 3.1 Prior Knowledge and Revision Introduction Data Structure Data Operators From Data Model to Database Keys Translating an Entity-Relationship Diagram into Tables Data Integrity Entity Integrity Referential Integrity Data Dictionary Data Validation Summary Learning Objectives By the end of this topic you should be able to: describe the relational data model describe the correspondence between entity-relationship diagrams and relational data models. explain how data is stored in tables within a database define a key field define atomic and compound keys identify candidate keys for an entity select a primary key for an entity describe a surrogate key use foreign keys in a data model understand the term entity and referential integrity

24 18 TOPIC 3. THE RELATIONAL MODEL By the end of this topic you should be able to: describe the relational data model describe the correspondence between entity-relationship diagrams and relational data models. explain how data is stored in tables within a database define a key field define atomic and compound keys identify candidate keys for an entity select a primary key for an entity describe a surrogate key use foreign keys in a data model understand the term entity and referential integrity 3.1 Prior Knowledge and Revision This topic takes a closer look at the relational database model and builds on the introductory work already covered in previous Topics. In order to check your understanding of previous work, take the revision questions on the website. 3.2 Introduction This topic looks at the relational data model which is the most common data model in use today. This model is defined by the following features: data structure data operators data integrity 3.3 Data Structure The structure of data within the relational data must comply with the following rules: Each table in the database is identified by a unique name Each column in a table is identified by a unique name within that table The data in each column of a table is same domain and type

25 3.4. DATA OPERATORS 19 Each row in a table is unique Each cell in a table contains one value only Rows and columns in a table can be in any order Columns Columns within a table must be identified with a unique name. However it is acceptable to have the same name of a column used in separate tables in the same database. The example below shows two tables from the same database, each with columns called Name and Subject StudentID Name Subject TeacherID Name Subject The data in each column must be from the same domain and type. In the table above, example of the data in the Subject column would include English, Maths, Geography etc. An entry of 101, for example, in this column would be incorrect. Rows Each row within a table must be unique i.e. no row should appear more than once. Data Items Client No. Client Contact Date Facility 01/09/05 Pool Eastdyke YC Alan Donnelly 03/09/05 15/09/05 Hall Pool Data items should also be atomic to help search and maintain the database. However, non-atomic data items are still found in many implemented database. In the example above the Contact name is a multiple data item with two components. This would be better represented by decomposing Contact as Forename and Surname 3.4 Data Operators Operations on data include: enter delete amend retrieve In a relational database these operations are carried out by data operators to allow data to be inserted, deleted, amended, and retrieved respectively.

26 20 TOPIC 3. THE RELATIONAL MODEL 3.5 From Data Model to Database As has been already mentioned, the relational model uses the following structure: Data is stored as records in tables; The model is interrogated using a query; This topic will describe the relational data model in detail. Most DBMS use the terminology tables and records but these are sometimes referred to as relations and tuples respectively. Each column in a table is known as an attribute. Each table has a heading, and each row in the table is a record. Table Attributes Rows Name Street Town Joe Smith 1 Brown Road Newton Omar Singh 3 Straight Street Littletown Eng Min Lee 22 Station Road Bigtown Once the entities, attributes and relationships in the data model have been defined, a database can be created. Each entity in the data model becomes a table in the database. Each attribute of an entity becomes a field in a table in the database. It is not always easy to work out exactly which entities or tables are required in a relational database. This is particularly so because many-to-many relationships between entities are not desirable. A procedure called normalisation is used to help determine which entities are required. The list of possible values for an attribute is known as its domain. Each attribute must have a distinct name within the table as is shown below:

27 3.6. KEYS 21 Name Owner Species Sex Birth Death Fluffy Harold Cat F NULL Claws Gwen Cat M NULL Buffy Harold Dog F NULL Fang Benny Dog M NULL Bowser Diane Dog M Chirpy Gwen Bird F NULL Whistler Gwen Bird NULL NULL Slim Benny Snake M NULL Puffball Diane Hamster F NULL The ordering of rows and columns is not significant. Each item must be atomic, i.e. only single items are allowed. A database may contain multiple tables. Both the relational database itself and each table within it must have unique names assigned to them. 3.6 Keys In order to establish the relationships between the tables in the database, each record in a table must be able to be uniquely identified. A key is a field, or set of fields, whose values uniquely identify a record. In any table, there may be more than one field, or set of fields, which can uniquely identify each record-these are called candidate keys. The candidate key which is chosen to be used is called the primary key. Once a primary key is selected, and implemented in a database, it is wise to avoid changing anything about it as these keys act like the foundation stones of the database. Occasions may arise when you may want to change the data in primary key. For example, the contact name of someone in an organisation could be a candidate the primary key in a table called company. However, it is likely that the contact person for a company will change over time which would require significant updating if the contact name was the primary key. It is better to select a primary key that will not change in the course of time. In this example, company name would be better, but that too could change. As far as the database is concerned, arbitrary values are best. So, in this case, the best name for primary key of the company table would be something like company code with numbers identifying the company i.e. company 1, 2, 3 etc.. A key consisting of only one field is called an atomic key. A key which is made up of a set of fields is called a compound key. Sometimes it is necessary to "make up" a key to uniquely identify each record. This is called a surrogate key. A surrogate key is usually a number or similar code, which has

28 22 TOPIC 3. THE RELATIONAL MODEL no meaning or purpose other than to uniquely identify each record. Surrogate keys are used when there is no candidate key available, or when the only candidate key is a large compound key. What makes a good primary key? In general, the key should be as short and simple as possible: an atomic key is a better choice than a compound key a numeric field is usually a better choice of key than a text field. This is because they can be stored more compactly in the database than text fields. A key must also have the following properties: a key must have a value-it can never be blank (or "null"). a key should not change over time. You can now take the online Activities "Choosing a Key" and "Foreign Keys" 3.7 Translating an Entity-Relationship Diagram into Tables Databases can be designed and specified by means of Entity-Relationship diagrams. However, it has not been shown how to take the information described in an E-R diagram and form tables for a database. In fact, any database that is specified by an E-R diagram can also be represented by a collection of tables. In other words, an E-R design can be converted into a relational design. Converting an E-R design to a relational design makes use of the following steps: Convert each entity set to a table Convert many-to-many and one-to-many relationships Convert each relationship set to a table of relationships Each strong entity set is converted to a table, with each attribute of the entity becoming a column in the table. The primary key for each table should be identified along with other candidate keys as appropriate. In a similar way, each weak entity set is converted to a table consisting of those attributes of weak set combined with the corresponding attributes (i.e. primary key) from the strong set that owns the current set. Each relationship set becomes a table of relationships with one column for each attribute of the set. Only one-to-one and one-to-many relationships need to be represented as many-to-many can be split into two one-to-many relationships. For a 1:1 relationship a foreign key is added to one or both tables that are related. For example, consider the case that a customer and an address are related by a 1:1

29 3.8. DATA INTEGRITY 23 relationship (each customer has an address, and each address is associated with at most 1 customer). In this case the Customer table might have a primary key of customer id, and the Address table might have a primary key of address id. In this case, the customer table would also include a foreign key containing the address id of the Address table, and the Address table would include a foreign key containing the customer id. Foreign keys are also used to represent one-to-many relationships. A single foreign key represents a one-to-many relation, where the "one" portion of the relation is the reference from the foreign key, and the "many" portion has the foreign key. For example, consider the case of a customer who can place multiple orders. The relationship between customer and order is one-to-many (where one customer can place multiple orders, and each order matches only 1 customer). In this case, the Order table might have a primary key of order number, and the Customer table might have a primary key of customer id. The Order table would therefore contain a foreign key that references the customer id field in the Customer table. 3.8 Data Integrity A user of a database must have confidence and trust in the data that is stored. The data must accurately reflect what is reality. To achieve this a database uses the rules of entity integrity and referential integrity Entity Integrity For an entity to be stored in a relational database, it must possess two characteristics: It must have a primary key which is "non-null", i.e. its value will never be blank It must have no multi-valued attributes Together, these two characteristics define what is called "entity integrity" Referential Integrity The rules for referential integrity ensure that data linking tables together is present, and accurate. Tables are linked by using foreign keys, therefore referential integrity is concerned with foreign keys and how they link to a primary key in a related table. To ensure referential integrity between linked tables, a foreign key should be linked to the primary key of a related table. This would assume that a foreign key should always be present in order to establish the link. However, there are circumstances when a foreign key can have a null value. The following example illustrates this. A school football club three teams U13, U15 and U16s. Each team has pool of 15 players, and one player from each team is the Captain. Extracts from two tables from the football database are shown below: Captain

30 24 TOPIC 3. THE RELATIONAL MODEL Captain name Greig McNeil Charlton Player Team U13 U15 U16 Player name Team Captain name Simpson U13 McNeil Henderson U15 Greig Gemmell U13 McNeil Ferguson U15 Greig McNeil U13 Law U18 Charlton The primary key for the Captain table is Captain name. This attribute appears in the related table Player where it is used as a foreign key to link the two tables. To ensure referential integrity each player should have a captain. However, McNeil is a player and a captain, so Captain name in that row is null Data Dictionary A data dictionary is a formal way of defining what is in a database. It could be considered as data about data! This is often called metadata. The data dictionary is a flat table describing and naming the tables that form a database together with the attributes of the data held in each table. This will include: the names of each table; the data type, size and format of each attribute; the identification of keys and index; and a range of validation checks. The data dictionary is an integral part of a relational database and is used to define, control and validate the data and operations in the database. Here is an example of how table attributes are defined in a data dictionary: Table Name Type Size Validation Key/Index Player Player name Text 50 Required Yes, Primary Player Team Text 3 Required No Player Captain nametext 50 Required Yes, Foreign Data Validation Data validation is carried out by defining, in the data dictionary, the constraints on data being accepted into the database. Validation checks will include:

31 3.9. SUMMARY 25 domain constraints e.g. a surname is not numeric presence e.g. data that is part of a primary key must be present range e.g. values for a temperature will not be less than -273C permitted values e.g. American shoe sizes must be integers between 17 and 48 but not 22, 25, 27, 30, 32, 35 or 37 format e.g. a post code 3.9 Summary By now you should be able to: describe the relational data model describe the correspondence between entity-relationship diagrams and relational data models explain how data is stored in tables within a database define a key field define atomic and compound keys identify candidate keys for an entity select a primary key for an entity describe a surrogate key use foreign keys in a data model End of topic Test End of topic Test An online assessment is provided to help you review this topic.

32 26 TOPIC 3. THE RELATIONAL MODEL

33 27 Topic 4 Representing the Relational Data Model Contents 4.1 Prior Knowledge and Revision Introduction Properties of Tables Relational Data Modeling Anytown High School ACTIVITIES Database Integrity: Primary Keys and Entity Integrity Database Integrity: Foreign Keys and Referential Integrity Summary Learning Objectives Prior Knowledge and Revision Learning Objectives Introduction Properties of Tables Relational Data Modeling Database Integrity Primary Keys and Entity Integrity Foreign Keys and Referential Integrity

34 28 TOPIC 4. REPRESENTING THE RELATIONAL DATA MODEL By the end of this topic you will be able to: Prior Knowledge and Revision Learning Objectives Introduction Properties of Tables Relational Data Modeling Database Integrity Primary Keys and Entity Integrity Foreign Keys and Referential Integrity 4.1 Prior Knowledge and Revision In order to check your understanding of previous work, take the revision questions on the website. Revision Questions 10 min There is an Online Assesment at this point 4.2 Introduction This topic builds on the previous one and develops a method for representing a relational database model. To do this successfully the relational database modeler must understand full how the properties of tables in the model affect the final model. Only then can the model be developed to show the relationships between the different entities. Finally, the integrity of the database has to be ensured and this is done by careful consideration of primary and foreign keys and an understanding of the these keys role determine that integrity. 4.3 Properties of Tables You have seen how tables are used to show how data can be represented in a database. The use of columns and rows show how data is structured and how values of data items can be represented. However, in relational data modeling a table has to adhere to a set of rules: i ii Every table in the database must have a unique name. Each value of a data item in a table must be atomic i.e. where a row and column

35 4.4. RELATIONAL DATA MODELING 29 intersect, there is only one value. For example, this table could not be used in a relational database model: Candidate Number Exam Subject English English, History History, French, Spanish English History iii The values that are held under a column, i.e. the values of an attribute, must be in the same domain. Why would the table below not obey this rule? Candidate Number SName Course Anderson Anwar Spanish Scanlon Weir Woodrow Spanish iv v vi Each column has a unique name, i.e. each attribute in the table is distinct. In a relational database the name of an attribute (i.e. the column name) is used for reference e.g. when searching the table, so each attribute must be unique. Each row in the table is unique, i.e. no row is duplicated anywhere else in the table. This is a fundamental principle of databases. Each row will contain a primary key that is used to identify the existence of a row in a table. Since the primary key must be unique, then we can say that each row must be unique. The order of the rows and the columns are not significant. In (iv) and (v) above we have seen how reference to columns is by name (e.g. when searching) and reference to rows is by primary key, so as far as the database model is concerned, it is not necessary to have the columns or rows in any particular order. Reference to column 3 or row 5 has no meaning. Compare the table below to the table in (iii) above. What can you deduce? Candidate Number SName Course Anderson Scanlon Anwar Spanish Woodrow Spanish Weir Relational Data Modeling This section builds on previous study of entities, attributes and relationships and the modeling techniques used i.e. ER Diagrams

36 30 TOPIC 4. REPRESENTING THE RELATIONAL DATA MODEL Anytown High School Consider some of the entities found in a school: There are pupils, there are courses, there are teachers (to teach the course) and there are teachers to advise pupils (lets call them guidance teachers). Of course, we could identify lots of other entities (like classrooms, lockers, cleaners, cooks, library books, timetables, appointments, homework, exam grades, etc. etc. etc), but like most data models you will study on this course it will be simplified to help understanding. Firstly, consider the entities of pupil, and course: This cannot be represented as two tables in a relational database since several Pupil identifiers(providing a foreign key to link the table) would be needed in each entry of the Course table because the same pupil can study many courses. Similarly several Course identifiers would be needed in each entry of Pupil. That would mean that in the Course table there would be multi values identifying all the pupils doing a particular course and in the Pupil table, multiple values identifying all the course each pupils studied. This breaks the rule (in both tables) of data items requiring to be atomic. 1:M & M:N Relationships This activity takes the Anytown High Scool example and shows how relationships can be refined to overcome the problems with M:N relationships. For example, another table can be introduced that will represent unique instances of pupils studying courses. By taking the primary keys of the existing entities we can combine them to form a new entity: A Pupil can be identified by an unique number called pupilid and a course by courseid. Combine these to form the primary key of the new table (called Enrollment) so that every instance of a pupil on a course is uniquely identified. Separately, these attributes become foreign keys in the new table and provide the link to the two existing tables showing the 1:m relationships ACTIVITIES Redraw the ER Diagram in the stages described above and include verbs to describe the relationships between the tables in the model. ER Diagram 1 10 min Redraw the ER Diagram in the stages described above and include verbs to describe the relationships between the tables in the model. From the following scenario identify the entities and relationships and construct an E-R Diagram: ER Diagram 2 30 min Kirklee Vegetable Club is a club of gardeners keen on growing vegetables in the allotment plots that are leased from the Council. Each member can rent more than one plot to grow vegetables. Each year the Club records details of the amount vegetables that are grown on each plot (i.e. its produce) as there are prizes awarded to the plot that produces the largest yield of each type of vegetable. Each plot has a unique number and

37 4.5. DATABASE INTEGRITY: PRIMARY KEYS AND ENTITY INTEGRITY 31 members are identified by their membership number. The vegetables that are grown are identified by the vegetable name (e.g. carrot, potato etc). Any vegetable type may be grown in any of the plots. Three unique identifiers are mentioned in the scenario and you will have to introduce a new entity to deal with the m:n relationship Task - Identify the entities and relationships and draw an E-R Diagram that would enable this scenario to be modelled for a relational database. 4.5 Database Integrity: Primary Keys and Entity Integrity Identifying and declaring a primary key in a table introduces an inherent restriction on the table i.e. that two rows in the table cannot be the same. This allows each row to be identified. For example, the details of pupil in a school database can be found by using a unique identifier i.e. the primary key. If the database is relational then the details of that pupil will be recorded once to avoid data redundancy. There may be several attributes of a pupil that are unique e.g. Scottish Candidate Number (SCN), School Enrollment Number, and possibly National Insurance Number. Only one of these candidate keys can be used as the primary key. If the SCN is used, and no two SCNs are the same, then the uniqueness of each pupil record is guaranteed. Also, if the primary key is the SCN, then the primary key must have a value for the pupil to exist on the database. This is the same for any primary key in any table and ensures entity integrity i.e. the primary key ensure the existence of a record in a table. In the pupil example, the value for National Insurance may not exist, but since NI Number is not the primary key, then entity integrity is maintained i.e. a pupil can exist on the school s database without necessarily having a National Insurance Number Database Integrity: Foreign Keys and Referential Integrity A foreign key in one table has the same values of the primary key in another table. For example the attribute used to identify a pupil (SCN) in the Enrolment table for the Anytown High School model, will be a foreign key because the same attribute is used as the primary key in the Pupil table. It follows that both must have the same value in order to identify a pupil. This is known as referential integrity. The same applies to the attribute identifying the Course that is been studied and both together i.e. the pupil identifier and the course identifier, form the composite primary key of the enrolment table. (The same entity integrity applies and for each enrolment to exist it must have a pupil and a course identified.) Referential integrity ensures that a link (or relationship) exists between tables. Not only does it ensure the link, it defines the relationship as 1:m i.e a primary key in one table will exist once, but may appear many times as a foreign key in other tables.

38 32 TOPIC 4. REPRESENTING THE RELATIONAL DATA MODEL 4.6 Summary By the end of this topic you will be able to: Prior Knowledge and Revision Learning Objectives Introduction Properties of Tables Relational Data Modeling Database Integrity Primary Keys and Entity Integrity Foreign Keys and Referential Integrity End of topic Test End of topic Test An online assessment is provided to help you review this topic.

39 33 Topic 5 Normalisation Contents 5.1 Prior Knowledge and Revision Introduction The Normalisation Process Unormalised Form (UNF) First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Normalisation in a Nutshell Beyond 3NF Why Normalise? Insertion Anomolies Deletion Anomolies Update Anomalies Summary Learning Objectives By the end of this topic you will be able to; analyse and extract data from up to two source documents to produce a model in un-normalised form (UNF) identify and remove repeating groups of data from UNF to produce a model in first normal form(1nf) identify and remove partial dependencies from 1NF to produce a model in second normal form (2NF) identify and remove transitive dependences from 2NF to produce a model in third normal form (3NF)

40 34 TOPIC 5. NORMALISATION By the end of this topic you will be able to; analyse and extract data from up to two source documents to produce a model in un-normalised form (UNF) identify and remove repeating groups of data from UNF to produce a model in first normal form (1NF) identify and remove partial dependencies from 1NF to produce a model in second normal form (2NF) identify and remove transitive dependences from 2NF to produce a model in third normal form (3NF) 5.1 Prior Knowledge and Revision The topic assumes that you have previously worked with databases that use linked tables with a single one-to-many relationship and that you can identify and remove multi-valued fields found in records. This may have been done through the study of the corresponding Unit at Intermediate 2, or another course on Computing or ICT. In order to check your understanding of previous work, take the revision questions on the website. Revision Questions 10 min 5.2 Introduction So far we have dealt with entities and the relationships between them and considered the table structure of how these entities can be represented in a relational database model. This topic considers how entities, tables and records can be structured into a form that makes up a relational model. This form is called normal form and this topic will describe a methodical process that takes entities, relations and records to that required form. This process is called normalisation and can be considered in stages from an unnormalised model (UNF), to first normal form (1NF), to second normal form (2NF), and finally to third normal form (3NF). Data can be modelled to higher levels of normalisation but that is beyond the scope of this course. When data is in 3NF it is in a form that can be implemented and used effectively within a relational database software package. The process of normalisation takes account of the relationships between entities that we have been considering in previous topics and the importance of keys within the tables considered. We know that we have a normalised model when every attribute in a table that is not part of the primary key is determined by the primary key and nothing else. We will use the example of the vegetable club to go through the process of normalisation and then consider the problems that arise if the model is not normalised.

41 5.3. THE NORMALISATION PROCESS The Normalisation Process The best way to consider the normalisation process is by working through an example and then practising on other scenarios. The example we will use to take you through the process will be the vegetable club model that was considered earlier. We will look at this methodically to enable you to apply the process to any example you may encounter. We start by analysing an existing scenario and data recording methods (if any exist). From that we can produce an UNF model of the data, progressing to 1NF, 2NF and finally 3NF Unormalised Form (UNF) Case Study: Kirklee Vegetable Club Kirklee Vegetable Club is a club of gardeners keen on growing vegetables in the allotment plots that are leased from the Council. Each member can rent more than one plot to grow vegetables. Each year the ClubRecords details of the amount vegetables that are grown on each plot (i.e. its produce) as there are prizes awarded to the plot that produces the largest yield of each type of vegetable. Each plot has a unique number and members are identified by their membership number. The vegetables that are grown are identified by the VegetableName (e.g. carrot, potato etc). Any vegetable type may be grown in any of the plots. For every vegetable season the Club secretary keeps card records for each plot. An example is shown below: Kirklee Vegetable Club 2006 PlotNumber 5 PlotSize 1.2 ha Member No 102 Member Name James MacArthur VegetableName Yield ClubRecord Carrot 5.5 Kg 10.4 Kg Potato 23.2Kg 36.8 Kg Onion 12.3Kg 16.4Kg At the end of the season, a card similar to this will be produced for all 64 plots on the allotment site. Notes: This is a simplified example that would have limited use in reality e.g. variety types of the same vegetable are not recorded and it is assumed that the model is used only for the one season of growing: only the ClubRecord for yields are carried over from previous seasons. However, there is enough in the model to take you through the normalisation process which is why it has been simplified. Do not be too concerned about the model not reflecting reality, as it is the process of normalising that you are more concerned with at this stage. As you get more comfortable with the process, then you will be able to normalise more complex situations, but at this level of course, all examples will be simplified in order to keep you focussed on the process. (Details of levels of treatment of examples are outlined in the Course Arrangements e.g. in the examination you will only ever be presented with two source document like the card above, and you will end up with a maximum of four tables when

42 36 TOPIC 5. NORMALISATION your model is normalised.) UNF - stage (i) The first stage in producing an unormalised model is to create a single entity that lists all the attributes that are recorded on the original source documents and in the scenario: PlotNumber PlotSize MemberNumber MemberName VegetableName Yield ClubRecord Note: if the same attribute appears twice in the source documents, you only need to have it in the list once UNF - stage (ii) Choose an attribute that will identify each record. (In data modelling terms we are selecting a primary key). The scenario tells us that cards are kept for each plot, so the PlotNumber should uniquely identify each card. Indicate this by underlining the attribute: PlotNumber PlotSize MemberNumber MemberName VegetableName Yield ClubRecord UNF - stage (iii) From the source document(s) identify what attributes have more than one entry. From the record cards we see that VegetableName, Yield, and ClubRecord have more than one value. This is called a repeating group of attributes and should not be confused with a multi-valued attribute. There is one multi-valued attribute in this scenario - can you identify it? Repeating groups of attributes should be identified in the UNF by indenting the list (or by some other means)

43 5.3. THE NORMALISATION PROCESS 37 PlotNumber PlotSize MemberNumber MemberName VegetableName Yield ClubRecord UNF - stage (iv) Finally, the entity should be named and the attributes enclosed in brackets: Plot (PlotNumber PlotSize MemberNumber MemberName VegetableName Yield ClubRecord) We now have the data model in UNF UNF Interactivity The following activity takes you through the UNF process that you have just followed. UNF Interactivity First Normal Form (1NF) We could consider a formal definition of when entities are in 1NF, but that would just confuse you. The simplest way to get to 1NF is to remove repeating groups and form a new entity with them. As it stands, Plot has repeating values for VegetableName, Yield and ClubRecord. An entity in this state would be impossible to implement in a database and to match the correct VegetableName with its associated values for Yield and ClubRecord. By splitting the entity into two separate entities then it would be possible to link the two entities using a foreign key but 1NF models would have some important anomalies limiting its use. These anomalies are considered later on in this topic NF stage (i) Remove the identified repeating group and give it a name:

44 38 TOPIC 5. NORMALISATION Plot (PlotNumber PlotSize MemberNumber MemberName) Vegetable (VegetableName Yield ClubRecord) These two tables are now separated but we need to show how they are linked NF stage (ii) We know that here is a link between them as a plot can produce many vegetables. The primary key (i.e. PlotNumber) provides the link between the entities and should be included in both: Plot (PlotNumber PlotSize MemberNumber MemberName) Vegetable (VegetableName Yield ClubRecord PlotNumber) The attribute PlotNumber is the primary in Plot and is a foreign key in Vegetable. Foreign keys should be indicated with an asterix: Plot (PlotNumber PlotSize MemberNumber MemberName) Vegetable (VegetableName Yield ClubRecord *PlotNumber) NF Stage (iii) All that is left to do before the model is in 1NF is to identify the primary key in the second entity. A primary key must be able to identify a unique occurrence of the entity. The VegetableName alone would identify vegetable with the same name of which there would be many and therefore not unique. Identifying the primary key can be help by consider what it is that the entity represents. In

45 5.3. THE NORMALISATION PROCESS 39 its present form the Vegetable entity records details of a vegetable grown on a particular plot. So each occurrence of this entity could be identified by the VegetableName and the PlotNumber. This is called a composite key and indicated by underlining all the attributes in it and listing them together: Plot Vegetable (PlotNumber PlotSize MemberNumber MemberName) (VegetableName *PlotNumber Yield ClubRecord) The model is now in 1NF NF Activity The following activity takes you through the 1NF process you have just followed. 1NF Activity Second Normal Form (2NF) Tables are in 2NF when every non-primary key attribute is determined by the entire primary key. In the first table Plot, all attributes other than PlotNumber, which is the primary key, can be determined by the primary key. What this means is that as it stands the non-primary key attributes are dependant on the primary key. You may be thinking that Member Name is dependent on Member Number, but in this form, Member Number is not the primary key and so Member Name is indirectly dependent on the primary key i.e. Member Name is dependent on Member Number which in turn is dependent on PlotNumber, so we can conclude that Member Name is dependent on PlotNumber. This is called transitive dependency which we shall return to in the next normalisation form. The table Vegetable has a composite key consisting of VegetableName and PlotNumber. Vegetable (VegetableName *PlotNumber Yield ClubRecord) Each non-primary key is dependent on the primary key, but are they each dependent on the entire primary key? If not, the table is not in 2NF. The Yeild of a vegetable is determined by the the type of vegetable it is (i.e. the VegetableName) and where it is grown (i.e. the PlotNumber). There will be many values of Yield for the same VegetableName that can be stored in this form, so in order to find a unique occurrence of a particular vegetable yield we must know both the PlotNumber and the VegetableName. We can conclude that Yield is entirely dependent on the primary key and this part of the model is in 2NF.

46 40 TOPIC 5. NORMALISATION What about ClubRecord? This is a value for the largest amount of a vegetable grown in the Club in any one year. This vegetable could have been grown in any plot i.e. it is not determined by the PlotNumber; it is determined by the VegetableName. This means that ClubRecord is not fully dependent on the entire Primary Key and therefore not in 2NF. Since it is dependent on part of the primary key we call this a partial dependency and in order to get to 2NF we must split the entity to remove this condition. (VegetableName (VegetableName *PlotNumber ClubRecord) Yield) When splitting entities to remove partial dependencies, we must ensure that existing primary key is not split. VegetableName and PlotNumber is still the primary key on the left hand entity, but VegetableName alone becomes the primary key of the entity on the right. For this example it would be appropriate to assign the name of Vegetable to the entity on the right and assign a new name to the entity on the left. So we would have: Produce (VegetableName Vegetable (VegetableName *PlotNumber ClubRecord) Yield) The primary key in the entity we are now calling Vegetable is VegetableName. This attribute is left in the entity we are now calling Produce as a foreign key and determining the relationship between the entities. So indicating the keys would give: Produce (*VegetableName Vegetable (VegetableName *PlotNumber Yield) These entities are now represented in 2NF The three entities together would be: ClubRecord) Plot Produce (PlotNumber PlotSize MemberNumber MemberName) (*VegetableName *PlotNumber Yield) Vegetable (VegetableName ClubRecord) The relationship between these entities would be represented by an E-R Diagram. The attribute PlotNumber now appears in the entities Plot (as the primary key) and in Produce (as a foreign key), so that must be where the relationship exists. Furthermore, we can deduce from the keys that the relationship is 1:n. Notice that by normalisation to 2NF we have removed the m:n relationship between Plot

47 5.3. THE NORMALISATION PROCESS 41 and Vegetable. 2NF Activity Third Normal Form (3NF) Entities in third normal form ensure that every attribute is dependent on the primary key and nothing else. When we consider the Plot entity we discovered that although PlotNumber was the primary key identifying each occurrence of the entity, MemberName is also determined by MemberNumber i.e. MemberName is transitively dependent on PlotNumber. Plot (PlotNumber PlotSize MemberNumber MemberName) In 3NF it is not permissible to have attributes transitively dependent on the primary key. To remove transitive dependencies we must create a new entity with MemberNumber the primary key which determines MemberName: Plot (PlotNumber Member (MemberNumber MemberNumber MemberName) MemberName) MemberName becomes the linking attribute and therefore the foreign key in Plot indicating a m:1 relationship: Plot (PlotNumber Member (MemberNumber *MemberNumber) MemberName) The model would now appear to be in 3NF but if we check the Rules for normalisation we discover that the attribute MemberName is not atomic i.e. it is made up of two character strings representing the members first name and surname. So by splitting this attribute into atomic values we complete the normalisation process: Plot (PlotNumber PlotSize *MemberNumber) Member (MemberNumber MemberFName MemberSName) Produce (*VegetableName *PlotNumber Yield) Vegetable (VegetableName ClubRecord)

48 42 TOPIC 5. NORMALISATION The normalisation process has only produced 1:m relationships which is the intended outcome. 3NF Activity Normalisation in a Nutshell When presented with a scenario to normalise the following stages are required: 1. To get to UNF a) From the scenario (and source documents) list all the attributes you can identify b) Identify any repeating group of attribute and indent them on the list c) Choose an attribute to act as a primary key which determines the rest d) Close the list with brackets and give it a name 2. To get to 1NF a) Form a new entity from the repeating group b) Include the primary key from UNF to act as foreign key in the new entity c) Identify the primary key in the new entity d) Add brackets and give it a name 3. To get to 2NF a) Identify and remove partial dependences making sure that primary keys are present as foreign keys in any new entities b) Note: this will only apply where there is a composite key (i.e. more than one attribute) in a 1NF entity and where some non-key attributes are dependent only on part of the primary key 4. To get to 3NF a) Identify and remove transitive dependencies again ensuring the primary/foreign key link b) Note: transitive dependent attributes and dependent on the primary key via a non-key attribute 5. Finally check the normalisation rules for any other anomalies e.g. non-atomic attributes Beyond 3NF At 3NF a relational data model can be adequately implemented and further levels of normalisation are not required. However, they do exist but are beyond the scope of this course. In fact, most of the relational models that you will encounter will satisfy the rules of higher levels of normalisation than 3NF. You will come across a similar type of situation at the normalisation levels that you will be dealing with in this course. For example, it is not uncommon in some cases to find

49 5.4. WHY NORMALISE? 43 that when moving to 2NF, 3NF is also achieved. All this means is that no transitive dependencies where encountered at 2NF. So for the sake of meeting the requirements of this course, what you have considered so far in terms of normalisation will allow you to implement you data models later on in the course. 5.4 Why Normalise? This section examines the benefits brought about by having a normalised data model. Earlier on in the course we touch on problems associated with filing systems that were not modelled on any underpinning theory and how this could lead to a series of anomalies arising within the data. The anomalies identified were insertion, deletion and amendment and are relevant not only to databases held in filing systems, but also databases that have not been normalised to an adequate level i.e. 3NF. Normalisation of a relational database model is performed in order to reduce data redundancy. A database is said to contain redundant data if the same item is stored more than once in a single table. Normalisation generally works by splitting a table into smaller tables each with reduced data redundancy. For example, consider the Order table as shown: Order Order id Description quantity Customer id Fname SName Street Town DVD Player Joe Smyth 4 Red Road Sheildinch MP3 Player Omar Singh 3 Cloan Wetherfield Ave Headphones Joe Smyth 4 Red Road Sheildinch The details for Joe Smyth are repeated for orders and This extra information is not necessary, as all these details remain the same for Joe Smyth. In fact, the only item of information that is necessary is the Cust id as this identifier determines all other details. This link between values is known as functional dependency where the values of one attribute are functionally determined by the values of another attribute. The primary key of a table functionally determines the other attributes in that relation. In the above example the Cust id attribute functionally determines Fname, Sname, Street and Town. Redundancy can therefore be expressed in terms of functional dependencies, and removing these functionally determined values can reduce data redundancy. The Order relation is therefore split into 2 relations (Order and Customer), thus reducing the amount of redundant information as shown below: Order

50 44 TOPIC 5. NORMALISATION Order id Description Quantity Cust id DVD Player MP3 Player Headphones Customer Cust id Sname Fname Street Town Smyth Singh Joe Omar 4 Red Road 3 Cloan Ave Sheildinch Wetherfield Insertion Anomolies In a relational model a table must have atomic values (thus multiple values or repeating groups are not allowed). Any table that satisfies this condition is in first normal form (1NF). A table that does not have atomic values is called un-normalised (UNF). Converting a UNF table to a 1NF relation simply involves splitting up any non-atomic values. For example, consider the normalised below which contains multiple values: Owner Harold Gwen Benny Diane Name Fluffy, Buffy Claws, Chirpy, Whistler Fang, Slim Bowser, Puffball This can easily be converted to 1st normal form by splitting the multiple items as is shown: Owner Harold Harold Gwen Gwen Gwen Benny Benny Diane Diane Name Fluffy Buffy Claws Chirpy Whistler Fang Slim Bowser Puffball However, there are problems with the 1st normal form. insertion anomalies. The first problem is that of

51 5.5. SUMMARY 45 Insertion Anomalies Deletion Anomolies Another problem with tables in 1NF is Delete anomalies which mean that deleting one entry can mean that another entry is deleted. For example, consider the full pet relation as shown in the table below. If we delete the owner Harold, then both Buffy and Fluffy are deleted. Name Owner Species Sex Birth Death Fluffy Harold Cat F NULL Claws Gwen Cat M NULL Buffy Harold Dog F NULL Fang Benny Dog M NULL Bowser Diane Dog M Chirpy Gwen Bird F NULL Whistler Gwen Bird NULL NULL Slim Benny Snake M NULL Puffball Diane Hamster F NULL Update Anomalies Another problem with tables in 1NF is that known as Update anomalies. These anomalies mean that updating a single value can involve changing multiple entries. 5.5 Summary By now you should be able to: analyse and extract data from up to two source documents to produce a model in un-normalised form (UNF) identify and remove repeating groups of data from UNF to produce a model in first normal form(1nf) identify and remove partial dependencies from 1NF to produce a model in second normal form (2NF) identify and remove transitive dependences from 2NF to produce a model in third normal form (3NF) End of topic Test End of topic Test

52 46 TOPIC 5. NORMALISATION An online assessment is provided to help you review this topic.

53 GLOSSARY 47 Glossary atomic key a key made up of a single attribute attributes a property or characteristic of something candidate key the choice of key to use to uniquely identify an entity candidate keys a key that can be considered as a primary key cardinality the number of members in something compound key a key made up of two or more attribute database management system (DBMS) A program that lets one or more computer users create and access data in a database. Databases An organized collection of information data dependence A problem that occurs when changes made to data require programs referring to this data to be changed. data dictionary a file that describes the data and its organisation in a database data integrity trust in the accuracy of data data operators ways in which data are manipulated data redundancy The duplication of data in a database leads to redundant appearances of the same data. As well as serving no purpose redundant data can cause significant problems. data structure the organisation on data into tables Data validation guarantees that data is valid and accurate

54 48 GLOSSARY domain the permitted values of an attribute duplication of data The same information being duplicated in other parts of the database. entity Something that exists as a particular and discrete form entity integrity ensures that there is no duplication of a row in a table by using non null and unique primary key values entity set a collection of entities of similar form represented in a database as a table inconsistency Data that is up duplicated in a database will be prone to inaccuracies when it is modified leading to inconsistent data across the system. instances an occurrence or example of an entity key an attribute (or attributes) that can uniquely identify an entity keys In a database a key is one or more attributes that is selected to identify a record and used when searching and sorting. normalisation the process of organising a database in order to remove ambiguities primary key A primary key is a key that uniquely identifies each record in a table and is used to access that record referential integrity this keeps data consistent by ensuring links between table using foreign keys relationships a connection indicating how entities are linked surrogate key a key introduced to act as the primary key tables In a relational database, a table organises the information about a single topic into rows and columns