CG3.6 Databases and Distributed Systems You need to understand: Databases and distributed systems Explain what is meant by data consistency, data redundancy and data independence. Describe the relative advantages of the use of databases over flat files. Explain what is meant by relational database organisation and data normalisation (first, second and third normal forms). Explain entity relationship modelling and use it to analyse simple problems. Restructure data into third normal form. Describe the use of primary and foreign keys, indexes and links. Describe the advantages of different users having different views of the data in a database. Discuss different approaches to database security. Recognise that the individual user of a database may be prevented from accessing particular elements of the information. Explain what is meant by data warehousing and data mining, using examples from supermarkets and insurance companies. Explain the purpose of a database management system (DBMS), query languages and data dictionaries. Explain the role of the database administrator. Distributed systems Explain that distribution can apply to both data and processing. Describe distributed databases and the advantages of such distribution. Describe contingency planning to recover from disasters. Databases Databases offer advantages over flat files as they offer a way of storing data in terms of tables of data containing distinct fields. These can be controlled with validation rules, and powerful query languages allow users to interrogate their data easily. These are the terms and concepts you ll need to know for the summer exam.
A data dictionary provides descriptions of the structure of the data held in a database A query language is a programming language used to interrogate a database. A primary key (or key field) is a way of uniquely identifying a record. You see this in MS Access when you use an AutoNumber. This ensures that even if a database has to uniquely identify records, the primary key will allow you to distinguish between them. A relational database is one which contains multiple tables which contain related fields. A foreign key is a primary key from one table which appears in another table to form a link. This is used to form links between tables. In the example below, CustID is the primary key of the Customers table, and it is the foreign key in the Orders table. Customers CustID FName LName TelNo Etc 121 Bill Clinton 654321 122 Ted Smith 123456 123 Neo Reeves 999 Orders OrderID CustID Date Product Etc 22 121 28/3/2008 Hair dye 23 121 4/6/2009 Phone 24 123 1/2/2010 Board marker This allows the user to avoid repeatedly duplicating the same data, saving large amounts of storage space. An index (containing the key and address) is used in a database to: improve (read) access times to records. sort the records (for viewing/output) Relational databases that are poorly designed may suffer from one of two problems. Firstly, they may contain redundant data; this is where data is duplicated, causing unnecessary waste of storage space (E.g. Repeatedly storing the same customer s name). Redundant data may also be inconsistent; this is where the same data is stored more than once and meant to the same but they are different. E.g. You may have one item described as a Large tent, and another referring to a Lrge tent. Normalization is a process by which relational databases are made more efficient in their design. The aim of the process is to have a database structure which is in third normal form. The definitions of the different forms are as follows: Unnormalized: Table data contains repeating groups (or data)
First normal form: No repeating attribute or groups of attributes. Second normal form: Every non- primary key attribute of the Table is fully functionally dependent on the primary key. Third normal form: Data items are dependent on the whole key and nothing except the key (or the key, the whole key and nothing but the key). Databases are normalised for the following reasons: - + Normalising data usually reduces data duplication/redundancy + Avoids danger of inconsistency / maintains integrity + Avoids danger of data being lost during update + Avoids wasting storage space and processing time An example of a third normal form database structure could be Driver (Driver Code, Driver Name, Driver Address, Driver TelNum) Car (Car Number, Make, Model, Colour) DriverInCar (Key Field, Driver Code, Car Number) Notice that the table name is shown before the brackets, the primary keys are underlined and the foreign keys are overlined. Also notice that all data items are only dependent on the primary key. What s meant by this is that you don t see Make and model in the DriverInCar field, for example. If it was in there, you d need to update that every time you make a change to the Car Number field. Index contains key field and address/location of record, thus allows faster, direct access to records. An index is used to improve (read) access times to records. sort the records (for viewing/output) A database administrator is the person in a company who is responsible for the structure and management of the database system and the data in it. A DBMS (database management system) is a software system that allows users to create and manage databases, such as defining fields, tables, relationships, access control, and views. The DBMS may allow certain users read and/or write access to certain records or fields only. When an online database is used, there are additional problems with database integrity that can occur. As multiple users may be trying to update the same record at the same time, locking is used to protect a record from becoming corrupted. Database views are used to limit access to a subset of a database users can only access (read, read/write) what they are meant to access to avoid mistakes, inconsistency and security issues. Users can read, write to, change only part of a database Users can access certain fields or records Tables can be linked together so users can view it as if they are one table. Data Independence refers to making sure that the data in a database is independent of the applications which use it. For example, it should be possible to add a field to a table without affecting existing applications that already use it.
Query language programming language used to interrogate a database. Data dictionary - provides descriptions of the structure of the data held in a database. Data can sometimes be stored in a data warehouse. This is defined as a large collection of data stored together (logically) efficient analysis (for instance to analyse sales data). An example of this might be the results of the national census. Data mining is the term used to describe the analysis of a large amount of data in a data warehouse to provide new information or to find new patterns in the existing data. A supermarket could use the intelligence derived from data mining on data extracted from loyalty card data to increase its profits by attracting customers to make additional purchases via targeted special offers, etc. An insurance company might use data warehousing to check claims histories together with other insurance companies data to try to detect fraud. An insurance company might use data warehousing to generate new business opportunities for example age related insurance / boat insurance / etc. Distribute system is a managed computer system that resides on different sites/computers/cpus which provide both distributed data and processing. Distributed processing is the operation that occurs when an application distributes its tasks among different computers in a network. Distributed system offers efficiency and security (one fails, others may still be available). Distributed database is a database that has its database files reside on different sites/computers/cpus to maximize performance. A distributed database can appear to applications as a single data source. It is difficult to keep all the data on all the computers synchronized and up to date. Exam questions: 2010 Jan.8 1. In a certain college, students are able to attend evening courses which are taught by tutors. A student can attend any number of courses, and each course is taught by a single tutor, although a tutor may teach more than one course. It is required to design a database for this situation. (a) Using an example, explain what is meant by a primary key in a database. [2] (b) Explain what is meant by third normal form in a database. [1] (c) Construct an entity relationship diagram to illustrate this situation. [3] (d) Design a database for the above situation in third normal form. [6]
2011June 19-21 Define the term data mining. Describe how a supermarket chain might use data mining. [3] Outline the role of a database administrator. [1] Security is very important in database applications, and in many cases, it is not desirable that every user should be able to access all the data in the database. Explain how a database management system can handle security in this situation. [2] 2005S 5 5. (a) A government office stores information about members of the public on a computerised system. Certain government employees are allowed to view and amend this information and several employees may access the same record simultaneously. (i) Why should the system prevent two employees updating the same record simultaneously? [1] (ii) Explain how the system prevents two employees updating the same record simultaneously. [1] (b) Members of the public may be concerned about government employees accessing and amending their confidential information. (i) Describe two methods of ensuring that only authorised employees can access the information. [2] (ii) Describe in detail how an employee who had misused this confidential information could be identified. [2]