Files What s it all about? Information being stored about anything important to the business/individual keeping the files. The simple concepts used in the operation of manual files are often a good guide to computerised data processing. The only difference is that computerised files are more structured. March 06 Files 1 What s in a file? Data Files single fact about an entity. Entity something we care about (physical object or event). Attribute property or characteristic of entity. Field each kind of attribute. March 06 Files 2 Files Files What s in a file? Records Collection of attributes about entity. It describes an entity. A File, then, is a collection of records. Simple (linear) file - records listed one after another in a line. File Emp ID Surname First Name Rate 11478 Smith 07850 Smith 31718 45125 Thomas McCoy Record John Fred Susan Mary 10.50 09.35 12.45 11.25 Fred is an attribute of record with key 11478 March 06 Files 3 March 06 Files 4 Files Key Used to make records unique. Employee ID is the key above. Ensure that no two employees have the same ID. Used to absolutely distinguish between people. March 06 Files 5 A simple record structure consists of a fixed number of fields of fixed length. Previous example. A file comprised of records of this type is often referred to as a flat file. Pictured as repeated rows of identically structured data. This type of organisation has problems. March 06 Files 6
What about holding someone s history in a file. Could assign a fixed number of history fields. This is not a good idea - space, different people. One solution is to allow repeating groups of particular fields. March 06 Files 7 Physical and Logical Views of Data Logical View Concerned with the nature of the data or information. Independent of the physical details of storage or presentation. Physical View physical aspects of storage and presentation. March 06 Files 8 Physical View already seen an example of the logical view. The physical view is:- There are many ways of categorising files. In data processing, there is a division in terms of usage. Master Files, Transaction Files, Backup Files. Master Files consists of records that contain standing data on entities that are of a permanent nature. EmpID consists of up to six characters preceded by leading blanks. Rate consists of a binary number 16 bits long. A physical record is the minimum chunk of data transferred between storage and the CPU in the course of data processing. Employee master file. March 06 Files 9 March 06 Files 10 Transaction Files records which relate to a single, usually dated, event or fact. These files are source data which are used to amend or update master files. Timesheet transaction file on hours worked by employees. Backup Files backup of the transaction and master files. March 06 Files 12 March 06 Files 11 Files can be stored in a number of ways:- Linear File store a collection of records in one long list. Sequential file Records are ordered by some field (the sequence field). To find a record, must search through all records until match is found.
Random Records are stored at a physical address computed by an algorithm working on a field value such as EmpID. can lead to problems when records have the same position called collision and is solved by moving to the next available spot when retrieving records, a short sequential search is Indexed Records are physically stored randomly with a sequentially ordered index field (EmpID) and a pointer to the physical location of each record. Used when there are a lot of data and sequential organisation would be time consuming Similar to using an index in a book. Index file is held in main (working) memory for speed of processing. March 06 required Files 13 March 06 Files 14 Indexed Sequential Records are physically stored sequentially, ordered by some field with an index, which provides access by some, possible different, field. Think of a dictionary - searching first for the letter and then the word. Problems with insert and deleting of records Operations on files sorting merging (updating) inserting deleting March 06 Files 15 March 06 Files 16 With traditional file management uncontrolled redundancy inconsistent data inflexibility limited data sharing poor enforcement of standards low programmer productivity and excessive programmer maintenance March 06 Files 17 So how can you improve? Recognise that data is an important resource. Then organise a store that is data centric - A database. An integrated collection of data organised to meet the needs of multiple users in an organisation. March 06 Files 18
A Database is structured in a logically meaningful manner. There is minimal redundancy. As far as possible, the same item of data will not be repeated in the database. To manage databases, software was developed called Database Management Systems - DBMS. March 06 Files 19 Characteristics of DBMS It is software that handles all read and write access by users and applications. It is capable of presenting users with a view of the part of the database relevant to their needs. It ensures consistency of data and minimum redundancy. It allows authorisation of different users to access different parts of the database. March 06 Files 20 Characteristics of DBMS It allows the person in control of the database to define its structure. It provides various facilities for monitoring and control of the database. It allows for data independence within an organisation. March 06 Files 21 Advantages of database approach. Data redundancy is reduced. Data consistency is maintained. Independence of data and programs is possible. A logical view is presented to the user or user programs. Applications development is enhanced because data sharing is possible. March 06 Files 22 Advantages Standards can be enforced. Security of data is more easily implemented. Disadvantages Design involves time and cost. Hardware and software costs can be considerable. Slower than direct file access. March 06 Files 23 A file based approach will be more appropriate if:- different applications require different data. Fast, repetitive transaction processing in high volumes is to be undertaken. Application needs of the organisation are unlikely to change over time. Information production is according to standard formats - little flexibility is required. March 06 Files 24
Database Users Database Administrator Experienced and senior user. Assists in the development of the database. Helps to achieve and maintain an acceptable level of technical performance of the database. Attains a satisfactory level of security. Database Administrator Monitors the use of the DB from perspective of accounting and efficient utilisation of the data. Reorganises the physical structure where necessary. Set standards for documentation and data representation in the database. Ensure that user data requirements are met. March 06 Files 25 March 06 Files 26 Applications programmers develop and maintain programs for the functions required by the organisation. Manipulation of the data being stored. Familiar with the user of data manipulation languages which allow storing, retrieving, modifying and deleting of data. Casual Users - everyone else. March 06 Files 27 DBMS Utilities which aid the DBA and the applications programmers in their jobs. Data definition language Data manipulation language Data dictionary March 06 Files 28 Data Definition Language (DDL) formal language used to specify content and structure Data Manipulation Language (DML) most DBMS have specialised language used with 3GL or 4GL to manipulate data contains commands to extract and add data most common is SQL or Structured Query many DBMS are compatible with more complex languages (like C) giving greater flexibility Data Dictionary (DD) file storing definitions of data elements and data characteristics usage, physical representation, ownership, security information March 06Language Files 29 March 06 Files 30
types of database RDB Hierarchical and Network Data Models Object Oriented RDB Relational Data Model. Represent all data as simple 2D tables called relations. Tables appear similar to flat files. Information in more that one file is easily extracted and combined. March 06 Files 31 March 06 Files 32 Hierarchical Data Models Presents data in a tree like structure. Uses one to many relationships Network Data Models Use many to many relationships. Still uses tree like structures. Hierarchical and Network considered outdated less flexible than RDBMS do not support English language like inquiries for information paths for data access specified in advance and require major effort to change March 06 Files 33 March 06 Files 34 Object Oriented allow for the storing of conventional data and also for other data like pictures, video and voice everything stored as objects can facilitate data models that must be easy to change financial models are relatively slow compared to RDB for large numbers of transactions March 06 Files 35 schema - the structure of the tables each table has number of fields field can be a key primary key - uniquely identifies the record foreign key - link to occurrence of a different table must be a primary key in another table March 06 Files 36
relationship types one to one (1:1) - unary relation Employee has a computer account many to many (N:N)- multiple associations Employee can access Secure Room one to many - (1:N) - unary association in one direction, multiple association in the other Employee has an Identification badge Creating a database:- Decide what data is to be stored. Could start with a real world representation of the data. Then translate to a table like representation. Normalise the data. What s normalisation? March 06 Files 37 March 06 Files 38 A series of steps that aid in the progressive normalisation of relations by thoroughly analysing each item of data. What are the steps 1. Write the data of each source in a unnormalised form. UNF notation and select a key. 2. Convert the UNF into 1st Normal Form. 3. Convert the 1NF into 2NF. March 06 Files 39 4. Convert the 2NF to 3NF. 5. Apply the 3NF tests. 6. Optimise the 3NF relations. 7. Re-apply the 3NF tests. Need an example:- data given below. March 06 Files 40 Consultant Details No. Name Grade Salary Scale Car Type 019 Wheeler D S1 A Address: Renmore Galway Skills Code Description Qualification SK01 Accounting IMA SK15 SSADM NCC Cert SK10 CAD/CAM 3Years March 06 Files 41 Un-Normalised Form Consultant No. Surname Address Grade Salary Scale Car Type Skills (Code, Description, Qualification) March 06 Files 42
Choose a Key Should be unique for each occurrence of the group of data. Value cannot be NULL for any occurrence. Must not repeat within the group of data. Take Consultant No as an obvious choice. Rewrite the UNF as follows:- March 06 Files 43 Consultant = Consultant No + Surname + Address + Grade + Salary Scale + Car Type + {Skill Code + Skill Description + Qualification} {} enclose a repeatable data structure. Step 2:- convert to 1st Normal Form (1NF). March 06 Files 44 Remove any repeating structure from the UNF. Consultant No + Consultant No + Surname + Surname + Address + Address + Grade + Grade + Salary Scale + Car Type Car Type + { Skill Code + Consultant No + Skill Description + Skill Code + Qualification } Skill Description + Qualification March 06 Files 45 The key in the new relation is typically a compound or composite key. Here, we use Consultant No + Skill Code. Step 3:- Convert to 2nd Normal Form (2NF) Remove Part Key dependencies. Only relations having a compound or composite key are examined. March 06 Files 46 For each non key attribute in the relation, ask:- Does this attribute depend on the whole key of only part of it. Looking at the key of Consultant No + Skill Code a) Skill Description depends on Skill Code only. b) Qualification depends on the whole key. Many consultants may have the same skill. March 06 Files 47 UNF 1NF 2NF Consultant No + Consultant No + Consultant No + Surname + Surname + Surname + Address + Address + Address + Grade + Grade + Grade + Salary Scale + Salary Scale + Salary Scale + Car Type + Car Type Car Type { Skill Code + Skill Description + Consultant No + Consultant No + Qualification } Skill Code + Skill Code + Skill Description + Qualification Qualification Skill Code + Skill Description March 06 Files 48
Step 4:- Convert to 3rd Normal Form (3NF) Search for dependencies between attributes. The question to ask:- Is attribute A dependent on attribute B? In other words, for a given value of A is there only one possible value for B or vice versa? Look at Grade and Salary. March 06 Files 49 Presume that they are related as follows:- Salary Scale 1 has Grades A,B,C,D Salary Scale 2 has Grades E,F,G Given a salary scale, is there only one value for grade? No. Given a grade, is there only one value for Salary Scale? Yes. Salary Scale is dependent on grade. March 06 Files 50 The dependent attribute is removed and make a new relation for 3NF. The new relation s key is the determinant attribute (Grade in this case). This key remains in the original relation as a foreign key (shown with an asterisk). March 06 Files 51 1NF 2NF 3NF Consultant No + Consultant No + Consultant No + Surname + Surname + Surname + Address + Address + Address + Grade + Grade + *Grade + Salary Scale + Salary Scale + Car Type Car Type Car Type Grade + Consultant No + Consultant No + Salary Scale Skill Code + Skill Code + Skill Description + Qualification Consultant No + Qualification Skill Code + Skill Code + Qualification Skill Description Skill Code + March 06 Files 52 Skill Description A final operation is required to arrive at strict 3NF relations. Check for inter key dependence in compound keys. Here, the relationship between Consultant No. and Skill Code. Given Consultant No, is there only one possible value for Skill? Given Skill Code, is there only one possible value for Consultant? March 06 Files 54 March 06 Files 53 If the answer to either is yes, then the relation should only have a simple key or either field. Step 5:- Apply the 3NF tests Two conditions must be satisfied here. 1) Given a value for a key of a 3NF relation, is there only one possible value for each of the associated attributes?
2) Is each attribute directly dependent on the key? Errors in making 1NF or 2NF will show up with the first test. The second test is to check for indirect dependencies on the 3NF keys. The objective:- to hold data elements in as few occurrences as Step 6:- Optimise the 3NF relations. Combining relations with the exact same key. Be careful that no synonyms exist in either key or other attributes. Customer and client - different names for the same relation. Check for inter data dependencies - merged relations may only be present in 2NF. March 06possible. Files 55 March 06 Files 56 Step 7:- Reapply the 3NF tests. Perfectly normalised data should have the following characteristic:- each attribute in a relation must depend on the key, the whole key and nothing but the key. March 06 Files 57