Normalisation and Data Storage Devices

Size: px
Start display at page:

Download "Normalisation and Data Storage Devices"

Transcription

1 Unit 4 Normalisation and Data Storage Devices Structure 4.1 Introduction 4.2 Functional Dependency 4.3 Normalisation Why do we Normalize a Relation? Second Normal Form Relation Third Normal Form Boyce-Codd Normal Form (BCNF) Fourth and Fifth Normal Form 4.4 Data Storage Devices 4.5 File Systems 4.6 Summary 4.7 Self Understanding 4.1 Introduction To design and arrive at good database schema normalisation is used. Normally a database consists of table definitions and column definitions and also some constraints to be enforced by the system. Although we had a glance of the process of normalisation earlier, let us study the same formally with an example. Normally a table containing arbitrary collection of attributes can result in number of problems especially with regards to update operations. For example a table called SCD is defined to contain data about students, the table will consist of Roll no., name, course, credits for the courses, grade obtained by the students and his study center number. The key of this table is a composite key consisting of roll number and course. Fundamentals of Database Management Page No.: 86

2 One may face following problems while updating the data in the database. If one likes to add a new course, they can not be added to the table since roll number will need to be null for this course which is not permitted as it is the key attribute. If any change has to be carried out regarding subjects in the course, modifications has to be done in a number of tuples since many students might have opted for this course. If some students has to be deleted it may lead to deletion of some course itself, if only those students have opted for that course. To include integrity constraints workload will be more on DBMS since DBMS may have to check many tuples. To avoid these anomalies (problems in updating) one can keep the following principles for a good database design: 1. Unrelated data should be kept in different tables. For example, data regarding students and courses has to be kept in separate tables. 2. The design should try to represent constraints explicitly to the extent possible and table structure should itself reflect the database constraints. 3. Table should not contain any redundancy. For example, in the above table data of courses repeat for every student which leads to updating problems (anomalies). These good principles are considered in the theory of normalisation to arrive at good database. Constraints: Basically there are two types of constraints, one which defines permitted values attributes can have and another which defines a relationship between different attributes (generally known as dependency). Let us look at dependency in details, as it is a formal tool, which is used to capture constraints that have influence on database design. Fundamentals of Database Management Page No.: 87

3 4.2 Functional Dependency The functional dependency is denoted by. FD X Y means X uniquely determines Y where X and Y are simple or composite attributes. The dependency from X to Y is said to be there if application has the following: If T1 and T2 are two tuples with some values X then value for Y must also be same in T1 and T2 i.e. relationship between X and Y is independent of other attribute which might be present in the table. In simple words, for a given X there is always single value of Y. For example: A street of a city pin code (a street of a city has a unique pincode, however the reverse needn't be true). ISBN, TITLE AUTHOR (given the ISBN of a book, one can find title and author of the book). From the example in hand i.e. SCD, one can identify following FD's: ROLL NO NAME COURSE CREDITS ROLL NO STUDY CENTER NO ROLL NO, COURSE GRADE From the definition of relation, we note that every row in a table is unique and no two rows can have exactly same attribute values. The key plays an important role in design of tables, since key has unique value in each row. The key may consist of one or more attributes, may be minimal or consist of superfluous attributes. A table may have one or more keys called candidate keys where one of the candidate key may be designated as primary key. Implications and Covers: The application can call for some functional dependencies which may imply additional functional dependencies. Fundamentals of Database Management Page No.: 88

4 If F is a set of FDs then we define closure of F denoted as F+ to be set to all possible FDs, which are implied by F. To find F+, given F, we have to find out the inference rules for the FD's which are implied by F. The inference rules are very important for good database design for the following reasons. Given F, one may like to determine whether X Y is implied or not. For computing the closure of F+ of F. Given F we may want to remove those FDs, which are redundant in F. A FD is redundant if it is implied by another FD in F. While designing database schema, one has to find minimal cover "G" of F. The minimal cover G does not contain any redundant FDs (i.e. G+ will be same as F+). By computing minimal cover G of F, we can ensure that DBMS will enforce the constraints, which automatically enforces the constraints implied by G. Inference rules for FDs: Inference rules also known as Armstrong's Axioms are published by Armstrong. These properties are as given below: 1. Reflexivity property: X Y is true if Y is subset of X. 2. Augmentation property: If X Y is true, then XZ YZ is also true. 3. Transitivity property: If X Y and Y Z then X Z is implied. 4. Union property: If X Y and X Z are true, then X YZ is also true. This property indicates that if right hand side of FD contains many attributes then FD exists for each of them. 5. Decomposition property: If X Y is implied and Z is subset of Y, then X Z is implied. This property is the reverse of union property. 6. Pseudotransitivity property: If X Y and WY Z are given, then XW Z is true. Fundamentals of Database Management Page No.: 89

5 To have a better understanding of these properties let us consider an example. Consider a example of a college having a table STUDY with course, teacher, room no and department as attributes. STUDY(course, teacher, roomno, dept), here we can identify few FDs namely Course teacher Teacher Department Course room number Additional following FDs can be derived from above using inference properties as below: By reflexivity: course, teacher teacher By Augmentation: course, room number teacher, room number By transitivity: course department By union: course teacher, room number The main axioms proposed by Armstrong are sound and complete and are defined as : 1. Soundness property: If X Y can be inferred from F using above axioms, then X Y will be true in any relation in which F holds. 2. Completeness property: If X Y can not be inferred from F and F holds in relation R, then X Y will not be true in relation R. 4.3 Normalization Consider the relation shown in Table 4.1 In this relation, an order no. includes many items. The attribute order lines is not single attribute but is composed of many attributes. Fundamentals of Database Management Page No.: 90

6 Table 4.1 An Unnormalized Relation Order no. Order date Item lines Item code Quantity Price/unit Item code Quantity Price/unit Item code Quantity Price/unit Besides this, the number of item lines is variable. This form is not suitable for storage as a file in a computer. Further, retrieval of data based on a component of a composite attribute is difficult. For example, to find out how many items with a specified item code are ordered, one must break up the composite attribute first before attempting a search. Thus a relation with a format such as the one in Table 4.1 is not allowed. It is said to be unnormalized. To normalize this relation, a composite attribute is converted to individual attributes. The normalization step consists of first identifying fields within a composite attribute as individual attributes. After doing this common attributes for a composite attribute are duplicated as many times as there are lines in the composite attribute. The normalized relation corresponding to the relation given in Table 4.1 is shown in Table 4.2. Fundamentals of Database Management Page No.: 91

7 Table 4.2 Normalized Form of the Relation given in Table 4.1 Order no. Order date Item code Quantity Price/ unit The relation shown in Table 4.2 is said to be in First Normal Form, abbreviated as 1NF. This form is also called a flat file. There are no composite attributes, and every attribute is single and describes one Property. Converting a relation to the 1NF form is the first essential step in normalization. There are successive higher normal forms known as 2NF, 3NF, BCNF, 4NF and 5NF.Each form is an improvement over the earlier form. In other words, 2NF is an improvement on lnf, 3NF is an improvement on 2NF, and so on. A higher normal form relation is a sub-set of lower normal form as shown in Fig: 4.1 The higher normalization steps are based on three important concepts. Fundamentals of Database Management Page No.: 92

8 5NF 4NF BCNF 3NF 2NF 1NF Fig. 4.1 Illustration of successive normal forms of a relation 1. Dependence among attributes in a relation. 2. Identification of an attribute or a set of attributes as the key of a relation. 3. Multivalued dependency between attributes. (i) Functional dependency: As the concept of dependency is very important, it is essential that we first understand it well and then proceed to the idea of normalization. There is no fool-proof algorithmic method of identifying dependency. We have to use our commonsense and judgement to specify dependencies. Let X and Y be two attributes of a relation. Given the value of X, if there is only one value of Y corresponding to it, then Y is said to be functionally dependent on X. This is indicated by the notation: X Y Fundamentals of Database Management Page No.: 93

9 For example, given the value of item code, there is only one value of item name for it. Thus item name is functionally dependent on item code. This is as shown as: Item code item name Similarly in Table 4.2, given an order number, the date of the order is known. Thus: Order no. Order date Functional dependency may also be based on a composite attribute. For example, if we write X,Z Y it means that there is only one value of Y corresponding to given values of X, Z. In other words, Y is functionally dependent on the composite X, Z. In Table 4.2, for example, Order no., and Item code together determine Qty. and Price. Thus: Order no., Item code Qty., Price As another example, consider the relation Student (Roll no., Name, Address, Dept., Year of study) In this relation, Name is functionally dependent on Roll no. In fact, given the value of Roll no., the values of all the other attributes can be uniquely determined. Name and Department are not functionally dependent because given the name of a student; one cannot find his department uniquely. This is due to the fact that there may be more than one student with the same name. Name in this case is not a key. Department and Year of study are not functionally dependent as Year of study pertains to a student whereas Department is an independent attribute. The functional dependency in this Fundamentals of Database Management Page No.: 94

10 relation is shown in Fig. 4.2 as a dependency diagram. Such dependency diagrams are very useful in normalization. Name Roll no. Address Department Year of study Fig. 4.2 Dependency diagram for the relation "Student" (ii) Relation key: Consider the relation of Table 4.1. Given the Vendor code, the Vendor Name and Address are uniquely determined Thus Vendor code is the relation key. Given a relation, if the value of an attribute X uniquely determines the values of all other attributes in a row, then X is said to be the key of that relation. Sometimes more than one attribute is needed to uniquely determine other attributes in a relation row. In that case such a set of attributes is the key. In Table 4.2, Order no. and Item code together determine Order date, Qty. and Price. Thus Order no. and Item code together form the key. In the relation "Supplies" (Vendor code, item code, Qty. supplied, Date of supply, Price/unit), Vendor code and Item code together form the key. This dependency is shown in the dependency diagram of Fig Fundamentals of Database Management Page No.: 95

11 Vendor code Quantity Supplied Date of supply Item code Price/unit Fig. 4.3 Dependency diagram for the relation "Supplies" Observe that in the figure the fact that Vendor code and Item code together form a composite key is clearly shown by enclosing them together in a rectangle Why do we Normalize a Relation? Relations are normalized so that when relations in a database are to be altered during the lifetime of the database, we do not lose information or introduce inconsistencies. The type of alterations normally needed for relations are: 1. Insertion of new data values to a relation. This should be possible without being forced to leave blank fields for some attributes. 2. Deletion of a tuple, namely, a row of a relation. This should be possible without losing vital information unknowingly. 3. Updating or changing a value of an attribute in a tuple. This should be possible without exhaustively searching all the tuples in the relation. Consider, for example, the relation shown in Table 4.2. If we wish to enter in our database a new item with item code 3945, whose price/ unit is Rs but for which no order has been placed, we cannot do it unless we leave blank fields for order no. and order date. Order no. is a key field and leaving a blank for it would make retrieval impossible. Fundamentals of Database Management Page No.: 96

12 If Order no in Table 4.2 is deleted then we lose the information that Item code 4629 costs Rs Such an accidental loss of information should not occur. If the price of item code 4627 is changed from Rs to Rs , then in the relation of Table 4.2, it is necessary to find out all the tuples (rows) where Item code 4627 occurs and then change the Price/unit in all these places. In the table, three rows should be changed. If by mistake one row is missed there will be inconsistency in the database. Ideal relations after normalization should have the following properties so that the problems mentioned above do not occur for relations in the (ideal) normalized form: 1. No data value should be duplicated in different rows unnecessarily. 2. A value must be specified (and required) for every attribute in a row. 3. Each relation should be self-contained. In other words, if a row from a relation is deleted, important information should not be accidentally lost. 4. When a row is added to a relation, other relations in the database should not be affected. 5. A value of an attribute in a tuple may be changed independent of other tuples in the relation and other relations. The idea of normalizing relations to higher and higher normal forms is to attain the goals of having a set of ideal relations meeting the above criteria Second Normal Form Relation We will now define a relation in the Second Normal Form (2NF). A relation is said to be in 2NF if it is in INF and non-key attributes are functionally dependent on the key attribute(s). Further, if the key has more than one attribute then no non-key attributes should be functionally dependent upon a part of the key attributes. Consider, for example, the relation given in Table 4.2. This relation is in INF. The key is (Order no., Item code). The Fundamentals of Database Management Page No.: 97

13 dependency diagram for attributes of this relation is shown in Fig The non-key attribute price/unit is functionally dependent on Item code, which is part of the relation key. Also, the non-key attribute Order date is functionally dependent on Order no. which is a part of the relation key. Thus the relation is not in 2NF. It can be transformed to 2NF by splitting it into three relations as shown in Table 4.3. In table 4.3 the relation orders has order no. as the key. The relation "Order details" has the composite key Order no. and Item Order date Order no. Quantity Item code Price/unit Fig. 4.4 Dependency diagram for the relation given in table 4.2 Table 4.3 Splitting of Relation given in Table 4.2 into 2NF Relations (a) Orders Order Order No. date (b) Order Details Order Item Qty No. code (c) Prices Item Price/ Code unit code. In both relations the non-key attributes are functionally dependent on the whole key. Observe that by transforming to 2NF relations the repetition of Order date (Table 4.2) has been removed. Further, if an order for an item is cancelled, the price of an item is not lost. For example, if Order no. "1886" Fundamentals of Database Management Page No.: 98

14 for Item code "4629 is cancelled in Table 4.2, then the fourth row win be removed and the price of the item is lost. In Table 4.3 only the fourth row of the Table 4.3 (b) is omitted. The item price is not lost as it is available in Table 4.3 (c). The date of the order is also not lost as it is in Table 4.3 (a). These relations in 2NF form meet all the "ideal" conditions specified. Observe that the three relations obtained are self-contained. There is no duplication of data within a relation Third Normal Form A Third Normal Form normalization will be needed where all attributes in a relation tuple are not functionally dependent only on the key attribute. If two non-key attributes are functionally dependent, then there will be unnecessary duplication of data. Consider the relation given in Table 4.4. Here, Roll no. is the key and all other attributes are Table 4.4 A 2NF Form Relation Roll no. Name Department Year Hostel name 1784 Raman Physics 1 Ganga 1648 Krishnan Chemistry 1 Ganga 1768 Gopalan Mathematics 2 Kaveri 1848 Raja Botany 2 Kaveri 1682 Maya Geology 3 Krishna 1485 Singh Zoology 4 Godavari functionally dependent on it. Thus it is in 2NF. If it is known that in the college all first year students are accommodated in Ganga hostel, all second year students in Kaveri, all third year students in Krishna, and all fourth year students-in Godavari, then the non-key attribute Hostel name is dependent on the non-key attribute Year. This dependency is shown in Fig Observe that given the year of student, his hostel is known and vice versa. Fundamentals of Database Management Page No.: 99

15 Name Roll no Department Year Hostel Name Fig. 4.5 Dependency diagram for the relation given in Table 4.4 The dependency of hostel on year leads to duplication of data as is evident from Table 4.4. If it is decided to ask all first year students to move to Kaveri hostel, and- all second year students to Ganga hostel, this change should be made in many places in Table 4.4. Also, when a student's year of study changes, his hostel change should also be noted in Table 4.4. This is undesirable. A table is said to be in 3NF if it is in 2NF and no non-key attribute is functionally dependent on any other non-key attribute. Table 4.4 is thus not in 3NF. To transform it to 3NF, we should introduce another relation, which includes the functionally related non-key attributes. This is shown in Table 4.5. It should be stressed again that dependency between attributes is a semantic property and has to be stated in the problem specification. In this example the dependency between Year and Hostel is clearly stated. In case hostel allocated to students do not depend on their' year in college, then Table 4.4 is already in 3NF. Fundamentals of Database Management Page No.: 100

16 Table 4.5 Conversion of Table 4.4 into to 3NF Relations Roll no. Name Department Year 1784 Raman Physics Krishnan Chemistry Gopalan Mathematics Raja Botany Maya Geology Singh Zoology 4 Year Hostel name 1 Ganga 1 Ganga 2 Kaveri 2 Kaveri 3 Krishna 4 Godavari Let us consider another example of a relation. The relation "Employee" is given below and its dependency diagram in Fig Employee (Employee code, Employee name, Dept., Salary, Project no., Termination date of project) As can be seen from the figure, the termination date of a project is dependent on the Project no. Thus this relation is not in 3NF. The 3NF relations are: Employee (Employee code, Employee name, Salary, Project no.) Project (Project no., Termination date) Employee name Department Employee code Salary Project no. Termination date Fig 4.6 Dependency diagram of employee relation Fundamentals of Database Management Page No.: 101

17 4.3.4 Boyce-Codd Normal Form (BCNF) Assume that a relation has more than one possible key. Assume further that the composite keys have a common attribute. If an attribute of a composite key is dependent on an attribute of the other composite key, a normalization called BCNF is needed. Consider, as an example, the relation "Professor": Professor (Professor code, Dept., Head of Dept., Percent time) It is assumed that 1. A Professor can work in more than one department. 2. The percentage of the time he spends in each department is given. 3. Each department has only one Head of Department. The relationship diagram for the above relation is given in Fig Table 4.6 gives the relation attributes. The two possible composite keys are Professor code and Dept. or Professor code and Head of Dept. Observe that department as well as Head of Dept. are not non-key attributes. They are a part of a composite key. Fundamentals of Database Management Page No.: 102

18 Department Head of Department Professor code Percent time Department Head of Department Department Head of Department Professor code Percent time Fig 4.7 Dependency diagram of professor relation Table 4.6 Normalization of Relation Professor" Professor code Department Head of Depart Percent time P1 Physics Ghosh 50 P1 Mathematics Krishnan 50 P2 Chemistry Rao 25 P2 Physics Ghosh 75 P3 Mathematics Krishnan 100 Fundamentals of Database Management Page No.: 103

19 The relation given in Table 4.6 is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are duplicated.. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that Rao is the Head of Department of Chemistry. The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and deleting Head of Dept. from Professor relation. The normalized relations are shown in Table 4.7. Table 4.7 Normalized Professor Relation in BCNF (a) Professor Department Percent Code time Pl Physics 50 Pl Mathematics 50 P2 Chemistry 25 P2 Physics 75 P3 Mathematics 100 and the dependency diagrams for these new relations in Fig The dependency diagram gives the important clue to this normalization step, as is clear from Figs. 4.7 and 4.8. Department Physics Mathematics Chemistry (b) Head of Dept Ghosh Krishnan Rao Department Percent time Professor code Department Head of Department Fig. 4.8 Dependency diagram of Professor relation Fundamentals of Database Management Page No.: 104

20 4.3.5 Fourth and Fifth Normal Form When attributes in a relation have multivalued dependency, further normalization to 4NF and 5NF are required. We will illustrate this with an example. Consider a vendor supplying many items to many projects in an organization. The following are the assumptions: 1) A vendor is capable of supplying many items. 2) A project uses many items. 3) A vendor supplies to many projects. 4) An item may be supplied by many vendors. Table 4.8 gives a relation for this problem and Fig. 4.9 the dependency diagram(s). Table 4.8 Vendor-supply-projects Relation Vendor code Item code Project no. V1 I1 P1 V1 I2 P1 V1 I1 P3 V1 I2 P3 V2 I2 P1 V2 I3 P1 V3 I1 P1 V3 I1 P2 V I R V P I indicates multivalued dependency Fig. 4.9 Dependency diagrams of vendor-supply-project relation: Fundamentals of Database Management Page No.: 105

21 The relation given in Table 4.8 has a number of problems. For example: If vendor Y1 has supply to project P2, but the item is not yet decided, then a row with a blank for item code has to be introduced. The information about item 1 is stored twice for vendor V3. Observe that the relation given in Table 4.8 is in 3NF and also in, BCNF. It still has the problems mentioned above. The problem is reduced by expressing this relation as two relations in the Fourth Normal Form (4NF). A relation is in 4NF if it has no more than one independent multivalued dependency or one independent multivalued dependency with a functional dependency. Table 4.8 can be expressed as the two 4NF relations given in Table 4.9. The fact that vendors are capable of supplying certain items and that they are assigned to supply for some projects is independently specified in the 4NF relation. Table 4.9 Vendor-supply-project Relations in 4NF (a) Vendor Supply Vendor code Item code VI I1 VI I2 V2 I2 V2 I3 V3 I1 (b) Vendor Project Vendor code Project no. VI PI VI P3 V2 PI V3 PI V3 P2 These relations still have a problem. Even though vendor V1's capability to supply items and his allotment to supply for specified Projects are known, he may not be actually supplying them to a project as the project may not need it. We thus need another relation, which specifies this. This is called 5NF form. The 5NF relations are the relations in Table 4.9(a) and (b) together with the relation given in Table Fundamentals of Database Management Page No.: 106

22 Table NF Additional Relation Project no. P1 P1 P2 P3 P3 Item code I1 I2 I1 I1 I3 In Table 4.11 we summarize the normalization steps already explained. Table 4.11 Summary of Normalization Steps Input relation Transformation Output relation All relations Eliminate variable length records 1NF Remove multiattribute lines in Table 1NF relation Remove dependency of non-key 2NF attribute on part of a multiattribute key 2NF Remove dependency of non-key 3NF attributes on other non-key attributes 3NF Remove dependency of an attribute of a BCNF multiattribute key on an attribute of another(overlapping) multiattribute key BCNF Remove more than one independent 4NF multivalued dependency from relation by splitting relation 4NF Add one relation relating attributes 5NF with multuivalued dependency to the two relations with multivalued dependency Fundamentals of Database Management Page No.: 107

23 4.4 Data Storage Devices: There are many ways of storing and accessing data in the application. Let us have brief idea of various types of physical methods of storage and accessing data. A comparison of different file organization and advanced storage techniques available will be point of focus in this section. Physical Data Organization: As we know data in storage devices are stored in files. A file is a collection of records and a record contains values for many fields. In each file the contents of the records like type of field must be defined. Based on instances of records, file can be categorised into: Homogeneous file: Where file holds instance of a single record type. Non-Homogeneous file: Where file holds instances of many different record types. Also the records in the file may be of fixed length or varying length. A field (like attribute in a table) must have its domain and its internal representation. Normally fields are of fixed length, but some file permit varying length fields. Also in a record, field may occur once or many times (array). In summary, a typical application will have multiple files, and the data within these files would be inter-related. To process inter-related records we have to capture inter-file relationships efficiently. Now based on usage, a file can be categorised into: Master file: Which contains operational data. Transaction file: Which contains records of various business transactions. Reference file: Which contains a semi-permanent data for use in the processing. History file: Which contains the past data from either master or transaction files. Fundamentals of Database Management Page No.: 108

24 One may note at this point that, to process data stored in files a programming language provides operations for writing a new record, deleting an existing record, retrieving and modifying an existing record. Access Method: To retrieve the data from the file various access methods can be used and in these methods one has to indicate the access path so as to reach to the record stored in a file. The nature of this path will depend on how the data is organised and searched. We are more concerned with the length of the access path which would be measured in terms of the number of I/O operations required to be performed in getting a desired record into memory. An access method is a software that searches through the access path to locate the record. Different types of access methods corresponding to different ways of file organization are typically provided as part of a data processing environment. The basic access methods are the following: In the sequential access mode the next physical record is retrieved. In the random access mode any record in the file can be accessed at random. A dynamic access mode permits both sequential and random access mode. Even in the random access mode, it may be necessary to sequentially access a few records in the file. In most data processing situations, we will be interested in locating records given a value for one of their fields called key fields. For example, we may wish to locate the record of a bank customer given his account no., then account no is the search key. The keys may be broadly divided into two categories. The primary key has unique values in all records. The secondary key may be duplicated in many records of the file. Fundamentals of Database Management Page No.: 109

25 Performance: The performance of a file organisation may be measured in absolute terms to find out how it will perform in a given situation. The overall performance depends not only on how the data is organised, but also on types of file operations and their frequency (e.g. number of reads, number of updates, etc.). The important performance measures are: The response time: Which is the time lapsed from initiation to the completion of operation and includes: time spent in waiting for processor and device availability which is dependent on system and its load. time required to locate data on device. time required to transfer data between device and memory. time to process the data. Search length: Which is the length of access path may vary between range of values as records in a file may not have same access path. Hence average length is used in evaluating performance measurement. Expected I/O time: This is calculated for comparing files where different number of sequential and random access is required. Application Parameters: The following application parameters are used in evaluating performance of a file organization for application: Hit ratio: What percentage of file records will be accessed in an operation (i.e., business function). Volatility: What percentage of records are added, modified and deleted (over a period of time). Access keys: Which fields are used for accessing the records. Frequency: How frequently is the operation executed. Fundamentals of Database Management Page No.: 110

26 Hence in an application, although there may be many kinds of accesses to different data types. Only the dominant operations are considered which require efficient handling. Storage Devices: There are many storage devices available in the market to store the data required for the DBMS. We shall discuss some of them in general. Magnetic Tape: The salient features of this device can be listed as below: 1. It is a sequential access device, where data is recorded here as a physical sequence of blocks separated by 'inter-record gaps'. 2. The tape is moved when a read/write operation is initiated. 3. Here Data can be recorded in high densities (varying from bytes per inch of tape). 4. The transfer rate between the device and memory is at speeds of 50 to 300 KBytes per second. 5. A block on tape is unit of I/O, which may be of fixed length or may vary. 6. One or many file may fit on one reel of tape, whereas a large file may occupy many reels. 7. Since the nature of device is sequential, a file stored on tape can either be in input mode or in output mode. Hence tape file cannot be updated in place on the other hand a new copy has to be created. 8. A block on tape may store one or more file records called 'blocking' of records. This blocking and 'unblocking' of records is carried out automatically by the file organization software. The blocking increases effective utilisation of tape, reducing number of I/O operations on the file. 9. The tape includes certain labels to ensure identification of stored data and their correct processing. The two types of labels, called volume label and file label, contains the following: Typical Volume Label Contents label number (multiple labels permitted) Fundamentals of Database Management Page No.: 111

27 volume serial number security code identification Typical File Label Contents file identifier file and volume sequence numbers for proper sequencing of multivolume file generation and version numbers creation and expiry dates file security count of data blocks (in trailer label) From the above it is clear that access to data stored on tape is highly restricted due to its sequential access. Hence in a DBMS environment, the use of tape files is minimal or restricted to the following: for archiving of historical data for storing transaction logs (quite rare) for storing transaction data to help in disaster recovery (again, quite rare now-a-days). Magnetic Disks: The salient features of the magnetic disks are as below: 1. Disk devices are used primarily for storing data by DBMSs as they offer flexibility in organising data in many ways for efficient access as they permit random access to the stored data. 2. Capacities of the disk devices vary from 1 gigabyte to hundreds of gigabytes and they support high rate of data transfer. 3. Here the data is stored in sectors which are organised into multiple circular tracks on magnetised (and rotating) disk medium. 4. Each disk surface has its own read/write head, which is positioned on the track and sector for reading or writing into that track. Fundamentals of Database Management Page No.: 112

28 5. The device permits direct access to stored data by specifying the address of block or sector containing the data. 6. The following hardware parameters are considered for performance evaluation: Seek-time: time to position on required cylinder (min 4-5 msec, avg msec, max msec). Rotational delay: to locate required block on a track; also called latency time; equals half of rotation time on average (typically 8.33 msec at 3600 rpm). Transfer rate: 200 to 3000 KB/sec (very slow compared to CPU, hence disk i/o must be minimized by query speeds in megabytes per second). 7. The disk volume contains variety of control data for identification as well as quick positioning. At the end we can summarise that the disk device is a fairly complex device and the flexibility offered is the backbone of modern DBMSs. 4.5 File Systems A file system is an important component of the operating system of a computer. A DBMS uses features offered by a file system, and builds its own facilities on top of a file system. The most important functions of a file system are: 1. Directory service: A hierarchical directory structure is commonly provided for grouping related files in multiple directory levels and stores control information (access rights, how and where the data are stored, date of creation etc.). Fundamentals of Database Management Page No.: 113

29 2. Space allocation of files: The space is allocated to a file as it is created. The page of a fixed size, which may store one/more file records is used as a unit of allocation. Space may be allocated to a file in continuous pages. Blocking and buffering of records: One disk block/page can contain multiple file records, which improves disk space utilization. A large chunk of memory in which disk pages are read are set aside called 'buffer' space which increases processing efficiency. OS uses some policy (such as leastrecently-used) for replacement of pages in buffers to make room for new pages. Many OS provide a few built-in file organisations as part of their file system. A file organization defines how data will be organised, what additional structures will be created to access data efficiently, and what would be storage size and speed of access. Disk devices allow a lot of flexibility in organising data. The typical file organisations offered by many file systems includes sequential, indexed and hash-based methods. However, a DBMS may offer a select few. Record pointers: A relationship between two record types is often implemented using pointers. A pointer is an address of a record on disk. This allows direct access to related records. This pointer is also used for implementing indexes / linking records (relative or absolute) having same values for a field of interest. Sequential Organization A sequential file has its records stored in a physical sequence. In order to facilitate proper processing, the records are stored in a sorted order based on values of some field (e.g., account number). The file requires minimal storage, but offers limited operations, which limits its usefulness. The operations are : Fundamentals of Database Management Page No.: 114

30 open the file: either in input or output mode, although update mode is possible for disk sequential files. read next or write next record or, rewrite the last read record, but without changing its length. close the file. Such files may be placed on tape or disk device. In order to achieve independence from device used, such files are not updated in place. A new version of the file is produced when updating a sequential file. To achieve efficiency of operation, updates are carried out for a batch of transactions instead of each transaction. Because of limited options available with a sequential file, their use in a DBMS is limited to the following situations: when the file has high hit ratio but infrequent updates for small files for intermediate results of operations. Indexed-Sequential Files An indexed-sequential file facilitates both sequential and direct access on one key field. The records are stored in ascending order of their key values. An index is also built on the key field to facilitate random access. Index is a small table which contains entries giving key, record-address pairs. In a sparse index, these entries are made only for one key for one block of records, which can be considered as an 'anchor' for that block/sector/ track/cylinder on the disk. By examining consecutive entries in the index table, the access method can determine the block in which a record with desired key may be found. For large data files, the index table itself becomes large. Note that the index table will also be stored externally on the disk, and may need to be read in parts for searching its entries. For improving searching in index, it is Fundamentals of Database Management Page No.: 115

31 possible to create multiple levels of indexes, where all index at higher level is an index to the next level of index as indicated below: 1. Track index: Contains anchor entry of each track in that cylinder; it is stored on same cylinder. 2. Cylinder index: Contains one anchor per cylinder; it is stored in separate area. 3. Master index: It is index to cylinder index, created when later is itself large; it has one entry for each block of cylinder index. The procedure to locate a data record using these levels of indexes is: 1. read and search master index, and locate the block of cylinder index in which the key falls; read the cylinder block; 2. read the track index stored on that cylinder, and search it to get the likely track on which the record may be present; 3. read records on the track for required key (this may be done in one/more i/o operations). Thus, upto 3 reads are required for searching the index levels. The advantage of using sparse index is that it saves considerable storage space, as the index is small and facilitates efficient random access. There is also a disadvantage namely, there can be only one such index for a file and the insert/delete operations may require periodic re-organization of the file. Insertion in indexed sequential file: First find the track to which the record logically belongs. If this track is full, make room for the new record. While inserting, one of the existing record on the track gets displaced. Store the displaced record in the overflow track on the same cylinder, to avoid unnecessary movement of disk arm. Fundamentals of Database Management Page No.: 116

32 Modify the format of track index to include not only the highest key on the track, but also the highest key and its record address in the overflow area. Link the overflown records from the same track using pointers to facilitate search. If overflow track itself is full, a separate overflow area, consisting of a few cylinders, is used to absorb the overflows. From the above it is clear that when insertions take place, the access time suffers. Deletion in indexed sequential file: One of the simple way to carry out deletion of a record is to mark it by a flag. To remove overflows, and to physically delete flagged records, periodically re-organise an indexed sequential file, by re-creating it as a new file. This action restores its performance to the initial level. The following operations are normally supported for the indexed-sequential file: 1. OPEN (in Input, Output or I/O mode) 2. READ next, READ by key 3. WRITE (also means insert in I/O mode) 4. REWRITE (key can't be changed) 5. DELETE last read record or by key position by key for subsequent sequential processing. Hashed Files A hashed file permits random access on some field (usually, key field) in the record. It uses a mapping, called as 'hashing function', to convert a key value into a record position in the file. The hashing function should be such Fundamentals of Database Management Page No.: 117

33 that it produces record positions within the file space, and gives distinct positions for all keys. As meeting both of these requirements is difficult, in the situation when two keys produces same record position is called 'collision'. The hashing methods are designed to handle collisions. The ideal method to convert key values to record positions is random distribution of records within the file, since it removes any 'bias' from keys to their positions. Also, while designing one has to consider the type and range of key values for the given application. After making experiments it has been found that following methods work satisfactory in most cases: Multiplicative method: Which multiply given key by a factor, and take m lower significant bits of the product as hash result. The factor recommended is (sqrt(5) -1)/2, this factor is called "golden ratio". Division method: Which is based on taking remainder of a division: (i = k mod p). p chosen as a prime number to remove any bias of key from result and to achieve good scattering. When no collisions are present, the hashed file organization gives the best performance. The desired record is retrieved in a single access as its position can be directly obtained by hashing the specified key value. Insertions are also simple, as the new record can be stored at the place given by the hashing method. The deletion can be handled by a flag. However, collisions are common, and the hashed file must use some Method to handle them. We must detect that a collision has occurred, and, for Insertion, find a new free position where the record can be stored. The strategy for this should be such that all records, including those which collided, can be efficiently located during retrieval. Fundamentals of Database Management Page No.: 118

34 There are many methods available for collision handling: 1. Chaining Method: Where all records colliding at position I are linked by pointers (the colliding records are stored in a separate area). 2. The Open Addressing Method: Which finds an alternate place with respect to the hashed value i by using an appropriate increment (for some constant c). The record being deleted is on a chain (logical or physical) of the collided records. This chain needs to be adjusted (not an easy task) before deleting the record. An easy way would be to flag the record as deleted. Hashed files, in general, give a good performance however it does not facilitate sequential access, and only one hashed access can be set up. Indexed Files Although indexed sequential file uses a sparse index, and stores the index efficiently, it does not handle insertions/deletions efficiently, and it allows only one sparse index. The indexed files use a dense index, where the index has an entry for every key value, many independent indexes can be created which facilitate both sequential and random access on many fields (e.g., on both account number and customer name for accounts file for a bank). Here the file records need not be physically stored in key sequence to simplify insertion. The index is itself stored in ascending order of key values to facilitate sequential retrieval of records in the ascending order of keys. Being in sorted order, it can be searched efficiently for random access. For a large data, the index itself becomes large. Since it is stored on disk, it must be so organised that it permits efficient search and updates without incurring high I/O cost. Fundamentals of Database Management Page No.: 119

35 B-Tree: B-Tree is the practical and efficient method for organising indexes on external storage devices. Each level in the B-tree is like a level in the index, leading to a multilevel index. It effectively provides indexes to indexes from one level to another, until we reach the node leading to the desired record. The B-tree data structure is defined as follows: an order m is associated with a B-tree the root node has at least 1 key value and 2 pointers all leaf nodes are at the same level all nodes other than root have at least m/2 keys and m/2 + 1 pointers (maximum keys can be m) Searching B-Tree for a given key value k: First start with the root node, if the node contains k, the search ends here else, look for two consecutive keys in the node between which k falls, and take the pointer between them to the node on next level. The above process is repeated until k is found. From the above we conclude that maximum length of search is equal to the height of B-tree. Insertion of new key value: Starting from the root node, locate the leaf node B into which k must be placed. If B is not full (has less than m keys) then k is added to B (maintaining order of keys). Fundamentals of Database Management Page No.: 120

36 If B is already full, adding k to it will make it have m+1 keys. We now need to split B. This is done as follows: get a free node B. redistribute m+1 keys in B and B, each having m/2 keys the middle key and pointer to B are inserted in the parent of. B using same procedure. One may also note that B-tree always grows upwards: Deletion of the key. First ensure that the definition of B-tree is preserved. The deletion of k is simple if it is in a leaf node. Otherwise, we replace it by the next higher key k1, which would be in a leaf, and delete k1 instead. While deleting, a leaf may become critical when keys in it reduce below m/2: in which case, either borrow key values from its brother nodes, or merge it with others. The merging reduces 1 node in the tree. The merging also propagates upwards, and may reduce height of the tree. The advantages of B-tree for organising index are : Usually order (m) is quite large ( ); hence, their heights are usually small. With buffering, most of action takes place in main memory : In one experiment with m = 120, file was created with 1,00,000 keys; 10 buffers were used for buffering nodes; it required only 22 reads and 857 writes to create the index. Space utilisation is good as nodes are required by definition to be at least half-full; can be further improved by modifying definition. Fundamentals of Database Management Page No.: 121

37 Secondary Indexes The salient features regarding secondary indexes can be listed as below: It is an index on a non-key field, which may not have unique values in the records. A file may have many secondary indexes to provide efficient access paths on many attributes independently. This index may be exhaustive or selective, where in the former case, index entries are made for all values of the attribute and in the later case, the entries exist only for selected values of the attributes. As a key value may occur in many records, a typical index entry consists of a value and a set of pointers to records. The size of index entry will vary depending on the set size. One may choose an appropriate method for storing such varying- length entries. For Insertion and deletion of records for a file requires modifying the index too. i.e. Insertion requires a pointer to be added to the set and deletion requires a pointer to be removed from the set. Varying Length Records When file records are of fixed length, it is easy to calculate offset (i.e., relative position) of a field within the record and access the field value. When the fields are of varying length, we need to store field lengths also within the record, which makes access to field values difficult. A varying length record contains a varying length field (e.g., employee name), or varying number of occurrences of a field or actual number of occurrences must be stored within a record as illustrated below: E# Length of name Employee name Salary. In order to interpret such a record, we must know how many varying-length fields are present and how the lengths are stored. The contents of a record need to be scanned in order to locate a field value (such as salary). Fundamentals of Database Management Page No.: 122

38 A varying length record may be stored using different methods as discussed below: 1. Reserved space: Here we allocate maximum length required by a field, and use spaces/nulls for shorter values. (e.g.; shorter names are appended by blanks). Essentially, this corresponds to fixed-length fields. The wastage of space may be high here. 2. Using pointers: The varying field/array is stored in a separate area. The record contains a pointer (and length of value) to where the value is stored. The record now becomes a fixed length record, facilitating efficient access to fields. 3. Combined method: The record contains space for an average length or average number of occurrences. Additional characters in the value are stored in a separate area, whose pointer is placed in the main record. Here also, the record is of fixed length. In most cases, it may not be necessary to access the separate area. Multi-attribute Queries A multi-attribute query requires data to be retrieved on multiple fields in the record. The query will specify values (or, range of values) for the desired fields. For example, the following q specifies two fields : select NAME from STUDENT where HOSTEL = 5 and GAME = 'cricket'. The queries can be divided into different categories based on its results: 1. exact-match query: It specifies a value for the selected fields and uses,=, (equality) operation. 2. range query: It specifies a range for the selected attributes, or uses non- equality (e.g., <, >, etc.) operations. Fundamentals of Database Management Page No.: 123

39 The file may have indexes on some or all the attributes used in the query. When more than one index is available, it should be possible to do the boolean operations (e.g., 'and' above) on the index itself without actually retrieving records from the file. Although a number of innovative file organizations have been proposed to handle multi-attribute queries, they have limitations like they fix an ordering for attributes. Comparison of File Organizations We have studied many types of file organisations in this module. They create additional data structures for the file records for providing efficient access. Thus, each file organization is associated with a storage cost and access cost. Further, they use the direct access capabilities of the disk device to fine-tune their performance. The file organisation is chosen based on the characteristics of the data and how they will be processed by the applications. The following data about the application is required for performance evaluation: 1. Volume of data (number of records for each record type). 2. Growth and volatility of data (e.g., the employee data may grow by 20 records in this file get updated in a year; deletions may be 5). 3. Pattern of usage of data by the applications; for each application, we should obtain frequency of execution fields using which the records are accessed fields which are retrieved/updated hit-ratio (number of records accessed in each execution) sequence, if any, in which the records must be retrieved Using the above data, we can determine the overall performance of a file organisation based on storage, access and update costs. In a DBMS environment, first a logical database design is made which gives a set of Fundamentals of Database Management Page No.: 124

40 normalized tables, which may be modified for improving performance. This step is sometimes called 'de-normalization'. The typical modifications include the following: 1. split a table vertically, so that the attributes commonly required (accessed/updated) are bundled together in different tables, 2. split a table horizontally, so that tuples are placed in different tables based on their usage by different applications. (For example, employee file may be split into two files: permanent and temporary employees), 3. merge two/more tables (by taking their natural join), 4. merge tuples from same table by grouping them on a field, 5. introduce aggregate fields for ease of processing. The above modifications must be done in view of usage of data by the applications. There must be a good justification for every 'de-normalisation' action. 4.6 Summary In this unit we have learnt about functional dependency, how to decompose relations, normalisation and its steps in detail. Designing schemas and different approaches for its design have been discussed. Different storage devices and filing systems have been covered in detail. 4.7 Self Understanding 1. Discuss the advantages and shortcomings of various storage devices. 2. Discuss the functions of file systems. 3. Differentiate between Sequential file, Indexed sequential file, Hashed files and Indexed files. 4. What do you understand by Functional Dependency, discuss the inference rules for them. Fundamentals of Database Management Page No.: 125

41 5. What do you understand by decomposition of relations? 6. Explain in detail about Normalisation with example. 7. Derive the normal form for the case of consultant dealing with database of students/candidates. 8. Derive the normal form for the case of manufacturing department in an industry. Fundamentals of Database Management Page No.: 126

MODULE 8 LOGICAL DATABASE DESIGN. Contents. 2. LEARNING UNIT 1 Entity-relationship(E-R) modelling of data elements of an application.

MODULE 8 LOGICAL DATABASE DESIGN. Contents. 2. LEARNING UNIT 1 Entity-relationship(E-R) modelling of data elements of an application. MODULE 8 LOGICAL DATABASE DESIGN Contents 1. MOTIVATION AND LEARNING GOALS 2. LEARNING UNIT 1 Entity-relationship(E-R) modelling of data elements of an application. 3. LEARNING UNIT 2 Organization of data

More information

Physical Data Organization

Physical Data Organization Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor

More information

LOGICAL DATABASE DESIGN

LOGICAL DATABASE DESIGN MODULE 8 LOGICAL DATABASE DESIGN OBJECTIVE QUESTIONS There are 4 alternative answers to each question. One of them is correct. Pick the correct answer. Do not guess. A key is given at the end of the module

More information

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file? Files What s it all about? Information being stored about anything important to the business/individual keeping the files. The simple concepts used in the operation of manual files are often a good guide

More information

Storage in Database Systems. CMPSCI 445 Fall 2010

Storage in Database Systems. CMPSCI 445 Fall 2010 Storage in Database Systems CMPSCI 445 Fall 2010 1 Storage Topics Architecture and Overview Disks Buffer management Files of records 2 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

Functional Dependencies and Finding a Minimal Cover

Functional Dependencies and Finding a Minimal Cover Functional Dependencies and Finding a Minimal Cover Robert Soulé 1 Normalization An anomaly occurs in a database when you can update, insert, or delete data, and get undesired side-effects. These side

More information

Lecture 1: Data Storage & Index

Lecture 1: Data Storage & Index Lecture 1: Data Storage & Index R&G Chapter 8-11 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager

More information

Database Design and Normalization

Database Design and Normalization Database Design and Normalization CPS352: Database Systems Simon Miner Gordon College Last Revised: 9/27/12 Agenda Check-in Functional Dependencies (continued) Design Project E-R Diagram Presentations

More information

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing

More information

Database Design and Normal Forms

Database Design and Normal Forms Database Design and Normal Forms Database Design coming up with a good schema is very important How do we characterize the goodness of a schema? If two or more alternative schemas are available how do

More information

Relational Database Design

Relational Database Design Relational Database Design To generate a set of relation schemas that allows - to store information without unnecessary redundancy - to retrieve desired information easily Approach - design schema in appropriate

More information

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ: Review Quiz will contain very similar question as below. Some questions may even be repeated. The order of the questions are random and are not in order of

More information

Chapter 8: Structures for Files. Truong Quynh Chi [email protected]. Spring- 2013

Chapter 8: Structures for Files. Truong Quynh Chi tqchi@cse.hcmut.edu.vn. Spring- 2013 Chapter 8: Data Storage, Indexing Structures for Files Truong Quynh Chi [email protected] Spring- 2013 Overview of Database Design Process 2 Outline Data Storage Disk Storage Devices Files of Records

More information

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8 Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS Chapter 1 File Organization 1.0 Objectives 1.1 Introduction 1.2 Storage Devices Characteristics 1.3 File Organization 1.3.1 Sequential Files 1.3.2 Indexing and Methods of Indexing 1.3.3 Hash Files 1.4

More information

Theory of Relational Database Design and Normalization

Theory of Relational Database Design and Normalization Theory of Relational Database Design and Normalization (Based on Chapter 14 and some part of Chapter 15 in Fundamentals of Database Systems by Elmasri and Navathe, Ed. 3) 1 Informal Design Guidelines for

More information

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3 Storage Structures Unit 4.3 Unit 4.3 - Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CD-ROM

More information

Chapter 10. Functional Dependencies and Normalization for Relational Databases

Chapter 10. Functional Dependencies and Normalization for Relational Databases Chapter 10 Functional Dependencies and Normalization for Relational Databases Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1 Slide 13-1 Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible

More information

Record Storage and Primary File Organization

Record Storage and Primary File Organization Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records

More information

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files

More information

Database Management System

Database Management System UNIT -6 Database Design Informal Design Guidelines for Relation Schemas; Functional Dependencies; Normal Forms Based on Primary Keys; General Definitions of Second and Third Normal Forms; Boyce-Codd Normal

More information

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Chapter 13 Disk Storage, Basic File Structures, and Hashing. Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright 2004 Pearson Education, Inc. Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files

More information

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium. Chapter 4: Record Storage and Primary File Organization 1 Record Storage and Primary File Organization INTRODUCTION The collection of data that makes up a computerized database must be stored physically

More information

Schema Refinement, Functional Dependencies, Normalization

Schema Refinement, Functional Dependencies, Normalization Schema Refinement, Functional Dependencies, Normalization MSCI 346: Database Systems Güneş Aluç, University of Waterloo Spring 2015 MSCI 346: Database Systems Chapter 19 1 / 42 Outline 1 Introduction Design

More information

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Outline Informal Design Guidelines

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

File Management. Chapter 12

File Management. Chapter 12 Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution

More information

DATABASE DESIGN - 1DL400

DATABASE DESIGN - 1DL400 DATABASE DESIGN - 1DL400 Spring 2015 A course on modern database systems!! http://www.it.uu.se/research/group/udbl/kurser/dbii_vt15/ Kjell Orsborn! Uppsala Database Laboratory! Department of Information

More information

Functional Dependencies and Normalization

Functional Dependencies and Normalization Functional Dependencies and Normalization 5DV119 Introduction to Database Management Umeå University Department of Computing Science Stephen J. Hegner [email protected] http://www.cs.umu.se/~hegner Functional

More information

Normalization in Database Design

Normalization in Database Design in Database Design Marek Rychly [email protected] Strathmore University, @ilabafrica & Brno University of Technology, Faculty of Information Technology Advanced Databases and Enterprise Systems 14

More information

C# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London.. 1 21.01 20 2 23.01 2 Martin Paris.. 1 26.10 25 3 Deen London.. 2 29.

C# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London.. 1 21.01 20 2 23.01 2 Martin Paris.. 1 26.10 25 3 Deen London.. 2 29. 4. Normalisation 4.1 Introduction Suppose we are now given the task of designing and creating a database. How do we produce a good design? What relations should we have in the database? What attributes

More information

Design of Relational Database Schemas

Design of Relational Database Schemas Design of Relational Database Schemas T. M. Murali October 27, November 1, 2010 Plan Till Thanksgiving What are the typical problems or anomalies in relational designs? Introduce the idea of decomposing

More information

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS 464 Spring 2003 Topic 23 Database

More information

File System Management

File System Management Lecture 7: Storage Management File System Management Contents Non volatile memory Tape, HDD, SSD Files & File System Interface Directories & their Organization File System Implementation Disk Space Allocation

More information

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University Operating Systems CSE 410, Spring 2004 File Management Stephen Wagner Michigan State University File Management File management system has traditionally been considered part of the operating system. Applications

More information

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques Database Systems Session 8 Main Theme Physical Database Design, Query Execution Concepts and Database Programming Techniques Dr. Jean-Claude Franchitti New York University Computer Science Department Courant

More information

The Classical Architecture. Storage 1 / 36

The Classical Architecture. Storage 1 / 36 1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage

More information

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 14: DATA STORAGE AND REPRESENTATION Data Storage Memory Hierarchy Disks Fields, Records, Blocks Variable-length

More information

Theory of Relational Database Design and Normalization

Theory of Relational Database Design and Normalization Theory of Relational Database Design and Normalization (Based on Chapter 14 and some part of Chapter 15 in Fundamentals of Database Systems by Elmasri and Navathe) 1 Informal Design Guidelines for Relational

More information

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design Physical Database Design Process Physical Database Design Process The last stage of the database design process. A process of mapping the logical database structure developed in previous stages into internal

More information

Chapter 10 Functional Dependencies and Normalization for Relational Databases

Chapter 10 Functional Dependencies and Normalization for Relational Databases Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright 2004 Pearson Education, Inc. Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of

More information

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7 Storing : Disks and Files Chapter 7 Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet base Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Disks and

More information

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1 COSC344 Database Theory and Applications Lecture 9 Normalisation COSC344 Lecture 9 1 Overview Last Lecture Functional Dependencies This Lecture Normalisation Introduction 1NF 2NF 3NF BCNF Source: Section

More information

CIS 631 Database Management Systems Sample Final Exam

CIS 631 Database Management Systems Sample Final Exam CIS 631 Database Management Systems Sample Final Exam 1. (25 points) Match the items from the left column with those in the right and place the letters in the empty slots. k 1. Single-level index files

More information

2) What is the structure of an organization? Explain how IT support at different organizational levels.

2) What is the structure of an organization? Explain how IT support at different organizational levels. (PGDIT 01) Paper - I : BASICS OF INFORMATION TECHNOLOGY 1) What is an information technology? Why you need to know about IT. 2) What is the structure of an organization? Explain how IT support at different

More information

Functional Dependency and Normalization for Relational Databases

Functional Dependency and Normalization for Relational Databases Functional Dependency and Normalization for Relational Databases Introduction: Relational database design ultimately produces a set of relations. The implicit goals of the design activity are: information

More information

Chapter 2 Data Storage

Chapter 2 Data Storage Chapter 2 22 CHAPTER 2. DATA STORAGE 2.1. THE MEMORY HIERARCHY 23 26 CHAPTER 2. DATA STORAGE main memory, yet is essentially random-access, with relatively small differences Figure 2.4: A typical

More information

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed ... ...

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed ... ... Outline Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) Hardware: Disks Access Times Example -

More information

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to: 14 Databases 14.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define a database and a database management system (DBMS)

More information

Storage and File Structure

Storage and File Structure Storage and File Structure Chapter 10: Storage and File Structure Overview of Physical Storage Media Magnetic Disks RAID Tertiary Storage Storage Access File Organization Organization of Records in Files

More information

Normalization in OODB Design

Normalization in OODB Design Normalization in OODB Design Byung S. Lee Graduate Programs in Software University of St. Thomas St. Paul, Minnesota [email protected] Abstract When we design an object-oriented database schema, we need

More information

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina Normalisation to 3NF Database Systems Lecture 11 Natasha Alechina In This Lecture Normalisation to 3NF Data redundancy Functional dependencies Normal forms First, Second, and Third Normal Forms For more

More information

6. Storage and File Structures

6. Storage and File Structures ECS-165A WQ 11 110 6. Storage and File Structures Goals Understand the basic concepts underlying different storage media, buffer management, files structures, and organization of records in files. Contents

More information

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES 1 Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES INFORMAL DESIGN GUIDELINES FOR RELATION SCHEMAS We discuss four informal measures of quality for relation schema design in

More information

Schema Refinement and Normalization

Schema Refinement and Normalization Schema Refinement and Normalization Module 5, Lectures 3 and 4 Database Management Systems, R. Ramakrishnan 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational

More information

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management COMP 242 Class Notes Section 6: File Management 1 File Management We shall now examine how an operating system provides file management. We shall define a file to be a collection of permanent data with

More information

Schema Design and Normal Forms Sid Name Level Rating Wage Hours

Schema Design and Normal Forms Sid Name Level Rating Wage Hours Entity-Relationship Diagram Schema Design and Sid Name Level Rating Wage Hours Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Database Management Systems, 2 nd Edition. R. Ramakrishnan

More information

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Chapter 13: Disk Storage, Basic File Structures, and Hashing 1 CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Answers to Selected Exercises 13.23 Consider a disk with the following characteristics

More information

Normalisation. Why normalise? To improve (simplify) database design in order to. Avoid update problems Avoid redundancy Simplify update operations

Normalisation. Why normalise? To improve (simplify) database design in order to. Avoid update problems Avoid redundancy Simplify update operations Normalisation Why normalise? To improve (simplify) database design in order to Avoid update problems Avoid redundancy Simplify update operations 1 Example ( the practical difference between a first normal

More information

BCA. Database Management System

BCA. Database Management System BCA IV Sem Database Management System Multiple choice questions 1. A Database Management System (DBMS) is A. Collection of interrelated data B. Collection of programs to access data C. Collection of data

More information

DATABASE SYSTEMS. Chapter 7 Normalisation

DATABASE SYSTEMS. Chapter 7 Normalisation DATABASE SYSTEMS DESIGN IMPLEMENTATION AND MANAGEMENT INTERNATIONAL EDITION ROB CORONEL CROCKETT Chapter 7 Normalisation 1 (Rob, Coronel & Crockett 978184480731) In this chapter, you will learn: What normalization

More information

Storing Data: Disks and Files

Storing Data: Disks and Files Storing Data: Disks and Files (From Chapter 9 of textbook) Storing and Retrieving Data Database Management Systems need to: Store large volumes of data Store data reliably (so that data is not lost!) Retrieve

More information

Chapter 12 File Management

Chapter 12 File Management Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access

More information

Chapter 12 File Management. Roadmap

Chapter 12 File Management. Roadmap Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Overview Roadmap File organisation and Access

More information

Relational Database Design Theory

Relational Database Design Theory Relational Database Design Theory Informal guidelines for good relational designs Functional dependencies Normal forms and normalization 1NF, 2NF, 3NF BCNF, 4NF, 5NF Inference rules on functional dependencies

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization

More information

www.gr8ambitionz.com

www.gr8ambitionz.com Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1

More information

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems 1 Any modern computer system will incorporate (at least) two levels of storage: primary storage: typical capacity cost per MB $3. typical access time burst transfer rate?? secondary storage: typical capacity

More information

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University CS 377 Database Systems Database Design Theory and Normalization Li Xiong Department of Mathematics and Computer Science Emory University 1 Relational database design So far Conceptual database design

More information

Normalisation 6 TABLE OF CONTENTS LEARNING OUTCOMES

Normalisation 6 TABLE OF CONTENTS LEARNING OUTCOMES Topic Normalisation 6 LEARNING OUTCOMES When you have completed this Topic you should be able to: 1. Discuss importance of the normalisation in the database design. 2. Discuss the problems related to data

More information

CS143 Notes: Normalization Theory

CS143 Notes: Normalization Theory CS143 Notes: Normalization Theory Book Chapters (4th) Chapters 7.1-6, 7.8, 7.10 (5th) Chapters 7.1-6, 7.8 (6th) Chapters 8.1-6, 8.8 INTRODUCTION Main question How do we design good tables for a relational

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Data storage Tree indexes

Data storage Tree indexes Data storage Tree indexes Rasmus Pagh February 7 lecture 1 Access paths For many database queries and updates, only a small fraction of the data needs to be accessed. Extreme examples are looking or updating

More information

Database 2 Lecture I. Alessandro Artale

Database 2 Lecture I. Alessandro Artale Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (1) Database 2 Lecture I Alessandro Artale Faculty of Computer Science Free University of Bolzano Room: 221 [email protected] http://www.inf.unibz.it/

More information

DATABASE NORMALIZATION

DATABASE NORMALIZATION DATABASE NORMALIZATION Normalization: process of efficiently organizing data in the DB. RELATIONS (attributes grouped together) Accurate representation of data, relationships and constraints. Goal: - Eliminate

More information

CSCI-GA.2433-001 Database Systems Lecture 7: Schema Refinement and Normalization

CSCI-GA.2433-001 Database Systems Lecture 7: Schema Refinement and Normalization CSCI-GA.2433-001 Database Systems Lecture 7: Schema Refinement and Normalization Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com View 1 View 2 View 3 Conceptual Schema At that point we

More information

4.2: Multimedia File Systems Traditional File Systems. Multimedia File Systems. Multimedia File Systems. Disk Scheduling

4.2: Multimedia File Systems Traditional File Systems. Multimedia File Systems. Multimedia File Systems. Disk Scheduling Chapter 2: Representation of Multimedia Data Chapter 3: Multimedia Systems Communication Aspects and Services Chapter 4: Multimedia Systems Storage Aspects Optical Storage Media Multimedia File Systems

More information

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms. Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS

More information

2. Basic Relational Data Model

2. Basic Relational Data Model 2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that

More information

DATABASE MANAGEMENT SYSTEMS. Question Bank:

DATABASE MANAGEMENT SYSTEMS. Question Bank: DATABASE MANAGEMENT SYSTEMS Question Bank: UNIT 1 1. Define Database? 2. What is a DBMS? 3. What is the need for database systems? 4. Define tupule? 5. What are the responsibilities of DBA? 6. Define schema?

More information

Optimizing Performance. Training Division New Delhi

Optimizing Performance. Training Division New Delhi Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,

More information

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Databases and Information Systems 1 Part 3: Storage Structures and Indices bases and Information Systems 1 Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: - database buffer -

More information

Outline. mass storage hash functions. logical key values nested tables. storing information between executions using DBM files

Outline. mass storage hash functions. logical key values nested tables. storing information between executions using DBM files Outline 1 Files and Databases mass storage hash functions 2 Dictionaries logical key values nested tables 3 Persistent Data storing information between executions using DBM files 4 Rule Based Programming

More information

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè. CMPT-354-Han-95.3 Lecture Notes September 10, 1995 Chapter 1 Introduction 1.0 Database Management Systems 1. A database management system èdbmsè, or simply a database system èdbsè, consists of æ A collection

More information

Normal forms and normalization

Normal forms and normalization Normal forms and normalization An example of normalization using normal forms We assume we have an enterprise that buys products from different supplying companies, and we would like to keep track of our

More information

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design Chapter 7: Relational Database Design Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 7: Relational Database Design Features of Good Relational Design Atomic Domains

More information

Physical Database Design and Tuning

Physical Database Design and Tuning Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence

More information

Physical DB design and tuning: outline

Physical DB design and tuning: outline Physical DB design and tuning: outline Designing the Physical Database Schema Tables, indexes, logical schema Database Tuning Index Tuning Query Tuning Transaction Tuning Logical Schema Tuning DBMS Tuning

More information

Chapter 8. Database Design II: Relational Normalization Theory

Chapter 8. Database Design II: Relational Normalization Theory Chapter 8 Database Design II: Relational Normalization Theory The E-R approach is a good way to start dealing with the complexity of modeling a real-world enterprise. However, it is only a set of guidelines

More information

SQL Server. 1. What is RDBMS?

SQL Server. 1. What is RDBMS? SQL Server 1. What is RDBMS? Relational Data Base Management Systems (RDBMS) are database management systems that maintain data records and indices in tables. Relationships may be created and maintained

More information

1. Physical Database Design in Relational Databases (1)

1. Physical Database Design in Relational Databases (1) Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card

More information

Recovery System C H A P T E R16. Practice Exercises

Recovery System C H A P T E R16. Practice Exercises C H A P T E R16 Recovery System Practice Exercises 16.1 Explain why log records for transactions on the undo-list must be processed in reverse order, whereas redo is performed in a forward direction. Answer:

More information

Mass Storage Structure

Mass Storage Structure Mass Storage Structure 12 CHAPTER Practice Exercises 12.1 The accelerating seek described in Exercise 12.3 is typical of hard-disk drives. By contrast, floppy disks (and many hard disks manufactured before

More information

KNOWLEDGE FACTORING USING NORMALIZATION THEORY

KNOWLEDGE FACTORING USING NORMALIZATION THEORY KNOWLEDGE FACTORING USING NORMALIZATION THEORY J. VANTHIENEN M. SNOECK Katholieke Universiteit Leuven Department of Applied Economic Sciences Dekenstraat 2, 3000 Leuven (Belgium) tel. (+32) 16 28 58 09

More information