1.8 Database and data modeling Show understanding of the limitations of using a file based approach for the storage and retrieval of large volumes of data Information is vital to organizations. Often one of the most valuable resources in a business is its accumulated information. The problem is the storage, retrieval and manipulation of all this information. Bikash Agrawal The way in which computers manage information come a long way over the last few decades. Today s users take for granted the many benefits found in a database system. However, it wasn t that long ago that, computers relied on a much less elegant and costly approach to data management called the file based system. File based approach One way to keep information on a computer is to store it in permanent files. A company system a number of application programs; each of them is designed to manipulate data files. An organization s data was duplicated in separate files for the use of individual departments. for example the personnel department would hold details on name, address, qualifications etc. of each employee, while the payroll department would hold details of name, address and salary of each employee. Each department had its own set of application programs to process the data in these files. The system just described is called the file based system. Consider a traditional banking system that uses the file based system to manage the organization s data shown in Figure 1.1. As we can see, there are different departments in the bank. Each its own applications that manage and manipulate different data files. For banking systems, the programs may be used to debit or credit an account, find the balance of an account, add a new mortgage loan and generate monthly statements. Figure 1.1. Example of a file based system used by banks to manage data.
Disadvantages of the file based approach Using the file based system to keep organizational information a number of disadvantages. Listed below are five examples. Data redundancy and inconsistency Often, within an organization, files and applications are created by different programmers from various departments over long periods of time. This can lead to data redundancy and inconsistency. For example, a customer can have a savings account as well as a mortgage loan. Here, the customer details may be duplicated since the programs for the two functions store their corresponding data in two different data files. This gives rise to redundancy in the customer's data. Since the same data is stored in two files, inconsistency arises if a change made in the data of one file is not reflected in the other. Data isolation Data isolation is a property that determines when and how changes made by one operation become visible to other concurrent users and systems. This issue occurs in a concurrency situation. This is a problem because: It is difficult for new applications to retrieve the appropriate data, which might be stored in various files. Integrity problems Problems with data integrity is another disadvantage of using a file based system. It refers to the maintenance and assurance that the data in a database are correct and consistent. Factors to consider when addressing this issue are: Data values must satisfy certain consistency constraints that are specified in the application programs. It is difficult to make changes to the application programs in order to enforce new constraints. For example, In the savings bank application, one such integrity rule could be 'Customer ID, which is the unique identifier for a customer record, should not be empty'. There can be several such integrity rules. In a file based system, all these rules need to be explicitly programmed in the application program. Security problems Security can be a problem with a file based approach because: There are constraints regarding accessing privileges.
Application requirements are added to the system in an ad hoc manner so it is difficult to enforce constraints. For example, in a banking system, payroll personnel need to view only that part of the database that information about the various bank employees. They do not need access to information about customer accounts. Since application programs are added to the system in an ad hoc manner, it is difficult to enforce such security constraints. Concurrency access Concurrency is the ability of the database to allow multiple users access to the same record without adversely affecting Bikash transaction processing. Agrawal A file based system must manage, or prevent, concurrency by the application programs. Typically, in a file based system, when an application opens a file, that file is locked. This means that no one else access to the file at the same time. In database systems, concurrency is managed thus allowing multiple users access to the same record. This is an important difference between database and file based systems. Describe the features of a relational database which address the limitations of a file based Approach The solution of many of these problems with using flat files was the arrival of relational database system. The data are stored in tables which have relationships between the various tables. Each table stores data about an entity i.e. some thing about which data are stored, for example, a customer or a product. Each table a primary key field, by which all the values in that table are identified. The table can be viewed just like a spreadsheet grid, so one row in the table is one record. The practical design of relational databases is based in the theory developed in the late 1970s by Ted Codd. The theory called the entities relations and they are implemented as tables. Each record in the table is called a tuple (also known as a row). A data item is known as an attribute (or a column). The records in the table can be related to entities in other tables by having common fields within the entities. So, the problem of duplication of customer details in saving account and mortgage loan can be solved by using relevant field in the mortgage loan table simply containing the key of saving account. The likely data design here would be: The saving account a primary key AccHolderID. The mortgage loan also the AccHolderID The AccHolderID field in the mortgage loan table is foreign key.
Here differing needs of the departments are met by the software that is used to control the data. As all the data are stored somewhere in the system, a department only needs software that can search for it. In this each department does not need its own set of data, simply its own view of the centralized database to which all users have access. Advantages of RDBMS over flat file approach Data are contained in a single software applications the relational database software. Duplication of data is minimized and so the chance of data inconsistency is reduced As long as there is a link to the table sorting the data, they can always be accessed via the link rather than repeating the data. Because data duplication is minimized, the volume of data is reduced, leading to faster searching and sorting of data. Show understanding of the features provided by a DBMS to address the issues of: data management, including maintaining a data dictionary The data dictionary contains information about the actual database itself. This data enables the DBA to keep a tight control over all aspects of the database and facilities maintenance. The dictionary could contain: The detailed description of each data item The relationships between data items Access rights for users and groups Bikash Agrawal Validation rules The map between the logical and the physical view for storage purposes. Data recovery procedures. A transaction log to monitor the users, programs and data. data modeling When large databases are designed, it become common practice to use diagrams. The re are often referred to as data models. Each model a number of key elements. Entities Attributes Relationships: these provide the links between the entities. For example: Cities Students take Courses Capital
Entity relationship diagrams (ER diagrams) These diagrams are graphical representations of the structure of data. RE diagrams allow the analyst to think about and model general relationships. There are three possible relationships linking the entities. One to one relationship Capital Each country one capital city One to many relationship Cities Each country many cities Many to Many relationship Students take Courses Each student takes many courses, and same course is taken by many students Bikash logical schema Agrawal Databases are characterized by a three schema architecture because there are three different ways to look at them. Each schema is important to different groups in an organization. The graphic below illustrates this architecture and the groups most involved with each schema.. Logical Schema A database s logical schema is its overall logical plan. This schema is developed with diagrams that define the content of database tables and describe how the tables are linked together for
data access. Database designers are responsible for creating the logical schema. Application developers and database administrators may find the logical schema useful for performing certain tasks.