Topic 1.5: Problems with File System Data Management Every task requires extensive programming in a third-generation language (3GL) Programmer must specify task and how it must be done which require much programming efforts. Figure 1.5.1 shows a sample 3GL COBOL program. Modern databases use fourth-generation language (4GL) Allows user to specify what must be done without specifying how it is to be done, which require less programming efforts 21. SELECT CUSTOMER-DATA-FILE 22. ASSIGN TO "C:\DATAFILES\CUSTOMER.DAT" 23. ORGANIZATION IS RECORD SEQUENTIAL 24. DATA DIVISON 25. FILE SECTION 26. FD CUSTOMER-DATA-FILE 27. LABEL RECORDS ARE STANDARD. 28. 01 CUST-DATA-REC. 29. 05 C_NAME PIC X(20). 30. 05 C_PHONE PIC X(10). 31. 05 C_ADDRESS PIC X(30). 32. 05 C_ZIP PIC X(5). 33. 05 A_NAME PIC X(20). 34. 05 A_PHONE PIC X(10). 35. 05 TP PIC X(2). 36. 05 AMT PIC 9(3)V99 37. 05 REN PICPIC X(11). Figure 1.5.1 Sample 3GL COBOL Program Programming in 3GL Time-consuming, high-level activity Programmer must be familiar with physical file structure As system becomes complex, access paths become difficult to manage and tend to produce malfunctions Complex coding establishes precise location of files and system components and data characteristics Ad hoc queries are impossible Writing programs to design new reports is time consuming As number of files increases, system administration becomes difficult Making changes in existing file structure is difficult File structure changes require modifications in all programs that use data in that file Modifications are likely to produce errors, requiring additional time to debug the program Security features hard to program and therefore often omitted
Structural and Data Dependence Structural dependence Access to a file depends on its structure. That is if a file structure changes means all programs accessing this file will have to change to access the changed file successfully. The reason for this is that the file structure is coded within the program. See the sample COBOL program in Figure 1.5.1. If the file structure was not coded in the body of the program, the program becomes independent from the file structure changes. Data dependence Changes in a file's data characteristics (fields' data type or length) affect program s ability to access data in that file. That is if any of a file data characteristics change means all programs accessing this file's data will have to change to access the changed file successfully. The reason for this is that the file data characteristics are coded within the program. See the sample COBOL program in Figure 1.5.1. If the file data characteristics were not coded in the body of the program, the program becomes independent from the file structure changes. Logical data format How a human being views the data Physical data format How the computer sees the data Field Definitions and Naming Conventions Flexible record definition anticipates reporting requirements by breaking up fields into their component parts. For example, if the city and country were integrated within the address field, then it will be harder to generate reports by the City or by the Country. Then is better to separate the City and Country fields from the Address field, see Table 1.2 below.
Sample Customer File Fields Data Redundancy Data redundancy results in data inconsistency Data inconsistency happens when different and conflicting versions of the same data appear in different places in the same file or in multiple files Errors more likely to occur when complex entries are made in several different files and recur frequently in one or more files Data anomalies develop when required changes in redundant data are not made successfully Data Anomalies Modification anomalies Occur when changes must be made to existing records. For example, see Figure 1.3 below, if Agent Leah F. Hahn has new phone number, that new number must be applied (entered) in each of the CUSTOMER file records in which Ms. Hahn's phone number is shown. In this case only three changes in three records must be made. However, in a large scale system, such changes might occur in hundreds or even thousands of records. So it is clear, the potential for data inconsistencies in the file system is great. Insertion anomalies
Occur when entering new records. For example, to add new customer record in the CUSTOMER file shown in Figure 1.3 below, we must also add the corresponding agent data. If we add several hundred new customer records, we must also enter several hundred agent names and telephone numbers. Deletion anomalies Occur when deleting records. For example, if Agent Alex B. Alby quits and is deleted from the payroll, all the customers, who were associated with the Agent Alex B. Alby, in the CUSTOMER file shown in Figure 1.3 below, will refer to a nonexistent Agent. To resolve this problem, all customer records in which the Agent Alex B. Alby's name and phone number appear must be modified. So obviously, from the above data anomalies problems, the potential for data inconsistencies in the file system is great Database vs. File System It is clear from the previous description of the problems inherent in file systems make using a database system desirable Comparison between the old File System against the advanced Database Management System
File system Many separate and unrelated files Database Logically related data stored in a single logical data repository Figure 1.6 shows a Comparison between the old File System against the advanced Database Management System Contrasting Database and File Systems Concept Check List and describe three problems in the file system data management environment? Explain the three data anomalies problems in file systems? Explain why the database environment was more desirable for managing data?