CSCI 275 Database Management Systems Informally: Database - Collection of related data Database Management System (DBMS) - Software that manages and controls access to the database Database Application - Program that interacts with the database Database System - Collection of database applications, the DBMS, and a database. Examples Supermarket scan bar code price, name, reduce stock count, customer profile Credit card credit check, lost/stolen list, store transaction, statements Travel agent flight and hotel details, booking, concurrency control duplicate booking Library book details, patrons, holds, catalog, bar codes Insurance cost based on user profile University student info, enrollments, marks, staff, salary, etc. Internet transactions all database driven File Approach Database Approach Registrar Business Residence Registrar Business Residence Application Programs Application Programs Application Programs DBMS Software Student Files Student Files Student Files Student Database
File-Based System - A collection of application programs that perform services for the end-users such as storage, retrieval, update, and the production of reports. - Each program defines and manages its own data Limitations 1. Separation and isolation of data - must synchronize separate files to answer questions 2. Duplication of data - enter data multiple times and store in several files - costly - loss of data integrity data can become inconsistent 3. Data dependence - changes to field structure may require changes to many programs - COBOL Y2K problem - changing 991234 to 20001234 - any program using file must change - must also change data in old files - program/data dependence 4. Incompatible file formats - structure of file generated by one language may be different than another - customized programs must be written to translate 5. Fixed queries / proliferation of application programs - need programmer or types of queries available is fixed - need ad hoc programs for new queries - too many programs and files Two main limitations of File-Based Approach - Definition of data embedded in application programs, rather than being stored separately and independently - No control over access and manipulation of data beyond that imposed by programs
Database Approach Database - Shared collection of logically related data, and a description of this data, designed to meet the information needs of an organization - Single (possibly large) repository of data - Defined once and used simultaneously by many - No longer owned by one department shared corporate resource - Operational data + description of this data System catalog (data dictionary or metadata data about data) - Provides description of the data to permit program-data independence - Data abstraction permits change of internal definition (structure) of database without affecting external applications Logically related data - To analyze information needs of organization, identify - Entities distinct objects (persons, places, things, concepts, events) - Attributes properties describing aspects of an object - Relationships associations between entities Database Management System (DBMS) - Software system used to define, create, maintain, and control access to a database Facilities - Allows users to define and update the structure of the database, usually through the use of a Data Definition Language (DDL) - Allows users to insert, update, delete, and retrieve data from the database, usually through the use of a Data Manipulation Language (DML) o Query language e.g. SQL Structured Query Language - Provides controlled access to the database o Security system authorizes access o Integrity system maintains consistency o Concurrency control system permits shared access o Recovery control system restores to previous consistent state after failure o User-accessible catalog provides data descriptions o View mechanism reduces complexity by providing users with only the data desired/required/permitted Database Application Programs - Computer programs that interact with the database by issuing appropriate requests (typically SQL statements) to the DBMS
DBMS Components Hardware Software Data Procedures People Machine Bridge Humans Hardware - Can range from single PC to single mainframe to network of computers Software - The DBMS, application programs, operating system, network software (if necessary) Data - Operational data + meta data (data about data) - Structure of data is called the schema Procedures - Instructions and rules applied to the design and use of the database and DBMS People - Data administrators (DA) o Management of the data resource o Database planning and conceptual design o Development and maintenance of standards, policies, and procedures - Database Administrators (DBA) o Physical realization of the database o Physical database design and implementation o Security and integrity control o Maintenance of the operational system - Database Designers o Logical database design entities, attributes, relationships, constraints o Physical database design map logical design into specific storage structures - Application developers o Graphical user interfaces to the database system - End users o Naïve users access database via application programs o Sophisticated users interact directly with DBMS using query language
Advantages of DBMS - Control of data redundancy controlled duplication - Data consistency if redundancy, propagate updates - More information derive additional information after integration of data - Sharing of data more users share data and new applications built on existing data - Improved data integrity validity checks via constraints consistency rules - Improved security restricted access to portion of data via DBMS - Enforcement of standards DBA establishes conventions and procedures for all - Economy of scale less apps, lower cost for storage, backup, recovery, etc. - Balance conflicting requirements DBA strikes balance for all in organization - Improved data accessibility Ad hoc queries can be constructed via query language - Increased productivity common routines made available within DBMS - Improved maintenance data independence of the application programs - Increased concurrency DBMS concurrency controls available at multiple levels - Improved backup and recovery centralized and built-in features Disadvantages of DBMS - Complexity developers and users must understand more complex system - Size storage space and overhead of complex software system - Cost of DBMS purchase and maintain large software system - Additional hardware costs faster machine with more memory and larger disk - Cost of conversion convert from legacy system - Performance generalized software is typically slower than dedicated applications - Higher impact of failure failure of system brings company operations to a standstill
Data Model - Integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data Database Generations 1 st Generation relationships set up via pointers Hierarchical Model - Data organized in tree-like structure - Parent-to-child is one-to-many relationship (1::n) - To access child, must go through parent (start at root) - E.g. IBM IMS (Information Management System) on tape drive Network Model - Generalize tree model to a graph - Permit 1::n and m::n relationships - Nodes have more than one parent - May have standalone nodes - Many entry points into database 2 nd Generation relationships via common fields Relational Model - Based on set theory and operations - 1970 E.F. Codd paper on relational model - Implicit relationships via common fields - Relationships made when accessing data - Access gained via any table and relationships 3 rd Generation overcome limited modeling capabilities of relational model - E.g. how to incorporate images, video, behavior, etc. Object-Relational DBMS (Extended Relational DBMS) Object-Oriented DBMS