Course Notes on A Short History of Database Technology



Similar documents
Course Notes on A Short History of Database Technology

Introduction to Object-Oriented and Object-Relational Database Systems

Overview RDBMS-ORDBMS- OODBMS

Introductory Concepts

Overview of Data Management

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

Introduction: Database management system

1 File Processing Systems

Overview of Database Management

DATABASE SYSTEM CONCEPTS AND ARCHITECTURE CHAPTER 2

Chapter 2 Database System Concepts and Architecture

CSE 132A. Database Systems Principles

DATABASE MANAGEMENT SYSTEM

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Chapter 2. Data Model. Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel

Chapter 1 Databases and Database Users

Module 4 Creation and Management of Databases Using CDS/ISIS

History of Database Systems

The ObjectStore Database System. Charles Lamb Gordon Landis Jack Orenstein Dan Weinreb Slides based on those by Clint Morgan

Chapter 1: Introduction

Contents RELATIONAL DATABASES

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

ECS 165A: Introduction to Database Systems

Database Systems. Lecture 1: Introduction

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 1 Outline

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

COIS Databases

Week 1 Part 1: An Introduction to Database Systems. Databases and DBMSs. Why Use a DBMS? Why Study Databases??

CSE 233. Database System Overview

Database Management. Chapter Objectives

Database Resources. Subject: Information Technology for Managers. Level: Formation 2. Author: Seamus Rispin, current examiner

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS

Database System. Session 1 Main Theme Introduction to Database Systems Dr. Jean-Claude Franchitti

Database Management Systems

Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.

Chapter 14: Databases and Database Management Systems

Complex Data and Object-Oriented. Databases

Object Oriented Database Management System for Decision Support System.

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

Course Notes on Databases and Database Management Systems

Database Systems. Session 1 Main Theme Introduction to Database Systems Dr. Jean-Claude Franchitti

1. INTRODUCTION TO RDBMS

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Object Oriented Databases (OODBs) Relational and OO data models. Advantages and Disadvantages of OO as compared with relational

Software: Systems and Application Software

Service Oriented Architectures

full file at

Database System Concepts

n Assignment 4 n Due Thursday 2/19 n Business paper draft n Due Tuesday 2/24 n Database Assignment 2 posted n Due Thursday 2/26

Core Syllabus. Version 2.6 B BUILD KNOWLEDGE AREA: DEVELOPMENT AND IMPLEMENTATION OF INFORMATION SYSTEMS. June 2006

Topics in basic DBMS course

Modern Databases. Database Systems Lecture 18 Natasha Alechina

Data Modeling for Big Data

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

COMP5138 Relational Database Management Systems. Databases are Everywhere!

Principles of Database. Management: Summary

DATA BASE.

Database Concepts. Database & Database Management System. Application examples. Application examples

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

White Paper: 5GL RAD Development

Databases in Organizations

2. Background on Data Management. Aspects of Data Management and an Overview of Solutions used in Engineering Applications

MIS Concepts & Design. Seema Sirpal Delhi University Computer Centre

Computer Information Systems

Study Notes for DB Design and Management Exam 1 (Chapters 1-2-3) record A collection of related (logically connected) fields.

Technologies for a CERIF XML based CRIS

B.Sc (Computer Science) Database Management Systems UNIT-V

THE OPEN UNIVERSITY OF TANZANIA FACULTY OF SCIENCE TECHNOLOGY AND ENVIRONMENTAL STUDIES BACHELOR OF SIENCE IN INFORMATION AND COMMUNICATION TECHNOLOGY

Introduction. Chapter 1. Introducing the Database. Data vs. Information

WEB APPLICATION FOR TIMETABLE PLANNING IN THE HIGHER TECHNICAL COLLEGE OF INDUSTRIAL AND TELECOMMUNICATIONS ENGINEERING

BUILDING OLAP TOOLS OVER LARGE DATABASES

Module 3: File and database organization

FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY AUTUMN 2016 BACHELOR COURSES

Data Modeling Basics

Foundations of Business Intelligence: Databases and Information Management

10. Creating and Maintaining Geographic Databases. Learning objectives. Keywords and concepts. Overview. Definitions

Introduction to Databases

Division of Mathematical Sciences

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Chapter 1: Introduction. Database Management System (DBMS)

Tier Architectures. Kathleen Durant CS 3200

Principles of Distributed Database Systems

DATA WAREHOUSING AND OLAP TECHNOLOGY

Students who successfully complete the Health Science Informatics major will be able to:

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Foundations of Business Intelligence: Databases and Information Management

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

A COMPARATIVE STUDY BETWEEN THE PERFORMANCE OF RELATIONAL & OBJECT ORIENTED DATABASE IN DATA WAREHOUSING

Transcription:

Course Notes on A Short History of Database Technology Traditional File-Based Approach Three Eras of Database Technology (1) Prehistory file systems hierarchical and network systems (2) The revolution: relational database technology (3) Post-relational era new organizations of data, more complex data influence of object technology more complex applications (e.g., distributed and web-based) File technology was the first attempt to automate manual filing systems Collection of applications that each define and manage their own files File: collection of records with structure linked to a specific application and language (Pascal s files of records, C++ s struct or class declarations, COBOL s data division) Data storage and retrieval must be coded explicitly in each application 2 1 Example: management of student data in a university student data needed by several applications (application, registration, finances, library management, grade reporting,...) without database technology, each application separately maintains data files and programs to manipulate those files different formats can possibly coexist for the same data (e.g., length of names) separate files can cause multiple (i.e., redundant) updates of the same data (e.g., to change an address) History of Database Management, October 7, 2008 1 History of Database Management, October 7, 2008 2

Problems with the File-Based Approach Files versus Database A database is more self-descriptive on its content (schema) than files Data models capture more information and know about some inter-entity or inter-file relationships (e.g., by managing joins), while inter-file links must be coded in programs DBMSs offer a number of functions that need not be programmed in applications Tight integration between program and data files lack of data independence: file management is entirely application specific physical storage structures visible in application code (e.g., ISAM, VSAM) Labor intensive (lower productivity of programmers) no high-level query language run-time performance can (and must) be programmed DBMS functions must be programmed (consistency of redundant data in several files, backup, concurrency, etc) 3 4 Capabilities of a modern DBMS include maintain different views on the same data by different applications control of multi-user access distribution of data on different hardware and software platforms access control enforcement of integrity constraints query processing and optimization security and recovery File-based data management is an obsolete technology, except for applications that do not require DBMS functions. Even for those problems, when efficiency is not crucial and size is not too big, it is often worth considering simple database software (like, e.g., ACCESS) File technology is also relevant when efficiency is very important and the operations on data relatively simple (e.g., movies on demand) History of Database Management, October 7, 2008 3 History of Database Management, October 7, 2008 4

Hierarchical and Network Systems Software systems without underlying organized discipline Lack of data independence traditional view of data structures as records (on disk storage) + links: mixture of physical and logical worlds user view dependent on physical details irrelevant for information needs modern view: links = semantic relationships + physical access paths Navigational programming all database accesses imply procedural programming: no high-level query language programming = navigation governed by physical access paths long and complex programs oriented towards one-record-at-a-time navigational programming complex, necessarily-manual performance optimization Relational Model Defined in 1970 (T. Codd) About 10 years of hesitation Robust RDBMSs built in the 80 s Historical compromise between scientists, practitioners, vendors Most systems sold in the early 90 s are relational 5 6 The first general-purpose DBMSs were developed in the early 60 s based on a network (or graph-based) organisation, standardized as CODASYL ( Conference on Data Systems Languages ); examples of systems: IDS, IDMS, TOTAL, ADABAS, PHOLAS hierarchical systems or tree-based; examples: IMS, System 2000 At the time, they were unsystematic, adhoc; there was no real theoretical model about data organization (in particular, no notion of levels of data representation, of data independence, and no idea that nonprocedural data access was feasible) Early DBMSs encouraged users to view data much as it was stored example of data dependency: physical location of fields within records visible in programs other example of data dependency: pointers in the trees and graphs of hierarchies and networks confusion between physical pointers (that provide efficient acces) and conceptual (or logical) pointers that express meaninful links between related pieces of information the relational model established and demonstrated the virtues of data independence by suppressing pointers (an extreme solution!) cleaner pointers came back with object technology Novelty and Importance of the Relational Model Simplicity of basic concepts Simple theoretical framework Emergence of notion of data model Data independence: separate universe of application from that of implementation Emergence of disciplined database design Validation of a declarative, high-level approach to data management Construction of powerful DBMS software New architectures Standards 7 History of Database Management, October 7, 2008 5 History of Database Management, October 7, 2008 6

In short, the definition of the relational model and the advent of relational systems have founded modern database technology Simplicity of basic concepts: simple, abstract data structure, independent of physical considerations, sufficiently powerful and general as a realistic framework for numerous database applications in business data processing Simple basis for a theoretical framework that stimulated database research relational theory based on well-established theories: mathematical set theory and first-order predicate logic the DB field turns into a scientific discipline, the pre-relational era had drawn little interest from computer scientists many interesting questions were raised for researchers and the field attracted a lot of good minds; progress was quick in the 70 s remark about theory in general: theoretical basis lack of practicality theory = reliability, predictability a strong formal underlying theory is always interesting users need not master nor know about the whole theory Emergence of the notion of data model, quickly leading to semantic data models with more semantics than the relational model, and a clean a posteriori definition of network and hierarchic data models. Data models include the definition of integrity at the application-domain level, as a part of database design. Data independence: insulation of users from physical database design and access optimization; such a growing insulation of users from the lower layers of systems is a general trend in all informatics Emergence of a methodology for disciplined database design: a more semantic or conceptual view of information structures (e.g., entity-relationship) on top of the relational schema significantly simplifies the processes of databases analysis and design The design of high-level interactive database languages, based on set-at-a-time, nonprocedural operations, high-level operations (a first in the database world) demonstrated that a less procedural style (than that of programming with traditional programming languages) of data manipulation was not only practically feasible but also highly productive Construction of efficient DBMS software, in particular efficient query optimizers, transaction management for multiple users (concurrency control), reliability and failure recovery New architectures: database machines, client-server, distributed databases, client tools, 4GLs SQL standard = portability of applications Relational Market Relational systems still dominate the user DBMS landscape Leading systems : Oracle, Informix (now disappeared), IBM (DB2), Sybase, Microsoft (SQL Server), MySQL Products of similar functionality and performance System performance, robustness have consistently strengthened (e.g., recovery, concurrency support, distribution) Numerous client tools (4GLs), users want to manage interfaces as forms, pictures, graphs, sounds, etc. not tables of numbers PC systems, competing with the larger DBMSs for lower-end applications Tool vendors have lost importance through extensions of main DBMS products 8 Database market (Gartner, June 2003): Oracle (43 %), IBM (24 %), Microsoft (17 %) History of Database Management, October 7, 2008 7 History of Database Management, October 7, 2008 8

Economics Post-Relational Era New organizations of data entity-relationship model nonnormalized relations object technology and object-relational systems semi-structured and unstructured data (XML) New functions distribution heterogeneity (multidatabases, interoperability) active databases (triggers) and deduction ERP packages (application-oriented tasks common to many organizations) data analysis (data warehouses and data mining) More complex applications: data-management technology enters application domains (e.g., design, geography, molecular biology, electronic commerce) reduction of hardware costs (memory, computing cycles, soon network bandwidth are essentially free at the margin) 25 years ago, one hour of machine time one week of programmer time today, one week of programmer time a personal computer relational technology has provided huge overall productivity gains Evolution of DBMS software development of efficient DBMS functions (e.g., relational query optimizers) easier database programming powerful and user-friendly front-end software (traditional database software is progressively becoming middleware) If the automobile had followed the same development cycle as the computer, a Rolls- Royce would today cost 100 USD, get one million miles to the gallon, and explode once a year, killing everybody inside. 9 Weaknesses of Relational Technology Fixed collection of data types inadequate for complex data Poor conceptual modeling, e.g., of semantic relationships Economics Evolution = Economics + Methodology name of the game 30 years ago: optimize the use of expensive hardware analogous evolution in all domains of informatics (e.g., from machine language to high-level programming languages) human efficiency has become the crucial quality factor Methodological advances development of software engineering (after the diagnosis of software crisis ) object technology methods for software development (from structured methods to OO analysis and design) continuous migration of program functions into DBMS functions 10 Separation between data and operations Insufficient transaction model Mismatch between SQL and usual programming languages Insufficient performance for data-analysis applications 11 Weaknesses of Relational Technology The first applications of DBMSs (those that relational technology addressed) had data composed of many small items, with a simple structure, and with many simple queries and modifications (e.g., airline reservation systems, banking systems, personnel management systems, sales and inventory management systems) Fixed collection of (built-in) data types (essentially alphanumeric) are inadequate for applications with complex data (e.g., design, WWW, multimedia, geography, biology) History of Database Management, October 7, 2008 9 History of Database Management, October 7, 2008 10

Poor conceptual modeling (e.g., semantic relationships like generalization, aggregation, materialization must be explicitly managed by user programs) Separation between data and operations (stored procedures are not integrated in the data model, there is no encapsulation) RDBMS have not fully exploited the relational model the possibility of defining application-dependent domains has been reduced to using a fixed set of data types a wider variety of integrity constraints could have been supported The transaction model of RDBMSs is inappropriate for long-duration transactions for interactive and cooperative design environments Application programming with RDBMSs is a combination of programs in usual procedural (tuple-at-a-time) programming languages and set-oriented queries in SQL: mismatch between SQL and traditional programming languages 4GLs, OODBMSs The performance of RDBMS is insufficient for data-analysis applications ( OLAP technology) Information-Management Context Relational technology solved business data processing (administrative data) Success of database technology users demand it for domains other than business data processing Steady improvement in price/performance ratio of hardware and supporting architectures Software crisis : quality of application development is becoming a major concern maintenance dominates overall development costs reuse application-development methodology is improving Open systems variety of system architectures Contributions of OO Technology Objects can be used for both natural real-world modeling and software structuring 13 Applications are less stable than data Extensible type system (i.e., domain-specific types) simplify application development Object identity helps object chaining and sharing Data abstraction, encapsulation, and modularity some behavior naturally goes with data stable public interface for objects greater stability, easier evolution Generalization and inheritance, for reuse, easier adaptation Polymorphism, for simpler programming, easier evolution 12 History of Database Management, October 7, 2008 11 History of Database Management, October 7, 2008 12

Application Context Today s applications: objects + web + multimedia More complex data time series, historical data (data warehouses, data mining progress in decision-support systems) multimedia (streaming audio and video) weakly-structured data (XML) complex data (e.g., molecular biology, geographic information systems) Web interfaces have made 4GLs and business forms obsolete Internet, emergence of electronic commerce Object Orientation for Database-Management Systems 2 approaches pure 00: extend the concepts of object-oriented programming languages like Java or C++ to include persistence object-relational: extend the relational model to include several common object-oriented concepts They differ in their answer to the question how important is the relation? not very for the pure OO approach basically for the object-relational approach 14 15 Need for dynamic, interactive applications, with audio and video streams Convergence of data-modeling and document-management technologies: originally, most web documents were manually composed (HTML) DBMS increasingly include functions for making their contents displayable on the internet Object-Oriented Database-Management Systems (OODBMSs) Extension of object-oriented programming languages (OPLs) to management of persistent data Tight (seamless) integration between OPLs (C++, Smalltalk, Java) and DBMS OODBMSs manage (store, call, query) complex objects directly OQL language (extension of SQL) Performance thru caching large numbers of objects in address space of client program 16 History of Database Management, October 7, 2008 13 History of Database Management, October 7, 2008 14

Problems with OODBMSs Language-bound (closely tied to C++, Smalltalk, Java) Rediscover problems of tying database design close to application design Rediscover that declarative (SQL-like) approach is productive Lack of robustness for major DBMS functions for update-intensive OLTP applications (security, transaction processing, parallel processing) Vary widely in their syntax and functions (more than RDBMSs) in spite of ODMG standard Object-Relational DBMSs Main goal: provide object capabilities while safeguarding RDBMS investments Technical goals support of flexible and complex structures of the object world offer query capabilities and performance of RDBMSs Differences with RDBMSs extensible type system (application-dependent new types and functions) better integration with programming languages than SQL-92 Differences with OODBMSs compatibility with RDBMSs and SQL 17 19 RDBMS are evolving into object-relational systems RDBMS vendors have maintained their strong position (Oracle, Sybase, IBM, Microsoft) Roughly similar products Big business! OODBMS Market Appeared in the early 90 s, have now virtually disappeared Vendors: Object Design, Objectivity, Versant, Computer Associates (Jasmine) OODBMS have not displaced RDBMS and have not moved from niche markets Market development hindered by existing relational base and then by the development of object-relational systems 18 History of Database Management, October 7, 2008 15 History of Database Management, October 7, 2008 16

Problems with ORDBMS Hybrid approach Technological integration without new concepts Weak on inheritance and polymorphism No sound theoretical foundations Future of Database Management The quantity of available data is increasing enormously Networking drives multimedia databases, streaming audio and video, interactive video, huge digital libraries Applications with complex data structures (human genome, geographic databases) Exploit the available data (data warehousing and data mining for decision support) 20 22 Web Databases The first web sites stored their data in operating systems files Now web pages are dynamically built from data queried from DBMSs Semi-structured data model addresses the need to combine databases with other data sources (like web pages) relax the necessity of a fixed schema for database data 21 History of Database Management, October 7, 2008 17 History of Database Management, October 7, 2008 18