1 File Processing Systems



Similar documents
æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

Chapter 2 Database System Concepts and Architecture

Chapter 1: Introduction

DATABASE SYSTEM CONCEPTS AND ARCHITECTURE CHAPTER 2

Chapter 1: Introduction. Database Management System (DBMS)

Database System Concepts

Introduction to database management systems

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

ECS 165A: Introduction to Database Systems

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Introduction: Database management system

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.

Chapter 1: Introduction

DATABASE MANAGEMENT SYSTEM

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

Core Syllabus. Version 2.6 B BUILD KNOWLEDGE AREA: DEVELOPMENT AND IMPLEMENTATION OF INFORMATION SYSTEMS. June 2006

Database Concepts. Database & Database Management System. Application examples. Application examples

Database Management. Chapter Objectives

Database Management Systems

Author: Abhishek Taneja

Introdução às Bases de Dados

Overview of Data Management

Introduction to Database Systems. Chapter 1 Introduction. Chapter 1 Introduction

CSE 132A. Database Systems Principles

Database System. Session 1 Main Theme Introduction to Database Systems Dr. Jean-Claude Franchitti

Principles of Database. Management: Summary

1. INTRODUCTION TO RDBMS

Overview of Database Management

BCA. Database Management System

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

Lesson 8: Introduction to Databases E-R Data Modeling

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

COMP5138 Relational Database Management Systems. Databases are Everywhere!

Database Systems. Session 1 Main Theme Introduction to Database Systems Dr. Jean-Claude Franchitti

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

Topics. Introduction to Database Management System. What Is a DBMS? DBMS Types

Introduction to Database Systems


Chapter 1 Databases and Database Users

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 1 Outline

Overview of Database Management Systems

Module 3: File and database organization

B.Sc (Computer Science) Database Management Systems UNIT-V

CSE 233. Database System Overview

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Week 1 Part 1: An Introduction to Database Systems. Databases and DBMSs. Why Use a DBMS? Why Study Databases??

Oracle Database 10g: Introduction to SQL

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Introduction to Databases

CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY

Database Systems. Lecture 1: Introduction

Database 10g Edition: All possible 10g features, either bundled or available at additional cost.

CHAPTER 6 DATABASE MANAGEMENT SYSTEMS. Learning Objectives

David Dye. Extract, Transform, Load

Chapter 2. Data Model. Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

How To Improve Performance In A Database

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline. We ll learn: Faloutsos CMU SCS

Concepts of Database Management Seventh Edition. Chapter 7 DBMS Functions

Course: CSC 222 Database Design and Management I (3 credits Compulsory)

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

COMPONENTS in a database environment

Contents RELATIONAL DATABASES

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS

The Right Database for Your Growing Business Ndo M. Osias

Foundations of Business Intelligence: Databases and Information Management

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Transactions and the Internet

Database Systems Introduction Dr P Sreenivasa Kumar

Software: Systems and Application Software

ORACLE DATABASE 11G: COMPLETE

Database Management System

Course MIS. Foundations of Business Intelligence

CSE 530A Database Management Systems. Introduction. Washington University Fall 2013

CHAPTER. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION. Database Planning and Database Architecture

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

COIS Databases

Basic Concepts of Database Systems

Database Management. Technology Briefing. Modern organizations are said to be drowning in data but starving for information p.

Data Hierarchy. Traditional File based Approach. Hierarchy of Data for a Computer-Based File

The Import & Export of Data from a Database

U III 5. networks & operating system o Several competing DOC standards OMG s CORBA, OpenDoc & Microsoft s ActiveX / DCOM. Object request broker (ORB)

Chapter 14: Databases and Database Management Systems

Introduction to Object-Oriented and Object-Relational Database Systems

FileMaker 11. ODBC and JDBC Guide

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives


What is Data Virtualization?

Foundations of Business Intelligence: Databases and Information Management

Study Notes for DB Design and Management Exam 1 (Chapters 1-2-3) record A collection of related (logically connected) fields.

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

SQL, PL/SQL FALL Semester 2013

INTRODUCTION DATABASE MANAGEMENT SYSTEMS

Transcription:

COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data. The collection of data is often referred to as the database. Example: Dickinson keeps information about each student: name, Banner ID number, standing, address... this data makes up the database programs (or parts of programs) are used to add new students, to change a student s address or standing, to retrieve information about a student... Goals of a DBMS: manage large bodies of information provide convenient and efficient ways to store and access information secure information against system failure or tampering permit data to be shared among multiple users 1 File Processing Systems Why are DBMSs needed? Alternative: store information in files, and write programs as needed to modify these files and retrieve information. A file processing system is a collection of files and programs that access/modify these files. Typically, new files and programs are added over time (by different programmers) as new information needs to be stored and new ways to access information are needed. Problems with file processing systems: data redundancy and inconsistency difficulty of accessing data problems with concurrent access Example: assume I m paying for groceries with my MAC card at the same time my pay check is being deposited (and my bank uses a file processing system): withdrawal program 1. read balance from checking account file as $51 3. subtract $50 (for groceries) 4. update checking account file (new balance: $1) deposit program 2. read balance from checking account file as $51 5. add $100 (my salary) 6. update checking account file (new balance: $151) It is difficult to prevent such problems unless programs (example: withdrawal and deposit) are coordinated or integrated. atomicity problems - ensuring that a system failure during a database update does not leave the database in an inconsistent state 1

security problems not all users should have access to all data example: bank payroll personnel shouldn t know my checking account balance difficult to enforce security in an ad hoc system integrity problems data may need to satisfy certain conditions, called consistency constraints example: account balances should never fall below $0 difficult to enforce/add/change consistency constraints in a file processing system DBMSs were developed to remedy these problems. 2 Data Abstraction abstraction: (Random House College Dictionary) the act of taking away or separating hiding details that aren t needed for a particular purpose data abstraction - hiding details of how data is stored and maintained Levels of data abstraction (from low to high): physical level describes how data is actually stored on physical media example: files of records or objects needed by implementors and maintainers of DBMSs logical level describes what data is stored and how data is interrelated example: data is seen as tables or a graph or tree structure needed by database administrators (and power users ) who must see all of the data view level describes some part or subset of the database simplifies the database for users who only need part useful for controlling access to data example: payroll personnel have a view that lets them see employee data, but not customer data can have as many views as needed. example: one for payroll personnel, one for tellers, one for officers.. views can overlap the logical level is like a view that encompasses the entire database 2

view 1 view 2... view N logical level physical level 3 Instances and Schemas The data stored in the database at any given time is an instance of the database. The overall design of the database is the database schema. Example: name account number balance Linda Null 102 8725.38 Tim Wahls 497 0.50 is an instance of a database with schema (name, account number, balance) Database systems have schemas at each level of abstraction: the physical schema is the design at the physical level, i.e. as a file of records of a particular type the logical schema is the design at the logical level. Example: (name, account number, balance) is a logical schema. a subschema is a design at the view level. For example, (name, account number) is a subschema of (name, account number, balance) As with levels of abstraction, there is: one physical schema one logical schema as many subschemas as needed 4 Data Models A data model is a collection of conceptual tools for describing: data data relationships data semantics consistency constraints Data models: 3

provide a way of thinking about data that isn t linked to the implementation of the database are used at the logical and view levels (roughly) Data models of current or historical interest include: the network and hierarchical data models (obsolete) the relational model the entity-relationship model the object-oriented and object-relational models the semistructured data model (XML) 4.1 The Entity-Relationship and Object-Oriented Models In the entity-relationship model: data is viewed as sets of entities that represent things in the real world, and sets of relationships among entities entities in the system are distinct and uniquely identifiable example: customer entities can be uniquely identified by social security number data constraints can be specified explicitly The object-oriented model is similar, except that: objects play the role of entities relationships among objects are not emphasized constructing new kinds of objects from old kinds (via inheritance) is emphasized 4.2 The Relational Model In the relational model: sets of entities and sets of relationships between entities are represented as tables each table has multiple (named) columns each data item is represented as a fixed format record stored in a row of such a table For example, a table storing customer information could have columns for the customer name, address and social security number. Each row in such a table is then a record representing one customer. Databases are often designed using the entity-relationship model, and then the design is translated to the relational model for implementation. The object-relational model is a combination of the object and relational models. Objects of a class can be stored in a table with one column for each field (data member) of the class. 4

5 Database Languages 5.1 Data Definition Language A data definition language (DDL) is the language used to define and modify the logical schema of the database. often used to define/modify subschemas (views) and to specify consistency constraints the definition of the logical schema (written in the DDL) is compiled into a file or set of tables called the data dictionary the data dictionary is consulted for schema information whenever data is read or modified a separate language called the data storage and definition language is used to access the physical schema of the database 5.2 Data Manipulation Language A data manipulation language (DML) is the language used at the logical and view levels to retrieve, insert, delete and modify information stored in the database. A query is a request for information retrieval. For example (in English): list all customers who have an account balance over $500. A query language is the part of the DML used for retrieving information. The terms query language and data manipulation language are often used interchangeably. Example: SQL is a DML (and also a DDL), but is usually called a query language. DMLs are classified by the kind of query language they include: procedural DMLs require the user to specify what data is needed and to provide an algorithm for getting the data. The algorithm is written in the query language. nonprocedural DMLs require only that the user specify what data is needed. DDLs manipulate schemas, while DMLs manipulate instances 6 Data Storage and Querying The implementation of a DBMS is typically divided into two functional components: the storage manager and the query processor. 6.1 The Storage Manager The storage manager is the component that provides the interface between the physical level and the logical level. Components of the storage manager: the file manager the buffer manager, which attempts to minimize file reads and writes. Note that disk access is much slower than main memory access. the authorization and integrity manager, which: checks that consistency constraints are satisfied, and takes some action when they aren t prevents unauthorized access to data, typically through a password and security classification system the transaction manager, which: logs database updates during normal operation 5

detects when information in the database or data dictionary is lost or corrupted due to disk crash, power failure, software errors... restores the database to a previous consistent state using the log The database manager for a small system typically does not implement all of these functions. 6.2 The Query Processor Components of the query processor: the DDL interpreter, which handles DDL statements and records the effect in the data dictionary the DML compiler, which translates DML statements into low-level commands to the query evaluation engine. The DML compiler also does query optimization - evaluating alternative query evaluation plans, and choosing the plan with the lowest cost (typically in terms of disk accesses). the query evaluation engine, which executes the query evaluation plan produced by the DML compiler. 7 Transaction Management A transaction is a sequence of DML commands that forms a logical unit of work. Example: transferring money from one bank account to another. Critical properties of transactions: atomicity - a transaction must execute completely or not at all (in terms of the final effect on the database state). consistency - once a transaction completes successfully, the database must be in a consistent state. The database may be in an inconsistent state while a transaction is executing. For example, during a transaction that transfers money from account A to account B, the total funds held by the bank may be inconsistent (after account A has been debited, but before account B has been credited). Note that such consistency is largely the responsibility of the application programmer. isolation - a transaction must not be affected by other transactions that are executing concurrently. durability - once a transaction completes successfully, its effect must persist even in the presence of system failures. These properties are known as the ACID properties. 8 Data Mining Data mining is the process of analyzing large volumes of data to find useful patterns (for example, that young males who purchase video game systems are also likely to purchase HD televisions). Roughly speaking, this process has three steps: 1. preprocessing data (often from multiple databases) to put it into the format expected by the data mining algorithm 2. running the data mining algorithm, which typically uses statistical techniques to find patterns in the data 3. manually inspecting the patterns found to determine which are useful The SQL:1999 standard specifies constructs to support data mining, and many commercial tools for data analysis and visualization are available. 6

9 Database Architecture Modern database systems typically employ a client-server architecture: the database system itself runs on a server machine users interact with client machines, which connect to the server via a network Application programs typically employ either a two-tier or a three-tier architecture: two-tier architecture the application program resides partially or entirely on the client machine the program communicates with the server using standards such as ODBC and JDBC three-tier architecture the application program communicates with an application server (Oracle WebLogic, IBM Websphere, JBoss), but contains no direct database calls the application server communicates with the database server As the application server contains reusable business logic and Web interface components, this is an effective strategy for developing large applications. 10 Database Users and Administrators 10.1 Database Users application programmers application programmers write application programs in some programming language (called the host language) which permits the use of DML commands application programs are used to ease common or complicated interaction with the DBMS examples: generating paychecks, doing account transfers,... techniques for executing DML commands from within host languages: using an application programming interface(api) that provides procedures/methods for sending DML commands to the database. Examples: Open Database Connectivity (ODBC) - an API defined by Microsoft for the C programming language Java Database Connectivity (JDBC) extending the host language syntax to embed DML calls within the program. special syntax is used to separate DML commands from host language commands a DML precompiler handles embedded DML commands before the program is compiled specialized languages that combine imperative control structures (loops, if statements, procedures) with DML statements often called rapid application development (RAD) tools or fourth-generation languages usually provided with the DBMS usually allow rapid GUI development with forms and reports examples: Visual Basic for Applications (from Microsoft) and PL/SQL (from Oracle) sophisticated users: interact with the DBMS using the DML directly. specialized users: interact with the DBMS in specialized ways: 7

CAD expert systems graphical or audio data temporal data... naive users: interact with the database through application programs (including Web interfaces). 10.2 Database Administrator The database administrator (DBA) is the person(s) in control of the DBMS. One goal of DBMSs is to permit centralized control. Typical responsibilities of the database administrator: schema definition storage structure and access method definition schema and physical organization modification granting authorization for data access integrity constraint specification maintenance and backups 8