Introduction to Database Systems. Chapter 1 Introduction. Chapter 1 Introduction



Similar documents
1 File Processing Systems

Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Introduction to Databases

ECS 165A: Introduction to Database Systems

Chapter 2 Database System Concepts and Architecture

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Topics. Introduction to Database Management System. What Is a DBMS? DBMS Types

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

Introduction: Database management system

DATABASE SYSTEM CONCEPTS AND ARCHITECTURE CHAPTER 2

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

COMP5138 Relational Database Management Systems. Databases are Everywhere!

Overview of Database Management Systems

Database Management Systems. Chapter 1

Relational Database Systems 2 1. System Architecture

Database Management. Chapter Objectives

Week 1 Part 1: An Introduction to Database Systems. Databases and DBMSs. Why Use a DBMS? Why Study Databases??

B.Sc (Computer Science) Database Management Systems UNIT-V

1. INTRODUCTION TO RDBMS

Basic Concepts of Database Systems

CSE 132A. Database Systems Principles

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Concepts of Database Management Seventh Edition. Chapter 7 DBMS Functions

Principles of Database. Management: Summary

1.264 Lecture 15. SQL transactions, security, indexes

DBMS / Business Intelligence, SQL Server

Chapter 1: Introduction

BCA. Database Management System


Database Management Systems

Database Systems Introduction Dr P Sreenivasa Kumar

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

The Classical Architecture. Storage 1 / 36

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Chapter 1 Databases and Database Users

Introduction. Chapter 1. Introducing the Database. Data vs. Information

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 1 Outline

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS

CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY

Chapter 1: Introduction. Database Management System (DBMS)

Author: Abhishek Taneja

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline. We ll learn: Faloutsos CMU SCS

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.

Database System. Session 1 Main Theme Introduction to Database Systems Dr. Jean-Claude Franchitti

Introduction to Database Systems. Module 1, Lecture 1. Instructor: Raghu Ramakrishnan UW-Madison

Distributed Data Management

Introduction to Database Systems CS4320. Instructor: Christoph Koch CS

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS

Chapter 1: Introduction

Relational Database Basics Review

In-memory Tables Technology overview and solutions

Introduction to Database Systems

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

Transaction Management Overview

10. Creating and Maintaining Geographic Databases. Learning objectives. Keywords and concepts. Overview. Definitions

High-Volume Data Warehousing in Centerprise. Product Datasheet

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

How To Create A Table In Sql (Ahem)

Introduction to database management systems

INTRODUCTION TO DATABASE SYSTEMS

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

File Management. Chapter 12

Triggers & Packages. {INSERT [OR] UPDATE [OR] DELETE}: This specifies the DML operation.

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

Commercial Database Software Development- A review.

SQL Server Training Course Content

Patel Group of Institutions Advance Database Management System MCA SEM V UNIT -1

Database Management. Technology Briefing. Modern organizations are said to be drowning in data but starving for information p.

The Relational Model. Why Study the Relational Model?

Overview of Database Management

ORACLE DATABASE 11G: COMPLETE

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

Oracle Database: SQL and PL/SQL Fundamentals NEW

Chapter 1 - Database Systems

DATABASE MANAGEMENT SYSTEM

A Tutorial on SQL Server CMPT 354 Fall 2007

Database Design and Programming

Database Systems. Lecture 1: Introduction

Introdução às Bases de Dados

Overview of Data Management

Transactions and the Internet

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Database Concepts. Database & Database Management System. Application examples. Application examples

Distributed Databases

CSCI-UA: Database Design & Web Implementation. Professor Evan Sandhaus sandhaus@cs.nyu.edu evan@nytimes.com

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

LearnFromGuru Polish your knowledge

Intro to Databases. ACM Webmonkeys 2011

Chapter 1. Database Systems. Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel

CHAPTER. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION. Database Planning and Database Architecture

Transcription:

Introduction to Database Systems Winter term 2013/2014 Melanie Herschel melanie.herschel@lri.fr Université Paris Sud, LRI 1 Chapter 1 Introduction After completing this chapter, you should be able to: Explain basic notions: database state, schema, query, update, data model, DDL, DML, Explain the role of the DBMS, Explain data independence, declarativity, and the three-schema architecture, Name different classes of users of a database application system, Name some DBMS tools. 1. Introduction 2. ER-Modeling 3. Relational model(ing) 4. Relational algebra 5. SQL 2 Chapter 1 Introduction Basic Database Notions Database Management Systems (DBMS) Programmer s View, Data Independence Database Users and Database Tools 3

Task of a Database Query Answering What is a database? Difficult question. There is no precise and generally accepted definition. Naïve approach: The main task of a database system (DBS) is to answer certain questions (queries) about a subset of the real world, e.g., Questioning a DBS Which homeworks has Ann Smith completed? Database System 1 2 4 Task of a Database Data Storage The DBS acts only as a storage for information. The information must first be entered and kept current. In a sense, a DBS is a computerized version of a card-index box / filing cabinet (but more powerful and efficient). Keeping a DBS current Ann Smith has done Homework 3 and received 10 points for it. Database System OK 5 Task of a Database Efficient Data Retrieval and Combination Normal database systems do not perform particularly complicated computations on the stored data in order to answer questions. However, a DBS can retrieve the requested data quickly from a huge set of data (giga bytes, tera bytes, main memory). A DBS can also aggregate / combine several pieces of stored data to answer more complex questions ( Compute the average points for Homework 3 ). 6

Task of a Database SQL Above, the question Which homework has Ann Smith completed? was shown in natural (English) language. Making machines understand natural language is a tough task (and bears a large potential for misunderstanding). Therefore, questions (or queries) are normally written in a formal language, these days typically SQL. SQL SQL: Structured Query Language, developement started in 1986, current version SQL:2003. Pronounced S-Q-L or Sequel. 7 State, Query, Update The set of stored data is called the database state. State & query current state SQL query SELECT Homework FROM Solved WHERE Student = Ann Smith Answer Entering, modifying, or deleting information (i.e., making updates) changes the database state. Update current state Update INSERT INTO Solved VALUES ( Ann Smith, 3, 10) new state 8 Structured Information Each database can store only information of a predeclared structure (a limited domain of disclosure): Update Today s special in the cafeteria is pizza. Homework DBS Error Because the data are structured, not simply text, complex query formulations are possible, e.g., How many homeworks has each student done?. 9

Structured Information Actually, a database system stores only plain data (character, strings, numbers), and not information. Data becomes information by interpretation. Therefore, real-world concepts like students, homework, cafeterias, etc., need to be defined/declared before the database can be used. A pure text database? Which types of questions could we pose on a DBS storing text (character strings) only with no further structure provided? 10 State vs. Schema Database schema Formal definition of the structure of the database contents. Determines the possible database states. Defined only once (when DB is created). In a programming language, this corresponds to variable declarations (assigning a type to a variable) Variable declaration Example of a variable declaration in C: short int i; Possible states of variable i: -32768 i 32767 11 State vs. Schema Database state (instance of the schema) Contains the actual data, structured according to the schema. Changes often (whenever database information is updated). Corresponds to current contents/values of a programming language variable. Variable state change In state s, variable i has value 41. Now, perform state change (s to s ) via assignment i = i + 1. 12

State vs. Schema In the relational model, the data is structured in form of tables (relations). Each table has a name, a sequence of named columns (attributes), and a set of rows (tuples). A table SOLVED Student Homework Points Ann Smith 1 10 Ann Smith 2 8 Michael Jones 1 9 Michael Jones 2 9 DB schema DB state 13 Data Model Defines a formal language (syntax and semantics) for Declaring database schema Querying the current database state Changing the database state Data model is, regrettably, widely used for database schema Examples (Network model, hierarchical model,) relational model, entity relationship model, object-oriented models, UML, XML. 14 Chapter 1 Introduction Basic Database Notions Database Management Systems (DBMS) Programmer s View, Data Independence Database Users and Database Tools 15

DBMS A Database Management System (DBMS) is an application independent software system that implements a data model, i.e., allows for Definition of a DB schema for some concrete application, Storage of an instance of this schema on, e.g., a disk, Querying the current instance (database state), Changing the database state. 16 DBMS Normal users do not need to use SQL for their daily tasks of data entry or data lookup. These users use application programs that have been developed specifically for this task and offer a more accessible user interface. Internally, these application programs translate the user requests into SQL statements (queries, updates) in order to communicate with the DBMS. 17 DBMS Often, several different application programs are used to access the same centralized database. For example, the Homework DBS might provide: A read only web interface for students. A program used by the teaching assistant to load homework and exam points. A program that prints a report for the professor used to assign grades. The interactive SQL interface (SQL console) that comes with the DBMS is simply yet another way to access the DBMS. 18

DBMS Accessing a DBMS User 1 User 2 Application Program DBMS Tool (e.g., SQL console) Database Management System (DBMS) DB schema DB state 19 DB Application Systems Often, different users access the same database concurrently (i.e., at the same time, touching the same data). The DBMS is usually implemented as a background server process (or set of such processes) that is accessed over the network by application programs (clients). One can also view the DBMS as an extension of the operating system (a more powerful file system). 20 DB Application Systems Client-Server Architecture Server (DBMS) Network Client User 1 (Application) Client User 2 (SQL console)... 21

DB Application Systems Three-Tier Architecture Server (DBMS) Application Server (App. Web Server) Thin client (Browser) 22 DB Application Systems A Recap of Database Vocabulary A database (DB) consists of a DB schema and a DB state. A database management system (DBMS) is a software system that implements a data model (e.g., a relational DBMS (RDBMS) implements the relational model). A database system (DBS) consists of a DBMS and a database. A database application system consists of a DBS and a set of application programs. 23 Chapter 1 Introduction Basic Database Notions Database Management Systems (DBMS) Programmer s View, Data Independence Database Users and Database Tools 24

Persistent Storage When the output is a function of the input only, no persistent storage is necessary. Computing the factorial (n n!) Today 5 factorial 120 Tomorrow 5 factorial 120 25 Persistent Storage But the output can also be a function of the input and a persistent state. Determining how many points Ann obtained for all her homework. Today Ann Homework points 20 Tomorrow Ann Homework points 30 26 Persistent Storage A DBS provides persistent state Input Output Ann Homework points 30 Persistent state Persistent information Information that lives longer than a single process (program execution). Survives power outage and a reboot of the operating system. 27

Persistent Storage Which of the following processes/devices need persistent storage? If so, for which particular task? 1.Web browser 2.Pocket calculator 3.Mobile phone 4.Screen saver 5.DVD recorder 28 Typed Persistent Data Classical Way to Implement Persistence Typed Persistent Data (1) 37 Information needed in subsequent program invocations is saved Classical into a file. way to implement persistence: Information needed in subsequent program invocations is The operating system (OS) maintains the file on disk. saved into a file. Disks The provide operating persistent system memory: (OS) maintains the content the file is not onlost disk. if the machine Disks provide is switched persistent off or memory: the OS is rebooted. the contents is not lost if the machine is switched o or the OS is rebooted. File systems are predecessors of modern DBMS. OS files and persistence The above statement is basically true but care should be taken nevertheless. Why? File systems are predecessors of modern DBMS. 29 Typed Persistent Data (2) Typed Persistent Data Implementing Persistence with Files Implementing persistence with files: OS files are usually nothing but sequences of bytes. OS files are usually nothing but sequences of bytes. A A record structure must must be defined be defined on top on of topthis of (much this (much like in Assembler like in Assembler languages): languages): 0 40 42 44 A n n S m i t h... 0 3 1 0 The The record record and and file file structure structure is contained is contained only only in the in the programmers heads. programmers heads. The OS file system cannot prevent misinterpretation, The overflows, OS file system etc., because cannot itprevent is not aware misinterpretation, of the file structure overflows, etc., because it is not aware of the file structure. 38 30

Typed Persistent Data Implementing Persistence with a DBMS The structure of the information to be stored must be defined in a way the DBMS understands: SQL DDL command CREATE TABLE SOLVED ( STUDENT VARCHAR(40), HOMEWORK NUMERIC(2), POINTS! NUMERIC(2) ) The file structure is formally documented. The system can detect type errors in application programs. Simplified programming (higher abstraction level). 31 A Subprogram Library Most DBMSs use OS files to store the data. (Some use raw disk device access.) One can view a DBMS as a subprogram library that can be used for file access. Compared with the direct OS system calls for file access, the DBMS offers higher level operations. The DBMS offers a wide varietry of algorithms that one would otherwise have to program. 32 A Subprogram Library For instance, a typical Relational DBMS contains routines for sorting (e.g., external merge sort), searching (e.g., B-trees), file space management, buffer management, aggregation, statistical evaluation. The algorithms are optimized for large data sets (that do not fit into main memory). The DBMS also offers multi-user support (locking) and safety measures to protect data against system crashes. 33

Physical Data Independence The DBMS is a layer of software above the OS files. The files can be accessed only via the DBMS. The DBMS may change the file structure internally (reorder records, splits files, etc.) for performance reasons. This goes unnoticed by the application program. Compare with the idea of abstract data types: The implementation changes, the interface is kept stable. 34 Physical Data Independence Typical Example At the beginning, a professor used the homeworks DB only for his courses in the current term. Since the DB was small and there were relatively few accesses, it was sufficient to store the data as a heap file. Later, the entire university used the DB, and information of previous courses had to be kept for some time. DB size grows significantly, DB access much more frequently. An index file (e.g., a B-tree) is now needed to provide fast access. 35 Physical Data Independence Without DBMS Using the new B-tree index to access the file must be explicitly built into the lookup (query) commands. Thus, application programs need to be changed if the mode of file access is changed. If one forgets to change a seldolmly used application program, and this program does not update the index when the data has been updated, the DB becomes inconsistent. 36

Physical Data Independence With Relational DBMS Already at the interface, the system completely hides the(non-)existence of indexes on files. Queries and updates do not have to and cannot refer to indexes. The system automatically 1. Modifies the index in case of data updates, 2. Uses the index to evaluate queries against the indexed data when advantageous. 37 Physical Data Independence Conceptual Schema ( interface ): Only logical information content of the database, relevant to the subset of the real world modelled in the DB. Simplified view of the DB: physical storage details hidden. Internal/Physical Schema ( implementation ): Indexes, Division of tables among disks, Storage management if tables grow or shrink, Placement of new rows in a table (sort order, clustering). 38 Physical Data Independence The user enters a query (e.g., in SQL) that only refers to the conceptual schema. The DBMS translates this into a query/program (execution plan) which refers to the the internal schema. This is done by the query optimizer. The DBMS executes the translated query on the persistent instance of the internal schema. The DBMS translates the result back to the conceptual level. 39

Physical Data Independence Changing the internal schema Conceptual schema Same conceptual schema Old internal schema (no B-tree index) New internal schema (with B-tree index) 40 Declarative Languages Physical data independence requires that the query language (SQL) cannot refer to indexes. Declarative query languages go one step further: Queries should only describe what information is sought, but should not prescribe any particular method how to compute/ retrieve the desired information. Kowalski Algorithm = Logic + Control Imperative/procedural languages: explicit control, implicit logic Declarative/descriptive languages: implicit control, explicit logic 41 Declarative Languages SQL is a declarative language. The user describes conditions the requested data is required to fulfill: SQL Query SELECT S.POINTS FROM SOLVED S WHERE S.STUDENT = ANN SMITH AND S.HOMEWORK = 3 Often, simpler formulations of the same query are possible, but SQL users do not have to think about efficient execution. More concise than imperative programming: less expensive program development and maintenance. 42

Declarative Languages Declarative query languages allow powerful optimizers (no evaluation method is prescribed) need powerful optimizers (naïve evaluation is almost always too inefficient). Independence of current hardware technology and software quality: Today s queries will use tomorrow s DBMS setup and algorithms when a new version of the DBMS is released. 43 Logical Data Independence Logical data independence allows for changes to the logical information content of the database. Such changes are obviously restricted to additions to the logical information content. Example: add column SUBMISSION_DATE to table SOLVED. Such additions may be required for new applications. It should not be necessary to change old applications only because records now contain additional information. Logical data independence is important when there are application programs with distinct, but overlapping information needs. 44 Logical Data Independence External Schemas/Views Logical data independence requires a third level of database schemas, the external schemas or views. Each user (department,... ) may have an individual view of the data. An external view contains a subset of the information in the database, maybe slightly restructured. Views may also be vital because of security reasons. In contrast, the conceptual schema describes the complete information content of the database. 45

Three-Schema Architecture Three-schema architecture [ANSI/Sparc 1978] User 1 User n External schema 1... External schema n Conceptual schema Stored data Internal schema 46 More DBMS Functions Transactions Transactions are sequences of DB commands (queries and updates) that are executed as an atomic unit ( all or nothing ). DBMS may crash during/after a sequence of commands is/has been executed. The DBMS then performs undo /redo. Support for backup and recovery. Support of concurrent users. Each user is given the illusion to be the only DB user at any time. DBMS performs locking and conflict detection. 47 More DBMS Functions Security & Integrity Security Access rights: Who may perform which operations on which table? Auditing: DBMS remembers who did what/when. Integrity The DBMS checks that the entered data is plausible/complete (such checks may also span several tables). DBMS rejects updates (insertions and deletions) which would violate defined business rules. 48

More DBMS Functions Data Dictionary Metadata ( data about data, schema, user list, access rights) is availble in system tables, for instance: System tables SYS_TABLES Table_Name Owner SOLVED Melanie SYS_TABLES SYS SYS_COLUMNS SYS SYS_COLUMNS Table_Name Seq Col_Name SOLVED 1 Student SOLVED 2 Homework SOLVED 3 Points SYS_TABLES 1 Table_Name SYS_TABLES 2 Owner SYS_COLUMNS 1 Table_Name SYS_COLUMNS 2 Seq SYS_COLUMNS 3 Col_Name 49 Chapter 1 Introduction Basic Database Notions Database Management Systems (DBMS) Programmer s View, Data Independence Database Users and Database Tools 50 Database Users Database Administrator (DBA) Should know about all schemas, may change the conceptual and the internal schema (creates tables, creates/drops indexes). Can damage everything. Gives access rights to users. Ensures security. Monitors system performance (Transaction throughput #TX/s, #concurrent users, index sizes,...) Monitors available disk space and installs new disks. Ensures that backup copies of the data are made. Does recovery after disk failures, etc. 51

Database Users Application Programmer Writes programs for standard, all-day tasks, to be used by the naïve users (see below): safe data entry, report generation, data browsing. Knows SQL well, plus programming languages and development tools. Usually supervised by DBA. Might do conceptual schema design (knows which table the application will need to access/create). 52 Database Users End Users Sophisticated User (one kind of end user ) Knows SQL and/or some query tools, may use SQL console. Does non-standard aggregations/evaluations of the data without help from application programmers. May generate complex queries. Naïve User (the other kind of end user ) Uses DB only via application programs, often unaware of existence of DBMS back-end. Primarily data entry user, simple browsing-style queries against external views. 53 Database Tools Interactive SQL console Graphical/menu-based query tools Interface for DB access from standard programing lanugages (C, C++, Java) Tools for form-based DB application (4GL) Report generators Web interface Tools for data import/export, backup & recovery, performance monitoring,... 54

Summary Functions of database systems Persistence Integration/redundancy avoidance Physical and logical data independence Subprogram library: many algorithms built-in, especially tuned for external memory access (disks) Query and update evaluation 55 Summary Functions of database systems (continued) High data safety and availability (Backup & Recovery) Combinations of operations into atomic transactions Multi-user support: synchronization of concurrent accesses Integrity enforcement View management Security via data access control System catalog management (metadata) 56 Summary The main goal of the DBMS is to give the user a simplified view on the persistent storage, i.e., to hide any complications introduced by the DBMS physical layer. The user does not worry about physical storage details different information needs of other users efficient query formulation possibility of system crashes/disk failures presence of concurrent users accessing identical data subsets. 57