CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?



Similar documents
Topics. Introduction to Database Management System. What Is a DBMS? DBMS Types

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

Database Management Systems. Chapter 1

The Relational Model. Why Study the Relational Model?

ECS 165A: Introduction to Database Systems

1 File Processing Systems

Database Systems. Lecture 1: Introduction

Week 1 Part 1: An Introduction to Database Systems. Databases and DBMSs. Why Use a DBMS? Why Study Databases??

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

Introduction to Database Systems. Module 1, Lecture 1. Instructor: Raghu Ramakrishnan UW-Madison

COMP5138 Relational Database Management Systems. Databases are Everywhere!

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Introduction to Database Systems CS4320. Instructor: Christoph Koch CS

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Overview of Database Management

Introduction: Database management system

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

Overview of Database Management Systems

Introduction to Databases

Overview of Data Management

Overview. Introduction to Database Systems. Motivation... Motivation: how do we store lots of data?

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Relational Database Basics Review

CSE 530A Database Management Systems. Introduction. Washington University Fall 2013

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.

Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.

CHAPTER 6 DATABASE MANAGEMENT SYSTEMS. Learning Objectives

DATABASE MANAGEMENT SYSTEM

CSE 132A. Database Systems Principles

DATABASE SYSTEM CONCEPTS AND ARCHITECTURE CHAPTER 2

Module 4 Creation and Management of Databases Using CDS/ISIS

Chapter 2 Database System Concepts and Architecture

Database Management Systems

Chapter 1: Introduction

Chapter 1: Introduction

10. Creating and Maintaining Geographic Databases. Learning objectives. Keywords and concepts. Overview. Definitions

Basic Concepts of Database Systems

SQL Tables, Keys, Views, Indexes

Chapter 1: Introduction. Database Management System (DBMS)

1. INTRODUCTION TO RDBMS

Introduction to database management systems

Principles of Database. Management: Summary

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

Lecture 6. SQL, Logical DB Design

COMPONENTS in a database environment

Database Concepts. Database & Database Management System. Application examples. Application examples

Foundations of Information Management

Database System Concepts

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

3. Relational Model and Relational Algebra

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Introduction to Database Systems. Chapter 1 Introduction. Chapter 1 Introduction

Database Management. Technology Briefing. Modern organizations are said to be drowning in data but starving for information p.


Θεµελίωση Βάσεων εδοµένων

Contents RELATIONAL DATABASES

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

BCA. Database Management System

Introduction to Database Systems

DBMS / Business Intelligence, SQL Server

CSE 233. Database System Overview

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

Author: Abhishek Taneja

Database 10g Edition: All possible 10g features, either bundled or available at additional cost.

INTRODUCTION TO DATABASE SYSTEMS

Chapter 2. Data Model. Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel

Course: CSC 222 Database Design and Management I (3 credits Compulsory)

In This Lecture. Security and Integrity. Database Security. DBMS Security Support. Privileges in SQL. Permissions and Privilege.

Database Design and Programming

Introduction to Object-Oriented and Object-Relational Database Systems

EECS 647: Introduction to Database Systems

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

Review Entity-Relationship Diagrams and the Relational Model. Data Models. Review. Why Study the Relational Model? Steps in Database Design

Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange

Database Systems Introduction Dr P Sreenivasa Kumar

Introductory Concepts

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 1 Outline

Lesson 8: Introduction to Databases E-R Data Modeling

There are five fields or columns, with names and types as shown above.

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline. We ll learn: Faloutsos CMU SCS

The core theory of relational databases. Bibliography

Foundations of Information Management

Databases and BigData

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

Chapter 1 Databases and Database Users

History of Database Systems

DIRECTORATE OF OPEN & DISTANCE LEARNING DATABASE MANAGEMENT SYSTEMS

Core Syllabus. Version 2.6 B BUILD KNOWLEDGE AREA: DEVELOPMENT AND IMPLEMENTATION OF INFORMATION SYSTEMS. June 2006

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

How To Manage Data In A Database System

Database System. Session 1 Main Theme Introduction to Database Systems Dr. Jean-Claude Franchitti

Outline. Data Modeling. Conceptual Design. ER Model Basics: Entities. ER Model Basics: Relationships. Ternary Relationships. Yanlei Diao UMass Amherst

Commercial Database Software Development- A review.

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

Transcription:

CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 Introduction to Databases CS2 Spring 2005 (LN5) 1 Why databases? Why not use XML? What is missing from XML: Consistency and integrity: XML representations are tree-like and often encourage redundant representations of data Efficient storage: XML is intended for data exchange, not data storage (arguable) Efficient query evaluation: current XML query technology does not scale, and type checking is difficult Concurrency control: updates, transactions and recovery are not well understood for XML many people want to work on the data simultaneously Security: security policies should be enforced such that different users have permissions to access different subsets of the data... The problems get worse when the data gets large CS2 Spring 2005 (LN5) 2

Example: a document for students and courses What is wrong with the following? <school> <student id = 011 > <name> G. W. Bush </name> <taking><course cno = Eng > <title> Spelling</title> </course> <course cno = CS2 > <title> XML </title> </course> </taking> <GPA> 1.5 </GPA> </student> <student id = 012 > <name> D. Cheney </name> <taking> <course cno = CS2 > <title>politics</title> </course> </taking> <GPA> 1.4 </GPA> </student>... </school> CS2 Spring 2005 (LN5) 3 Problem 1. Data organization Redundancy: It appears to be keeping our course records in student records and doing so redundantly efficiency: consider information about the courses that the students are taking, such as instructor, classroom, etc. consistency: the CS2 record is different in Bush and Cheney s record loss of information: what if we drop Bush and Cheney from the document? We may end up losing all the information about CS2 and Eng So maybe we want to separate our student records and course records, and link them. But how? CS2 Spring 2005 (LN5) 4

Problem 2. Efficiency Probably a student file would never contain more than a few thousand entries. But there are things we d like to do quickly and efficiently even with our simple files. Examples: Give me all the students whose gpa is greater than 3.0. Who are taking CS2 and Eng simultaneously? We would like to program these as quickly as possible. We would like these programs to be executed efficiently. What would happen if you were using a document of universities with hundreds of thousands of entries? CS2 Spring 2005 (LN5) 5 Problem 3. Concurrency control Suppose several people are allowed to modify the document of student records. How do we stop two people changing the file at the same time and leaving it in a physical (or logical) mess? Example. Suppose at the same time, Bush reads his gpa (1.5), adds 2.5 to it, and writes it back Cheney reads Bush s gpa, subtracts it by 1.5, and writes it back. What would happen if Cheney reads Bush s gpa before Bush writes it, and writes the result back after Bush writes it? Bush s gpa now becomes 0 instead of 2.5! If you don t care about your GPA, just imagine this might happen to your bank account. CS2 Spring 2005 (LN5) 6

Problem 4. Recovery and reliability The system may crash while we are changing the document Example. We are increasing everyone s gpa by a certain percentage in response to Blair s education reform (this may not happen), but in the middle of it, the system crashes. Then those who didn t get an increase would not be happy. CS2 Spring 2005 (LN5) 7 Problem 5. Consistency and security How to ensure each student has a unique sid? gpa is in the range from 0 to 4? How to prevent a student from increasing his/her own gpa? a student from checking other s gpa? Operating systems are not sufficiently flexible to enforce security policies. CS2 Spring 2005 (LN5) 8

Advantages of a DBMS Data independence: it provides an abstract view of data to insulate application code from details of data representations and storage. Uniform data administration and efficient data access: it utilizes a variety of techniques to store and retrieve data efficiently. Reduced development time: it supports many important functions that are common to many applications accessing data stored in the DBMS. CS2 Spring 2005 (LN5) 9 Advantages of a DBMS (cont d) Concurrency control: it schedules concurrent accesses to the data such that the users can pretend they are using a singleuser system. Recovery from crashes: it also protects users from the effects of system failure. Data integrity: it supports integrity constraints, that is, conditions that data must satisfy to ensure that the data is consistent and accurate. Security: it enforces access control on the data. CS2 Spring 2005 (LN5) 10

Three-level architecture The logical structure: what users see. The program or query language interface. The physical structure: how files are organized. What indexing mechanisms are used. Further the logical level is split into two: conceptual level and view level views views views Conceptual level Physical level CS2 Spring 2005 Disk (LN5) 11 Fix the problem of the example XML file (cont d) Separate students from courses <school> <student id = 011 > <name> G. W. Bush </name> <gpa> 1.5 </gpa> </student> <student id = 012 > <name> D. Cheney </name> <gpa> 1.4 </gpa> </student>... <course cno = Eng > <title> Spelling</title> </course> <course cno = CS2 > <title> XML </title> </course>... </school> CS2 Spring 2005 (LN5) 12

Fix the problem of the example XML file Add enrollment records to link the two <school> <student id = 011 > <name> G. W. Bush </name> <gpa> 1.5 </gpa> </student> <student id = 012 > <name> D. Cheney </name> <gpa> 1.4 </gpa> </student>... <course cno = Eng > <title> Spelling</title> </course> <course cno = CS2 > <title> XML </title> </course>... <enroll cno = Eng sid = 011 /> <enroll cno = CS2 sid = 011 /> <enroll cno = CS2 sid = 012 />... </school> CS2 Spring 2005 (LN5) 13 This ends up with three tables Conceptual level: Students(name: string, sid: string, email: string, gpa: real) Courses(title: string, cno: string) Enroll(sid: string, cno: string) This is an example of a relational database schema, which we will talk about soon. Physical level: Relations (a set of records) stored as unordered files Index on the second column of students View: Course-info(cno: string, enrollment: integer) CS2 Spring 2005 (LN5) 14

Data independence Logical data independence: protection from changes in logical structure of data. The ability to modify the conceptual level without causing the application programs (queries against views) to be rewritten. Physical data independence: protection from changes in physical structure of data. The ability to modify the physical level without causing changes at the conceptual level. This is one of the most important benefits of using a DBMS. After all, you don t worry about how numbers are stored when you use a computer-based calculator. This is the same principle. CS2 Spring 2005 (LN5) 15 Example SQL query One should be able to use SQL to query the university database, for example, SELECT name FROM Students WHERE gpa > 3.0 without knowing, nor caring about how precisely data is stored CS2 Spring 2005 (LN5) 16

That s the traditional view, but Three-level architecture is not always achievable. When databases get big, users still have to worry about efficiency. There are databases over which we have no control. For example, the Web is a giant, disorganized database. There are also well-organized databases on the Web, for example, http://www-db.stanford.edu/ which has a very nice organization, but for which the terminology does not quite apply. CS2 Spring 2005 (LN5) 17 Describing data in a DBMS data models and database design When we design a database we try to think logically, but we need some kind of framework in which to design the database. The problem is rather like designing a data structure in some programming language. You might use arrays, lists, etc. depending on what is available. A data model is a collection of concepts for describing data. Data model vs. type system in programming language. A schema is a description of a particular collection of data, using the given data model. Schemas vs. types in programming language. An instance of a schema (database) is the collection of data stored in the database at a particular moment in time. Instances vs. values of types. CS2 Spring 2005 (LN5) 18

The relational model an introduction (1) The relational data model is the most widely used model today. Vendors: IBM, Microsoft, Oracle, Sybase, etc. Recall the notions of sets and records. Relation: a table with rows and columns (or a set of tuples). Column: fields, attributes Rows: tuples, records name Sid email gpa John 0001 John@nimbus.ocis 3.0 Joe 0002 Joe@nimbus.ocis 3.6 Mary 0003 Mary@nimbus.ocis 2.8 Grace 0004 Grace@nimbus.ocis 4.0 CS2 Spring 2005 (LN5) 19 The relational model an introduction (2) (Relation) Schema: every relation has a schema, which describes the columns (fields, attributes). Example: Students(name: string, sid: string, email: string, gpa: real) Courses(title: string, cid: string, credits: integer) Enroll(sid: string, cid: string, grade: string) Relational database: a collection of relations with distinct relation names. Database schema: a collection of relation schemas for the relations in the database. Example: {Students, Courses, Enroll} CS2 Spring 2005 (LN5) 20

Why are relational databases so important? They are extremely simple to understand Query languages for them are well understood. There is an interesting and useful connection between relational query languages and first-order logic Query languages are optimizable. There is well-developed technology for this. Optimizing queries for hierarchical and XML data is much less well-understood. Updates and transaction processing are easier to understand and implement for relational databases. CS2 Spring 2005 (LN5) 21 Other data models Object-oriented data model. Object-oriented database systems: ObjectStore, O2, Ode, etc. Object-relational model. Object-relational database systems: UniSQL, Informix Universal Server, etc. Semantic data model, e.g., the ER model. Semistructured data models: OEM, XML trees (DOM), etc. Developed for Web databases and for data on the Web. Other data models: Network model Hierarchical model The functional data model CS2 Spring 2005 (LN5) 22

Database languages Data definition language (DDL): defining schemas. Data manipulation language (DML): retrieving and manipulating data. Query: a statement requesting the retrieval of information. Example: a SQL query SELECT name FROM Students WHERE gpa > 3.0 Query language: expressing queries. It is part of DML. Relational query languages: relational algebra and relational calculus Commercial languages: SQL (Structured Query Language) and QBE (Query-By-Example). Updates: insert, delete, modify. CS2 Spring 2005 (LN5) 23 Structure of a DBMS DDL and DML compilers, query processor. Query processor: the most important components of a DBMS: query execution plan query optimization File and access methods Buffer management Disk space management Concurrency control and recovery queries DBMS DB CS2 Spring 2005 (LN5) 24

Database folks -- where do you want to end up? Database implementers: build DBMS software. End-users: store and use data in a DBMS. Query languages, conceptual level design, ect. The basic skills have become a must for many jobs today. Database application programmers: develop packages that facilitate data access for a particular group of end-users. Query languages, conceptual and physical level design, system functions, etc. Database administrators: design and maintain databases. Conceptual and physical level design: security and authorization, recovery from failure, database tuning Others: the Web people (e.g., E-commerce, XML, workflows). CS2 Spring 2005 (LN5) 25 What we shall learn Relational data model Relational query languages Relational algebra SQL CS2 Spring 2005 (LN5) 26

Summary what you should remember! Database and DBMS. Why DBMS? What is the typical structure of a DBMS? Data independence. Levels of abstraction (logical, physical). What is logical independence? Physical independence? Why? Data models. Schemas. Instances (database). How to represent information about the real world in a database? Database languages: DDL, DML, query languages. Is DML a language just for database updates? Is query language part of DML? DBA, end-user, application programmers. What are the responsibilities of a DBA? CS2 Spring 2005 (LN5) 27