CSE 544 Principles of Database Management Systems

Similar documents
CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Winter 2009 Lecture 1 - Class Introduction

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Spring 2006 Lecture 1 - Class Introduction

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Fall 2007 Lecture 1 - Class Introduction

Database Management Systems. Chapter 1

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

Information Systems SQL. Nikolaj Popov

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

Database Systems. Lecture 1: Introduction

CSE 233. Database System Overview

Database Design and Programming

3. Relational Model and Relational Algebra

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

Relational Algebra. Module 3, Lecture 1. Database Management Systems, R. Ramakrishnan 1

CSE 530A Database Management Systems. Introduction. Washington University Fall 2013

The core theory of relational databases. Bibliography

The Relational Model. Why Study the Relational Model?

CSE 132A. Database Systems Principles

Relational Databases


Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Introduction to Database Systems CS4320/CS5320. CS4320/4321: Introduction to Database Systems. CS4320/4321: Introduction to Database Systems

TIM 50 - Business Information Systems

ICAB4136B Use structured query language to create database structures and manipulate data

The Entity-Relationship Model

IT2305 Database Systems I (Compulsory)

Introduction to Databases

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.

Introduction to Database Systems CS4320. Instructor: Christoph Koch CS

Syllabus for Course : Database Systems Engineering at Kinneret College

2. Basic Relational Data Model

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Relational Algebra and SQL

SQL Simple Queries. Chapter 3.1 V3.0. Napier University Dr Gordon Russell

Introduction to database design

SQL DATA DEFINITION: KEY CONSTRAINTS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 7

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Evaluation Checklist Data Warehouse Automation

Databases and BigData

CSE 562 Database Systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Fundamentals of Database Design

Overview of Database Management Systems

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Question 1. Relational Data Model [17 marks] Question 2. SQL and Relational Algebra [31 marks]

Advanced Database Management MISM Course F A Fall 2014

1. INTRODUCTION TO RDBMS

Lecture 6. SQL, Logical DB Design

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Schema Mappings and Data Exchange

How To Create A Table In Sql (Ahem)

How To Manage Data In A Database System

From Spark to Ignition:

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

CSE 412/598 Database Management Spring 2012 Semester Syllabus

Objectives of Lecture 1. Class and Office Hours. Labs and TAs. CMPUT 391: Introduction. Introduction

IT2304: Database Systems 1 (DBS 1)

EECS 647: Introduction to Database Systems

SQL: Queries, Programming, Triggers

Example Instances. SQL: Queries, Programming, Triggers. Conceptual Evaluation Strategy. Basic SQL Query. A Note on Range Variables

Effective Use of SQL in SAS Programming

LiTH, Tekniska högskolan vid Linköpings universitet 1(7) IDA, Institutionen för datavetenskap Juha Takkinen

CSC 443 Data Base Management Systems. Basic SQL

Programming Database lectures for mathema

Tableau's data visualization software is provided through the Tableau for Teaching program.

Review Entity-Relationship Diagrams and the Relational Model. Data Models. Review. Why Study the Relational Model? Steps in Database Design

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

Outline. Data Modeling. Conceptual Design. ER Model Basics: Entities. ER Model Basics: Relationships. Ternary Relationships. Yanlei Diao UMass Amherst

Database Management. Technology Briefing. Modern organizations are said to be drowning in data but starving for information p.

In This Lecture. SQL Data Definition SQL SQL. Notes. Non-Procedural Programming. Database Systems Lecture 5 Natasha Alechina

Review: Participation Constraints

SQL Server An Overview

Commercial Database Software Development- A review.

Data Management in the Cloud

Lecture Slides 4. SQL Summary. presented by Timothy Heron. Indexes. Creating an index. Mathematical functions, for single values and groups of values.

Accessing Your Database with JMP 10 JMP Discovery Conference 2012 Brian Corcoran SAS Institute

COMP5138 Relational Database Management Systems. Databases are Everywhere!

DATABASE MANAGEMENT SYSTEMS. Question Bank:

1 File Processing Systems

f...-. I enterprise Amazon SimpIeDB Developer Guide Scale your application's database on the cloud using Amazon SimpIeDB Prabhakar Chaganti Rich Helms

CS 4604: Introduc0on to Database Management Systems

Advance DBMS. Structured Query Language (SQL)

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Northeastern University Online College of Professional Studies Course Syllabus

Databases 2011 The Relational Model and SQL

SQL Server for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

Scope of this Course. Database System Environment. CSC 440 Database Management Systems Section 1

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Active database systems. Triggers. Triggers. Active database systems.

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

Integrating Big Data into the Computing Curricula

Introduction: Database management system

Overview. Introduction to Database Systems. Motivation... Motivation: how do we store lots of data?

ETL Overview. Extract, Transform, Load (ETL) Refreshment Workflow. The ETL Process. General ETL issues. MS Integration Services

Technology Foundations. Conan C. Albrecht, Ph.D.

Transcription:

CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 1 - Introduction and the Relational Model 1

Outline Introduction Class overview Why database management systems (DBMS)? The relational model 2

Course Staff Instructor: Alvin Cheung Office hours: Wednesdays 2:30pm-3:30pm (or by appointment) Location: CSE 530 TA Extraordinaire: Shumo Chu Graduate student in the database group 544 survivor veteran Office hours and location: to be announced on course website Use cse544-staff@cs.washington.edu to reach us 3

Who I think I am Assistant professor (arrived 9 months ago) PhD from MIT Research interests: database management systems Current research focus New database architectures Database applications Research interests: programming systems Current research focus Domain-specific languages and compilers Program synthesis Hint: Take CSE 501 next time it s offered! 4

Who I think I am 5

My Teaching Style I speak fast (and sometimes in a weird accent) Especially when I get excited But I care a ton about your progress in class When you look bored, I speed up If no one asks questions, I go even faster! If you are bored, feel free to sleep (at your peril) If you are bored, feel free to check your email / facebook / etc (at your peril) If you are lost, ask me a question! Or just yell HELP! 6

Goals of the Class/Class Content Study principles of data management Data models, data independence, declarative query language, etc. Study key Database Mgmt Systems (DBMS) design issues Storage, query execution and optimization, transactions Parallel data processing, data warehousing, streaming data, etc. Study some current, hot research topics Ensure that You are comfortable using a DBMS locally and in the cloud You have an idea about how to build a DBMS You know a bit about current research topics in data management You get to explore in depth a data management problem (project) 7

Class Format Two lectures per week: Tues & Thurs @ 10:30am Mix of lecture and discussion Mostly based on papers Must read papers before lecture and submit a paper review Come prepared to discuss the papers assigned for the class Class participation counts for a non-negligible part of your grade 8

Class Topics Fundamentals Relational algebra Physical design Query optimization Transactions (aka how Amazon / airlines / banks work) Concurrency control Recovery Analytics (aka big data?) Data warehousing Parallel processing frameworks Advanced topics New architectures Database applications The Originators Magdalena Balazinska Dan Suciu 9

Readings and Notes Readings are based on papers Mix of old seminal papers and new papers Papers available online on class website Some come from the red book [no need to get it] Three types of readings Mandatory, additional resources Background readings from the following book Database Management Systems. Third Ed. Ramakrishnan and Gehrke. McGraw-Hill. [recommended] Lecture notes (the slides) Posted on class website after each lecture 10

Class Resources Website: lectures, assignments, projects http://www.cs.washington.edu/544 List of all the deadlines Mailing list on course website: Make sure you register Your @uw.edu email address is already on the list Discussion board: Discuss assignments, papers, weather, stock, etc HW: Introduce yourself to everyone by posting a new message on discussion board 11

Evaluation Class participation & paper reviews 20% Paper reviews are due by the beginning of each lecture Reading questions are posted on class website You need to speak up in class from time to time Assignments 35%: **Individual** HW1: Using a DBMS / data analysis pipeline HW2: Building a simple DBMS HW3: Analyzing a large dataset with a cloud DBMS Project 45%: Groups of up to three students Small research or engineering. Start to think about it now! 12

Class Participation An important part of your grade Because We want you to read & think about papers throughout quarter Important to learn to discuss papers Expectations Ask questions, raise issues, think critically Learn to express your opinion Respect other people s opinions 13

Paper reviews Between 1/2 page and 1 page in length Summary of the main points of the paper Critical discussion of the paper Guidelines on course website Reading questions For some papers, we will post reading questions to help you figure out what to focus on when reading the paper Please address these questions in your reviews Grading: credit/no-credit MUST submit review 12 HOURS BEFORE lecture Individual assignments (but feel free to discuss paper with others) 14

Assignments Goals: Hands-on experience using a DBMS, building a simple DBMS, and using a cloud DBMS for data analysis HW1: Use a DBMS HW2: Build a simple DBMS HW3: Large-scale data analysis with a cloud DBMS See course calendar for deadlines HW1 will be posted on next week We will accept late assignments with valid excuse 15

Project Overview Topic Choose from a list of mini-research topics Or come up with your own Can be related to your ongoing research Can be related to a project in another course Must be related to databases / data management Must involve either research or significant engineering Open ended Amazon AWS Final deliverables credits available! Short conference-style paper (6 pages) Conference-style presentation or posters depending on nb groups 16

Project Goals Study in depth a data management problem that you find interesting Understand and model the problem Research and understand related work (no need to be exhaustive) Propose some new approach or some interesting eng. problem Implement some parts Evaluate your solution Write-up and present your results Amount of work may vary widely between groups 17

Project Milestones Dates will be posted on course website Project proposal Milestone report Final report Final presentation More details will be on the website, including ideas & examples We will provide feedback throughout the quarter 18

Now onward to the world of databases! 19

Let s get started What is a database? A collection of files storing related data Give examples of databases Accounts database; payroll database; UW s students database; Amazon s products database; airline reservation database Your ORCA card transactions, Facebook friends graph, past tweets, etc 20

Data Management Data is valuable but hard and costly to manage Example: database for a store Entities: employees, positions (ceo, manager, cashier), stores, products, sells, customers. Relationships: employee positions, staff of each store, inventory of each store. What operations do we want to perform on this data? What functionality do we need to manage this data? 21

Required Functionality 1. Describe real-world entities in terms of stored data 2. Create & persistently store large datasets 3. Efficiently query & update 1. Must handle complex questions about data 2. Must handle sophisticated updates 3. Performance matters 4. Change structure (e.g., add attributes) 5. Concurrency control: enable simultaneous updates 6. Crash recovery 7. Access control, security, integrity Difficult and costly to implement all these features We will cover all these topics this quarter 22

Database Management System A DBMS is a software system designed to provide data management services Examples of DBMS Oracle, DB2 (IBM), SQL Server (Microsoft), PostgreSQL, MySQL, 23

Why should you care? From 2006 Gartner report: IBM: 21% market with $3.2BN in sales Oracle: 47% market with $7.1BN in sales Microsoft: 17% market with $2.6BN in sales Rise of big data 24

Typical System Architecture Two tier system or client-server connection (ODBC, JDBC) Data files Database server (someone else s C program) Applications 25

Main DBMS Features Data independence Data model Data definition language Data manipulation language Efficient data access Data integrity and security Data administration Concurrency control Crash recovery Reduced application development time How to decide what features should go into the DBMS? 26

When not to use a DBMS? DBMS is optimized for a certain workload Some applications may need A completely different data model Completely different operations A few time-critical operations Example Highly optimized scientific simulations 27

Outline Introductions Class overview Why database management systems (DBMS)? The relational model 28

Relation Definition Database is collection of relations Relation is a table with rows & columns SQL uses the term table to refer to a relation Relation R is subset of S 1 x S 2 x x S n Where S i is the domain of attribute i n is number of attributes of the relation 29

Properties of a Relation Each row represents an n-tuple of R Ordering of rows is immaterial All rows are distinct Ordering of columns is significant Because two columns can have same domain But columns are labeled so Applications need not worry about order They can simply use the names Domain of each column is a primitive type Relation consists of a relation schema and instance 30

More Definitions Relation schema: describes column names Relation name Name of each field (or column, or attribute) Domain of each field Degree (or arity) of relation: number of attributes Database schema: set of all relation schemas 31

Even More Definitions Relation instance: concrete table content Set of tuples (also called records) matching the schema Cardinality of relation instance: number of tuples Database instance: set of all relation instances 32

Example Relation schema Supplier(sno: integer, sname: string, scity: string, sstate: string) Relation instance sno sname scity sstate 1 s1 city 1 WA 2 s2 city 1 WA 3 s3 city 2 MA 4 s4 city 2 MA 33

Integrity Constraints Integrity constraint Condition specified on a database schema Restricts data that can be stored in db instance DBMS enforces integrity constraints Ensures only legal database instances exist Simplest form of constraint is domain constraint Attribute values must come from attribute domain 34

Key Constraints Key constraint: certain minimal subset of fields is a unique identifier for a tuple Candidate key Minimal set of fields That uniquely identify each tuple in a relation Primary key One candidate key can be selected as primary key 35

Foreign Key Constraints A relation can refer to a tuple in another relation Foreign key Field that refers to tuples in another relation Typically, this field refers to the primary key of other relation Can pick another field as well 36

Key Constraint SQL Examples CREATE TABLE Part ( ); pno integer, pname varchar(20), psize integer, pcolor varchar(20), PRIMARY KEY (pno) 37

Key Constraint SQL Examples CREATE TABLE Supply( ); sno integer, pno integer, qty integer, price integer 38

Key Constraint SQL Examples CREATE TABLE Supply( ); sno integer, pno integer, qty integer, price integer, PRIMARY KEY (sno,pno) 39

Key Constraint SQL Examples CREATE TABLE Supply( ); sno integer, pno integer, qty integer, price integer, PRIMARY KEY (sno,pno), FOREIGN KEY (sno) REFERENCES Supplier, FOREIGN KEY (pno) REFERENCES Part 40

Key Constraint SQL Examples CREATE TABLE Supply( ); sno integer, pno integer, qty integer, price integer, PRIMARY KEY (sno,pno), FOREIGN KEY (sno) REFERENCES Supplier ON DELETE NO ACTION, FOREIGN KEY (pno) REFERENCES Part ON DELETE CASCADE 41

General Constraints Table constraints serve to express complex constraints over a single table CREATE TABLE Part ( pno integer, pname varchar(20), psize integer, pcolor varchar(20), PRIMARY KEY (pno), CHECK ( psize > 0 ) ); It is also possible to create constraints over many tables 42

Relational Queries Query inputs and outputs are relations Query evaluation Input: instances of input relations Output: instance of output relation 43

Relational Algebra Query language associated with relational model Queries specified in an operational manner A query gives a step-by-step procedure Relational operators Take one or two relation instances as argument Return one relation instance as result Easy to compose into relational algebra expressions 44

Relational Operators Selection: σ condition (S) Condition is Boolean combination (, ) of terms Term is: attr. op constant, attr. op attr. Op is: <, <=, =,, >=, or > Projection: π list-of-attributes (S) Union ( ), Intersection ( ), Set difference ( ), Cross-product or cartesian product ( ) Join: R θ S = σ θ (R S) Division: R/S Rename ρ(r(f),e) 45

Selection & Projection Examples Patient no name zip disease 1 p1 98125 flu 2 p2 98125 heart 3 p3 98120 lung 4 p4 98120 heart π zip,disease (Patient) zip disease 98125 flu 98125 heart 98120 lung 98120 heart σ disease= heart (Patient) no name zip disease 2 p2 98125 heart 4 p4 98120 heart π zip (σ disease= heart (Patient)) zip 98120 98125 46