Introduction to Database Systems CS4320/CS5320. CS4320/4321: Introduction to Database Systems. CS4320/4321: Introduction to Database Systems



Similar documents
The Entity-Relationship Model

Introduction to Database Systems CS4320. Instructor: Christoph Koch CS

The Entity-Relationship Model

Conceptual Design Using the Entity-Relationship (ER) Model

Databases Model the Real World. The Entity- Relationship Model. Conceptual Design. Steps in Database Design. ER Model Basics. ER Model Basics (Contd.

Database Design Overview. Conceptual Design ER Model. Entities and Entity Sets. Entity Set Representation. Keys

Database Design Process

Database Design Process

Syllabus for Course : Database Systems Engineering at Kinneret College

Outline. Data Modeling. Conceptual Design. ER Model Basics: Entities. ER Model Basics: Relationships. Ternary Relationships. Yanlei Diao UMass Amherst

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Winter 2009 Lecture 1 - Class Introduction

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Course: CSC 222 Database Design and Management I (3 credits Compulsory)

DATABASE DESIGN. - Developing database and information systems is performed using a development lifecycle, which consists of a series of steps.

Instructor: Michael J. May Semester 1 of The course meets 9:00am 11:00am on Sundays. The Targil for the course is 12:00pm 2:00pm on Sundays.

Objectives of Lecture 1. Labs and TAs. Class and Office Hours. CMPUT 391: Introduction. Introduction

2. Conceptual Modeling using the Entity-Relationship Model

CSE 412/598 Database Management Spring 2012 Semester Syllabus

CS 649 Database Management Systems. Fall 2011

Databases and BigData

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Fall 2007 Lecture 1 - Class Introduction

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Spring 2006 Lecture 1 - Class Introduction

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

ISM 4210: DATABASE MANAGEMENT

Foundations of Information Management

Objectives of Lecture 1. Class and Office Hours. Labs and TAs. CMPUT 391: Introduction. Introduction

CSE 562 Database Systems

Lecture 6. SQL, Logical DB Design

Review Entity-Relationship Diagrams and the Relational Model. Data Models. Review. Why Study the Relational Model? Steps in Database Design

Chapter 3. Data Modeling Using the Entity-Relationship (ER) Model

Info Sys 422/722; ISyE 722. Computer Based Data Management. Fall, 2010

INFS2608 ENTERPRISE DATABASE MANAGEMENT

Database Design. Database Design I: The Entity-Relationship Model. Entity Type (con t) Chapter 4. Entity: an object that is involved in the enterprise

CS 564: DATABASE MANAGEMENT SYSTEMS

Cleveland State University

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

THE ENTITY- RELATIONSHIP (ER) MODEL CHAPTER 7 (6/E) CHAPTER 3 (5/E)

Web-Based Database Applications ITP 300x (3 Units)

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #5: En-ty/Rela-onal Models- - - Part 1

Database Systems. Lecture 1: Introduction

Foundations of Information Management

OTM 442/765 Database Management and Applications Fall, 2015

Database Design Process. Databases - Entity-Relationship Modelling. Requirements Analysis. Database Design

THE OPEN UNIVERSITY OF TANZANIA FACULTY OF SCIENCE TECHNOLOGY AND ENVIRONMENTAL STUDIES BACHELOR OF SIENCE IN INFORMATION AND COMMUNICATION TECHNOLOGY

IS6030 Data Management Fall Semester 2015

IINF 202 Introduction to Data and Databases (Spring 2012)

Data Modeling. Database Systems: The Complete Book Ch ,

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

UNIVERSITY OF MANITOBA ASPER SCHOOL OF BUSINESS DEPARTMENT OF ACCOUNTING AND FINANCE. FIN 3470 SMALL BUSINESS FINANCE- Fall 2013 Term

Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

COURSE SYLLABUS BMIS 410 WEB ENTERPRISE TECHNOLOGIES

Course Description This course will change the way you think about data and its role in business.

Review: Participation Constraints

CIS490 Design in Software Engineering. Course Syllabus for the Virtual Class

IST659 Fall 2015 M003 Class Syllabus. Data Administration Concepts and Database Management

CS 4604: Introduc0on to Database Management Systems

Information Management

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

CS 425 Software Engineering. Course Syllabus

How To Learn Data Analytics

CS 425 Software Engineering. Course Syllabus

CS 425 Software Engineering

CS 5890: Introduction to Data Science Syllabus, Utah State University, Fall

MIS Information Systems for Management The University of Manitoba, I.H. Asper School of Business Department of Accounting and Finance

Administering Microsoft SQL Server Database (ITMT 2303)

Database Design. Goal: specification of database schema Methodology: E-R Model is viewed as a set of

Phone: (204) :30 p.m. 3:30 p.m. Mondays and Wednesdays or by appointment

I. PREREQUISITE For information regarding prerequisites for this course, please refer to the Academic Course Catalog.

CISC 432/CMPE 432/CISC 832 Advanced Database Systems

L&I SCI 410: Database Information Retrieval Systems

Overview of Data Management

Scope of this Course. Database System Environment. CSC 440 Database Management Systems Section 1

IV. The (Extended) Entity-Relationship Model

THE OPEN UNIVERSITY OF TANZANIA FACULTY OF SCIENCE TECHNOLOGY AND ENVIRONMENTAL STUDIES BACHELOR OF SIENCE IN DATA MANAGEMENT

Preliminary Syllabus for the course of Data Science for Business Analytics

MIS 6302.X02: Analytics and Information Technology The University of Texas at Dallas Spring 2014

I. PREREQUISITES For information regarding prerequisites for this course, please refer to the Academic Course Catalog.

Organization of Information

Data Mining Carnegie Mellon University Mini 2, Fall Syllabus

MLIS 7520 Syllabus_Fall 2013 Page 1 of 6

H. JOHN HEINZ III COLLEGE CARNEGIE MELLON UNIVERSITY PROJECT MANAGEMENT SPRING A3 / B3 COURSE SYLLABUS

Criminology CRJU 2004 B Department of Criminal Justice College of Arts & Letters University of North Georgia

I. PREREQUISITES For information regarding prerequisites for this course, please refer to the Academic Course Catalog.

INFS1603 BUSINESS DATABASES

Ursuline College Accelerated Program

IST359 - INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

QUEEN S UNIVERSITY DEPARTMENT OF SOCIOLOGY SOCY 273 SOCIAL PSYCHOLOGY WINTER Tuesdays 8:30-10:00 am, Fridays 10:00-11:30 am Botterell Hall, B147

Syllabus Fundamentals of Engineering Project Management 2016 Summer C Semester

Course Overview Principles of Project Management

Course Outline. Fall Session 2015 A03

Transcription:

Introduction to Database Systems CS4320/CS5320 Instructor: Johannes Gehrke http://www.cs.cornell.edu/johannes johannes@cs.cornell.edu CS4320/CS5320, Fall 2012 1 CS4320/4321: Introduction to Database Systems Three main topics: Relational database systems Big Data Cloud data management Another way of thinking about this: The infrastructure for data science! CS4320/CS5320, Fall 2012 2 CS4320/4321: Introduction to Database Systems Underlying theme: How do I build a data management system? CS4320 will deal with the underlying concepts No programming assignments CS4321 will be the practicum Build components of a database system (C++ programming) Note: the practicum will only start next week CS4320/CS5320, Fall 2012 3

CS4320 Course Information Information is one of the most valuable resources in this information age How do we effectively and efficiently manage this information? Relational database management systems Dominant data management paradigm today Big Data/NoSQLSystems Big Data Cloud Systems 100+ billion dollar a year industry You will see this in the job market! CS4320/CS5320, Fall 2012 4 Topics The relational model, SQL, normalization Database internals (index structures, query processing, query optimization, transaction management, recovery) MapReduce and Hadoop NoSQL Big Data in the cloud Exercises using a real database system CS4320/CS5320, Fall 2012 5 Prerequisites Courses CS2110 (Computers and Programming) CS3110 (Structure and Interpretation of Computer Programs) CS4320/CS5320, Fall 2012 6

People Instructor Johannes Gehrke TAs TBD CS4320/CS5320, Fall 2012 7 Access to Instructor and TAs Office hours Fridays, 1:15-2:3pm. TA mailing list TBD Do not directly email TAs All of this info will be on the course homepage. CS4320/CS5320, Fall 2012 8 Course Structure Three components Four assignments (50%) Two examinations (49%) Participation in course evaluation (1%) No programming assignments in CS4320 CS4321 will have all programming assignments CS4320/CS5320, Fall 2012 9

Class Lectures Textbook: Database Management Systems (3 rd Edition) By Raghu Ramakrishnan and Johannes Gehrke Required textbook Syllabus Defined by class lectures, will be online in CMS Not defined by textbook CS4320/CS5320, Fall 2012 10 Grading Three components Assignments (50%) Exams (49%) Course evaluation (1%) CS4320/CS5320, Fall 2012 11 Assignments Four assignments Each assignment worth 12.5% of total grade CS4320/CS5320, Fall 2012 12

Assignment Policies Assignments have to be done individually No collaboration with others Academic integrity violations taken VERY seriously Read Cornell and CS academic integrity policies Available off course web page Need to sign and hand in form Course management system used to post assignment grades CS4320/CS5320, Fall 2012 13 Assignment Policies (contd.) Late submissions One day late: 15% penalty Day days late: 30% penalty No submissions more than two days late allowed. No exceptions (assignments handed out well in advance of deadline) Regrade requests Within 7 days after assignments are graded Hard deadline CS4320/CS5320, Fall 2012 14 Course Structure Three components Assignments (50%) Exams (49%) Course evaluation (1%) CS4320/CS5320, Fall 2012 15

Exams Mid-term exam (21%) Thursday October 18, 7:30-9:30pm Closed book exam; one two-sided page of material Final exam (28%) Thursday, December 13 Closed book exam; one two-sided page of material Cumulative with emphasis on second half Do not schedule other exams or events on these days CS4320/CS5320, Fall 2012 16 Relationship to CS4321 CS4320 is about concepts underlying Big Data No programming assignments CS4321 is the practicum associated with CS4320 Will actually build a realistic database system C++ programming Complementary Suggest that you take both Can take CS4320 without taking CS4321 Cannot take CS4321 without taking CS4320 CS4320/CS5320, Fall 2012 17 Is CS4320/4321 a of work? It depends! Much of the material in CS4320 is probably new to you CS4321 has substantial programming assignments Then why should I take this course? Intellectual argument Big conceptual ideas Beautiful meeting of theory and practice Utilitarian argument Many, many real applications (data management, data-driven websites, search engines, large-scale data analytics) Job market! CS4320/CS5320, Fall 2012 18

CS5300: Architecture of Large-Scale Information Systems How do you build e-commerce websites such as amazon.com? How do you build a reliable web service that scales to millions of users? CS4320/CS5320, Fall 2012 19 CS5300: Architecture of Large-Scale Information Systems Underlying theme: How do I build applications on top of a database system? Will combine coverage of fundamental concepts with hands-on experience on Amazon EC2 Prerequisite: CS4320 CS4320/CS5320, Fall 2012 20 CS5300: Material Covered Three-tier architectures Edge caches Distributed transaction management Web services Content management CS4320/CS5320, Fall 2012 21

Instructor Personal: Ph.D. from U of Wisconsin-Madison (CS, marketing) in 1999; joined Cornell right afterwards Chief Scientist at Fast Search and Transfer; acquired by Microsoft in 2008 Technical advisor to Microsoft and other companies, consulting in Big Data Research: Big Data Infrastructure Big Data Analytics 22 CS4320/CS5320, Fall 2012 22 The Entity-Relationship Model CS4320/CS5320, Fall 2012 23 Entities CS4320/CS5320, Fall 2012 24

ER Model Basics Entity: Real-world object distinguishable from other objects. An entity is described (in DB) using a set of attributes Entity Set: A collection of similar entities. E.g., all employees All entities in an entity set have the same set of attributes Each entity set has a key Each attribute has a domain CS4320/CS5320, Fall 2012 25 Relationships since did d budget Works_In CS4320/CS5320, Fall 2012 26 ER Model Basics (Contd.) Relationship: Association among two or more entities. E.g., Attishoo works in Pharmacy department. Relationship Set: Collection of similar relationships. An n-ary relationship set R relates n entity sets E1... En Each relationship in R involves entities e1 in E1,..., en in En CS4320/CS5320, Fall 2012 27

Relationships (Contd.) subordinate supervisor Reports_To Want to capture supervisor-subordinate relationship CS4320/CS5320, Fall 2012 28 Relationships (Contd.) id Parts id id Suppliers Want to capture information that a Supplier s supplies Part p to Department d CS4320/CS5320, Fall 2012 29 Ternary Relationship id Parts id id Suppliers Contract CS4320/CS5320, Fall 2012 30

How are these different? from to did d budget Works_In2 did d budget Works_In3 from Duration to CS4320/CS5320, Fall 2012 31 Key Constraints An employee can work in many departments; a dept can have many employees since Works_In did d budget Each dept has at most one manager, according to the key constraint on Manages. since Manages d did budget CS4320/CS5320, Fall 2012 32 Key Constraints: Examples Example Scenario 1: An inventory database contains information about parts and manufacturers. Each part is constructed by exactly one manufacturer. Example Scenario 2: A customer database contains information about customers and sales persons. Each customer has exactly one primary sales person. What do the ER diagrams look like? CS4320/CS5320, Fall 2012 33

Participation Constraints An employee can work in many departments; a dept can have many employees since Works_In did d budget Each employee works in at least one department according to the participation constraint on Works_In since Works_In did d budget CS4320/CS5320, Fall 2012 34 Participation Constraints: Examples Example Scenario 1 (Contd.): Each part is constructed by exactly one or more manufacturer. Example Scenario 2: Each customer has exactly one primary sales person. CS4320/CS5320, Fall 2012 35 What does this mean? since did d budget Manages Works_In since CS4320/CS5320, Fall 2012 36

Weak Entities A weak entity can be identified uniquely only by considering the primary key of another (owner) entity. Owner entity set and weak entity set must participate in a one-tomany relationship set (one owner, many weak entities). Weak entity set must have total participation in this identifying relationship set. cost p age Policy Dependents CS4320/CS5320, Fall 2012 37 Exercise Give two real-life examples where each of the following would occur: A key constraint A participation constraint A weak entity set CS4320/CS5320, Fall 2012 38 ER Modeling: Case Study Drugwarehouse.com has offered you a free life-time supply of prescription drugs (no questions asked) if you design its database schema. Given the rising cost of health care, you agree. Here is the information that you gathered: Patients are identified by their SSN, and we also store their s and age. Doctors are identified by their SSN, and we also store their s and specialty. Each patient has one primary care physician, and we want to know since when the patient has been with her primary care physician. Each doctor has at least one patient. CS4320/CS5320, Fall 2012 39

Summary of Conceptual Design Conceptual design follows requirements analysis ER model popular for conceptual design Basic constructs: entities, relationships, and attributes Some additional constructs such as weak entities. Note: There are many variations on ER model. CS4320/CS5320, Fall 2012 40 Summary of ER (Contd.) ER design is subjective. There are often many ways to model a given scenario! Analyzing alternatives can be tricky, especially for a large enterprise. Common choices include: Entity vs. attribute, entity vs. relationship, binary or n-ary relationship, etc. Ensuring good database design: resulting relational schema should be analyzed and refined further normalization. CS4320/CS5320, Fall 2012 41 Reminders Complete academic integrity form (on the website) and bring it to the next class. CS4321/CS5321 starts next week. CS4320/CS5320, Fall 2012 42