Outline 645: Database Design and Implementation Course topics and requirements Overview of databases and DBMS s Yanlei Diao University of Massachusetts Amherst Database Management Systems Fundamentals Data modeling Relational design Query language SQL Database theory Database implementation Storage, indexing Query processing, query optimization Transaction management Networked information systems XML, XQuery, XML query processing Data stream systems Probabilistic databases Prerequisites An undergraduate database or operating systems course Or consent of the instructor Data structures and algorithms Computer systems Sufficient programming experience Course Web Site http://avid.cs.umass.edu/courses/645/s2011/ Or Yanlei s web page Teaching 645 Spring 2011 Staff and Contact Information Instructor: Yanlei Diao Email: yanlei at cs.umass.edu Office hours: Tu Th 10:15-11:15 am, CS 232 TA: Liping Peng Email: lppeng at cs.umass.edu Office hour: TBD Mailing list cs645 at edlab-mail.cs.umass.edu course related information, relevant to the whole class All available on the home page 1
Textbook Grading Database Management Systems 3 rd Edition Ramakrishnan and Gehrke Homework: 35% Course Project: 30% Midterm: 30% Class participation: 5% Lecture notes will be posted on the schedule page before class. - no laptop in class Homework: 35% Exam (30%) 5 assignments Written problem sets Programming exercises Posted on the assignments page Watch the due date! Submission: hardcopy before class on due date Policy on late submissions Midterm exam In-class, closed-book exam April 5, location TBD No final exam! Project: 30% Participation (5%) Groups of 2 or work individually Research-oriented problem Nobody has solved the problem Scientific value Milestones & deliverables: see the projects page Submission: via email Proposal, status report: before class on due date Final report: 5 pm on the due date In-class presentation Attend every class Ask questions and answer questions 2
Academic Honesty All submitted work must be your own! Although students are encouraged to study together, each student much produce his or her own solution to each homework. Copying or using sections of someone else s program or assignment (even if it has been modified by you), or copying a solution from an external source, is not acceptable. The University guidelines for academic misconduct: http://www.umass.edu/dean_students/code_conduct/ acad_honest.htm The staff of CS 645 will be vigorous in enforcing them. Outline Course topics and requirements Overview of databases and DBMS s Databases and DBMS s A database is a large, integrated collection of data A database management system (DBMS) is a software system designed to store and manage a large amount of data Declarative interface to define data stored, add data, update data, and query data Efficient querying Concurrent users Reliable storage and crash recovery Access control Early DBMS s Early DBMS s (1960 s) evolved from file systems Many small data items, many queries and updates Banking Airline reservations 1960s Navigational DBMS Tree-based or graph-based data model Manual navigation to find what you want No support for search ( search program ) Relational DBMS Commercial DBMS s Relational model (E.F. Codd, 1970) Data independence: hides details of physical storage from users Declarative query language: say what you want, not how to compute it Mathematical foundation: what queries mean, possible implementations Query optimization (1970 s till now) Earliest: System R at IBM, INGRES at UC Berkeley Queries can be efficiently executed despite data independence and declarative queries! INGRES System R Informix Sybase IBM DB2 Oracle Postgres MS SQL Server MySQL Material in this slide based on wikipedia 3
Data warehouses Large amounts of data over years, complex queries, designed for analysis and reporting Sales data analysis, e.g., Walmart, Target, Fraud analysis, e.g., credit card use, insurance Call record analysis, e.g., AT&T Schema design, data cleaning and loading, indexing, aggregation, materialized views, data mining Electronic commerce E.g., amazon.com, ebay.com How do we integrate thousands of catalogs? How do we store data of sparse attributes How do we handle the scale? Social networking E.g., facebook.com, myspace.com, with 500 million users at a popular site Real-time analysis of user behavior Database Security Large-scale sensing applications Temperature, light, humidity Global Positioning System Radio Frequency Identification (RFID) Radar sensor networks Digital sky surveys How do we store, merge, and query sensing data? How do we support mission-critical applications? Bioinformatics Amino acid sequences: e.g., SWISS-PROT 3D molecular structure: e.g., PDB, CSD Data integration Pattern matching Approximate matching, ranking Automatic inference T V How does one build a database? Example: The Internet Shop* DBDudes Inc.: a well-known database consulting firm Barns and Nobble (B&N): a large bookstore specializing in books on horse racing B&N decides to go online but needs help Step 0: DBDudes makes B&N agree to pay steep fees and schedule a lunch meeting for requirements analysis * The example and all related material was taken from Database Management Systems Edition 3. 4
Step 1: Requirements Analysis I d like my customers to be able to browse my catalog of books and place orders online. Books: For each book, B&N s catalog contains its ISBN number, title, author, price, year of publication, Customers: Most customers are regulars with names and addresses registered with B&N. New customers must first call and establish an account. On the new website: Customers identify themselves before browsing and ordering. Each order contains the ISBN of a book and a quantity. Shipping: For each order, B&N ships all copies of a book together once they become available. Step 2: Conceptual Design A high level description of the data in terms of the Entity- Relationship (ER) model. ordernum author qty_in_stock order_date cname title price qty cardnum cid address isbn Year ship_date Books Customers Design review: What if a customer places two orders of the same book in one day? What if a customer wants to have different books in the same order? Step 3: Logical Design Step 4: Schema Refinement Mapping the ER diagram to the relational model CREATE TABLE Books (isbn CHAR(10), title CHAR(80), author CHAR(80), qty_in_stock INTEGER, price REAL, year INTEGER, PRIMARY KEY(isbn)) CREATE TABLE Customers (cid INTEGER, cname CHAR(80), address CHAR(200), PRIMARY KEY(cid)) Access control: use views to restrict the access of certain employees to customer sensitive information CREATE TABLE (ordernum INTEGER, isbn CHAR(10), cid INTEGER, cardnum CHAR(16), qty INTEGER, order_date DATE, ship_date DATE, PRIMARY KEY(ordernum, isbn), FOREIGN KEY (isbn) CREATE VIEW OrderInfo REFERENCES Books, (isbn, cid, qty, order_date, ship_date) FOREIGN KEY (cid) AS SELECT O.isbn, O.cid, O.qty, REFERENCES Customers) O.order_date, O.ship_date FROM O ordernum isbn cid cardnum qty order_date ship_date 0-07-11 123 40241160 2 Jan 3, 2006 Jan 6, 2006 1-12-23 123 40241160 1 Jan 3, 2006 Jan 11, 2006 0-07-24 123 40241160 3 Jan 3, 2006 Jan 26, 2006 ordernum cid 123 cardnum order_date 40241160 Redundant Storage! Orderlists Jan 3, 2006 ordernum isbn 0-07-11 1-12-23 0-07-24 qty 2 1 ship_date Jan 6, 2006 Jan 11, 2006 3 Jan 26, 2006 Step 5: Internet Application Development An Example Internet Store Presentation tier Client Program (Web Browser) HTML, Javascript, Cookies B&N Client: User input Display of output Application logic tier Application Server (Apache Tomcat) PHP, JSP, Servlets, XSLT B&N Business logic: Home page Login page Search page Cart page Confirm page Data management tier Database System (MySQL, DB2) Relational, XML B&N Data: Books Customers (User login) Orderlists 5
Example SQL Queries Search Page SELECT isbn, title, author, price FROM Books WHERE author = '<SearchString>' ORDER BY title Login Page SELECT cid, username, password FROM Customers WHERE username = '<SpecifiedUsername>' Step 6: Physical Design Auxiliary data structures, indexes, to speed up searches, e.g., hash index, B+tree, R-tree Books isbn title author price year qty Legacies of Edward L. 0-07-11 29.95 2003 10 the Turf Bowen 1-12-23 Seattle Slew Dan Mearns 24.95 2000 0 Spectacular Timothy 0-07-24 16.95 2001 3 Bid Capps Hash Index on Books.isbn H1 isbn number 1 2 3 N-1 What is inside DBMS? DBMS Architecture Query Parser DBMS Query Rewriter Query Optimizer Query Executor Query Processor Lock Manager Access Methods Disk Space Manager DB Log Manager Transactional Storage Manager Disk Manager Query Processor Transactional Storage Manager Syntax checking Internal representation Handling views CREATE Logical/semantic VIEW OrderInfo rewriting (ordernum, Flattening cid, order_date) subqueries AS SELECT O.ordernum, O.cid, Building O.order_date, a query execution plan FROM Efficient, if not optimal O Pull-based execution of a plan Iterator model Query Parser Query Rewriter Query Optimizer Query Executor SELECT C.cname, F.ordernum, F.order_date FROM SELECT Customers C.cname, F.ordernum, C, OrderInfo F WHERE C.cname F.order_date = John FROM C.cid Customers = F.cid C, OrderInfo F WHERE C.cname = John C.cid = F.cid SELECT C.cname, (On-the-fly) O.ordernum, O.order_date C.cname, FROM Customers O.ordernum, C, O WHERE C.cname O.order_date = John C.cid = O.cid (Indexed Join) cid=cid Customers cname= John Access Methods Lock Manager File, B+tree, R-tree, Hash Log Manager Concurrency: 2PL Customers cname= John (On-the-fly) C.cname, O.ordernum, O.order_date (Indexed Join) cid=cid Replacement policy Recovery: WAL 6
Disk Manager DBMS: Theory + Systems Query Parser Theory! Query Rewriter Disk Space Manager Allocate/Deallocate a page Read/Write a page Contiguous seq. of pages Query Optimizer Query Executor Database Lock Manager Access Methods Log Manager Data Log Indexes Catalog Disk Space Manager DB Systems! 7