Data Modeling and Databases I - Introduction Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
ADMINISTRATIVE ASPECTS D-INFK, ETH Zurich, Data Modeling and Databases 2
Basic Data Lectures Mondays, 10:00 12:00 Wednesdays, 08:00 10:00 Exercise Groups (Start February 24th) Tuesdays: 08:00 10:00 Fridays: 08:00-10:00 Held in English and German Web Page: http://www.systems.ethz.ch/courses/spring2015/data_mod_db D-INFK, ETH Zurich, Data Modeling and Databases 3
Literature Kemper, Eickler: Datenbanksysteme: Eine Einführung. Oldenbourg Verlag, 7. Auflage, 2009. or Garcia-Molina, Ullman, Widom: Database Systems: The Complete Book. Pearson, 2. Auflage, 2008. D-INFK, ETH Zurich, Data Modeling and Databases 4
Course Contents Data modeling Data organization, data models ER model Relational model Other models Databases SQL Query processing Transaction management Data Management Systems Data processing in the era of Big Data D-INFK, ETH Zurich, Data Modeling and Databases 5
Course Objectives Data Modeling with an emphasis on the Relational Model SQL Basic operation of a database Basics of Query processing Basics of Transaction management Understand data modeling, databases, and the impact of how data is modeled across the entire software stack D-INFK, ETH Zurich, Data Modeling and Databases 6
Exercises & Exam Exercise Sheets Handout in the week before it is discussed Not graded Sessionsprüfung (written, closed book) D-INFK, ETH Zurich, Data Modeling and Databases 7
Teaching & Learning This is a basic course. Material is well known: textbook, Wikipedia, web sites, multiple lectures, videos, on-line courses, YouTube Lectures and exercises are designed to expand and build and all this material: cover what is not on the textbooks (context, examples, answering questions) We will not overload the syllabus nor make constant demands for attention (no projects, no mid-term exams, no graded exercises) A moderate amount of constant effort during the semester will allow you to learn a lot and make it easy to follow the course and exercises Read in advance, attend classes, ask all the time D-INFK, ETH Zurich, Data Modeling and Databases 8
INTRODUCTION WHAT IS A DATABASE D-INFK, ETH Zurich, Data Modeling and Databases 9
A Database System (DBMS) A DBMS is a tool that helps develop and run data-intensive applications. Push the complexity of dealing with the data (storage, processing, consistency) to the database rather than to the program Share the database The Database is a tool Many shapes and forms Many applications D-INFK, ETH Zurich, Data Modeling and Databases 10
How does a database look like Until recently, a database often meant a relational database Today, there are many forms of data management engines (or databases) Principles and ideas behind relational databases apply to almost all forms of data management D-INFK, ETH Zurich, Data Modeling and Databases 11
Why use a DBMS? Avoid redundancy and inconsistency Rich (declarative) access to the data Synchronize concurrent data access Recovery after system failures Security and privacy Facilitate reuse of the data Reduce cost and pain of doing something useful There is always an alternative!!! D-INFK, ETH Zurich, Data Modeling and Databases 12
In practice http://www.slideshare.net/linkedin/linkedinscommunication-architecture http://highscalability.com/scaling-digg-and-otherweb-applications http://muratbuffalo.blogspot.ch/2014/10/facebo oks-software-architecture.html D-INFK, ETH Zurich, Data Modeling and Databases 13
A use case: Amadeus Passenger-Booking Database ~ 600 GB of raw data (two years of bookings) single table, denormalized ~ 50 attributes: flight-no, name, date,..., many flags Query Workload up to 4000 queries / second latency guarantees: 2 seconds today: only pre-canned queries allowed Update Workload avg. 600 updates per second (1 update per GB per sec) peak of 12000 updates per second data freshness guarantee: 2 seconds Problems with State-of-the Art Simple queries work only because of materialized views multi-month project to implement new query / process Complex queries do not work at all D-INFK, ETH Zurich, Data Modeling and Databases 14
INTRODUCTION DATA MODELING D-INFK, ETH Zurich, Data Modeling and Databases 15
A common need Data needs to be organized to be actually useful: presentation, processing, communication, calculations List of offerings, Egypt, ca. 1900 BC D-INFK, ETH Zurich, Data Modeling and Databases 16
Data Management Universe D-INFK, ETH Zurich, Data Modeling and Databases 17
Data Modeling Conceptual Model Captures the world (domain) to be represented Collection of entities and how they relate to each other (Entity-Relationship) Logical Model (schema) Mapping of the concepts to a concrete logical representation Physical Model Implementation in a concrete hardware architecture D-INFK, ETH Zurich, Data Modeling and Databases 18
Conceptual modeling Mini World Manual Modelling Conceptual Schema (ER-Schema) Semi-automatic Transformation XML Relational Schema Hierarchical Schema Object-oriented Schema D-INFK, ETH Zurich, Data Modeling and Databases 19
Example Student Lecture Professor Real World: University Conceptual Modeling MatrNr Name Student enrolls Professor teaches PersNr Name Nr Lecture Title D-INFK, ETH Zurich, Data Modeling and Databases 20
Modeling not trivial How you model affects how easy it is to work with the data: Person 1 endorses Person 2 for skills in topic X Person 1 is a friend of Person 2 in social site Y Friends that Person 1 and Person 2 have in common Person 1 likes Photo #n from Person 2 Person 1 recommends a book D-INFK, ETH Zurich, Data Modeling and Databases 21
Database Abstraction Layers Data Independence Logical Data Independence Physical Data Independence View1 View 2... View 3 Logical Layer (schema) Physical Layer Changes at one layer do not affect the layer! D-INFK, ETH Zurich, Data Modeling and Databases 22
Data Models Conceptual Data Models Entity Relationship UML Logical Data Models Flat file (e.g., SQLite) Network model (e.g., CODASYL/COBOL) Hierarchical model (IBM IMS/FastPath) Relational model (SQL) Object-oriented model (ODMG 2.0) Semi-structured model (XML Infoset) Deductive model (Datalog, Prolog) D-INFK, ETH Zurich, Data Modeling and Databases 23
Reading Chapter on Entity-Relationship modeling in any of the two textbooks If lazy and allergic to paper: D-INFK, ETH Zurich, Data Modeling and Databases 24