Eduardo Cunha de Almeida eduardo.almeida@uni.lu
Outline of the course Introduction Database Systems (E. Almeida) Distributed Hash Tables and P2P (C. Cassagnes) NewSQL (D. Kim and J. Meira) NoSQL (D. Kim) MapReduce (E. Almeida and E. Lucas) Data Streaming (C. Cassagne) Machine Learning (C. Henard) Assignment (E. Almeida, J. Meira and E. Lucas) 2
Database Systems 3
Databases go BigData 4
Some numbers from 2012 and earlier 5
Buzzwords 6
7
Data Processing Solutions for BigData System-R ( The traditional data processing ) The 70's design: relations and strong data consistency NewSQL Boosting System-R to handle the transactions onslaught NoSQL Eventual data consistency MapReduce Divide and conquer data processing for data analytics 8
Data Processing Solutions for BigData System-R ( The traditional data processing ) The 70's design: relations and strong data consistency NewSQL Boosting System-R To understand to what handle is going the on, transactions onslaughtlet's get back to this over the next NoSQL couple of lectures!!! Eventual data consistency MapReduce Divide and conquer data processing for data analytics 9
Definitions 10
What is a database? A database is an organized collection of data [Navathe and Elmasri, 94] - represents aspects of reality - has coherence (no random sets) - is Built for specific projects 11
Database vs. File system Self-descriptive nature: DB defines its data structures and constraints Data abstraction: FS requires programs to describe data Multiple view: Users may visualize data from different perspectives Sharing: DB allows concurrent access 12
Database Actors Users: Works on top of databases Analyst: Determines the users' requisites Designer: Designs the DB for specific projects DBA: Manages the Database structures and resource consumption 13
Three Layer Client/Server [Valduriez, 92] 14
Database Abstraction External Schema External Schema External Schema Logical Schema Physical Schema DISK 15
Today's lecture External Schema External Schema External Schema { - Conceptual Design - Relational Model - E-R Mapping Logical Schema Physical Schema DISK 16
Conceptual Design 17
Why do we need database design? Agree on structure the database before going to implementation. Entities Relationships between entities Constraints of the domain 18
Conceptual Design Entity/Relationship Model (E/R) - Requirements - Design - Implementation Different from UML conceived to support OO design!!! 19
E/R Diagrams Entities Attributes project Name Something from the real word with independent existence Properties of an entity Relationship manage Association between entities 20
Entities, attributes and keys Every entity has a minimal set of uniquely identifying attributes (i.e., key) employee SSN name DateOfBirth 21
Types of attributes Simple or Composite SSN Address Street Number City Multivalued Degree e.g., Ph.D., M.S., B.S. 22
Types of attributes Derived Age Requires some computation Key SSN Identify uniquely an entity 23
Relationships Connect entities together (in general identified by a verb) 1:N relationship (the norm in the design) employee 1 manage N project 24
Relationship cardinality 1:1 relationship (rare, because they may belong to the same table) employee 1 chair 1 department N:M relationship (not so rare, but try to avoid) supplier N supply M project 25
Relationship's Attributes An attribute of a relationship only exists due to such association employee 1 chair 1 department term 26
Example nameofproj nameofemp SSN employee 1 1 manage N PID project N local chair supply term 1 M nameofdep DID department nameofsup SID supplier 27
Multi-way Relationships Ternary relationship supplier N supply M project quantity date M part 28
Multi-way Relationships Ternary to binary relationship supplier 1 quantity N SP M M date 1 project 1 part Attention, since information may be lost 29
Weak Entities nameofemp employee 1 N has dependents SSN SSND nameofdpt Cannot be identified by its attributes alone Requires a foreign key in conjunction with its attributes 30
Modeling Hierarchy Data is naturally hierarchical (such as the world) employee technical admin Obs. Not all database systems implement inheritance 31
Hands-on 1.Entities: Professor, Student, Course 2.Relationships: teach, register 3.Attributes:??? 32
Relational model 33
Relational Model Created by Edgar Codd in 1969 Based on mathematical relations Set of tuples grouped into relations Database Attribute (column) SSN Name Age Email Dept 123 John 30 john@mail 2 124 Mary 34 mary@mail 3 333 Peter 40 peter@mail 3 Tuple (row) 34
Attributes (Columns) Data type: integer, float, string, date/time, binary Domain: set of atomic values (e.g., SSN is a set of 9 digits) mono-valued SSN Name Age Email Dept 123 John 30 john@mail 2 124 Mary 34 mary@mail 3 333 Peter 40 peter@mail 3 Domain 35
Constraints Domain: every element respects the type of its attribute E.g., Dom(dept) = {1, 2, 3, 4, 5} SSN Name Age Email Dept 123 John 30 john@mail 2 124 Mary 34 mary@mail 3 333 Peter 40 peter@mail 3 Domain 36
Constraints Entity integrity: No primary keys can be null Primary Key SSN Name Age Email Dept 123 John 30 john@mail 2 NULL Mary 34 mary@mail 3 NULL Peter 40 peter@mail 3 Tuples cannot be identified 37
Constraints Referential Integrity: enforces consistency between two relations Table: Employee SSN ename 123 John 321 Mary 333 Peter Table: Dependent SSN ID dname 123 1 James 123 2 Diana 444 3 Robert Broken link 38
Examples of constraint violation Table: Employee SSN Name Age Email Dept 123 John 30 john@mail 2 124 Mary 34 mary@mail 3 333 Peter 40 peter@mail 3 Insert(null, 'Gail', 32, gail@mail, 3) into employee Insert(123, 'Gail', 32, gail@mail, 3) into employee Insert(125, 'Gail', 32, gail@mail, 'A') into employee Insert(125, 'Gail', 32, gail@mail, 3) into employee 39
Structured Query Language (SQL) 40
SQL in a nutshell Programming language to manipulate data Declarative nature (What to do instead of how) Different versions: 86, 89, 92, 99, 03, Different aspects: Select, DDL, DCL, DML, triggers and constraints 41
SELECT Select [attributes] From [relation] Where [condition] For instance: Select SSN, name From Employee Where age<40; Select SSN, name From Employee Where age<40 and depto=2; 42
Select SSN, name From Employee Where age = 40 and age = 50; Select SSN, name From Employee Where age = 40 or age = 50; Select SSN, name From Employee Where age between 40 and 50; Consider the operators >, <, >=, <= 43
Joining Relations (Join operator) [SQL92] Select p.nameofproj, s.quantity From Project p, Supply s Where p.pid=s.pid; Select p.nameofproj, s.quantity, r.nameofsup From Project p, Supply s, Supplier, r Where p.pid=s.pid And s.sid=r.sid; 44
INSERT Insert Into [relation] Values (...) Where [condition] For instance: Insert into employee Values (111, 'Jane', 45); Insert into employee(ssn, name, age) Values (111, 'Jane', 45); 45
DELETE Delete From [relation] Where [condition] For instance: Delete from employee Where name= 'Jane'; Delete from employee Where age between 0 and 18; 46
sailors(sid, sname, age) boats(bid, bname, color) bookings(sid, bid, day) Hands-on Please find through SQL: 1.The name and age of the sailors 2.The name of the sailor who booked boat 103 3.The name of the sailor who booked the green boat 4.The colors of the boats booked by sailor 'John' 5.The name of the boats booked between '01/01/13' and '31/01/13' 6.The name of the sailors with age between 30 and 40 47
Mapping E/R to Relational 48
General Algorithm 1.Every entity becomes a relation with a key 2.Relationship 1:N sets a key to the N side 3.Relationship N:M creates a new relation with the keys from both sides 49
Example nameofproj nameofemp SSN employee 1 1 manage N PID project N local chair supply term 1 M nameofdep DID department nameofsup SID supplier 50
Example Table: project Attr. PID nameofproject local 1 Type integer string string Table: supplier Attr. SID nameofsup Type integer string 1 Table: supply N Attr. quantity product Type integer string M PID integer SID integer 51
Hands-on Finish mapping the E/R from the above example to the relational model... 52