Database Internals (Overview) Eduardo Cunha de Almeida eduardo@inf.ufpr.br
Outline of the course Introduction Database Systems (E. Almeida) Distributed Hash Tables and P2P (C. Cassagne) NewSQL (D. Kim and J. Meira) NoSQL (D. Kim) MapReduce (E. Almeida and E. Lucas) Data Streaming (C. Cassagne) Machine Learning (C. Henard) Assignment (E. Almeida, J. Meira and E. Lucas) 2
Outline of the course Introduction Database Systems (E. Almeida) Distributed Hash Tables and P2P (C. Cassagne) NewSQL (D. Kim and J. Meira) NoSQL (D. Kim) MapReduce (E. Almeida and E. Lucas) Data Streaming (C. Cassagne) Machine Learning (C. Henard) Assignment (E. Almeida, J. Meira and E. Lucas) 3
Readings - Joseph Hellerstein, Michael Stonebraker and James Hamilton. Architecture of a Database System. Material at: http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf - Hector Garcia-Molina, Jeffrey Ullman, Jennifer Widom. Database Systems: The Complete Book. Material at: http://infolab.stanford.edu/~ullman/dscb.html 4
History 1970's INGRES Project at Berkeley -> INGRES corp, Sybase, MS SQL Server, PostgreSQL System-R Project at IBM -> DB2, Non-StopSQL, HP Allbase 5
History DB2 Oracle 1980's Informix Teradata Sybase SQL as the standard query language. 6
History Postgres MySQL 1990's Illustra (from Postgres) BerkeleyDB (embedded K/V) 7
History MonetDB (as open-source) Vertica Greenplum 2000's EnterpriseDB Infobright 8
History VoltDB (from H-Store) Facebook Presto Google F1 2010's Google Megastore NewSQL and NoSQL architectures 9
Database Genealogy [Felix Naumann, 13] 10
General DBMS Architecture SQL Interface, Apps Comm. manager Buffer Access Methods Log Lock Parser Auth. manager Catalog Optimizer Executor Query processor Adm & Monitoring Access control Scheduling Trans. Manager Shared components Proc. Manager DBMS Database Database 11
Database Layout 12
Database Layout Logic Physical DB Tablespace File Segments Extents DBMS page OS page
Tab customers Index Tab products
Tab customers Tab products Index on prodid Tablespaces
Fixed size pages slot1 Record (rid, attr1, attr2,, attr n) Fixed size{... Slot2 (empty) Slot n 1... 0 1 N N of slots
Variable size pages N-ary Storage Model (NSM) Variable record size rid=(i,...) rid=(i,...) Data Free space 16 24 n * For write intensive applications Number of entries
NSM Example INSERT INTO T VALUES(2, 'MARIA', 18, CURITIBA, F); SELECT * FROM CUSTOMER WHERE ID=2; rid=(i,...) rid=(i,...) Data Free space SELECT AVG(AGE) FROM CUSTOMER; 16 24 n *
NSM Example INSERT INTO T VALUES(2, 'MARIA', 18, CURITIBA, F); SELECT * FROM CUSTOMER WHERE ID=2; rid=(i,...) rid=(i,...) Data Free space SELECT AVG(AGE) FROM CUSTOMER; 16 24 - Few write operations - Read the entire tuple at once - Aggregations may read the entire DB!! n *
Columnar page Decomposition Storage Model (DSM) [Ailamaki, VLDB02] Page: Attr1 Page: Attr2 Page: Attr3 (rid1,attr1) (rid1,attr2) (rid1,attr3) (rid2,attr1) (rid2,attr2) (rid2,attr3)......... (rid_n,attr1) (rid_n,attr2) (rid_n,attr3) Improves aggregation and compression For read intensive applications
DSM Example INSERT INTO T VALUES(2, 'MARIA', 18, CURITIBA, F); SELECT * FROM CUSTOMER WHERE ID=2; SELECT AVG(AGE) FROM CUSTOMER; Page: Attr1 Page: Attr2 Page: Attr3 (rid1,attr1) (rid1,attr2) (rid1,attr3) (rid2,attr1) (rid2,attr2) (rid2,attr3)......... (rid_n,attr1) (rid_n,attr2) (rid_n,attr3)
DSM Example INSERT INTO T VALUES(2, 'MARIA', 18, CURITIBA, F); SELECT * FROM CUSTOMER WHERE ID=2; SELECT AVG(AGE) FROM CUSTOMER; Page: Attr1 Page: Attr2 Page: Attr3 (rid1,attr1) (rid1,attr2) (rid1,attr3) (rid2,attr1) (rid2,attr2) (rid2,attr3)......... (rid_n,attr1) (rid_n,attr2) (rid_n,attr3) - More write operations than NSM - Read one attribute at a time - Aggregations are fast do not read the entire DB!! - Agressive compression
Quiz What types of page layout are more likely for the following queries? Why? Q1 Q2 Q3 Q4 SELECT * FROM CUSTOMERS; SELECT A.ID, A.BALANCE, C.NAME FROM CUSTOMERS C, ACCOUNT A WHERE A.ID=C.ID SELECT SUM(BALANCE) FROM ACCOUNT; SELECT C.CITY, SUM(A.BALANCE) FROM CUSTOMERS C, ACCOUNT A WHERE A.ID=C.ID GROUP BY C.CITY;
Database Index 24
Access method: B-tree Index Adam Betty Smith
P0 k1 P1 k2 P2...... km Pm Entrada de índice 17 <17 >17 Paginas não folha (só para direcionar) 5 13 27 2* 3* 5* 11* 14* Paginas folha (entradas de dados) Arquivo
Access method: Hash Index 4* 12* 32* 00 01 10 1* 21* 10* 11 Directory If insert 5*, then 5=101, bucket '01' (read backwards) Buckets
Clustered vs. Non-clustered index Index entries Data Records Non-clustered Clustered Physically sorted: only one clustered index per table
Quiz Considering the following queries, what types of index need to be implemented? Why? Q1 Q2 Q3 SELECT NAME, AGE FROM CUSTOMER WHERE AGE BETWEEN 18 AND 25; SELECT NAME, AGE FROM CUSTOMER WHERE CUST_ID= 1234; SELECT NAME, CITY, AGE, SSN FROM CUSTOMER WHERE AGE BETWEEN 18 AND 25 AND CITY ;
Overview of Query Processing 30
31
Query processing Query Query expression BALANCE (σ BALANCE > 2500 (ACCOUNT)) SELECT BALANCE FROM ACCOUNT WHERE BALANCE > 2500; σ BALANCE > 2500 ( BALANCE (ACCOUNT)) 32
Query processing Query Query expression BALANCE (σ BALANCE > 2500 (ACCOUNT)) SELECT BALANCE FROM ACCOUNT WHERE BALANCE > 2500; σ BALANCE > 2500 ( BALANCE (ACCOUNT)) May have n plans!!! How does the DBMS rewrite a query? 33
Query Transformation Rules 34
Query rewrite Rule: Selection operations are commutative 35
Query rewrite Rule: Join operations are commutative if the order of the attributes is not taken into account 36
Query rewrite Rule: Natural-joins are associative. 37
Query rewrite Rule: Natural-joins are associative. For theta-joins, this only holds if the following join involves attributes only from T2 and T3. 38
Query rewrite Rule: It distributes when all the attributes in the selection condition involve only the attributes of one of the expressions (say T1) 39
Query rewrite Rule: Conjunctive selection operations can be split into a sequence of individual selections 40
Further rules can be found in the readings. 41
Quiz Please, rewrite the following expressions using the presented rules and the following relations: ACCOUNT(BALANCE, ACC_ID, C_ID, B_ID), CUSTOMER(C_ID), BRANCH(B_ID, B_NAME) Q1 σ BALANCE > 2500 AND C_ID > 1000 (ACCOUNT X CUSTOMER) Q2 Q3 σ BALANCE > 2500 (σ B_ID = 1000 (ACCOUNT X BRANCH)) BALANCE, C_ID (σ BALANCE > 2500 (ACCOUNT X CUSTOMER)) Q4 BALANCE, ACC_ID (CUSTOMER X ACCOUNT) Q5 BALANCE, ACC_ID (CUSTOMER X (ACCOUNT X BRANCH))
Quiz Please, rewrite the following expressions using the presented rules and the following relations: ACCOUNT(BALANCE, ACC_ID, C_ID, B_ID), CUSTOMER(C_ID), BRANCH(B_ID, B_NAME) Q1 σ BALANCE > 2500 (σ C_ID > 1000 (ACCOUNT X CUSTOMER)) Q2 σ B_ID = 1000 (σ BALANCE > 2500 (ACCOUNT X BRANCH)) Q3 Q4 Q5 σ BALANCE > 2500 ( BALANCE, C_ID (ACCOUNT X CUSTOMER)) BALANCE, ACC_ID (CUSTOMER X ACCOUNT) BALANCE, ACC_ID (BRANCH x (CUSTOMER X ACCOUNT ))
Database Catalog The catalog is a meta-database that stores informations about: DBMS and BD Data structures Buffer and page sizes Indices Views 44
Database Catalog The catalog is a meta-database that stores informations about: Statistics Cardinality of tables and indices Domain values Page sizes for tables and indices 45
Database Catalog ALL_TABLES view from the Oracle catalog 46
Query optimization 47
Operators I/O cost for Selection ( ) Scan O(M), where M = number of pages Scan on sorted data O(log M), where M = number of pages Index scan (Clustered index) O(h + 1), where h = height of the index tree Index scan (Non-Clustered index) O(h + n), where n = number of fetched tuples. n = M if the tuples are spread across all pages (worst case). 48
Quiz Compute of the cost for a query selecting 10% of a table 'R' based on the following information from the catalog: Relation 'R' Tuples/page ratio 10,000 tuples 100 tuples Scan Scan on sorted Clustered index scan Non-clustered index scan 49
Quiz Compute of the I/O cost for a query selecting 10% of a table 'R' based on the following information from the catalog: Relation 'R' Tuples/page ratio 10,000 tuples 100 tuples Scan Scan on sorted Clustered index scan Non-clustered index scan 100 I/O 6.6 I/O 3 I/O 103 I/O 50
Transactions 51
Transactions Definition: A transaction is an atomic unit of work that is either completed in its entirety or not done at all. [Navathe and Elmasri, 2005] T1 Database r(balance) balance = 200 balance += 100 r(x) w(balance) balance = 300 52
Transaction State Transition [Elmasri and Navathe, 2005] Read, write Begin active End Partially commited Commit commited abort Failed Terminated 53
ACID properties Atomicity: a transaction is either completed in its entirety or not done at all Consistency: a transaction is consistency preserving if its complete execution takes the database from one consistent state to another Isolation: The execution of a transaction should not be interfered Durability: the changes applied by a commit must persist in the database 54
Schedules A schedule orders the execution of concurrent transactions in an interleaving fashion. T1 T2 r(x) r(x) w(x) w(x) 55
Concurrency Conflicts Concurrency problems: dirty read and lost update T1 T2 T1 T2 r(balance) r(balance) balance += 100 r(x) r(balance) w(balance) w(x) balance += 100 w(x) w(x) r(balance) w(x) balance -= 100 abort balance-=100 w(balance) w(balance) w(balance) 56
Concurrency Conflicts Concurrency problems: dirty read and lost update T1 T2 T1 T2 r(balance) r(balance) balance += 100 r(balance) w(balance) balance += 100 r(balance) balance -= 100 abort balance-=100 w(balance) w(balance) w(balance) 57
Algorithm: Testing Conflict Serializability Create a node for each transaction T in schedule S; Create an edge Ti -> Tj for each r(x) in Tj after w(x) in Ti Create an edge Ti -> Tj for each w(x) in Tj after r(x) in Ti Create an edge Ti -> Tj for each w(x) in Tj after w(x) in Ti Schedule S is serializable if and only if the precedence graph has no cycles. For example: Not serializable T1 r(x) w(x) T2 r(x) w(x) T1 T2 58
Hands-on Which of the following schedules is conflict serializable? S1 - T1 : r(x),t3 : r(x),t1 : w(x),t2 : r(x),t3 : w(x) S2 - T1 : r(x),t2 : r(x),t1 : w(x),t2 : w(x) S3 - T1 : r(x),t2 : r(y),t3 : w(x),t2 : r(x),t1 : r(x) 59