Distributed Database Design SL02

Size: px
Start display at page:

Download "Distributed Database Design SL02"

Transcription

1 Design Problem Distributed Database Systems Fall 2016 Distributed Database Design SL02 Design problem Design strategies (top-down, bottom-up) Fragmentation (horizontal, vertical) Allocation and replication of fragments, optimality, heuristics Design problem of distributed systems: Making decisions about the placement of data and programs across the sites of a computer network as well as possibly designing the network itself. In DDBMS, the distribution of applications involves Distribution of the DDBMS software Distribution of applications that run on the database Distribution of applications will not be considered in the following; instead the distribution of data is studied. DDBS16, SL02 1/60 M. Böhlen Framework of Distribution Dimension for the analysis of distributed systems Level of sharing: no sharing, data sharing, data + program sharing Behavior of access patterns: static, dynamic Level of knowledge on access pattern behavior: no information, partial information, complete information DDBS16, SL02 2/60 M. Böhlen Design Strategies/1 Top-down approach Designing systems from scratch Homogeneous systems Bottom-up approach The databases already exist at a number of sites The databases should be connected to solve common tasks Distributed database design should be considered within this general framework. DDBS16, SL02 3/60 M. Böhlen DDBS16, SL02 4/60 M. Böhlen

2 Design Strategies/2 Top-down design strategy Design Strategies/3 Distribution design is the central part of the design in DDBMSs (the other tasks are similar to traditional databases). Objective: Design the LCSs by distributing the data over the sites. Two main aspects have to be designed carefully: Fragmentation: Relation may be divided into a number of sub-relations, which are distributed Allocation and replication: Each fragment is stored at site(s) with "optimal" distribution Distribution design issues Why fragment at all? How to fragment? How much to fragment? How to test correctness? How to allocate? DDBS16, SL02 5/60 M. Böhlen Design Strategies/4 Bottom-up design strategy DDBS16, SL02 7/60 M. Böhlen DDBS16, SL02 6/60 M. Böhlen Fragmentation/1 What is a reasonable unit of distribution? Relation or fragment of relation? Relations as unit of distribution: If the relation is not replicated, we get a high volume of remote data accesses. If the relation is replicated, we get unnecessary replications, which cause problems in executing updates and waste disk space Might be an OK solution, if queries need all the data in the relation and data stays only at the sites that use the data Fragments of relation as unit of distribution: Application views are usually subsets of relations Thus, locality of accesses of applications is defined on subsets of relations Permits a number of transactions to execute concurrently, since they will access different portions of a relation Parallel execution of a single query (intra-query concurrency) However, semantic data control (especially integrity enforcement) is more difficult Fragments of relations are (usually) appropriate unit of distribution. DDBS16, SL02 8/60 M. Böhlen

3 Fragmentation/2 Fragmentation aims to improve: Reliability Performance Balanced storage capacity and costs Communication costs Security The following information is used to decide fragmentation: Quantitative information: cardinality of relations, frequency of queries, site where query is run, selectivity of the queries, etc. Qualitative information: predicates in queries, types of access of data, read/write, etc. Fragmentation/3 Types of Fragmentation Horizontal: partitions a relation along its tuples Vertical: partitions a relation along its attributes Mixed/hybrid: a combination of horizontal and vertical fragmentation (a) Horizontal Fragmentation (b) Vertical Fragmentation (c) Mixed Fragmentation DDBS16, SL02 9/60 M. Böhlen Fragmentation/4 Example of database instance DDBS16, SL02 10/60 M. Böhlen Fragmentation/5 Example (contd.): Horizontal fragmentation of PROJ relation PROJ1: projects with budgets less than PROJ2: projects with budgets greater than or equal to DDBS16, SL02 11/60 M. Böhlen DDBS16, SL02 12/60 M. Böhlen

4 Fragmentation/6 Example (contd.): Vertical fragmentation of PROJ relation PROJ1: information about project budgets PROJ2: information about project names and locations Correctness Rules of Fragmentation Completeness Decomposition of relation R into fragments R1, R 2,..., R n is complete iff each data item in R can also be found in some R i. Reconstruction If relation R is decomposed into fragments R1, R 2,..., R n, then there should exist some relational operator that reconstructs R from its fragments, i.e., R = R 1... R n Union to combine horizontal fragments Join to combine vertical fragments Disjointness If relation R is decomposed into fragments R1, R 2,..., R n and data item d i appears in fragment R j, then d i should not appear in any other fragment R k, k j (exception: primary key attribute for vertical fragmentation) For horizontal fragmentation, data item is a tuple For vertical fragmentation, data item is an attribute DDBS16, SL02 13/60 M. Böhlen Idea of Horizontal Fragmentation DDBS16, SL02 14/60 M. Böhlen Information Requirements/1 Intuition behind horizontal fragmentation Every site should hold all information that is used to query the site The information at the site should be fragmented so the queries of the site run faster Horizontal fragmentation is defined as selection operation, σ p (R) Example: σ BUDGET <200K (PROJ) σ BUDGET 200K (PROJ) Database information: Links between relations (a link models a 1:N relationship between relations that are related to each other by an equality join) Cardinality of relations: card(r) DDBS16, SL02 15/60 M. Böhlen DDBS16, SL02 16/60 M. Böhlen

5 Information Requirements/ Information Requirements/ Application information: Simple predicates used in queries Relation R[A 1, A 2,..., A n ] A i θv i a simple predicate (θ {=, <,, >,, }, v i D i, D i is the domain of A i ) For relation R we define Pr = {p1, p 2,..., p m } Example: PNAME = Maintenance, BUDGET < Minterm predicates minterm predicates are conjunctions of simple predicates Relation R, Pr = {p 1, p 2,..., p m } M = {m 1, m 2,..., m r } such that M = {m i m i = pj Pr p j }, 1 j m, 1 i z where p j = p j or p j = p j Application information (cnt d): Minterm selectivities The number of tuples of the relation that would be accessed by a user query, which is specified according to a given minterm predicate m i. Access frequencies The frequency with which a user application qi accesses data. DDBS16, SL02 17/60 M. Böhlen Horizontal Fragmentation/1 DDBS16, SL02 18/60 M. Böhlen Horizontal Fragmentation/2 We write Pr to denote the complete and minimal set of simple predicates generated from Pr: The set of predicates is complete if and only if any two tuples in the same fragment are referenced with the same probability by any application The set of predicates is minimal if and only if each simple predicate is relevant for determining the fragmentation and for each pair of fragments there is at least one query that accesses the fragments differently A horizontal fragment R i of relation R consists of all the tuples of R that satisfy a predicate F j : R j = σ Fj (R), 1 j w where F j is a selection formula, which is (preferably) a minterm predicate. Computing the horizontal fragmentation: 1. Determine Pr (80/20 rule) 2. Compute Pr 3. Determine M 4. Minimize M (eliminating impossible minterms) DDBS16, SL02 19/60 M. Böhlen DDBS16, SL02 20/60 M. Böhlen

6 Horizontal Fragmentation/3 Horizontal Fragmentation/4 Example: Fragmentation of the PROJ relation Consider the following query: Find the name and budget of projects given their location. The query is issued at all three locations Fragmentation based on LOC, using the set of predicates {LOC = Montreal, LOC = NewYork, LOC = Paris } PROJ 1 = σ LOC= Montreal (PROJ) PNO PNAME BUDGET LOC P1 Instrument Montreal PROJ 2 = σ LOC= NewYork (PROJ) PNO PNAME BUDGET LOC P2 DB Develop New York P3 CAD/CAM New York If access is only according to the location, the above set of predicates is complete i.e., in each fragment PROJ i each tuple has the same probability of being accessed If there is a second query/application that accesses only those project tuples where the budget is less than $200K, the set of predicates is not complete. P2 in PROJ 2 has higher probability to be accessed PROJ 3 = σ LOC= Paris (PROJ) PNO PNAME BUDGET LOC P4 Maintenance Paris DDBS16, SL02 21/60 M. Böhlen Horizontal Fragmentation/5 DDBS16, SL02 22/60 M. Böhlen Horizontal Fragmentation/ Example (contd.): Add BUDGET < 200K and BUDGET 200K to the set of predicates to make it complete. {LOC = Montreal, LOC = NewYork, LOC = Paris, BUDGET 200K, BUDGET < 200K} is a complete set Minterms to fragment the relation are given as follows: (LOC = Montreal ) (BUDGET 200K) (LOC = Montreal ) (BUDGET > 200K) (LOC = NewYork ) (BUDGET 200K) (LOC = NewYork ) (BUDGET > 200K) (LOC = Paris ) (BUDGET 200K) (LOC = Paris ) (BUDGET > 200K) Example (contd.): Now, PROJ i, i = 1, 2, 3 will be split in two fragments PROJ 1 = σ LOC= Montreal (PROJ) PNO PNAME BUDGET LOC P1 Instrument Montreal PROJ 3 = σ LOC= NewYork (PROJ) PNO PNAME BUDGET LOC P2 DB Develop New York Note that the following fragments are empty: σ LOC= Paris BDGT <200K (PROJ) σ LOC= Montreal BDGT 200K (PROJ) PROJ 4 = σ LOC= NewYork (PROJ) PNO PNAME BUDGET LOC P3 CAD/CAM New York PROJ 6 = σ LOC= Paris (PROJ) PNO PNAME BUDGET LOC P4 Maintenance Paris DDBS16, SL02 23/60 M. Böhlen DDBS16, SL02 24/60 M. Böhlen

7 Derived Horizontal Fragmentation/ Derived Horizontal Fragmentation/2 Defined on a member (target) relation of a link according to a selection operation specified on its owner (source). Each link is an equijoin. Fragments of member relation can be defined with a semijoin on fragments of owner relation. Given a link L where owner(l) = S and member(l) = R, the derived horizontal fragments of R are defined as Ri = R >< Si, 1 i w where w is the maximum number of fragments that will be defined on R and Si = σ Fi (S) where Fi is the formula according to which the primary horizontal fragment Si is defined. DDBS16, SL02 25/60 M. Böhlen Derived Horizontal Fragmentation/3 Given link L1 where owner(l1) = PAY and member(l1) = EMP and PAY 1 = σ SAL (PAY ) PAY 2 = σsal>30000 (PAY ) we get EMP1 = EMP >< PAY 1 EMP2 = EMP >< PAY DDBS16, SL02 26/60 M. Böhlen Vertical Fragmentation/1 Objective of vertical fragmentation is to partition a relation into a set of smaller relations so that many of the applications will run on only one fragment. Vertical fragmentation of a relation R produces fragments R 1, R 2,..., each of which contains a subset of R s attributes. Vertical fragmentation is defined using the projection operation of the relational algebra: π A1,A 2,...,A n (R) Example: PROJ 1 = π PNO,BUDGET (PROJ) PROJ 2 = π PNO,PNAME,LOC (PROJ) Vertical fragmentation has also been studied for (centralized) DBMS Smaller relations, and hence less page accesses e.g., MONET system DDBS16, SL02 27/60 M. Böhlen DDBS16, SL02 28/60 M. Böhlen

8 Vertical Fragmentation/2 Vertical Fragmentation/3 Vertical fragmentation is more complicated than horizontal fragmentation In horizontal partitioning: for n simple predicates, the number of possible minterms is 2 n ; some of them can be ruled out by existing implications/constraints. In vertical partitioning: for m non-primary key attributes, the number of possible fragments is equal to B(m) (= the mth Bell number), i.e., the number of partitions of a set with m members. For large numbers, B(m) m m (e.g., B(15) = 10 9 ) Two types of heuristics for vertical fragmentation exist: Grouping: assign each attribute to one fragment, and at each step, join some of the fragments until some criteria is satisfied. Bottom-up approach Splitting: starts with a relation and decides on beneficial partitionings based on the access behaviour of applications to the attributes. Top-down approach Results in non-overlapping fragments Optimal solution is probably closer to the full relation than to a set of small relations with only one attribute DDBS16, SL02 29/60 M. Böhlen Vertical Fragmentation/4 DDBS16, SL02 30/60 M. Böhlen Vertical Fragmentation/5 Application information: The major information required as input for vertical fragmentation is related to applications (queries) Since vertical fragmentation places in one fragment those attributes usually accessed together, there is a need for some measure that would define more precisely the notion of togetherness, i.e., how closely related the attributes are. This information is obtained from queries and collected in the Attribute Usage Matrix and Attribute Affinity Matrix. Given are the user queries/applications Q = (q 1,..., q q ) that will run on relation R(A 1,..., A n ) Attribute Usage Matrix: Denotes which query uses which attribute: { 1 iff qi uses A use(q i, A j ) = j 0 otherwise The use(qi, ) vectors for each application are easy to define if the designer knows the applications that willl run on the DB (consider also the rule) DDBS16, SL02 31/60 M. Böhlen DDBS16, SL02 32/60 M. Böhlen

9 Vertical Fragmentation/6 Example: Consider relation PROJ(PNO, PNAME, BUDGET, LOC) and queries: q 1 = SELECT BUDGET FROM PROJ WHERE PNO=Value q 2 = SELECT PNAME,BUDGET FROM PROJ q 3 = SELECT PNAME FROM PROJ WHERE LOC=Value q 4 = SELECT SUM(BUDGET) FROM PROJ WHERE LOC =Value Abbreviations: A 1 = PNO, A 2 = PNAME, A 3 = BUDGET, A 4 = LOC Attribute Usage Matrix A 1 A 2 A 3 A 4 q q q q Vertical Fragmentation/7 Attribute Affinity Matrix: Denotes the frequency of two attributes A i and A j with respect to a set of queries Q = (q 1,..., q n ): aff (A i, A j ) = ( ref l (q k ) acc l (q k )) sites l k: use(q k,a i )=1, use(q k,a j )=1 where ref l (q k ) is the cost (= number of accesses to (A i, A j )) for each execution of query q K at site l acc l (q k ) is the frequency of query q k at site l DDBS16, SL02 33/60 M. Böhlen Vertical Fragmentation/ DDBS16, SL02 34/60 M. Böhlen Vertical Fragmentation/9 Example (contd.): Let the cost of each query be ref l (q k ) = 1, and the frequency acc l (q k ) of the queries be as follows: Site1 Site2 Site3 acc 1 (q 1 ) = 15 acc 2 (q 1 ) = 20 acc 3 (q 1 ) = 10 acc 1 (q 2 ) = 5 acc 2 (q 2 ) = 0 acc 3 (q 2 ) = 0 acc 1 (q 3 ) = 25 acc 2 (q 3 ) = 25 acc 3 (q 3 ) = 25 acc 1 (q 4 ) = 3 acc 2 (q 4 ) = 0 acc 3 (q 4 ) = 0 Attribute affinity matrix aff (A i, A j ) = A 1 A 2 A 3 A 4 A A A A e.g., aff (A1, A 3 ) = 1 k=1 3 l=1 acc l(q k ) = acc 1 (q 1 ) + acc 2 (q 1 ) + acc 3 (q 1 ) = 45 (q 1 is the only query to access both A 1 and A 3 ) DDBS16, SL02 35/60 M. Böhlen Idea: Take the attribute affinity matrix (AA) and reorganize the attribute orders to form clusters where the attributes in each cluster demonstrate high affinity to one another. Bond energy algorithm (BEA) has been suggested for that purpose for several reasons: It is designed specifically to determine groups of similar items as opposed to a linear ordering of the items. The final groupings are insensitive to the order in which items are presented. The computation time is reasonable (O(n 2 ), where n is the number of attributes) DDBS16, SL02 36/60 M. Böhlen

10 Vertical Fragmentation/10 BEA: Input: AA matrix Output: Clustered AA matrix (CA) Permutation is done in such a way to maximize the following global affinity mesaure (affinity of A i and A j with their neighbors): AM = n i=1 j=1 n aff(a i, A j )[aff(a i, A j 1 ) + aff(a i, A j+1 ) + aff(a i 1, A j ) + aff(a i+1, A j )] Vertical Fragmentation/11 Example (contd.): Cluster Affinity Matrix CA after running BEA A 1 A 3 A 2 A 4 A A A A Elements with similar values are grouped together, and two clusters can be identified An additional partitioning algorithm is needed to identify the clusters in CA Usually more clusters and more than one candidate partitioning, thus additional steps are needed to select the best clustering. The resulting fragmentation after partitioning (PNO is added in PROJ 2 explicilty as key): PROJ 1 = {PNO, BUDGET } PROJ 2 = {PNO, PNAME, LOC} DDBS16, SL02 37/60 M. Böhlen BEA Algorithm/1 Input: The AA matrix Output: Clustered affinity matrix CA which is a permutation of AA Initialization: Place and fix one of the columns of AA in CA (we choose column 1 in our example) Iteration: Place the remaining n-i columns in the remaining i+1 positions in the CA matrix. For each column choose the placement that makes the largest contribution to the global affinity measure. Row order: Order the rows according to the column ordering. DDBS16, SL02 38/60 M. Böhlen BEA Algorithm/2 In order to determine the best placement we define the contribution of a placement. Contribution of a placement: cont(a i, A k, A j ) = 2 bond(a i, A k )+ 2 bond(a k, A j ) 2 bond(a i, A j ) Bond between a pair of attributes is defined as: bond(a x, A y ) = n aff (A z, A x ) aff (A z, A y ) z=1 DDBS16, SL02 39/60 M. Böhlen DDBS16, SL02 40/60 M. Böhlen

11 BEA Algorithm/3 Consider the following AA matrix and the corresponding CA matrix where A 1 and A 2 have been placed. AA = A 1 A 2 A 3 A 4 A A A A CA = A 1 A 2 A A A A bond(a 3, A 1 ) = = 4410 bond(a 3, A 2 ) = = 890 The next step is to place A 3. BEA Algorithm/4 Ordering (0-3-1) : cont(a0, A3, A1) = 2 bond(a0, A3) + 2 bond(a3, A1) 2 bond(a0, A1) = = 8820 Ordering (1-3-2) : cont(a1, A3, A2) = 2 bond(a1, A3) + 2 bond(a3, A2) 2 bond(a1, A2) = = Ordering (2-3-4) : cont(a2, A3, A4) = 2 bond(a2, A3) + 2 bond(a3, A4) 2 bond(a2, A4) = 1780 DDBS16, SL02 41/60 M. Böhlen BEA Algorithm/5 After adding A 3 the CA matrix has the form CA = A 1 A 3 A After adding A 4 and ordering rows the CA matrix has the form CA = A 1 A 3 A 2 A 4 A A A A DDBS16, SL02 43/60 M. Böhlen 02.9 DDBS16, SL02 42/60 M. Böhlen BEA Algorithm/6 The final step is to divide a set of clustered attributes {A 1,..., A n } into two sets {A 1,..., A i } and {A i+1,..., A n } such that the costs of queries that access both sets are minimized. The appropriate split point on the diagonal must be determined: A 1 A 3 A 2 A 4 A A A A This is an optimization problem: the optimal point on the diagonal must be determined. Note: Generalizations are needed to deal with partitions located in the middle of the matrix multiple split points DDBS16, SL02 44/60 M. Böhlen

12 BEA Algorithm/7 Correctness of Vertical Fragmentation AQ(q i ) = {A j use(q i, Aj) = 1} TQ = {q i AQ(q i ) TA} BQ = {q i AQ(q i ) BA} OQ = Q {TQ BQ} CQ = q i Q S j ref j (q i ) acc j (q i ) CTQ = q i TQ S j ref j (q i ) acc j (q i ) CBQ = q i BQ S j ref j (q i ) acc j (q i ) COQ = q i OQ S j ref j (q i ) acc j (q i ) CTQ CBQ COQ 2 attributes accessed by q i queries that access TA only queries that access BA only queries that access TA and BA cost of all queries cost of TQ queries cost of BQ queries cost of other queries maximize Relation R is decomposed into fragments R 1, R 2,..., R n e.g., PROJ = {PNO, BUDGET, PNAME, LOC} into PROJ 1 = {PNO, BUDGET } and PROJ 2 = {PNO, PNAME, LOC} Completeness Guaranteed by the partitioning algortihm, which assigns each attribute in A to one partition Reconstruction Join to reconstruct vertical fragments R = R1 R n = PROJ 1 PROJ 2 Disjointness Attributes have to be disjoint in VF. Two cases are distinguished: If tuple IDs are used, the fragments are really disjoint Otherwise, key attributes are replicated automatically by the system e.g., PNO in the above example DDBS16, SL02 45/60 M. Böhlen Mixed Fragmentation DDBS16, SL02 46/60 M. Böhlen Replication and Allocation In most cases simple horizontal or vertical fragmentation of a DB schema will not be sufficient to satisfy the requirements of the applications. Mixed fragmentation (hybrid fragmentation): Consists of a horizontal fragment followed by a vertical fragmentation, or a vertical fragmentation followed by a horizontal fragmentation Fragmentation is defined using the selection and projection operations of relational algebra: σ p (π A1,...,A n (R)) π A1,...,A n (σ p (R)) Replication: Which fragements shall be stored as multiple copies? Complete Replication Complete copy of the database is maintained in each site Selective Replication Selected fragments are replicated in some sites Allocation: On which sites to store the various fragments? Centralized Consists of a single DB and DBMS stored at one site with users distributed across the network Partitioned Database is partitioned into disjoint fragments, each fragment assigned to one site DDBS16, SL02 47/60 M. Böhlen DDBS16, SL02 48/60 M. Böhlen

13 Replication/1 Replication/2 Comparison of replication alternatives Replicated DB fully replicated: each fragment at each site partially replicated: each fragment at some of the sites Non-replicated DB (= partitioned DB) partitioned: each fragment resides at only one site Rule of thumb: read only queries If 1, then replication is advantageous, otherwise update queries replication may cause problems DDBS16, SL02 49/60 M. Böhlen Fragment Allocation/1 DDBS16, SL02 50/60 M. Böhlen Fragment Allocation/2 Fragment allocation problem Given are: fragments F = {F 1, F 2,..., F n } network sites S = {S 1, S 2,..., S m } and applications Q = {q 1, q 2,..., q l } Find: the optimal distribution of F to S Optimality Minimal cost Communication + storage + processing (read and update) Cost in terms of time (usually) Performance Response time and/or throughput Constraints Per site constraints (storage and processing) Required information Database Information selectivity of fragments size of a fragment Application Information RR ij : number of read accesses of a query q i to a fragment F j UR ij : number of update accesses of query q i to a fragment F j u ij : a matrix indicating which queries updates which fragments, r ij : a similar matrix for retrievals originating site of each query Site Information USC k : unit cost of storing data at a site S k LPC k : cost of processing one unit of data at a site S k Network Information communication cost/frame between two sites frame size DDBS16, SL02 51/60 M. Böhlen DDBS16, SL02 52/60 M. Böhlen

14 Fragment Allocation/3 We discuss an allocation model which attempts to minimize the total cost of processing and storage meet certain response time restrictions General Form: min(total Cost) Fragment Allocation/4 The total cost function has two components: storage and query processing. TOC = S k S F j F STC jk + QPC i q i Q subject to response time constraint storage constraint processing constraint Storage cost of fragment Fj at site S k : STC jk = USC k size(f j ) x ij where USC k is the unit storage cost at site k Decision variable x ij { 1 if fragment Fi is stored at site S x ij = j 0 otherwise Query processing cost for a query q i is composed of two components: composed of processing cost (PC) and transmission cost (TC) QPC i = PC i + TC i DDBS16, SL02 53/60 M. Böhlen Fragment Allocation/5 Processing cost is a sum of three components: access cost (AC), integrity contraint cost (IE), concurency control cost (CC) DDBS16, SL02 54/60 M. Böhlen Fragment Allocation/6 The transmission cost is composed of two components: Cost of processing updates (TCU) and cost of processing retrievals (TCR) Access cost: AC i = s k S PC i = AC i + IE i + CC i (UR ij + RR ij ) x ij LPC k F j F where LPC k is the unit process cost at site k Integrity and concurrency costs: Can be similarly computed, though depends on the specific constraints Note: AC i assumes that processing a query involves decomposing it into a set of subqueries, each of which works on a fragment,..., This is a very simplistic model Does not take into consideration different query costs depending on the operator or different algorithms that are applied TC i = TCU i + TCR i Cost of updates: Inform all the sites that have replicas + a short confirmation message back TCU i = u ij (update message cost + acknowledgment cost) S k S F j F Retrieval cost: Send retrieval request to all sites that have a copy of fragments that are needed + sending back the results from these sites to the originating site. TCR i = F j F min S k S (x jk (cost retrieval request + cost sending back result)) DDBS16, SL02 55/60 M. Böhlen DDBS16, SL02 56/60 M. Böhlen

15 Fragment Allocation/7 Modeling the constraints Response time constraint for a query q i execution time of q i max. allowable response time for q i Storage constraints for a site Sk storage requirement of F j at S k storage capacity of S k F j F Processing constraints for a site Sk processing load of q i at site S k processing capacity ofs k q i Q Fragment Allocation/8 Solution Methods The complexity of this allocation model/problem is NP-complete Correspondence between the allocation problem and similar problems in other areas Plant location problem in operations research Knapsack problem Network flow problem Hence, solutions from these areas can be re-used Use different heuristics to reduce the search space Assume that all candidate partitionings have been determined together with their associated costs and benefits in terms of query processing. The problem is then reduced to find the optimal partitioning and placement for each relation Ignore replication at the first step and find an optimal non-replicated solution Replication is then handeled in a second step on top of the previous non-replicated solution. DDBS16, SL02 57/60 M. Böhlen Conclusion/1 DDBS16, SL02 58/60 M. Böhlen Conclusion/2 Distributed design decides on the placement of (parts of the) data and programs across the sites of a computer network There are two key technical questions: fragmentation and allocation/replication of data Horizontal fragmentation is defined via the selection operation σ p (R) Rewrites the queries of each site in the conjunctive normal form and finds a minimal and complete set of conjunctions to determine fragmentation Vertical fragmentation via the projection operation π A (R) Computes the attribute affinity matrix and groups similar attributes together Mixed fragmentation is a combination of both approaches Allocation/Replication of data Type of replication: no replication, partial replication, full replication Optimal allocation/replication modelled as a cost function under a set of constraints The complexity of the problem is NP-complete Use of different heuristics to reduce the complexity DDBS16, SL02 59/60 M. Böhlen DDBS16, SL02 60/60 M. Böhlen

Chapter 3: Distributed Database Design

Chapter 3: Distributed Database Design Chapter 3: Distributed Database Design Design problem Design strategies(top-down, bottom-up) Fragmentation Allocation and replication of fragments, optimality, heuristics Acknowledgements: I am indebted

More information

Distributed Database Design (Chapter 5)

Distributed Database Design (Chapter 5) Distributed Database Design (Chapter 5) Top-Down Approach: The database system is being designed from scratch. Issues: fragmentation & allocation Bottom-up Approach: Integration of existing databases (Chapter

More information

Distributed Databases in a Nutshell

Distributed Databases in a Nutshell Distributed Databases in a Nutshell Marc Pouly Marc.Pouly@unifr.ch Department of Informatics University of Fribourg, Switzerland Priciples of Distributed Database Systems M. T. Özsu, P. Valduriez Prentice

More information

Distributed Database Management Systems

Distributed Database Management Systems Page 1 Distributed Database Management Systems Outline Introduction Distributed DBMS Architecture Distributed Database Design Distributed Query Processing Distributed Concurrency Control Distributed Reliability

More information

Fragmentation and Data Allocation in the Distributed Environments

Fragmentation and Data Allocation in the Distributed Environments Annals of the University of Craiova, Mathematics and Computer Science Series Volume 38(3), 2011, Pages 76 83 ISSN: 1223-6934, Online 2246-9958 Fragmentation and Data Allocation in the Distributed Environments

More information

Advanced Database Management Systems

Advanced Database Management Systems Advanced Database Management Systems Distributed DBMS:Introduction and Architectures Alvaro A A Fernandes School of Computer Science, University of Manchester AAAF (School of CS, Manchester) Advanced DBMSs

More information

Distributed Databases. Fábio Porto LBD winter 2004/2005

Distributed Databases. Fábio Porto LBD winter 2004/2005 Distributed Databases LBD winter 2004/2005 1 Agenda Introduction Architecture Distributed database design Query processing on distributed database Data Integration 2 Outline Introduction to DDBMS Architecture

More information

Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts

Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts Distributed Databases Basic Concepts Distributed Databases Concepts. Advantages and disadvantages of distributed databases. Functions and architecture for a DDBMS. Distributed database design. Levels of

More information

DISTRIBUTED AND PARALLELL DATABASE

DISTRIBUTED AND PARALLELL DATABASE DISTRIBUTED AND PARALLELL DATABASE SYSTEMS Tore Risch Uppsala Database Laboratory Department of Information Technology Uppsala University Sweden http://user.it.uu.se/~torer PAGE 1 What is a Distributed

More information

chapater 7 : Distributed Database Management Systems

chapater 7 : Distributed Database Management Systems chapater 7 : Distributed Database Management Systems Distributed Database Management System When an organization is geographically dispersed, it may choose to store its databases on a central database

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Chapter 5: Overview of Query Processing

Chapter 5: Overview of Query Processing Chapter 5: Overview of Query Processing Query Processing Overview Query Optimization Distributed Query Processing Steps Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of

More information

Distributed Database Systems. Prof. Dr. Carl-Christian Kanne

Distributed Database Systems. Prof. Dr. Carl-Christian Kanne Distributed Database Systems Prof. Dr. Carl-Christian Kanne 1 What is a Distributed Database System? A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed

More information

Distributed Information Systems - Exercise 4

Distributed Information Systems - Exercise 4 WS 2004/2005 Distributed Information Systems - Exercise 4 Schema Fragmentation Date: 23.11.2004 Return: 30.11.2004 1. Horizontal Fragmentation Given the following relational database: Employees EmpNo EmpName

More information

Distributed Data Management

Distributed Data Management Introduction Distributed Data Management Involves the distribution of data and work among more than one machine in the network. Distributed computing is more broad than canonical client/server, in that

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

Physical Database Design and Tuning

Physical Database Design and Tuning Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence

More information

1. Physical Database Design in Relational Databases (1)

1. Physical Database Design in Relational Databases (1) Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence

More information

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

More information

Distributed Databases

Distributed Databases Distributed Databases Chapter 1: Introduction Johann Gamper Syllabus Data Independence and Distributed Data Processing Definition of Distributed databases Promises of Distributed Databases Technical Problems

More information

Vertical Partitioning Algorithms for Database Design

Vertical Partitioning Algorithms for Database Design Vertical Partitioning Algorithms for Database Design SHAMKANT NAVATHE, STEFANO CERI, GIO WIEDERHOLD, AND JINGLIE DOU Stanford University This paper addresses the vertical partitioning of a set of logical

More information

Database Constraints and Design

Database Constraints and Design Database Constraints and Design We know that databases are often required to satisfy some integrity constraints. The most common ones are functional and inclusion dependencies. We ll study properties of

More information

TOP-DOWN APPROACH PROCESS BUILT ON CONCEPTUAL DESIGN TO PHYSICAL DESIGN USING LIS, GCS SCHEMA

TOP-DOWN APPROACH PROCESS BUILT ON CONCEPTUAL DESIGN TO PHYSICAL DESIGN USING LIS, GCS SCHEMA TOP-DOWN APPROACH PROCESS BUILT ON CONCEPTUAL DESIGN TO PHYSICAL DESIGN USING LIS, GCS SCHEMA Ajay B. Gadicha 1, A. S. Alvi 2, Vijay B. Gadicha 3, S. M. Zaki 4 1&4 Deptt. of Information Technology, P.

More information

Horizontal Fragmentation Technique in Distributed Database

Horizontal Fragmentation Technique in Distributed Database International Journal of Scientific and esearch Publications, Volume, Issue 5, May 0 Horizontal Fragmentation Technique in istributed atabase Ms P Bhuyar ME I st Year (CSE) Sipna College of Engineering

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

2. Basic Relational Data Model

2. Basic Relational Data Model 2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that

More information

Principles of Distributed Database Systems

Principles of Distributed Database Systems M. Tamer Özsu Patrick Valduriez Principles of Distributed Database Systems Third Edition

More information

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89 by Joseph Collison Copyright 2000 by Joseph Collison All rights reserved Reproduction or translation of any part of this work beyond that permitted by Sections

More information

An Overview of Distributed Databases

An Overview of Distributed Databases International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 2 (2014), pp. 207-214 International Research Publications House http://www. irphouse.com /ijict.htm An Overview

More information

Chapter 2: DDBMS Architecture

Chapter 2: DDBMS Architecture Chapter 2: DDBMS Architecture Definition of the DDBMS Architecture ANSI/SPARC Standard Global, Local, External, and Internal Schemas, Example DDBMS Architectures Components of the DDBMS Acknowledgements:

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Rethinking SIMD Vectorization for In-Memory Databases

Rethinking SIMD Vectorization for In-Memory Databases SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form Section 1.3 Matrix Products A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form (scalar #1)(quantity #1) + (scalar #2)(quantity #2) +...

More information

3. Relational Model and Relational Algebra

3. Relational Model and Relational Algebra ECS-165A WQ 11 36 3. Relational Model and Relational Algebra Contents Fundamental Concepts of the Relational Model Integrity Constraints Translation ER schema Relational Database Schema Relational Algebra

More information

Chapter 6: Episode discovery process

Chapter 6: Episode discovery process Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing

More information

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws. Matrix Algebra A. Doerr Before reading the text or the following notes glance at the following list of basic matrix algebra laws. Some Basic Matrix Laws Assume the orders of the matrices are such that

More information

virtual class local mappings semantically equivalent local classes ... Schema Integration

virtual class local mappings semantically equivalent local classes ... Schema Integration Data Integration Techniques based on Data Quality Aspects Michael Gertz Department of Computer Science University of California, Davis One Shields Avenue Davis, CA 95616, USA gertz@cs.ucdavis.edu Ingo

More information

On Recognizable Timed Languages FOSSACS 2004

On Recognizable Timed Languages FOSSACS 2004 On Recognizable Timed Languages Oded Maler VERIMAG Grenoble France Amir Pnueli NYU and Weizmann New York and Rehovot USA FOSSACS 2004 Nutrition Facts Classical (Untimed) Recognizability Timed Languages

More information

Vertical Partitioning for Query Processing over Raw Data

Vertical Partitioning for Query Processing over Raw Data Vertical Partitioning for Query Processing over Raw Data Weijie Zhao UC Merced wzhao23@ucmerced.edu Yu Cheng UC Merced ycheng4@ucmerced.edu Florin Rusu UC Merced frusu@ucmerced.edu ABSTRACT Traditional

More information

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation Objectives Distributed Databases and Client/Server Architecture IT354 @ Peter Lo 2005 1 Understand the advantages and disadvantages of distributed databases Know the design issues involved in distributed

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches Concepts of Database Management Seventh Edition Chapter 9 Database Management Approaches Objectives Describe distributed database management systems (DDBMSs) Discuss client/server systems Examine the ways

More information

Distributed Database Management Systems

Distributed Database Management Systems Distributed Database Management Systems (Distributed, Multi-database, Parallel, Networked and Replicated DBMSs) Terms of reference: Distributed Database: A logically interrelated collection of shared data

More information

Optimizing Performance. Training Division New Delhi

Optimizing Performance. Training Division New Delhi Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,

More information

Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : 933597 Hertford College Supervisor: Dr.

Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : 933597 Hertford College Supervisor: Dr. Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : 933597 Hertford College Supervisor: Dr. Dan Olteanu Submitted as part of Master of Computer Science Computing Laboratory

More information

DERIVATIVES AS MATRICES; CHAIN RULE

DERIVATIVES AS MATRICES; CHAIN RULE DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we

More information

SQL Query Evaluation. Winter 2006-2007 Lecture 23

SQL Query Evaluation. Winter 2006-2007 Lecture 23 SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

Databases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

Databases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31 Databases -Normalization III (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31 This lecture This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases -

More information

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries

More information

Efficiently Identifying Inclusion Dependencies in RDBMS

Efficiently Identifying Inclusion Dependencies in RDBMS Efficiently Identifying Inclusion Dependencies in RDBMS Jana Bauckmann Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany bauckmann@informatik.hu-berlin.de

More information

Solution to Homework 2

Solution to Homework 2 Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

More information

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014 Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li Advised by: Dave Mount May 22, 2014 1 INTRODUCTION In this report we consider the implementation of an efficient

More information

Fast Contextual Preference Scoring of Database Tuples

Fast Contextual Preference Scoring of Database Tuples Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation

More information

Relational Database Design

Relational Database Design Relational Database Design To generate a set of relation schemas that allows - to store information without unnecessary redundancy - to retrieve desired information easily Approach - design schema in appropriate

More information

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3 The Relational Model Chapter 3 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase,

More information

Schema Design and Normal Forms Sid Name Level Rating Wage Hours

Schema Design and Normal Forms Sid Name Level Rating Wage Hours Entity-Relationship Diagram Schema Design and Sid Name Level Rating Wage Hours Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Database Management Systems, 2 nd Edition. R. Ramakrishnan

More information

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions. 3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

More information

Theory behind Normalization & DB Design. Satisfiability: Does an FD hold? Lecture 12

Theory behind Normalization & DB Design. Satisfiability: Does an FD hold? Lecture 12 Theory behind Normalization & DB Design Lecture 12 Satisfiability: Does an FD hold? Satisfiability of FDs Given: FD X Y and relation R Output: Does R satisfy X Y? Algorithm: a.sort R on X b.do all the

More information

Introduction to Database Management Systems

Introduction to Database Management Systems Database Administration Transaction Processing Why Concurrency Control? Locking Database Recovery Query Optimization DB Administration 1 Transactions Transaction -- A sequence of operations that is regarded

More information

Physical Design in OODBMS

Physical Design in OODBMS 1 Physical Design in OODBMS Dieter Gluche and Marc H. Scholl University of Konstanz Department of Mathematics and Computer Science P.O.Box 5560/D188, D-78434 Konstanz, Germany E-mail: {Dieter.Gluche, Marc.Scholl}@Uni-Konstanz.de

More information

Why Is This Important? Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) Example (Contd.)

Why Is This Important? Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) Example (Contd.) Why Is This Important? Schema Refinement and Normal Forms Chapter 19 Many ways to model a given scenario in a database How do we find the best one? We will discuss objective criteria for evaluating database

More information

Implementing Web-Based Computing Services To Improve Performance And Assist Telemedicine Database Management System

Implementing Web-Based Computing Services To Improve Performance And Assist Telemedicine Database Management System Implementing Web-Based Computing Services To Improve Performance And Assist Telemedicine Database Management System D. A. Vidhate 1, Ige Pranita 2, Kothari Pooja 3, Kshatriya Pooja 4 (Information Technology,

More information

BBM467 Data Intensive ApplicaAons

BBM467 Data Intensive ApplicaAons Hace7epe Üniversitesi Bilgisayar Mühendisliği Bölümü BBM467 Data Intensive ApplicaAons Dr. Fuat Akal akal@hace7epe.edu.tr FoundaAons of Data[base] Clusters Database Clusters Hardware Architectures Data

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Outline. Distributed DBMSPage 4. 1

Outline. Distributed DBMSPage 4. 1 Outline Introduction Background Distributed DBMS Architecture Datalogical Architecture Implementation Alternatives Component Architecture Distributed DBMS Architecture Distributed Database Design Semantic

More information

University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7. Review University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

More information

Pigeonhole Principle Solutions

Pigeonhole Principle Solutions Pigeonhole Principle Solutions 1. Show that if we take n + 1 numbers from the set {1, 2,..., 2n}, then some pair of numbers will have no factors in common. Solution: Note that consecutive numbers (such

More information

Introduction. The Quine-McCluskey Method Handout 5 January 21, 2016. CSEE E6861y Prof. Steven Nowick

Introduction. The Quine-McCluskey Method Handout 5 January 21, 2016. CSEE E6861y Prof. Steven Nowick CSEE E6861y Prof. Steven Nowick The Quine-McCluskey Method Handout 5 January 21, 2016 Introduction The Quine-McCluskey method is an exact algorithm which finds a minimum-cost sum-of-products implementation

More information

Design of Relational Database Schemas

Design of Relational Database Schemas Design of Relational Database Schemas T. M. Murali October 27, November 1, 2010 Plan Till Thanksgiving What are the typical problems or anomalies in relational designs? Introduce the idea of decomposing

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

www.dotnetsparkles.wordpress.com

www.dotnetsparkles.wordpress.com Database Design Considerations Designing a database requires an understanding of both the business functions you want to model and the database concepts and features used to represent those business functions.

More information

Data Integration. Maurizio Lenzerini. Universitá di Roma La Sapienza

Data Integration. Maurizio Lenzerini. Universitá di Roma La Sapienza Data Integration Maurizio Lenzerini Universitá di Roma La Sapienza DASI 06: Phd School on Data and Service Integration Bertinoro, December 11 15, 2006 M. Lenzerini Data Integration DASI 06 1 / 213 Structure

More information

INCIDENCE-BETWEENNESS GEOMETRY

INCIDENCE-BETWEENNESS GEOMETRY INCIDENCE-BETWEENNESS GEOMETRY MATH 410, CSUSM. SPRING 2008. PROFESSOR AITKEN This document covers the geometry that can be developed with just the axioms related to incidence and betweenness. The full

More information

SQL: Queries, Programming, Triggers

SQL: Queries, Programming, Triggers SQL: Queries, Programming, Triggers Chapter 5 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 R1 Example Instances We will use these instances of the Sailors and Reserves relations in

More information

Example Instances. SQL: Queries, Programming, Triggers. Conceptual Evaluation Strategy. Basic SQL Query. A Note on Range Variables

Example Instances. SQL: Queries, Programming, Triggers. Conceptual Evaluation Strategy. Basic SQL Query. A Note on Range Variables SQL: Queries, Programming, Triggers Chapter 5 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Example Instances We will use these instances of the Sailors and Reserves relations in our

More information

SOLVING QUADRATIC EQUATIONS - COMPARE THE FACTORING ac METHOD AND THE NEW DIAGONAL SUM METHOD By Nghi H. Nguyen

SOLVING QUADRATIC EQUATIONS - COMPARE THE FACTORING ac METHOD AND THE NEW DIAGONAL SUM METHOD By Nghi H. Nguyen SOLVING QUADRATIC EQUATIONS - COMPARE THE FACTORING ac METHOD AND THE NEW DIAGONAL SUM METHOD By Nghi H. Nguyen A. GENERALITIES. When a given quadratic equation can be factored, there are 2 best methods

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer. DBMS Architecture INSTRUCTION OPTIMIZER Database Management Systems MANAGEMENT OF ACCESS METHODS BUFFER MANAGER CONCURRENCY CONTROL RELIABILITY MANAGEMENT Index Files Data Files System Catalog BASE It

More information

The Entity-Relationship Model

The Entity-Relationship Model The Entity-Relationship Model 221 After completing this chapter, you should be able to explain the three phases of database design, Why are multiple phases useful? evaluate the significance of the Entity-Relationship

More information

DATABASE NORMALIZATION

DATABASE NORMALIZATION DATABASE NORMALIZATION Normalization: process of efficiently organizing data in the DB. RELATIONS (attributes grouped together) Accurate representation of data, relationships and constraints. Goal: - Eliminate

More information

Introduction to Parallel Programming and MapReduce

Introduction to Parallel Programming and MapReduce Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant

More information

Data integration general setting

Data integration general setting Data integration general setting A source schema S: relational schema XML Schema (DTD), etc. A global schema G: could be of many different types too A mapping M between S and G: many ways to specify it,

More information

Chapter 10 Practical Database Design Methodology and Use of UML Diagrams

Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Outline The Role of Information Systems in

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

AN OVERVIEW OF DISTRIBUTED DATABASE MANAGEMENT

AN OVERVIEW OF DISTRIBUTED DATABASE MANAGEMENT AN OVERVIEW OF DISTRIBUTED DATABASE MANAGEMENT BY AYSE YASEMIN SEYDIM CSE 8343 - DISTRIBUTED OPERATING SYSTEMS FALL 1998 TERM PROJECT TABLE OF CONTENTS INTRODUCTION...2 1. WHAT IS A DISTRIBUTED DATABASE

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:

More information

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system 1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables

More information

Design of LDPC codes

Design of LDPC codes Design of LDPC codes Codes from finite geometries Random codes: Determine the connections of the bipartite Tanner graph by using a (pseudo)random algorithm observing the degree distribution of the code

More information

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 The Relational Model CS2 Spring 2005 (LN6) 1 The relational model Proposed by Codd in 1970. It is the dominant data

More information

Two approaches to the integration of heterogeneous data warehouses

Two approaches to the integration of heterogeneous data warehouses Distrib Parallel Databases (2008) 23: 69 97 DOI 10.1007/s10619-007-7022-z Two approaches to the integration of heterogeneous data warehouses Riccardo Torlone Published online: 23 December 2007 Springer

More information

Relational Algebra and SQL

Relational Algebra and SQL Relational Algebra and SQL Johannes Gehrke johannes@cs.cornell.edu http://www.cs.cornell.edu/johannes Slides from Database Management Systems, 3 rd Edition, Ramakrishnan and Gehrke. Database Management

More information

Limitations of E-R Designs. Relational Normalization Theory. Redundancy and Other Problems. Redundancy. Anomalies. Example

Limitations of E-R Designs. Relational Normalization Theory. Redundancy and Other Problems. Redundancy. Anomalies. Example Limitations of E-R Designs Relational Normalization Theory Chapter 6 Provides a set of guidelines, does not result in a unique database schema Does not provide a way of evaluating alternative schemas Normalization

More information

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES Constantin Brâncuşi University of Târgu Jiu ENGINEERING FACULTY SCIENTIFIC CONFERENCE 13 th edition with international participation November 07-08, 2008 Târgu Jiu TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information