Fragmentation and Data Allocation in the Distributed Environments
|
|
|
- Gwendoline Oliver
- 9 years ago
- Views:
Transcription
1 Annals of the University of Craiova, Mathematics and Computer Science Series Volume 38(3), 2011, Pages ISSN: , Online Fragmentation and Data Allocation in the Distributed Environments Nicoleta - Magdalena Iacob (Ciobanu) Abstract. The distributed data processing is an effective way to improve reliability, availability and performance of a database system. In this paper we will concentrate on data allocation problem with the aim to assure an optimal distribution of data in the process of the distributed database design in correlation with data fragmentation. Efficient allocation of fragments requires a balance between costs (storage, processing and transmission of data), performance (especially response time) and data distribution restrictions. The allocation of fragments is closely related to the replication of data from distributed databases. In addition, we analyzed the cost of fragmentation and replication Mathematics Subject Classification. Primary 68P05; Secondary 68P15, 68N17. Key words and phrases. distributed databases, fragmentation design, allocation design, strategies, methods, cost analysis. 1. Distributed databases Definition 1.1. Let K be a collection of data K i = {A i1,..., A ik }, k = 1, 2,... which consists of a set of attributes (fields, columns) and has an associated set of data (lines, tuples, records). The database: BD = {K 1,..., K i,...}, i I, I = 1, 2,..., n is a collection of structured data. Definition 1.2. Let C be a set of computers: C = {C 1,..., C j }, j J, J = 1, 2,..., m then DDB is a distributed database and P 1, P 2,..., P j the parties that compose it. Therefore DDB = {P 1 P 2... P j }, with {P 1... P j } =, where P j is a corresponding part of the computer C j, P j {K 1,..., K i,...}, i I, with 2 j m, i.e. P j is a subset of the DDB database. Definition 1.3. A distributed database system consists of a collection of sites, connected together via some kind of communications network, in which: a. Each site is a full database system site in its own right, but b. The sites have agreed to work together so that a user at any site can access data anywhere in the network exactly as if the data were all stored at the user s own site. Definition 1.4. It follows that a distributed database (DDB) is really a kind of virtual database, whose component parts are physically stored in a number of distinct real databases at a number of distinct sites (in effect. it is the logical union of those database) [2]. Received July 05, Revision received August 25, This work was partially supported by the strategic grant POSDRU/88/1.5/S/52826, Project ID52826 (2009), co-financed by the European Social Fund - Investing in People, within the Sectoral Operational Programme Human Resources Development
2 FRAGMENTATION AND DATA ALLOCATION Definition 1.5. A distributed database management system (DDBMS) is a software system that manages a distributed database while making the distribution transparent to the user [3]. Distribution is normally discussed solely in terms of the fragmentation and replication of data. A data fragment constitutes some subset of the original database. A data replicate constitutes some copy of the whole or part of the original database [1]. 2. The design of a distributed database The design of a distributed computer system involves making decisions on the placement of data and programs across the sites of a computer network. In the case of distributed DBMSs, the distribution of applications involves two things: the distribution of the distributed DBMS software and the distribution of the application programs that run on it [5]. The distributed nature of the data involves new further data: data fragmentation, partitioning, replication and fragments allocation as well as replicas on different sites. The design of DDB can be done, as in the case of centralized database, using the top-down and/or bottom-up approach. Top-down design shown schematically in Figure 1, aims to assure an optimal distribution of data and is used in the distributed database design phase. Figure 1. Top-down design of distributed databases The starting point in a top-down design approach is to identify and analyze the requirements of the system which will use the distributed database architecture and
3 78 N.M. IACOB (CIOBANU) identify the data storage requirements and the data access mode. Based on these requirements first will be designed the global scheme of DDB independent of physical constraints regarding distribution and implementation. The next step is the data distribution design, which will show the fragmentation type of DDB and also allocation of fragments on nodes. The design process will end with the local database design for each node. The design techniques for global scheme of DDB and local database are similar with those used in a centralized database scenario. Specific to DDB are aspects related to the fragmentation design and fragment allocation scheme to network nodes. The two activities, fragmentation and data allocation are addressed independently, fragmentation results representing the input in data allocation phase. Both activities have the same inputs, the difference between them being the fact that fragmentation starts from global relationships, while the data allocation takes into account fragmentation results. Both take into account the data access requirements, but each of them ignore the way in which requirements are comprised by the other design decisions. The fragments and their allocation on sites aim to achieve, as much as possible, access to the local data references Fragmentation design. Definition 2.1. Fragmentation. The system partitions the relation into several fragments, and stores each fragment at a different site. If a distributed database B is partitioned this means that the parties P 1, P 2,..., P j, i I, I = 1, 2,..., n which compose her form disjoint subsets: B = {P 1 P 2... P j }, with {P 1... P j } =. The fragmentation is the partitioning of a global relation R into fragments R 1, R 2,..., R i, containing enough information to reconstruct the original relation R. There are three basic rules that should be looked at during the fragmentation, which ensure that the database does not have semantic changes during fragmentation, i.e. ensure consistency of the database: Completeness. If relation R is decomposed into fragments R 1, R 2,..., R n, each data item that can be found in R must appear in at least one fragment. Reconstruction. It must be possible to define a relational operation that will reconstruct R from the fragments. Reconstruction for horizontal fragmentation is Union operation and Join for vertical. Disjointness. If data item d i appears in fragment R i, then it should not appear in any other fragment. Exception: vertical fragmentation, where primary key attributes must be repeated to allow reconstruction. In the case of horizontal fragmentation, data item is a tuple; for vertical fragmentation, data item is an attribute Fragmentation strategies. Consider a relation with scheme R. The fragmentation of R consist of determining the number of fragments (subscheme) R i obtained by applying an algebraic relation on R (as operations on relations which show the logical properties of data). In this context, the fragmentation of data collection can be done in two ways: a) Horizontal. The horizontal fragmentation of a relation R is the subdivision of its tuples into subsets called fragments; the fragmentation is correct if each tuple of R is mapped into at least one tuple of the fragments (completeness condition). An additional disjointness condition, requiring that each tuple of R be mapped into
4 FRAGMENTATION AND DATA ALLOCATION exactly one tuple of one of the fragments, is often introduced in distributed database systems in order to control the existence of duplication explicitly at the fragment level (by having multiple copies of the same fragment). The resulted fragments R i have the same scheme structure as well as collection R, but differ by the data they contain and are resulted by applying a selection on R. Selection o p (R) - defines a relation that contains only those tuples of R that satisfy the specified condition (predicate p): o p ( a 1,..., a n (R)). A horizontal fragment can be obtained by applying a restriction: R i = ð condi (R). So we can rebuild the original relation by union as follows: R = R 1 R 2... R k. For example: R 1 = σ type= House (PropertyForSale) R 2 = σtype= Flat (PropertyForSale) There are two versions of horizontal partitioning: primary horizontal fragmentation of a relation is achieved through the use of predicates defined on that relation which restricts the tuples of the relation. derived horizontal fragmentation is realized by using predicates that are defined on other relations. b) Vertical. It divides the relation vertically by columns. The resulted fragments R i contain only part from the collection structure R. It keeps only certain attributes at certain site and they contain the primary key of the relation R to ensure that the restore is possible and are resulted from the application of a projection operation of relational algebra: a1,..., a n (R), where a 1,..., a n are attributes of the relation R. The fragmentation is correct if each attribute of the relation is mapped into at least one attribute of the fragments; moreover, it must be possible to reconstruct the original relation by joining the fragments together R = R 1 R2... Rn ; in other words, the fragments must be a lossless join decomposition of the relation. For example: R 1 = staffno, position, salary(staff) R 2 = staffno, firstname, lastname, branchno(staff) c) Sometimes, only vertical or horizontal fragmentation of a database scheme is insufficient to distribute adequately data for some applications. Instead it can be useful to implement mixed or hybrid fragmentation. A mixed fragment from a relation consists of a horizontal fragment that is vertically fragmented, or a vertical fragment that is horizontally fragmented. A mixed fragmentation is defined using selection and projection operations of relational algebra: σ p ( a 1,..., a n (R)) or a 1,..., a n (σ p (R)). The comparison of fragmentation strategies is showed in table 1. For each fragment of a relation R: Condition C = True (all tuples are selected). List (L = ATTRS(R)) = True (all attributes are included in the list). Vertical fragmentation Horizontal fragmentation Mixed fragmentation C True (all tuples ) False (not all tuples) False (not all tuples) L False (not all columns) True (all columns) False (not all columns) Table 1: Comparison of fragmentation strategies The advantages of the fragmentation: Usage - applications work with views rather than entire relations; Efficiency - data is stored close to the place where it is mostly frequently used; Parallelism - with fragments are the unit of distribution, a transaction can be divided into several subqueries that operate on fragments;
5 80 N.M. IACOB (CIOBANU) Security - data not required by local applications is not restored, and consequently not available to unauthorized users; Performance of global applications that require data from several fragments located at different sites many be slower Allocation design. A database is named distributed if any of its tables are stored at different sites; one or more of its tables are replicated and their copies are stored at different sites; one or more of its tables are fragmented and the fragments are stored at different sites; and so on. In general, a database is distributed if not all of its data is localized at a single site [6]. The problem of fragment allocation can be treated as a problem of placement optimization for each data fragment. Efficient distribution of fragments requires a balance between costs (storage, processing and transmission of data), performance (especially response time) and data distribution restrictions. Give: a set of m fragments F = {F 1, F 2,..., F m } exploited by a set of k applications Q = {Q 1, Q 2,..., Q k } on a set of p sites S = {S 1, S 2,..., S p }. Determine: { 1, if F is alocated Det(F, S) = 0, if otherwise The allocation problem involves finding the optimal distribution of F to S. To minimize: storage cost + communication cost + local processing cost Subject to: response time constraints availability constraints network topology security restrictions The allocation of fragments is closely related to the replication of data from DDB. In the data allocation phase the designer must decide whether DDB fragments will be replicated and also their degree of replication. Definition 2.2. Replication. The system maintains several identical replicas (copies) of the relation, and stores each replica at a different site. The alternative to replication is to store only one copy of relation r. A database is replicated if the entire database or a portion of it (a table, some tables, one or more fragments, etc.) is copied and the copies are stored at different sites. The problem with having more than one copy of a database is to maintain the mutual consistency of the copies-ensuring that all copies have identical schema and data content. Assuming replicas are mutually consistent, replication improves availability since a transaction can read any of the copies [4]. In addition, replication provides more reliability, minimizes the chance of total data loss, and greatly improves disaster recoveryin addition, replication provides more reliability, minimizes the chance of total data loss, and greatly improves disaster recovery. Fragmentation and replication can be combined: A relation can be partitioned into several fragments and there may be several replicas of each fragment [7] Methods of data allocation. In designing data allocation, the following rule must be respected: the data must be placed closer to the location in which will be used. In practice, the methods used for data allocation are:
6 FRAGMENTATION AND DATA ALLOCATION Method of determination of non redundant allocation (called the best choices method) implies evaluating each possible allocation and choosing a single node for each fragment. The method eliminates the possibility of placing a fragment at a particular node if a fragment is already assigned to that node. The redundant allocation method on all profitable nodes implies the selection of all nodes for which the benefit of allocating a copy of the fragment is greater than the cost of allocation. Redundant final allocation is recommended when data retrieval frequency is higher than the frequency of updating data. Data replication is more convenient for systems that admit temporary inconsistent data. Progressive replication method implies initial implementation of a non redundant solution and then progressively introduces the replicated copies from the most profitable node. The benefits are calculated by taking into consideration all query and update operations. 3. Cost analysis Assume the set of sites is S = {S 1, S 2,..., S p }. Let p be the total number of sites, N Rel be the total number of relations, m be the total number of fragments, N frag be the cardinality of the fragmented relation, n be the number of fragments of the fragmented relation, N rep be the cardinality of each other ( N Rel 1 ) replicated relations, N join be the cardinality of the joined relations, k be the number of attributes in both fragmented and replicated relations, K join be the number of attributes after joining the relations from any site, K p be the number of attributes to be projected, CT comp be cost per tuple comparison, CT conc be the cost per tuple concatenation, T cost attr be transmission cost per attribute, T C R be the replication transmission costs, CP p attr be the cost per projected attribute Cost analysis of fragmentation and replication (CAFR). The CAFR algorithm requires one of the relations referenced by a query to be fragmented and other relations to be replicated at the sites that have a fragment of the fragmented relation. The query is decomposed into the same number of subqueries as the number of sites and each subquery is processed at one of these sites. The cost for one of the relation to be fragmented across m sites is: T C O F = N frag k T cost attr n. (1) The cost for all other ( N Rel 1) relations to be replicated across n sites is: T C R = (N rep k T cost attr ) (N Rel 1). (2) The local processing costs of these sites are (union cost for fragmented relation and natural join cost): LP C = r n Cost[ (F ) i ] j +[(N frag 1)+N rep (N Rel 1) CT comp +N join CT conc ] p. j=1 i=1 It follows that, the total cost for CAFR strategy is: T C CAF R = T C O F + T C R + LP C. (4) (3)
7 82 N.M. IACOB (CIOBANU) The CAFR permits parallel processing of a query and are not applicable for processing distributed queries, in which all the non fragmented relations are referenced by a query Cost analysis of partition and replication (CAPR). The CAPR algorithm works as follows. For a given query, the minimum response time is estimated if all referenced data is transferred to and processed at only one of the sites. Next, for each referenced relation and each copy of the relation, the response time is estimated if the copy of the relation is partitioned and distributed to a subset of S and all the other relations are replicated at the sites where they are needed. A choice of processing sites and sizes of fragments for the selected copy of the chosen relation are determined by CAPR so as to minimize the response time. Finally, the strategy which gives the minimum response time among all the copies of all the referenced relations is chosen. The union cost: UC = m [k join N join CT conc ] j. (5) i=1 If all the relations are transferred to any of the sites, to project and union the relations then the cost is: T C P ROJ UNI = [N rep K P CP p attr ] I + UC. (6) NT R I= Comparison of cost analysis. CAFR or CAPR requires substantial data transfer and preparation before a query can be processed in parallel at different sites. Performing local reductions before data transfer can reduce the data replication cost and the join processing cost. However, local reduction takes time and it delays data transfer. It is not always true that local reduction will reduce the response time for the processing of a given query. But comparing equations (6) and (4) it is concluded that T C P ROJ UNI > T C CAF R ; as it is always true that sending all relations directly to the assembly site, where all joins are performed, is unfavorable due to its high transmission overhead and little exploitation of parallelism. This case is the worst case depending on the given input query (wherein there are no conditions or predicates in the where clause of the given query). The fragment and replicate strategy (CAFR) permits parallel processing of a query. CAFR are not applicable for processing distributed queries; in which all the non fragmented relations are referenced by a query. In case of CAFR strategy, if no relation referenced by a query is fragmented, it is necessary to decide which relation will be partitioned into fragments; which copy of the relation should be used; how the relation will be partitioned; and where the fragments will be sent for processing. To resolve this problem, CAPR (Partition and Replicate strategy) is presented. But CAPR algorithm favors joins involving small numbers of relations, and improvement decreases as the number of relations involved in joins increases. Improvement of CAPR over single site processing depends heavily on how fast a relation can be partitioned. If the implementation of relation partitioning is inefficient, CAPR actually gives worse response time than single site processing.
8 FRAGMENTATION AND DATA ALLOCATION Conclusion The objective of a data allocation algorithm is to determine an assignment of fragments at different sites in order to minimize the total data transfer cost involved in executing a set of queries. This is equivalent to minimization of the average query execution time, which has a primary importance in a wide area of distributed applications. The fragmentation in a distributed database management system increases the level of concurrency and therefore system throughput for query processing. Distributed databases have appeared as a necessity, because they improve availability and reliability of data and assure high performance in data processing by allowing parallel processing of queries, but also reduce processing costs. References [1] P. Beynon-Davies, Database systems (3rd ed.), New York: Palgrave-Macmillan, [2] C.J. Date, An introduction to Database Systems (8th ed.), Addison Wesley, [3] R. Elmasri and S. Navathe, Fundamentals of database systems (4th ed.), Boston: Addison- Wesley, [4] N.M. Iacob (Ciobanu), The use of distributed databases in e-learning systems, Procedia Social and Behavioral Sciences Journal 15 (2011), 3rd World Conference on Educational Sciences - WCES 2011, Bahcesehir University, Istanbul - Turkey, February 2011, [5] M.T. Ozsu and P. Valduriez, Principles of Distributed Database Systems (3th ed.), New York: Springer, [6] S. Rahimi and F.S. Haug, Distributed database management systems: A Practical Approach, IEEE, Computer Society, Hoboken, N. J: Wiley, [7] A. Silberschatz, H.F. Korth and S. Sudarshan, Database System Concepts (6th ed.), McGraw- Hill, (Nicoleta - Magdalena Iacob (Ciobanu)) University of Pitesti, Faculty of Mathematics and Computer Science, Computer Science Department, Targu din Vale Street, No.1, Pitesti, Romania address: nicoleta.iacob [email protected]
Chapter 3: Distributed Database Design
Chapter 3: Distributed Database Design Design problem Design strategies(top-down, bottom-up) Fragmentation Allocation and replication of fragments, optimality, heuristics Acknowledgements: I am indebted
chapater 7 : Distributed Database Management Systems
chapater 7 : Distributed Database Management Systems Distributed Database Management System When an organization is geographically dispersed, it may choose to store its databases on a central database
DISTRIBUTED AND PARALLELL DATABASE
DISTRIBUTED AND PARALLELL DATABASE SYSTEMS Tore Risch Uppsala Database Laboratory Department of Information Technology Uppsala University Sweden http://user.it.uu.se/~torer PAGE 1 What is a Distributed
Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation
Objectives Distributed Databases and Client/Server Architecture IT354 @ Peter Lo 2005 1 Understand the advantages and disadvantages of distributed databases Know the design issues involved in distributed
Distributed Data Management
Introduction Distributed Data Management Involves the distribution of data and work among more than one machine in the network. Distributed computing is more broad than canonical client/server, in that
Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts
Distributed Databases Basic Concepts Distributed Databases Concepts. Advantages and disadvantages of distributed databases. Functions and architecture for a DDBMS. Distributed database design. Levels of
Physical Database Design and Tuning
Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence
1. Physical Database Design in Relational Databases (1)
Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence
Principles of Distributed Database Systems
M. Tamer Özsu Patrick Valduriez Principles of Distributed Database Systems Third Edition
Distributed Databases in a Nutshell
Distributed Databases in a Nutshell Marc Pouly [email protected] Department of Informatics University of Fribourg, Switzerland Priciples of Distributed Database Systems M. T. Özsu, P. Valduriez Prentice
TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES
Constantin Brâncuşi University of Târgu Jiu ENGINEERING FACULTY SCIENTIFIC CONFERENCE 13 th edition with international participation November 07-08, 2008 Târgu Jiu TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED
Distributed Databases
Distributed Databases Chapter 1: Introduction Johann Gamper Syllabus Data Independence and Distributed Data Processing Definition of Distributed databases Promises of Distributed Databases Technical Problems
Implementing New Approach for Enhancing Performance and Throughput in a Distributed Database
290 The International Arab Journal of Information Technology, Vol. 10, No. 3, May 2013 Implementing New Approach for Enhancing Performance and in a Distributed Database Khaled Maabreh 1 and Alaa Al-Hamami
PartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE [email protected] 2
City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015
City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Database Systems Course Code: CS3402 Course
Distributed Database Design (Chapter 5)
Distributed Database Design (Chapter 5) Top-Down Approach: The database system is being designed from scratch. Issues: fragmentation & allocation Bottom-up Approach: Integration of existing databases (Chapter
BBM467 Data Intensive ApplicaAons
Hace7epe Üniversitesi Bilgisayar Mühendisliği Bölümü BBM467 Data Intensive ApplicaAons Dr. Fuat Akal [email protected] FoundaAons of Data[base] Clusters Database Clusters Hardware Architectures Data
Chapter 5: Overview of Query Processing
Chapter 5: Overview of Query Processing Query Processing Overview Query Optimization Distributed Query Processing Steps Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of
Applying Attribute Level Locking to Decrease the Deadlock on Distributed Database
Applying Attribute Level Locking to Decrease the Deadlock on Distributed Database Dr. Khaled S. Maabreh* and Prof. Dr. Alaa Al-Hamami** * Faculty of Science and Information Technology, Zarqa University,
IT2304: Database Systems 1 (DBS 1)
: Database Systems 1 (DBS 1) (Compulsory) 1. OUTLINE OF SYLLABUS Topic Minimum number of hours Introduction to DBMS 07 Relational Data Model 03 Data manipulation using Relational Algebra 06 Data manipulation
An Overview of Distributed Databases
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 2 (2014), pp. 207-214 International Research Publications House http://www. irphouse.com /ijict.htm An Overview
TOP-DOWN APPROACH PROCESS BUILT ON CONCEPTUAL DESIGN TO PHYSICAL DESIGN USING LIS, GCS SCHEMA
TOP-DOWN APPROACH PROCESS BUILT ON CONCEPTUAL DESIGN TO PHYSICAL DESIGN USING LIS, GCS SCHEMA Ajay B. Gadicha 1, A. S. Alvi 2, Vijay B. Gadicha 3, S. M. Zaki 4 1&4 Deptt. of Information Technology, P.
IT2305 Database Systems I (Compulsory)
Database Systems I (Compulsory) INTRODUCTION This is one of the 4 modules designed for Semester 2 of Bachelor of Information Technology Degree program. CREDITS: 04 LEARNING OUTCOMES On completion of this
Normalization in Database Design
in Database Design Marek Rychly [email protected] Strathmore University, @ilabafrica & Brno University of Technology, Faculty of Information Technology Advanced Databases and Enterprise Systems 14
Steps to System Reconfiguration in a Distributed Database System: A Fault-tolerant Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VIII (Nov Dec. 2014), PP 32-38 Steps to System Reconfiguration in a Distributed Database System:
MS SQL Performance (Tuning) Best Practices:
MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware
virtual class local mappings semantically equivalent local classes ... Schema Integration
Data Integration Techniques based on Data Quality Aspects Michael Gertz Department of Computer Science University of California, Davis One Shields Avenue Davis, CA 95616, USA [email protected] Ingo
Optimizing Performance. Training Division New Delhi
Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,
Relational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute
Advantages of DBMS. Copyright @ www.bcanotes.com
Advantages of DBMS One of the main advantages of using a database system is that the organization can exert, via the DBA, centralized management and control over the data. The database administrator is
Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design
Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS 464 Spring 2003 Topic 23 Database
Secure and Efficient Data Storage in Multi-clouds
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 9 (2013), pp. 977-984 International Research Publications House http://www. irphouse.com /ijict.htm Secure
RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG
1 RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG Background 2 Hive is a data warehouse system for Hadoop that facilitates
Distributed Database Management Systems
Page 1 Distributed Database Management Systems Outline Introduction Distributed DBMS Architecture Distributed Database Design Distributed Query Processing Distributed Concurrency Control Distributed Reliability
Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches
Concepts of Database Management Seventh Edition Chapter 9 Database Management Approaches Objectives Describe distributed database management systems (DDBMSs) Discuss client/server systems Examine the ways
A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems
A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-
CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY
CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY 2.1 Introduction In this chapter, I am going to introduce Database Management Systems (DBMS) and the Structured Query Language (SQL), its syntax and usage.
Distributed Database Design
Distributed Databases Distributed Database Design Distributed Database System MS MS Web Web data mm xml mm dvanced Database Systems, mod1-1, 2004 1 Advanced Database Systems, mod1-1, 2004 2 Advantages
CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113
CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113 Instructor: Boris Glavic, Stuart Building 226 C, Phone: 312 567 5205, Email: [email protected] Office Hours:
Transaction Management in Distributed Database Systems: the Case of Oracle s Two-Phase Commit
Transaction Management in Distributed Database Systems: the Case of Oracle s Two-Phase Commit Ghazi Alkhatib Senior Lecturer of MIS Qatar College of Technology Doha, Qatar [email protected] and Ronny
Distributed Database Management Systems
Distributed Database Management Systems (Distributed, Multi-database, Parallel, Networked and Replicated DBMSs) Terms of reference: Distributed Database: A logically interrelated collection of shared data
Distributed Databases
Chapter 12 Distributed Databases Learning Objectives After studying this chapter, you should be able to: Concisely define the following key terms: distributed database, decentralized database, location
In Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
Database Systems Introduction Dr P Sreenivasa Kumar
Database Systems Introduction Dr P Sreenivasa Kumar Professor CS&E Department I I T Madras 1 Introduction What is a Database? A collection of related pieces of data: Representing/capturing the information
Software Requirements Metrics
Software Requirements Metrics Fairly primitive and predictive power limited. Function Points Count number of inputs and output, user interactions, external interfaces, files used. Assess each for complexity
Load Balancing in Distributed Data Base and Distributed Computing System
Load Balancing in Distributed Data Base and Distributed Computing System Lovely Arya Research Scholar Dravidian University KUPPAM, ANDHRA PRADESH Abstract With a distributed system, data can be located
B2.2-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS
B2.2-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS NOTE: 1. There are TWO PARTS in this Module/Paper. PART ONE contains FOUR questions and PART TWO contains FIVE questions. 2. PART ONE is to be answered
Database System Concepts
s Design Chapter 1: Introduction Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
3. Relational Model and Relational Algebra
ECS-165A WQ 11 36 3. Relational Model and Relational Algebra Contents Fundamental Concepts of the Relational Model Integrity Constraints Translation ER schema Relational Database Schema Relational Algebra
Survey on Comparative Analysis of Database Replication Techniques
72 Survey on Comparative Analysis of Database Replication Techniques Suchit Sapate, Student, Computer Science and Engineering, St. Vincent Pallotti College, Nagpur, India Minakshi Ramteke, Student, Computer
Fundamentals of Database System
Fundamentals of Database System Chapter 4 Normalization Fundamentals of Database Systems (Chapter 4) Page 1 Introduction To Normalization In general, the goal of a relational database design is to generate
SQL Query Evaluation. Winter 2006-2007 Lecture 23
SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution
6NF CONCEPTUAL MODELS AND DATA WAREHOUSING 2.0
ABSTRACT 6NF CONCEPTUAL MODELS AND DATA WAREHOUSING 2.0 Curtis Knowles Georgia Southern University [email protected] Sixth Normal Form (6NF) is a term used in relational database theory by Christopher
Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc
Big Data, Fast Data, Complex Data Jans Aasman Franz Inc Private, founded 1984 AI, Semantic Technology, professional services Now in Oakland Franz Inc Who We Are (1 (2 3) (4 5) (6 7) (8 9) (10 11) (12
Semantic Errors in SQL Queries: A Quite Complete List
Semantic Errors in SQL Queries: A Quite Complete List Christian Goldberg, Stefan Brass Martin-Luther-Universität Halle-Wittenberg {goldberg,brass}@informatik.uni-halle.de Abstract We investigate classes
Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases
Distributed Architectures Distributed Databases Simplest: client-server Distributed databases: two or more database servers connected to a network that can perform transactions independently and together
Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification
Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries
Principles for Distributed Databases in Telecom Environment Imran Ashraf Amir Shahzed Khokhar
Master Thesis Computer Science Thesis no: MCS-2010-23 June 2010 Principles for Distributed Databases in Telecom Environment Imran Ashraf Amir Shahzed Khokhar School School of Computing of Blekinge Blekinge
AN OVERVIEW OF DISTRIBUTED DATABASE MANAGEMENT
AN OVERVIEW OF DISTRIBUTED DATABASE MANAGEMENT BY AYSE YASEMIN SEYDIM CSE 8343 - DISTRIBUTED OPERATING SYSTEMS FALL 1998 TERM PROJECT TABLE OF CONTENTS INTRODUCTION...2 1. WHAT IS A DISTRIBUTED DATABASE
DATABASE NORMALIZATION
DATABASE NORMALIZATION Normalization: process of efficiently organizing data in the DB. RELATIONS (attributes grouped together) Accurate representation of data, relationships and constraints. Goal: - Eliminate
Extending Data Processing Capabilities of Relational Database Management Systems.
Extending Data Processing Capabilities of Relational Database Management Systems. Igor Wojnicki University of Missouri St. Louis Department of Mathematics and Computer Science 8001 Natural Bridge Road
Relational Database Systems 2 1. System Architecture
Relational Database Systems 2 1. System Architecture Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 1 Organizational Issues
Chapter 1: Introduction
Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition Language Data Manipulation Language Transaction Management Storage Management Database Administrator Database
Oracle8i Spatial: Experiences with Extensible Databases
Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction
Chapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
Guide to Performance and Tuning: Query Performance and Sampled Selectivity
Guide to Performance and Tuning: Query Performance and Sampled Selectivity A feature of Oracle Rdb By Claude Proteau Oracle Rdb Relational Technology Group Oracle Corporation 1 Oracle Rdb Journal Sampled
Integrating Pattern Mining in Relational Databases
Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a
When an organization is geographically dispersed, it. Distributed Databases. Chapter 13-1 LEARNING OBJECTIVES INTRODUCTION
Chapter 13 Distributed Databases LEARNING OBJECTIVES After studying this chapter, you should be able to: Concisely define each of the following key terms: distributed database, decentralized database,
1 File Processing Systems
COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.
Introduction to Databases
Page 1 of 5 Introduction to Databases An introductory example What is a database? Why do we need Database Management Systems? The three levels of data abstraction What is a Database Management System?
Secure Data Transfer and Replication Mechanisms in Grid Environments p. 1
Secure Data Transfer and Replication Mechanisms in Grid Environments Konrad Karczewski, Lukasz Kuczynski and Roman Wyrzykowski Institute of Computer and Information Sciences, Czestochowa University of
USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES
USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES 1 ALIREZA POORDAVOODI, 2 MOHAMMADREZA KHAYYAMBASHI, 3 JAFAR HAMIN 1, 3 M.Sc. Student, Computer Department, University of Sheikhbahaee,
A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System
A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System Mohammad Ghulam Ali Academic Post Graduate Studies and Research Indian Institute of Technology, Kharagpur Kharagpur,
Physical DB design and tuning: outline
Physical DB design and tuning: outline Designing the Physical Database Schema Tables, indexes, logical schema Database Tuning Index Tuning Query Tuning Transaction Tuning Logical Schema Tuning DBMS Tuning
PARALLELS CLOUD STORAGE
PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...
Load Balancing in the Cloud Computing Using Virtual Machine Migration: A Review
Load Balancing in the Cloud Computing Using Virtual Machine Migration: A Review 1 Rukman Palta, 2 Rubal Jeet 1,2 Indo Global College Of Engineering, Abhipur, Punjab Technical University, jalandhar,india
RELATIONAL DATABASE DESIGN
RELATIONAL DATABASE DESIGN g Good database design - Avoiding anomalies g Functional Dependencies g Normalization & Decomposition Using Functional Dependencies g 1NF - Atomic Domains and First Normal Form
A Virtual Machine Searching Method in Networks using a Vector Space Model and Routing Table Tree Architecture
A Virtual Machine Searching Method in Networks using a Vector Space Model and Routing Table Tree Architecture Hyeon seok O, Namgi Kim1, Byoung-Dai Lee dept. of Computer Science. Kyonggi University, Suwon,
www.gr8ambitionz.com
Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1
Relational Algebra and SQL
Relational Algebra and SQL Johannes Gehrke [email protected] http://www.cs.cornell.edu/johannes Slides from Database Management Systems, 3 rd Edition, Ramakrishnan and Gehrke. Database Management
Schema Design and Normal Forms Sid Name Level Rating Wage Hours
Entity-Relationship Diagram Schema Design and Sid Name Level Rating Wage Hours Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Database Management Systems, 2 nd Edition. R. Ramakrishnan
DATABASE MANAGEMENT SYSTEMS. Question Bank:
DATABASE MANAGEMENT SYSTEMS Question Bank: UNIT 1 1. Define Database? 2. What is a DBMS? 3. What is the need for database systems? 4. Define tupule? 5. What are the responsibilities of DBA? 6. Define schema?
Database Management. Chapter Objectives
3 Database Management Chapter Objectives When actually using a database, administrative processes maintaining data integrity and security, recovery from failures, etc. are required. A database management
International Journal of Innovative Research in Computer and Communication Engineering. (An ISO 3297: 2007 Certified Organization)
Improved Performance of Web Based Database Management for Telemedicine by Using Three Fold Approach of Data Fragmentation,Websites Data Clustering and Data Allocation Sneha Katale 1, Sonali Hotkar 2, Abhijeet
MS-40074: Microsoft SQL Server 2014 for Oracle DBAs
MS-40074: Microsoft SQL Server 2014 for Oracle DBAs Description This four-day instructor-led course provides students with the knowledge and skills to capitalize on their skills and experience as an Oracle
Designing Database Solutions for Microsoft SQL Server 2012 MOC 20465
Designing Database Solutions for Microsoft SQL Server 2012 MOC 20465 Course Outline Module 1: Designing a Database Server Infrastructure This module explains how to design an appropriate database server
Distributed Databases
C H A P T E R 12 Distributed Databases Learning Objectives After studying this chapter, you should be able to: Concisely define the following key terms: distributed database, decentralized database, location
æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.
CMPT-354-Han-95.3 Lecture Notes September 10, 1995 Chapter 1 Introduction 1.0 Database Management Systems 1. A database management system èdbmsè, or simply a database system èdbsè, consists of æ A collection
Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.
Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap. 1 Oracle9i Documentation First-Semester 1427-1428 Definitions
1. INTRODUCTION TO RDBMS
Oracle For Beginners Page: 1 1. INTRODUCTION TO RDBMS What is DBMS? Data Models Relational database management system (RDBMS) Relational Algebra Structured query language (SQL) What Is DBMS? Data is one
Data Quality in Information Integration and Business Intelligence
Data Quality in Information Integration and Business Intelligence Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies
High Availability Storage
High Availability Storage High Availability Extensions Goldwyn Rodrigues High Availability Storage Engineer SUSE High Availability Extensions Highly available services for mission critical systems Integrated
