A Framework for the Design of Distributed Databases

Size: px
Start display at page:

Download "A Framework for the Design of Distributed Databases"

Transcription

1 A Framework for the Design of Distributed Databases Fernanda Baiao Marta Mattoso Computer Science Department, COPPE/UFRJ Federal University of Rio de Janeiro, Brazil Gerson Zaverucha This work presents a framework to handle the class fragmentation problem during the design of distributed object databases. The framework works in the conceptual level, and thus uses the object data model to capture the application semantics represented by the user. The proposed framework integrates three modules. The heuristic module defines a set of heuristics to drive the fragmentation of object databases and incorporates them in a methodology that includes an analysis algorithm, horizontal and vertical class fragmentation algorithms. The theory revision module automatically improves the analysis algorithm through the use of an artificial intelligence technique named theory revision, using fragmentation schemas with previously known performance presented as examples. Finally, the branchand-bound module uses optimization techniques to perform an intelligent search for an optimal fragmentation schema through a larger space of hypotheses when compared to the space of hypotheses covered by the heuristic approach. INTRODUCTION Distributed and parallel processing on database management systems (DBMS) is an efficient way of improving performance of applications that manipulate large volumes of data. This may be accomplished by removing irrelevant data accessed during the execution of queries and by reducing the data exchange among sites, which are the two main goals of the design of distributed databases [1]. Also, many recent problem domains are supported by applications that are typically more complex than traditional applications, in addition to their great volume of data. Those applications require, at least in the conceptual level, a semantically richer data model which is capable of directly representing complex structures and operations in a more natural and adequate manner, such as the object data model. Therefore, in order to improve performance of those applications, it is very important to design information distribution properly, and take the application semantics into account as much as possible. The distribution design involves making decisions on the fragmentation and placement of data across the sites of a computer network. The first phase of the distribution design in a top-down approach is the fragmentation phase, which is the process of clustering in fragments the information accessed simultaneously by applications. The fragmentation phase is then followed by the allocation phase, which handles the physical storage of the generated fragments among the nodes of a computer network, and the replication of fragments. This work addresses the fragmentation phase of databases. We believe that, by outputting good fragmentation schemas with improved performance, data allocation and replication may then be carried out more efficiently, since the

2 fragmentation schema will adequately reflect appropriate units of distribution according to the application access patterns, and thus may significantly reduce the search space of the allocation phase. However, the generation of a good fragmentation schema of a database using the object data model is a difficult task, because of four basic reasons: (i) it is not a well-defined problem; (ii) it must take many parameters into account; (iii) it has a lot of conflicting goals, and (iv) it requires some estimates and heuristics that may be sometimes conflicting. However, the designer may concentrate on semantic relationships leaving physical distribution design to the last phase. To fragment a class, it is possible to use two basic techniques: horizontal fragmentation and vertical fragmentation. In object databases, horizontal fragmentation distributes class instances across the fragments. Thus, a horizontal fragment of a class contains a subset of the whole class extension. On the other hand, vertical fragmentation (VF) breaks the class logical structure (its attributes and methods) and distributes them across the fragments. The horizontal fragmentation is usually subdivided in primary and derived horizontal fragmentation. Primary horizontal fragmentation (PHF) basically optimizes set operations (search over a class extension), firstly by reducing the amount of irrelevant data accessed and, secondly, by permitting applications to be executed concurrently, thus achieving a high degree of parallelism. On the other hand, derived horizontal fragmentation (DHF) can be viewed as an approach of clustering objects of distinct classes in the disk, therefore clearly addressing the relationships between classes and improving performance of applications with navigational access. It is also possible to apply both vertical and horizontal fragmentation techniques in a class simultaneously (which we call hybrid fragmentation) or to apply different fragmentation techniques in different classes in the database schema (which we call mixed fragmentation). There are many approaches in the literature addressing the DDODB problem [2, 3, 4, 5, 6]. However, due to complexity, most of them rely on a specific set of estimates and heuristics. Also, some approaches require an instantiated database to work on, which may limit their application. Most important, the distribution design algorithms presented are limited to the application of just one of the fragmentation techniques (horizontal or vertical, but not both) in all classes of the schema, therefore proposing either a horizontal-only or a vertical-only class fragmentation approach for all classes of the schema. We have already pointed out in [7] the benefits of mixed fragmentation (that is, the combination of vertical and horizontal fragmentation in different classes of the schema) and hybrid fragmentation (in the same class) to increase the performance of applications. It is also important to analyze the database schema and the application characteristics in order to propose good fragmentation schemas. However, such issues are not addressed in other works in the literature.

3 FRAMEWORK FOR THE DESIGN OF DISTRIBUTED DATABASES This work presents a framework to handle the class fragmentation problem during the design of distributed databases, using the object model in the conceptual level. This way, the ideas presented may be applied in different environments (such as in domains where data is managed by object-relational or object-oriented database management systems), as long as the application conceptual model is compatible with the object-oriented model defined in this work. The proposed framework (illustrated in Figure 1) integrates three modules: the DDODB heuristic module, the theory revision module (TREND3) and the DDODB branch-and-bound module. Database Application (Semantics + Operations + quantitative info) Good fragmentation schema DDODB Heuristic Module (AA VF HF) Improved Analysis Algorithm (Revised Theory) Distribution Designer Known fragmentation schemas Analysis Algorithm (Initial Theory) (Examples) Optimal fragmentation schema (Examples) TREND 3 Module FORTE FORTE Module Optimal fragmentation schema DDODB Branch and Bound Module Query Processing Cost Function Figure 1. Overall framework for the class fragmentation in the DDODB The distribution designer provides input information about the database semantics, the operations that will be executed over the stored data and additional quantitative information such as the estimate cardinality of each class. This information is then passed to the DDODB Heuristic Module. The DDODB heuristic module defines a set of heuristics to search for the best fragmentation schema for a given database application. The execution of the algorithms from the heuristic module (AA-Analysis Algorithm, VF-Vertical Fragmentation and HF-Horizontal Fragmentation) will follow this set of heuristics and quickly output a good fragmentation schema to the distribution designer to be implemented on the database. Intermediary results of the heuristic module are presented in [7, 8, 9]. Performance results from these works have proven the effectiveness of the DDODB heuristic module during an experimental study on top of Benchmark 007. The set of heuristics implemented by the DDODB heuristic module may be further automatically improved by executing a theory revision process through the use of inductive logic programming (ILP) [10]. This process is called Theory REvisioN on the Design of Distributed Databases (TREND3), and is represented in our framework by the TREND3 module[11]. The improvement

4 process may be carried out by providing two input parameters to the TREND3 module: the analysis algorithm PROLOG implementation (representing the initial theory) and a fragmentation schema with previously known performance (representing a set of examples). The analysis algorithm is then automatically modified by a theory revision system (called FORTE) so as to produce a revised theory. The revised theory will represent an improved analysis algorithm that will be able to output the fragmentation schema given as input parameter, and this revised analysis algorithm will then substitute the original one in the DDODB Heuristic Module. Additionally, the input information from the distribution designer may be passed to our third module, the DDODB Branch-and-Bound Module[12]. This module represents an alternative approach to the heuristic module in searching for the best fragmentation schema for a given database application. The branch-and-bound procedure searches for an optimal solution in the space of potentially good fragmentation schemas for an application and outputs its result to the distribution designer. Although the search space covered by the branch-and-bound algorithm is much larger than the one covered by the heuristic algorithm, its execution cost is also much higher. To handle this, the branch-andbound algorithm tries to bound its search for the best fragmentation schema by using a query processing cost function during the evaluation of each fragmentation schema in the hypotheses space. This cost function, defined in [13], is responsible for estimating the execution cost of queries on top of a distributed database beign evaluated. The branch-and-bound algorithm then discards all the fragmentation schemas with an estimate cost higher than the cost of the fragmentation schema output from the heuristic module. Finally, the final result from the branch-and-bound algorithm, as well as the fragmentation schemas discarded during the searh, may generate examples (positive or negative) to the TREND3 module, thus incorporating the branch-and-bound results into the DDODB heuristic module. The complete framework is detailed in [14]. CONCLUSIONS This work presents a framework to handle the class fragmentation problem during the design of distributed object databases. The framework works in the conceptual level, and thus uses the object data model to capture the application semantics represented by the user. The proposed framework integrates three modules (heuristic, knowledge-based and branch-and-bound). The heuristic module defines a set of heuristics to drive the fragmentation of object databases and incorporates them in a methodology that includes an analysis algorithm, horizontal and vertical class fragmentation algorithms, addressing the need mentioned by Özsu and Valduriez [1] of a distribution design methodology which encompasses the horizontal and vertical fragmentation algorithms and uses them as part of a more general strategy. Experiments using our methodology resulted in fragmentation schemas with better performance results when compared to other fragmentation schemas proposed

5 in the literature. The main contribution of the heuristic module is the analysis phase, which chooses the most adequate fragmentation technique to be applied in each class of the database schema, based on heuristics derived from experimental results previously obtained. With current algorithms proposed in the literature, the distribution designer is induced to apply one single type of fragmentation to all classes. Even when the designer decides to use a horizontal fragmentation algorithm to one class and another vertical fragmentation algorithm to another class, he is left with no assistance to make this decision. REFERENCES [1] M. Özsu and P. Valduriez, Principles of Distributed Database Systems, 2 nd edition (1 st edition 1991), New Jersey, Prentice-Hall, [2] L. Bellatreche, K. Karlapalem and A. Simonet, "Algorithms and Support for Horizontal Class Partitioning in Object- Oriented Databases", International Journal of Distributed and Parallel Databases, Kluwer Academic Publishers, vol. 8(2), 2000, pp [3] Y. Chen and S. Su, "Implementation and Evaluation of Parallel Query Processing Algorithms and Data Partitioning Heuristics in Object Oriented Databases, International Journal of Distributed and Parallel Databases, Kluwer Academic Publishers, vol. 4(2), 1996, pp [4] C. Ezeife and K. Barker, "Distributed Object Based Design: Vertical Fragmentation of Classes", International Journal of Distributed and Parallel Databases, Kluwer Academic Publishers, vol. 6(4), 1998, pp [5] K. Karlapalem, S. Navathe and M. Morsi, Issues in Distribution Design of Object-Oriented Databases. In M. Özsu et al. (eds.), Distributed Object Management, Morgan Kaufmann Publishers Inc., San Francisco, USA, [6] M. Savonnet, M. Terrasse and K. Yétongnon, Fragtique: A Methodology for Distributing Object Oriented Databases. In: Proceedings of the International Conference on Computing and Information (ICCI'98), Winnipeg, Canada, 1998, pp [7] F. Baião and M. Mattoso, A Mixed Fragmentation Algorithm for Distributed Object Oriented Databases. In Special Issue of the Journal of Computing and Information (JCI), vol. 3(1), ICCI 98, March 2000, ISSN , pp [8] F. Baião, M. Mattoso and G. Zaverucha, Towards an Inductive Design of Distributed Object Oriented Databases. In Proceedings of the Third IFCIS Conference on Cooperative Information Systems (CoopIS'98), IEEE CS Press, New York, USA, Ago 1998, pp [9] F. Baião, M. Mattoso and G. Zaverucha, "Horizontal Fragmentation in Object DBMS: New Issues and Performance Evaluation". In Proceedings of the "19 th IEEE International Performance, Computing and Communications Conference" (IPCCC 2000), IEEE CS Press, Phoenix, Feb 2000, pp [10] N. Lavrac and S. Dzreroski, Inductive Logic Programming: Techniques and Applications, Ellis Horwood, [11] F. Baião, M. Mattoso, J. Shavlik and G. Zaverucha, "Applying Theory Revision in the Design of Distributed Databases". In preparation, Feb [12] F. Baião, M. Mattoso, J. Shavlik and G. Zaverucha, "A Branch-and-Bound Approach for the Design of Distributed Databases ". In preparation, Feb [13] G. Ruberg, F. Baião, M. Mattoso, "A Cost Model for the Evaluation of Path Expressions in Distributed Object Databases", submitted for publication, Nov [14] F. Baião A Methodology and Algorithms for the Design of Distributed Databases using Theory Revision D.Sc. Thesis, COPPE/UFRJ, Dec (

Horizontal Fragmentation Technique in Distributed Database

Horizontal Fragmentation Technique in Distributed Database International Journal of Scientific and esearch Publications, Volume, Issue 5, May 0 Horizontal Fragmentation Technique in istributed atabase Ms P Bhuyar ME I st Year (CSE) Sipna College of Engineering

More information

Fragmentation and Data Allocation in the Distributed Environments

Fragmentation and Data Allocation in the Distributed Environments Annals of the University of Craiova, Mathematics and Computer Science Series Volume 38(3), 2011, Pages 76 83 ISSN: 1223-6934, Online 2246-9958 Fragmentation and Data Allocation in the Distributed Environments

More information

AN OVERVIEW OF DISTRIBUTED DATABASE MANAGEMENT

AN OVERVIEW OF DISTRIBUTED DATABASE MANAGEMENT AN OVERVIEW OF DISTRIBUTED DATABASE MANAGEMENT BY AYSE YASEMIN SEYDIM CSE 8343 - DISTRIBUTED OPERATING SYSTEMS FALL 1998 TERM PROJECT TABLE OF CONTENTS INTRODUCTION...2 1. WHAT IS A DISTRIBUTED DATABASE

More information

Distributed Databases. Fábio Porto LBD winter 2004/2005

Distributed Databases. Fábio Porto LBD winter 2004/2005 Distributed Databases LBD winter 2004/2005 1 Agenda Introduction Architecture Distributed database design Query processing on distributed database Data Integration 2 Outline Introduction to DDBMS Architecture

More information

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-

More information

An Ants Algorithm to Improve Energy Efficient Based on Secure Autonomous Routing in WSN

An Ants Algorithm to Improve Energy Efficient Based on Secure Autonomous Routing in WSN An Ants Algorithm to Improve Energy Efficient Based on Secure Autonomous Routing in WSN *M.A.Preethy, PG SCHOLAR DEPT OF CSE #M.Meena,M.E AP/CSE King College Of Technology, Namakkal Abstract Due to the

More information

Applying Attribute Level Locking to Decrease the Deadlock on Distributed Database

Applying Attribute Level Locking to Decrease the Deadlock on Distributed Database Applying Attribute Level Locking to Decrease the Deadlock on Distributed Database Dr. Khaled S. Maabreh* and Prof. Dr. Alaa Al-Hamami** * Faculty of Science and Information Technology, Zarqa University,

More information

Distributed Database Design (Chapter 5)

Distributed Database Design (Chapter 5) Distributed Database Design (Chapter 5) Top-Down Approach: The database system is being designed from scratch. Issues: fragmentation & allocation Bottom-up Approach: Integration of existing databases (Chapter

More information

Principles of Distributed Database Systems

Principles of Distributed Database Systems M. Tamer Özsu Patrick Valduriez Principles of Distributed Database Systems Third Edition

More information

Chapter 3: Distributed Database Design

Chapter 3: Distributed Database Design Chapter 3: Distributed Database Design Design problem Design strategies(top-down, bottom-up) Fragmentation Allocation and replication of fragments, optimality, heuristics Acknowledgements: I am indebted

More information

A Practical Approach of Storage Strategy for Grid Computing Environment

A Practical Approach of Storage Strategy for Grid Computing Environment A Practical Approach of Storage Strategy for Grid Computing Environment Kalim Qureshi Abstract -- An efficient and reliable fault tolerance protocol plays an important role in making the system more stable.

More information

ParGRES: a middleware for executing OLAP queries in parallel

ParGRES: a middleware for executing OLAP queries in parallel ParGRES: a middleware for executing OLAP queries in parallel Marta Mattoso 1, Geraldo Zimbrão 1,3, Alexandre A. B. Lima 1, Fernanda Baião 1,2, Vanessa P. Braganholo 1, Albino Aveleda 1, Bernardo Miranda

More information

An Overview of Distributed Databases

An Overview of Distributed Databases International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 2 (2014), pp. 207-214 International Research Publications House http://www. irphouse.com /ijict.htm An Overview

More information

Distributed Databases in a Nutshell

Distributed Databases in a Nutshell Distributed Databases in a Nutshell Marc Pouly Marc.Pouly@unifr.ch Department of Informatics University of Fribourg, Switzerland Priciples of Distributed Database Systems M. T. Özsu, P. Valduriez Prentice

More information

Implementing New Approach for Enhancing Performance and Throughput in a Distributed Database

Implementing New Approach for Enhancing Performance and Throughput in a Distributed Database 290 The International Arab Journal of Information Technology, Vol. 10, No. 3, May 2013 Implementing New Approach for Enhancing Performance and in a Distributed Database Khaled Maabreh 1 and Alaa Al-Hamami

More information

Using Provenance to Improve Workflow Design

Using Provenance to Improve Workflow Design Using Provenance to Improve Workflow Design Frederico T. de Oliveira, Leonardo Murta, Claudia Werner, Marta Mattoso COPPE/ Computer Science Department Federal University of Rio de Janeiro (UFRJ) {ftoliveira,

More information

DWMiner : A tool for mining frequent item sets efficiently in data warehouses

DWMiner : A tool for mining frequent item sets efficiently in data warehouses DWMiner : A tool for mining frequent item sets efficiently in data warehouses Bruno Kinder Almentero, Alexandre Gonçalves Evsukoff and Marta Mattoso COPPE/Federal University of Rio de Janeiro, P.O.Box

More information

chapater 7 : Distributed Database Management Systems

chapater 7 : Distributed Database Management Systems chapater 7 : Distributed Database Management Systems Distributed Database Management System When an organization is geographically dispersed, it may choose to store its databases on a central database

More information

Fourth generation techniques (4GT)

Fourth generation techniques (4GT) Fourth generation techniques (4GT) The term fourth generation techniques (4GT) encompasses a broad array of software tools that have one thing in common. Each enables the software engineer to specify some

More information

Knowledge based system to support the design of tools for the HFQ forming process for aluminium-based products

Knowledge based system to support the design of tools for the HFQ forming process for aluminium-based products MATEC Web of Conferences 21, 05008 (2015) DOI: 10.1051/matecconf/20152105008 C Owned by the authors, published by EDP Sciences, 2015 Knowledge based system to support the design of tools for the HFQ forming

More information

Optimization of Image Search from Photo Sharing Websites Using Personal Data

Optimization of Image Search from Photo Sharing Websites Using Personal Data Optimization of Image Search from Photo Sharing Websites Using Personal Data Mr. Naeem Naik Walchand Institute of Technology, Solapur, India Abstract The present research aims at optimizing the image search

More information

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES Constantin Brâncuşi University of Târgu Jiu ENGINEERING FACULTY SCIENTIFIC CONFERENCE 13 th edition with international participation November 07-08, 2008 Târgu Jiu TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence ICS461 Fall 2010 1 Lecture #12B More Representations Outline Logics Rules Frames Nancy E. Reed nreed@hawaii.edu 2 Representation Agents deal with knowledge (data) Facts (believe

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Towards the Optimization of Data Mining Execution Process in Distributed Environments

Towards the Optimization of Data Mining Execution Process in Distributed Environments Journal of Computational Information Systems 7: 8 (2011) 2931-2939 Available at http://www.jofcis.com Towards the Optimization of Data Mining Execution Process in Distributed Environments Yan ZHANG 1,,

More information

Cloud Based Distributed Databases: The Future Ahead

Cloud Based Distributed Databases: The Future Ahead Cloud Based Distributed Databases: The Future Ahead Arpita Mathur Mridul Mathur Pallavi Upadhyay Abstract Fault tolerant systems are necessary to be there for distributed databases for data centers or

More information

A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System

A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System Mohammad Ghulam Ali Academic Post Graduate Studies and Research Indian Institute of Technology, Kharagpur Kharagpur,

More information

Software Design. Design (I) Software Design Data Design. Relationships between the Analysis Model and the Design Model

Software Design. Design (I) Software Design Data Design. Relationships between the Analysis Model and the Design Model Software Design Design (I) Software Design is a process through which requirements are translated into a representation of software. Peter Lo CS213 Peter Lo 2005 1 CS213 Peter Lo 2005 2 Relationships between

More information

DESIGN OF A SPATIAL DATA WAREHOUSE BASED ON AN INTEGRATED NON- SPATIAL DATABASE AND GEO-SPATIAL INFORMATION

DESIGN OF A SPATIAL DATA WAREHOUSE BASED ON AN INTEGRATED NON- SPATIAL DATABASE AND GEO-SPATIAL INFORMATION DESIGN OF A SPATIAL DATA WAREHOUSE BASED ON AN INTEGRATED NON- SPATIAL DATABASE AND GEO-SPATIAL INFORMATION Abdulvahit Torun Harita Genel Komutanlığı (General Command of Mapping) (GCM), Kartografya Dairesi,

More information

Apuama: Combining Intra-query and Inter-query Parallelism in a Database Cluster

Apuama: Combining Intra-query and Inter-query Parallelism in a Database Cluster Apuama: Combining Intra-query and Inter-query Parallelism in a Database Cluster Bernardo Miranda 1, Alexandre A. B. Lima 1,3, Patrick Valduriez 2, and Marta Mattoso 1 1 Computer Science Department, COPPE,

More information

Chapter 10 Practical Database Design Methodology and Use of UML Diagrams

Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Outline The Role of Information Systems in

More information

ADVANCED GEOGRAPHIC INFORMATION SYSTEMS Vol. II - Using Ontologies for Geographic Information Intergration Frederico Torres Fonseca

ADVANCED GEOGRAPHIC INFORMATION SYSTEMS Vol. II - Using Ontologies for Geographic Information Intergration Frederico Torres Fonseca USING ONTOLOGIES FOR GEOGRAPHIC INFORMATION INTEGRATION Frederico Torres Fonseca The Pennsylvania State University, USA Keywords: ontologies, GIS, geographic information integration, interoperability Contents

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

Adaptive Virtual Partitioning for OLAP Query Processing in a Database Cluster

Adaptive Virtual Partitioning for OLAP Query Processing in a Database Cluster Adaptive Virtual Partitioning for OLAP Query Processing in a Database Cluster Alexandre A. B. Lima 1, Marta Mattoso 1, Patrick Valduriez 2 1 Computer Science Department, COPPE, Federal University of Rio

More information

Data Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609.

Data Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609. Data Integration using Agent based Mediator-Wrapper Architecture Tutorial Report For Agent Based Software Engineering (SENG 609.22) Presented by: George Shi Course Instructor: Dr. Behrouz H. Far December

More information

COURSE NAME: Database Management. TOPIC: Database Design LECTURE 3. The Database System Life Cycle (DBLC) The database life cycle contains six phases;

COURSE NAME: Database Management. TOPIC: Database Design LECTURE 3. The Database System Life Cycle (DBLC) The database life cycle contains six phases; COURSE NAME: Database Management TOPIC: Database Design LECTURE 3 The Database System Life Cycle (DBLC) The database life cycle contains six phases; 1 Database initial study. Analyze the company situation.

More information

Distributed Data Management

Distributed Data Management Introduction Distributed Data Management Involves the distribution of data and work among more than one machine in the network. Distributed computing is more broad than canonical client/server, in that

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Mauro Sousa Marta Mattoso Nelson Ebecken. and these techniques often repeatedly scan the. entire set. A solution that has been used for a

Mauro Sousa Marta Mattoso Nelson Ebecken. and these techniques often repeatedly scan the. entire set. A solution that has been used for a Data Mining on Parallel Database Systems Mauro Sousa Marta Mattoso Nelson Ebecken COPPEèUFRJ - Federal University of Rio de Janeiro P.O. Box 68511, Rio de Janeiro, RJ, Brazil, 21945-970 Fax: +55 21 2906626

More information

Transaction Management in Distributed Database Systems: the Case of Oracle s Two-Phase Commit

Transaction Management in Distributed Database Systems: the Case of Oracle s Two-Phase Commit Transaction Management in Distributed Database Systems: the Case of Oracle s Two-Phase Commit Ghazi Alkhatib Senior Lecturer of MIS Qatar College of Technology Doha, Qatar Alkhatib@qu.edu.sa and Ronny

More information

MultiMedia and Imaging Databases

MultiMedia and Imaging Databases MultiMedia and Imaging Databases Setrag Khoshafian A. Brad Baker Technische H FACHBEREIGM W-C^KA VK B_l_3JLJ0 T H E K Inventar-N*.: Sachgebiete: Standort: Morgan Kaufmann Publishers, Inc. San Francisco,

More information

Horizontal Partitioning by Predicate Abstraction and its Application to Data Warehouse Design

Horizontal Partitioning by Predicate Abstraction and its Application to Data Warehouse Design Horizontal Partitioning by Predicate Abstraction and its Application to Data Warehouse Design Aleksandar Dimovski 1, Goran Velinov 2, and Dragan Sahpaski 2 1 Faculty of Information-Communication Technologies,

More information

To Enhance The Security In Data Mining Using Integration Of Cryptograhic And Data Mining Algorithms

To Enhance The Security In Data Mining Using Integration Of Cryptograhic And Data Mining Algorithms IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 06 (June. 2014), V2 PP 34-38 www.iosrjen.org To Enhance The Security In Data Mining Using Integration Of Cryptograhic

More information

Optimization of ETL Work Flow in Data Warehouse

Optimization of ETL Work Flow in Data Warehouse Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. Sivaganesh07@gmail.com P Srinivasu

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

IV Distributed Databases - Motivation & Introduction -

IV Distributed Databases - Motivation & Introduction - IV Distributed Databases - Motivation & Introduction - I OODBS II XML DB III Inf Retr DModel Motivation Expected Benefits Technical issues Types of distributed DBS 12 Rules of C. Date Parallel vs Distributed

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Parallel Database Server. Mauro Sousa Marta Mattoso Nelson F. F. Ebecken. mauros, marta@cos.ufrj.br, nelson@ntt.ufrj.br

Parallel Database Server. Mauro Sousa Marta Mattoso Nelson F. F. Ebecken. mauros, marta@cos.ufrj.br, nelson@ntt.ufrj.br Data Mining: A Tightly-Coupled Implementation on a Parallel Database Server Mauro Sousa Marta Mattoso Nelson F. F. Ebecken COPPE - Federal University of Rio de Janeiro P.O. Box 68511, Rio de Janeiro, RJ,

More information

Database Management. Chapter Objectives

Database Management. Chapter Objectives 3 Database Management Chapter Objectives When actually using a database, administrative processes maintaining data integrity and security, recovery from failures, etc. are required. A database management

More information

USING SCHEMA AND DATA INTEGRATION TECHNIQUE TO INTEGRATE SPATIAL AND NON-SPATIAL DATA : DEVELOPING POPULATED PLACES DB OF TURKEY (PPDB_T)

USING SCHEMA AND DATA INTEGRATION TECHNIQUE TO INTEGRATE SPATIAL AND NON-SPATIAL DATA : DEVELOPING POPULATED PLACES DB OF TURKEY (PPDB_T) USING SCHEMA AND DATA INTEGRATION TECHNIQUE TO INTEGRATE SPATIAL AND NON-SPATIAL DATA : DEVELOPING POPULATED PLACES DB OF TURKEY () Abdulvahit Torun General Command of Mapping (GCM), Cartography Department,

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

Task Scheduling in Hadoop

Task Scheduling in Hadoop Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed

More information

Comparative Analysis of Classification Algorithms on Different Datasets using WEKA

Comparative Analysis of Classification Algorithms on Different Datasets using WEKA Volume 54 No13, September 2012 Comparative Analysis of Classification Algorithms on Different Datasets using WEKA Rohit Arora MTech CSE Deptt Hindu College of Engineering Sonepat, Haryana, India Suman

More information

LONG BEACH CITY COLLEGE MEMORANDUM

LONG BEACH CITY COLLEGE MEMORANDUM LONG BEACH CITY COLLEGE MEMORANDUM DATE: May 5, 2000 TO: Academic Senate Equivalency Committee FROM: John Hugunin Department Head for CBIS SUBJECT: Equivalency statement for Computer Science Instructor

More information

The Design of a Distributed Database for Doctoral Studies Management

The Design of a Distributed Database for Doctoral Studies Management Informatica Economică vol. 14, no. 4/2010 139 The Design of a Distributed Database for Doctoral Studies Management Enikö Elisabeta TOLEA, Aurelian Razvan COSTIN Babes Bolyai University, Cluj-Napoca, Romania

More information

not necessarily strictly sequential feedback loops exist, i.e. may need to revisit earlier stages during a later stage

not necessarily strictly sequential feedback loops exist, i.e. may need to revisit earlier stages during a later stage Database Design Process there are six stages in the design of a database: 1. requirement analysis 2. conceptual database design 3. choice of the DBMS 4. data model mapping 5. physical design 6. implementation

More information

New Approach of Computing Data Cubes in Data Warehousing

New Approach of Computing Data Cubes in Data Warehousing International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of

More information

PHP Code Design. The data structure of a relational database can be represented with a Data Model diagram, also called an Entity-Relation diagram.

PHP Code Design. The data structure of a relational database can be represented with a Data Model diagram, also called an Entity-Relation diagram. PHP Code Design PHP is a server-side, open-source, HTML-embedded scripting language used to drive many of the world s most popular web sites. All major web servers support PHP enabling normal HMTL pages

More information

Abstract. Keywords: Data Warehouse, Views, Fragmentation, Performance benefit

Abstract. Keywords: Data Warehouse, Views, Fragmentation, Performance benefit Optimizing Partition-Selection Scheme for Warehouse Aggregate Views * C.I. Ezeife School of Computer Science University of Windsor Windsor, Ontario Canada N9B 3P4 cezeife@cs.uwindsor.ca Tel: (519) 253-3000

More information

Database Replication with Oracle 11g and MS SQL Server 2008

Database Replication with Oracle 11g and MS SQL Server 2008 Database Replication with Oracle 11g and MS SQL Server 2008 Flavio Bolfing Software and Systems University of Applied Sciences Chur, Switzerland www.hsr.ch/mse Abstract Database replication is used widely

More information

Towards Full-fledged XML Fragmentation for Transactional Distributed Databases

Towards Full-fledged XML Fragmentation for Transactional Distributed Databases Towards Full-fledged XML Fragmentation for Transactional Distributed Databases Rebeca Schroeder 1, Carmem S. Hara (supervisor) 1 1 Programa de Pós Graduação em Informática Universidade Federal do Paraná

More information

A Virtual Machine Searching Method in Networks using a Vector Space Model and Routing Table Tree Architecture

A Virtual Machine Searching Method in Networks using a Vector Space Model and Routing Table Tree Architecture A Virtual Machine Searching Method in Networks using a Vector Space Model and Routing Table Tree Architecture Hyeon seok O, Namgi Kim1, Byoung-Dai Lee dept. of Computer Science. Kyonggi University, Suwon,

More information

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar Object Oriented Databases OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar Executive Summary The presentation on Object Oriented Databases gives a basic introduction to the concepts governing OODBs

More information

Incorporating Evidence in Bayesian networks with the Select Operator

Incorporating Evidence in Bayesian networks with the Select Operator Incorporating Evidence in Bayesian networks with the Select Operator C.J. Butz and F. Fang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada SAS 0A2 {butz, fang11fa}@cs.uregina.ca

More information

Chapter 10. Practical Database Design Methodology. The Role of Information Systems in Organizations. Practical Database Design Methodology

Chapter 10. Practical Database Design Methodology. The Role of Information Systems in Organizations. Practical Database Design Methodology Chapter 10 Practical Database Design Methodology Practical Database Design Methodology Design methodology Target database managed by some type of database management system Various design methodologies

More information

Secure Data Transfer and Replication Mechanisms in Grid Environments p. 1

Secure Data Transfer and Replication Mechanisms in Grid Environments p. 1 Secure Data Transfer and Replication Mechanisms in Grid Environments Konrad Karczewski, Lukasz Kuczynski and Roman Wyrzykowski Institute of Computer and Information Sciences, Czestochowa University of

More information

Data Engineering for the Analysis of Semiconductor Manufacturing Data

Data Engineering for the Analysis of Semiconductor Manufacturing Data Data Engineering for the Analysis of Semiconductor Manufacturing Data Peter Turney Knowledge Systems Laboratory Institute for Information Technology National Research Council Canada Ottawa, Ontario, Canada

More information

Lightweight Service-Based Software Architecture

Lightweight Service-Based Software Architecture Lightweight Service-Based Software Architecture Mikko Polojärvi and Jukka Riekki Intelligent Systems Group and Infotech Oulu University of Oulu, Oulu, Finland {mikko.polojarvi,jukka.riekki}@ee.oulu.fi

More information

Software Requirements Metrics

Software Requirements Metrics Software Requirements Metrics Fairly primitive and predictive power limited. Function Points Count number of inputs and output, user interactions, external interfaces, files used. Assess each for complexity

More information

An Analysis of Four Missing Data Treatment Methods for Supervised Learning

An Analysis of Four Missing Data Treatment Methods for Supervised Learning An Analysis of Four Missing Data Treatment Methods for Supervised Learning Gustavo E. A. P. A. Batista and Maria Carolina Monard University of São Paulo - USP Institute of Mathematics and Computer Science

More information

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. ravirajesh.j.2013.mecse@rajalakshmi.edu.in Mrs.

More information

Software Life-Cycle Management

Software Life-Cycle Management Ingo Arnold Department Computer Science University of Basel Theory Software Life-Cycle Management Architecture Styles Overview An Architecture Style expresses a fundamental structural organization schema

More information

Background knowledge-enrichment for bottom clauses improving.

Background knowledge-enrichment for bottom clauses improving. Background knowledge-enrichment for bottom clauses improving. Orlando Muñoz Texzocotetla and René MacKinney-Romero Departamento de Ingeniería Eléctrica Universidad Autónoma Metropolitana México D.F. 09340,

More information

VII. Database System Architecture

VII. Database System Architecture VII. Database System Lecture Topics Monolithic systems Client/Server systems Parallel database servers Multidatabase systems CS338 1 Monolithic System DBMS File System Each component presents a well-defined

More information

KEEP THIS COPY FOR REPRODUCTION PURPOSES. I ~~~~~Final Report

KEEP THIS COPY FOR REPRODUCTION PURPOSES. I ~~~~~Final Report MASTER COPY KEEP THIS COPY FOR REPRODUCTION PURPOSES 1 Form Approved REPORT DOCUMENTATION PAGE I OMS No. 0704-0188 Public reoorting burden for this collection of information is estimated to average I hour

More information

Fig. 3. PostgreSQL subsystems

Fig. 3. PostgreSQL subsystems Development of a Parallel DBMS on the Basis of PostgreSQL C. S. Pan kvapen@gmail.com South Ural State University Abstract. The paper describes the architecture and the design of PargreSQL parallel database

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

LDIF - Linked Data Integration Framework

LDIF - Linked Data Integration Framework LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,

More information

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,

More information

Distributed Databases

Distributed Databases Distributed Databases Chapter 1: Introduction Johann Gamper Syllabus Data Independence and Distributed Data Processing Definition of Distributed databases Promises of Distributed Databases Technical Problems

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

SEARCHING AND KNOWLEDGE REPRESENTATION. Angel Garrido

SEARCHING AND KNOWLEDGE REPRESENTATION. Angel Garrido Acta Universitatis Apulensis ISSN: 1582-5329 No. 30/2012 pp. 147-152 SEARCHING AND KNOWLEDGE REPRESENTATION Angel Garrido ABSTRACT. The procedures of searching of solutions of problems, in Artificial Intelligence

More information

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation Objectives Distributed Databases and Client/Server Architecture IT354 @ Peter Lo 2005 1 Understand the advantages and disadvantages of distributed databases Know the design issues involved in distributed

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

Microsoft TMG Replacement with NetScaler

Microsoft TMG Replacement with NetScaler Microsoft TMG Replacement with NetScaler Replacing Microsoft Forefront TMG with NetScaler for Optimization This deployment guide focuses on replacing Microsoft Forefront Threat Management Gateway (TMG)

More information

A Lab Course on Computer Architecture

A Lab Course on Computer Architecture A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

More information

A Fast Partial Memory Approach to Incremental Learning through an Advanced Data Storage Framework

A Fast Partial Memory Approach to Incremental Learning through an Advanced Data Storage Framework A Fast Partial Memory Approach to Incremental Learning through an Advanced Data Storage Framework Marenglen Biba, Stefano Ferilli, Floriana Esposito, Nicola Di Mauro, Teresa M.A Basile Department of Computer

More information

A Flexible Machine Learning Environment for Steady State Security Assessment of Power Systems

A Flexible Machine Learning Environment for Steady State Security Assessment of Power Systems A Flexible Machine Learning Environment for Steady State Security Assessment of Power Systems D. D. Semitekos, N. M. Avouris, G. B. Giannakopoulos University of Patras, ECE Department, GR-265 00 Rio Patras,

More information

PHP FRAMEWORK FOR DATABASE MANAGEMENT BASED ON MVC PATTERN

PHP FRAMEWORK FOR DATABASE MANAGEMENT BASED ON MVC PATTERN PHP FRAMEWORK FOR DATABASE MANAGEMENT BASED ON MVC PATTERN Chanchai Supaartagorn Department of Mathematics Statistics and Computer, Faculty of Science, Ubon Ratchathani University, Thailand scchansu@ubu.ac.th

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Distributed Database Management Systems

Distributed Database Management Systems Page 1 Distributed Database Management Systems Outline Introduction Distributed DBMS Architecture Distributed Database Design Distributed Query Processing Distributed Concurrency Control Distributed Reliability

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Physical Database Design and Tuning

Physical Database Design and Tuning Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence

More information

Bachelor Degree in Informatics Engineering Master courses

Bachelor Degree in Informatics Engineering Master courses Bachelor Degree in Informatics Engineering Master courses Donostia School of Informatics The University of the Basque Country, UPV/EHU For more information: Universidad del País Vasco / Euskal Herriko

More information

Selective Naive Bayes Regressor with Variable Construction for Predictive Web Analytics

Selective Naive Bayes Regressor with Variable Construction for Predictive Web Analytics Selective Naive Bayes Regressor with Variable Construction for Predictive Web Analytics Boullé Orange Labs avenue Pierre Marzin 3 Lannion, France marc.boulle@orange.com ABSTRACT We describe our submission

More information

Adding Semantics to Business Intelligence

Adding Semantics to Business Intelligence Adding Semantics to Business Intelligence Denilson Sell 1,2, Liliana Cabral 2, Enrico Motta 2, John Domingue 2 and Roberto Pacheco 1,3 1 Stela Group, Universidade Federal de Santa Catarina, Brazil 2 Knowledge

More information

USING SCHEMA AND DATA INTEGRATION TECHNIQUE TO INTEGRATE SPATIAL AND NON-SPATIAL DATA : DEVELOPING POPULATED PLACES DB OF TURKEY (PPDB_T)

USING SCHEMA AND DATA INTEGRATION TECHNIQUE TO INTEGRATE SPATIAL AND NON-SPATIAL DATA : DEVELOPING POPULATED PLACES DB OF TURKEY (PPDB_T) ISPRS SIPT IGU UCI CIG ACSG Table of contents Table des matières Authors index Index des auteurs Search Recherches Exit Sortir USING SCHEMA AND DATA INTEGRATION TECHNIQUE TO INTEGRATE SPATIAL AND NON-SPATIAL

More information

The Role of Controlled Experiments in Software Engineering Research

The Role of Controlled Experiments in Software Engineering Research The Role of Controlled Experiments in Software Engineering Research Victor R. Basili 1 The Experimental Discipline in Software Engineering Empirical studies play an important role in the evolution of the

More information

Wireless Sensor Networks Coverage Optimization based on Improved AFSA Algorithm

Wireless Sensor Networks Coverage Optimization based on Improved AFSA Algorithm , pp. 99-108 http://dx.doi.org/10.1457/ijfgcn.015.8.1.11 Wireless Sensor Networks Coverage Optimization based on Improved AFSA Algorithm Wang DaWei and Wang Changliang Zhejiang Industry Polytechnic College

More information

Object Oriented Database Management System for Decision Support System.

Object Oriented Database Management System for Decision Support System. International Refereed Journal of Engineering and Science (IRJES) ISSN (Online) 2319-183X, (Print) 2319-1821 Volume 3, Issue 6 (June 2014), PP.55-59 Object Oriented Database Management System for Decision

More information