A Distribution Management System for Relational Databases in Cloud Environments

Size: px
Start display at page:

Download "A Distribution Management System for Relational Databases in Cloud Environments"

Transcription

1 JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 11, NO. 2, JUNE A Distribution Management System for Relational Databases in Cloud Environments Sze-Yao Li, Chun-Ming Chang, Yuan-Yu Tsai, Seth Chen, Jonathan Tsai, and Wen-Lung Tsai Abstract For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and durable) properties. Moreover, database partition and migration tools can help transplanting conventional relational database systems to the cloud environment rather than rebuilding a new system. This paper proposes a database distribution management (DBDM) system, which partitions or replicates the data according to the transaction behaviors of the application system. The principle strategy of DBDM is to keep together the data used in a single transaction, and thus, avoiding massive transmission of records in join operations. The proposed system has been implemented successfully. The preliminary experiments show that the DBDM performs the database partition and migration effectively. Also, the DBDM system is modularly designed to adapt to different database management system (DBMS) or different partition algorithms. Index Terms Data migration, database partition, distributed database, relational database. 1. Introduction People rely more and more on information systems in their daily life with the development and integration of information and communication technologies. The incurred transactions and generated data have thus been increased dramatically. Cloud computing is one of the technologies developed sharing resources among tenants to improve system performance and to save operational cost. In addition to newly developed applications, legacy systems Manuscript received October 31, 2012; revised January 1, This work was supported by the Taiwan Ministry of Economic Affairs and Institute for Information Industry under the project titled Fundamental Industrial Technology Development Program (1/4). S.-Y. Li is with the Department of Applied Informatics and Multimedia, Asia University, Taichung (Corresponding author lsy@asia.edu.tw). C.-M. Chang and Y.-Y. Tsai are with the Department of Applied Informatics and Multimedia, Asia University, Taichung ( cmchang@ asia.edu.tw; yytsai@asia.edu.tw). S. Chen, J. Tsai, and W.-L. Tsai are with the Institute for Information Industry, Taipei ( seth@iii.org.tw; jonathan@iii.org.tw; and davetsai@ iii.org.tw). Digital Object Identifier: /j.issn X are also considered to transplant to cloud environments. On a cloud platform, programs can be replicated to multiple nodes easily to distribute the processing load. This is achieved via the mechanism of virtual machine. To avoid becoming the next hot spot of processing, the database should also be distributed properly. In order to handle huge amount of request for data processing, several key-value store systems, such as Google s Bigtable [1], Yahoo s PNUT [2], Amazon s Dynamo [3], and other similar systems [4], have been developed. Data in these systems are replicated or partitioned across different servers so that the data store can be scaled in and out easily as needed. However, this kind of system tends to achieve system availability at the sacrifice of data consistency. This is known as the CAP (consistency, availability, and partition tolerance) theory [5] or PACELC [6]. Although systems with key-value store perform well in large-scale queries and simple updates, they can hardly provide ACID (atomic, consistent, isolated, and durable) guarantee for important database transactions [7]. For transaction processing systems with relational databases to run in the cloud environment, an effective tool is needed to help managing the partition and/or replication of data and providing ACID guarantee at the same time. With the aid of this tool, it may take less effort in migrating legacy systems to the cloud platforms. Moreover, professionals in relational database can be shifted and utilized with less training. When distributing data to different nodes, the decisions of how to spread data on those nodes affect the performance and complexity of succeeding query and update tasks significantly. ing data to more nodes can distribute the workload at the cost of increased complexity in data allocation, result integration, and transaction coordination. On the other hand, although replicating data to different nodes makes the system affordable to large amount of similar queries, synchronization among data nodes for data modification becomes a problem to ensure consistency. Since the transaction behavior of various applications differs from each other, it is important to partition or replicate data according to the characteristics of an application system such that better performance can be achieved. This paper proposes a transaction-based database partition system for relational databases, which is named as database distribution manager (DBDM). The proposed

2 170 DBDM has two basic modules which provide database partition and data migration functionalities, respectively. The database partition module advises strategies pointing out whether a table in a database should be partitioned or replicated, and transferred to which nodes. The algorithm in this module provides solutions according to behaviors of the application system, including content and frequency of each transaction, tables used in each structured query language (SQL) statement, and relations among tables in the database. In principle, decisions made by this module let tables use in a single SQL statement together so that join operations can be accomplished within one node. This will avoid massive data movement among nodes. The data migration module, in turn, actually moves data according to the difference between current data distribution and the proposed one. The migration can be scheduled in a progressive manner by moving a limited amount of data at a time. In this way, the number of interferences to the operation of the system can be reduced. The amounts of data to be moved in one step and the time interval between two consecutive steps are set as parameters to the migration module. The proposed system was implemented to verify its feasibility. Preliminary experiments show that the database is effectively distributed and tables are grouped according to transactions of the application system. Further experiments will be conducted to verify the effectiveness on load distribution and performance enhancement. In the following, related works are discussed in the next section. The proposed system is introduced in Section 3. Section 4 illustrates the implementations and some design issues are discussed. Section 5 concludes this paper. 2. Related Works In recent researches in database partition, Rao et al. devised a tool named DB2 design advisor to automatically select optimal solution for data partitioning [8],[9]. This tool used workload as input and suggested a method of distribution in four facets, namely, indexing, materialized view, database partition, and multi-table clustering. To take into account of the dependency among them, the authors designed a hybrid method to reduce execution time without degrading system quality. These methods took advantage of parallelism and were suitable for applications involving analysis of massive data such as data warehousing and mining. However, it is hard to do parallel transactions due to the locking and synchronization problems incurred by the ACID requirements. Agrawal et al. proposed a technique for automatic partitioning database horizontally and vertically [10]. The technique was also implemented and tested on Microsoft SQL Server. Although system performance and manageability factors were considered, their method was only focused on a single node. Extending their algorithm for partition with multiple nodes needs more efforts. NewSQL was proposed by several researchers for on-line transaction processing (OLTP) [11]. The purpose of JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 11, NO. 2, JUNE 2013 NewSQL is to distribute the workload while still guaranteeing the ACID requirements. The efforts are principally focused on searching for an optimal execution plan for data inquiry and manipulation. As mentioned in the previous section, the efficiency of statement execution depends on how the data are distributed. Thus, a good partition strategy is the basis of efficient query execution. Some researches discussed data lookup problems in query processing [12],[13]. Improvements were made on the synchronization protocol such that distributed transactions can be conducted more efficiently. However, these improvements also rely on proper distribution of data to obtain maximal performance. There were also researches working on server allocation after partitioning the database. Apers used a model to evaluate cost functions under different server configurations. Thus, how to distribute data among the servers can be decided with this model [14]. Genetic algorithm has also been adopted for the server allocation problem [15]. The researchers found that their algorithms outperformed greedy heuristics under several different partition sizes. Menon formulated the server allocation as an integer programming problem [16]. Storage and processing capabilities were considered as constraints to the objectives. In addition to the factors considered in these researches, current distribution of the database should also be taken into account since the difference between current status and the proposed distribution determines the amount of data movement, which will affect the system performance during the redistribution. In summary, many factors should be considered in order to properly partition a database. These factors include the cost of ensuring ACID requirement, data migration for partitioning, storage and capacity of servers, server allocation, and data lookup and transmission to complete a query. In the database partition system proposed by this paper, transaction behavior and database relations are used to help judge the distribution strategy of data, such that the majority of factors can be taken into consideration. The architecture of the proposed system is introduced in the next section. 3. System Architecture The architecture of the proposed DBDM is shown in Fig. 1 along with its operation environment. In this environment, DBDM is operated outside the distributed database management system (DBMS). It retrieves database schema and statistics of the application system, evaluates partition strategies, and migrates data among data nodes according to the partition decision. Distributed database (DB) administrator can monitor the current partition status from the interface. The administrator can also direct the behavior of partition and migration process by adjusting threshold values and/or parameters. Some of the system information, such as available servers and their capacities, are also set by the administrator.

3 LI et al.: A Distribution Management System for Relational Databases in Cloud Environments 171 Application System & Users Distributed DB Administrator SQL & Results pgpool pgpool Status Management Interface Distributed SQL & Partial Results PostgreSQL DB Servers Application Database Data Movement DB Schema Transaction Statistics Fig. 1. System architecture of the database distribution manager. Migration Manager Schema Extractor Statistics Collector DB Decisions Collected DB Schema Collected Statistics Algorithm DB Distribution Manager (DBDM) Since this research focuses on database partition and migration, PostgreSQL DBMS and pgpool, both developed by PostgreSQL Global Development Group, are used as the distribution DBMS to simplify the setup of experiment environment. The proposed DBDM can be used with other distributed DBMS as well by modifying the interfaces between DBDM and the target distributed DBMS. The DBDM comprises five modules. The Algorithm module performs the evaluation of current database status and the proposition of partition strategy. The Migration Manager module then actually moves records in the database in response to the proposed partition strategy. Two types of information are needed for these two modules to complete their missions. The static information of the database, such as table schema and relationships among tables, are provided by the Schema Extractor module. The Statistics Collector module in turn gathers the runtime information, for example, the number of records and distribution of key values. Finally, the Management Interface module is in charge of the communication with the administrator. The two main functionalities of DBDM, which are the Algorithm and the Migration Manager, are introduced in more detail next. 3.1 Algorithm The objective of the Algorithm module is maximizing system performance under the constraint of guaranteeing the ACID requirement. System performance is in turn affected by transaction processing time and the workload. Thus, the procedure and characteristics of handling a transaction are analyzed first so that the algorithm can be tailored for the requirements of transaction processing. The steps to process a distributed transaction are 1) parsing, 2) execution planning and optimizing, 3) distributed command dispatching, 4) result collecting and summarizing, and 5) distributed transaction committing. The operation analysis is focused on those steps which are related to the distributed process, namely, data allocation, data transmission, and distributed transaction commitment. Consider the data allocation process first. Two methods are generally used in partitioning and locating the records in a table. One method is for storing and retrieving data according to a hash function of the key. The advantage of this method is its constant time for locating data. However, it is hard to find a suitable hash function for partitioning data in irregular sizes. The other method is for dividing the table into consecutive pieces according to the key. Although it suffers from increased data locating time as the number of partitions increases, this method can partition data in arbitrary lengths. Since the distribution of key values in a table depends on the characteristics of the application and is not predictable, the tables are divided into consecutive segments in this research. Next, the data transmission among data nodes is discussed. The most time-consuming part in a query is the join operation, which includes cross product operations. Although the final result may be few, the amount of intermediate data is large due to the cross products. Massive data transmission will be inevitable if tables involved in a join operation reside in different nodes. Conversely, the transmission of the query results only is necessary if those tables are stored in the same node. Therefore, the Algorithm module should strive to put together tables within the same query statement. Finally, distributed transaction commitment needs synchronization among data nodes which participate in the transaction to guarantee the ACID requirements. Due to the

4 172 data transmission problem, keeping tables of the same transaction together can effectively reduce the communications among nodes. Two strategies of database partitioning can be concluded from the above discussions. First, the tables involved in the same query statement should be kept together so that massive data transmission among nodes can be avoided. Next, the table is divided consecutively if partitioning is inevitable. However, grouping tables together makes data centralized instead of distributed. The group itself should be divided if it becomes the transaction hot spot. To allow the partition of the group while keeping related tables together, replication and horizontal partitioning are used. In a group to be partitioned, the tables containing elementary information are replicated since they act as lookup tables and normally contain less data. Moreover, the tables with master-detail relationships are partitioned in pairs. For example, the table storing items of sales orders is partitioned according to how the table storing headers of sales orders is partitioned. In this way, data in a group is distributed over multiple nodes while those related records needed in join operations still keep intact. After analyzing the behavior of transaction processing and its performance concerns, the Algorithm is designed and its modular structure is depicted in Fig. 2. The algorithm is divided into three main modules with three supporting modules, respectively. The main modules, namely Grouping Tables, Assign Servers to Groups, and Horizontally, perform the steps of database partition, whereas the supporting modules provide the needed information derived from outside and/or previous results. Among the three main modules, the Grouping Tables module first inspects every SQL statement and puts tables together if they are involved in a join operation. Next, the Assign Servers to Groups module allocates servers for each group according to the transaction frequencies of the groups and the capabilities of the servers such that each server will get a portion of transaction load proportional to JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 11, NO. 2, JUNE 2013 its capability. Finally, a group is partitioned horizontally into consecutive pieces by the last module if it is dispatched to more than one node. 3.2 Migration Manager After the way of database partition is determined, the Migration Manager module will actually move those data to the proposed locations. This module includes three phases as shown in Fig. 3. The Analyzing Differences module takes current distribution of data and partition decisions generated by the Algorithm module as its inputs. It compares each table in these inputs to determine the ranges of records which should be moved from one node to the other. Then, the Calculate Amounts to Move module evaluates the total amounts of data to be moved from one node to the other for every pair of nodes according to the analyzed differences. The migration action will be divided into several smaller movements if the system performance is concerned. As the last step, the Move Data & Set Parameters module composes physical instructions in response to the movement information passed from the previous step. The migration instructions are sent to each node involved in the migration process. The distributed DBMS is also informed regarding the changes on data distribution. Current Distribution of Data Decision Migration Parameters Analyze Differences Total Movements Movements in One Step Calculate Amounts to Move Move Data & Set Parameters pgpool ( Parameters) Data Movement Fig. 3. Modular structure of the Migration Manager. Transactions & SQL Table Access Calculate Table Access Table Access Grouping Tables Groups of Tables Servers & Capacities Calculate Group Access Group Access Assign Servers to Groups Tables to be replicated Group Server Mapping Tables to be partitioned by its own index Identify DB Schema Master/Detail Relationships Fig. 2. Modular structure of the Algorithm. Master/Detail Relationships Horizontally Tables to be partitioned according to its master table

5 LI et al.: A Distribution Management System for Relational Databases in Cloud Environments 173 It is noteworthy that the system performance is affected by the selection, deletion, transmission, and insertion process during the migration. The effect on the performance varies with different operation environments. Thus, the same migration request which can be done in one shot may need to be executed progressively in a different environment. Moreover, the requirements in response time or system throughput which is tight to some applications may have wider tolerance in other systems. To deal with this variety, the system leaves the decisions to the DB administrator by providing two parameters, namely, AmountInOneMove and IntervalBetweenTwoMoves. The former specifies the upper limit of data in one move. The later tells how much time the system should wait before the next movement can proceed. The administrator can set AmountInOneMove to a small value and set IntervalBetweenTwoMoves to a large value so that the migration is hidden from the users. The drawback of this setting is that it takes a longer time to achieve the desired partition. This method is suitable for frequent redistribution since relatively few data are changed, which implies few data to migrate. In the other situation where the system might be running intermittently, the administrator can set AmountInOneMove to a huge number and run the partition and migration process in the periodical maintenance so that the migration can be done in the whole piece. 4. Implementations This section introduces the results of physically implementing the proposed system. Some design issues are also discussed as a guideline for further research. 4.1 Implementation Results The proposed system was implemented on Linux using PostgreSQL and pgpool as the distributed DBMS. Part of the transactions and data of a hospital s inventory system were included as test data. Preliminary tests showed that the system partitioned the database and did the migration as expected. More data and queries are currently being prepared to further test the robustness and performance. The original configuration of the system contains three nodes with the same computation power, and a forth node with 80% power of the previous nodes is added in the new configuration. A snapshot of the DBDM exhibiting the state of the database is shown in Fig. 4. The current status column describes the current distribution of the database, which may be either the previously calculated result or the status after several migration steps, depending on the migration strategy. The objective distribution column shows the newly suggested distribution obtained from the Algorithm. The percentages in both columns represent the portion of the transaction load. The effect of adding a new node can be seen by comparing the current status and the objective distribution columns. It can be observed that the distribution of table ITEM_EXCH has been changed due to different node configurations. The percentage of ITEM_EXCH allocated to sever are01 is different from the other since there are other groups of tables co-located on that node. The percentage of ITEM_EXCH on server are02 to server are04 in the objective distribution shows that they are indeed allocated by their respective capabilities. 4.2 Discussions One of the strategies for the proposed Algorithm is grouping together the tables related by join operations. However, almost all tables are related in a relational data model, which means that the grouping process may result in one group with all tables in most cases. Taking a closer examination on the relational data model, it is found that the tables which are most frequently joined are the ones storing basic information such as Fig. 4. Partial results of the implemented DRDM system.

6 174 employee, department, product, and customer. These tables are usually served as referencing lists and contain relatively few records. Removing this kind of tables from the relational data model makes the remaining tables separate into smaller groups, which are really related in the application. According to the above observation, the system replicates the most frequently joined tables to the related groups while grouping the rest tables. In the current system, the frequency of joins involved is used to judge whether the table should be replicated or not. A parameter is introduced to indicate and adjust this frequency. Identifying those tables automatically is a possible direction in the future. A second designing consideration is how to map transaction distributions into data distribution. To simplify the implementation, the proposed system assumes that each transaction accesses those records in a table evenly. This is obviously not the case in physical situations. For example, trains in peak hours are queried more often than others and undelivered sales orders are accessed more frequently than historical ones. To cope with this, various access patterns should be observed and recorded so that the Algorithm can distribute the data in accordance with the transaction load. Finally, the modular design is important to the flexibility of the proposed system. In the global view of the system architecture (see Fig. 1), the Schema Extractor and the Statistics Collector modules are separated from the partition and migration modules purposely, so that they can be replaced easily to adopt other DBMS as needed. The command generation process in the Migration Manager module (see Fig. 3) is also isolated for the same reason. The transaction frequencies used in the Algorithm module (see Fig. 2) is isolated from the steps of partition calculation. Furthermore, the unit of those figures is transformed to percentages. Through this arrangement, the transaction frequencies used for the partitions can be changed to other measures without affecting the algorithm. Conversely, the proposed architecture can be used as a platform to test different partition strategies by simply replacing the algorithm body. 5. Conclusions In cloud environments, a transaction processing application needs to keep ACID guaranteeing and handle huge amount of data efficiently. Providing a partition strategy to the distributed database is important for fulfilling these requirements. This paper proposes a database distribution management system, DBDM, which distributes the data according to the transaction behaviors of the application system. It splits tables into groups so that join operations can be conducted in a group. Moreover, large groups are horizontally partitioned in pairs by observing the master-detail relationships between pairs of tables. These strategies avoid massive transmission of records in join operations among data nodes. JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 11, NO. 2, JUNE 2013 The proposed DBDM system was also implemented to verify its feasibility. Preliminary experiments showed that the DBDM performed the database partition and migration so that the transactions were distributed over the nodes. Extensive experiments will be conducted to evaluate the effectiveness on the performance improvements. The DBDM system was modularly designed so that it can be easily adapted to different DBMS. Modules which deal with the Algorithm can also be replaced to evaluate the performance of different algorithms. References [1] F. Chang, J. Dean, S. Ghemawat, et al., Bigtable: a distributed storage system for structured data, in Proc. of the 7th Symposium on Operating Systems Design and Implementation, Seattle, 2006, pp [2] B. F. Cooper, R. Ramakrishnan, U. Srivastava, et al., PNUTS: Yahoo! s hosted data serving platform, in Proc. the 34th Int. Conf. on Very Large Data Bases, Auckland, 2008, pp [3] G. DeCandia, D, Hastorun, M, Jampani, et al., Dynamo: Amazon s highly available key-value store, in Proc. of the 21st ACM SIGOPS Symposium on Operating System Principles, Stevenson, 2007, pp [4] D. Agrawal, A. E. Abbadi, S. Antony, and S. Das, Data management challenges in cloud computing infrastructures, Lecture Notes in Computer Science, vol. 5999, pp. 1 10, [5] S. Gilbert and N. Lynch, Brewer s conjecture and the feasibility of consistent, available, partition-tolerant web services, ACM SIGACT News, vol. 33, no. 2, pp , [6] D. J. Abadi, Consistency tradeoffs in modern distributed database system design: CAP is only part of the story, Computer, vol. 45, no. 2, pp , [7] M. Stonebraker, N. Hachem, and P. Helland, The end of an architectural era: (It's time for a complete rewrite), in Proc. of the 33rd Int. Conf. on Very Large Data Bases, Vienna, 2007, pp [8] J. Rao, C. Zhang, G. Lohman, et al., Automating physical database design in a parallel database, in Proc. of Int. Conf. on Management of Data and Symposium on Principles Database and Systems, Madison, 2002, pp [9] D. C. Zilio, J. Rao, S. Lightstone, et al., DB2 design advisor: integrated automatic physical database design, in Proc. of the 30th Int. Conf. on Very Large Data Bases, Toronto, 2004, pp [10] S. Agrawal, V. Narasayya, and B. Yang, Integrating vertical and horizontal partitioning into automated physical database design, in Proc. of the ACM SIGMOD Int. Conf. on Management of Data, Paris, 2004, pp [11] A. Pavlo, C. Curino, and S. Zdonik, Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems, in Proc. of the ACM SIGMOD Int. Conf. on Management of Data, Scottsdale, 2012, pp [12] A. Thomson, T. Diamond, S.-C. Weng, et al., Fast

7 LI et al.: A Distribution Management System for Relational Databases in Cloud Environments 175 distributed transactions for partitioned database systems, in Proc. of the ACM SIGMOD Int. Conf. on Management of Data, Scottsdale, 2012, pp [13] A. L. Tatarowicz, C. Curino, E. P. C. Jones, et al., Lookup tables: Fine-grained partitioning for distributed databases, in Proc. of the 28th IEEE Int. Conf. on Data Engineering, Washington, 2012, pp [14] P. M. G. Apers, Data allocation in distributed database systems, ACM Trans. on Database System, vol. 13, no. 3, pp , [15] A. L. Corcoran and J. Hale, A genetic algorithm for fragment allocation in a distributed database systems, in Proc. of 1994 ACM symposium on Applied computing, Phoenix, 1994, pp [16] S. Menon, Allocating fragments in distributed databases, IEEE Trans. on Parallel and Distributed Systems, vol. 16, no. 7, pp , Sze-Yao Li was born in Keelung City in He received the Ph.D. degree in computer science from National Tsing Hua University in He worked with the Computer and Communications Laboratory, Industrial Technology Research Institute from 1985 to Dr. Li served as a software manager with a workflow/erp company from 2000 to He is currently an assistant professor with the Department of Applied Informatics and Multimedia, Asia University. His research interests include databases and software testing. Chun-Ming Chang received the B.S. degree from National Cheng Kung University in 1985 and the M.S. degree from National Tsing Hua University in 1987, both in electrical engineering. He received the Ph.D. degree in electrical and computer engineering from University of Florida in From 1998 to 2002, Dr. Chang served as a senior technical staff member and a senior software engineer with two communication companies, respectively. He joined the faculty of Asia University in His research interests include computer vision/image processing, video compression, virtual reality, computer networks, and robotics. Yuan-Yu Tsai was born in Taichung in He received the B.S. degree from Department of Computer Science and Information Engineering, National Central University in 2000, and the Ph.D. degree from Institute of Computer Science, National Chung Hsing University in He is currently an assistant professor with the Department of Applied Informatics and Multimedia, Asia University. His research interests include computer graphics and information hiding algorithms for three-dimensional models and images. Seth Chen was born in Tainan in He received the B.S. degree from the Department of Mechanical Engineering, National Cheng Kung University in He is currently the Director of Innovative DigiTech-Enabled Applications & Services Institute, Institute for Information Industry. His research interests include cloud computing, distributed database, and innovative service application. Jonathan Tsai was born in Chiayi in He received the B.S. degree from the Department of Computer Science and Information Management, Soochow University in He is currently a section manager with Innovative DigiTech- Enabled Applications & Services Institute, Institute for Information Industry. His research interests include cloud computing, distributed database, and software engineering. Wen-Lung Tsai was born in Taipei in He received the B.S. degree from the Ming Chuan University in 1998 and the M.S. degree from the Chinese Culture University in He is currently an engineer with the Innovative DigiTech-Enabled Applications & Services Institute, Institute for Information Industry. He is also pursuing the Ph.D. degree with the Department of Information Management in National Central University. His research interests include cloud computing, distributed database, software engineering, and project management.

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-

More information

Hosting Transaction Based Applications on Cloud

Hosting Transaction Based Applications on Cloud Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India

More information

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Storage Systems Autumn 2009 Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Scaling RAID architectures Using traditional RAID architecture does not scale Adding news disk implies

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

This paper defines as "Classical"

This paper defines as Classical Principles of Transactional Approach in the Classical Web-based Systems and the Cloud Computing Systems - Comparative Analysis Vanya Lazarova * Summary: This article presents a comparative analysis of

More information

The Sierra Clustered Database Engine, the technology at the heart of

The Sierra Clustered Database Engine, the technology at the heart of A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel

More information

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol-1, Iss.-3, JUNE 2014, 54-58 IIST SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE

More information

Distributed Data Stores

Distributed Data Stores Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High

More information

Towards Full-fledged XML Fragmentation for Transactional Distributed Databases

Towards Full-fledged XML Fragmentation for Transactional Distributed Databases Towards Full-fledged XML Fragmentation for Transactional Distributed Databases Rebeca Schroeder 1, Carmem S. Hara (supervisor) 1 1 Programa de Pós Graduação em Informática Universidade Federal do Paraná

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,

More information

The relative simplicity of common requests in Web. CAP and Cloud Data Management COVER FEATURE BACKGROUND: ACID AND CONSISTENCY

The relative simplicity of common requests in Web. CAP and Cloud Data Management COVER FEATURE BACKGROUND: ACID AND CONSISTENCY CAP and Cloud Data Management Raghu Ramakrishnan, Yahoo Novel systems that scale out on demand, relying on replicated data and massively distributed architectures with clusters of thousands of machines,

More information

Report Data Management in the Cloud: Limitations and Opportunities

Report Data Management in the Cloud: Limitations and Opportunities Report Data Management in the Cloud: Limitations and Opportunities Article by Daniel J. Abadi [1] Report by Lukas Probst January 4, 2013 In this report I want to summarize Daniel J. Abadi's article [1]

More information

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk. Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated

More information

Survey on Load Rebalancing for Distributed File System in Cloud

Survey on Load Rebalancing for Distributed File System in Cloud Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Cloud DBMS: An Overview Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Outline Definition and requirements S through partitioning A through replication Problems of traditional DDBMS Usage analysis: operational

More information

CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING

CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING Journal homepage: http://www.journalijar.com INTERNATIONAL JOURNAL OF ADVANCED RESEARCH RESEARCH ARTICLE CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING R.Kohila

More information

Communication System Design Projects

Communication System Design Projects Communication System Design Projects PROFESSOR DEJAN KOSTIC PRESENTER: KIRILL BOGDANOV KTH-DB Geo Distributed Key Value Store DESIGN AND DEVELOP GEO DISTRIBUTED KEY VALUE STORE. DEPLOY AND TEST IT ON A

More information

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

More information

Eventually Consistent

Eventually Consistent Historical Perspective In an ideal world there would be only one consistency model: when an update is made all observers would see that update. The first time this surfaced as difficult to achieve was

More information

SCHEDULING IN CLOUD COMPUTING

SCHEDULING IN CLOUD COMPUTING SCHEDULING IN CLOUD COMPUTING Lipsa Tripathy, Rasmi Ranjan Patra CSA,CPGS,OUAT,Bhubaneswar,Odisha Abstract Cloud computing is an emerging technology. It process huge amount of data so scheduling mechanism

More information

Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales

Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales Bogdan Aurel Vancea May 2014 1 Introduction F1 [1] is a distributed relational database developed by Google

More information

Introduction to NOSQL

Introduction to NOSQL Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

A survey of big data architectures for handling massive data

A survey of big data architectures for handling massive data CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - jordydomingos@gmail.com Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context

More information

A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA. Technology, Coimbatore. Engineering and Technology, Coimbatore.

A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA. Technology, Coimbatore. Engineering and Technology, Coimbatore. A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA 1 V.N.Anushya and 2 Dr.G.Ravi Kumar 1 Pg scholar, Department of Computer Science and Engineering, Coimbatore Institute

More information

Review of Query Processing Techniques of Cloud Databases Ruchi Nanda Assistant Professor, IIS University Jaipur.

Review of Query Processing Techniques of Cloud Databases Ruchi Nanda Assistant Professor, IIS University Jaipur. Suresh Gyan Vihar University Journal of Engineering & Technology (An International Bi Annual Journal) Vol. 1, Issue 2, 2015,pp.12-16 ISSN: 2395 0196 Review of Query Processing Techniques of Cloud Databases

More information

How To Build Cloud Storage On Google.Com

How To Build Cloud Storage On Google.Com Building Scalable Cloud Storage Alex Kesselman alx@google.com Agenda Desired System Characteristics Scalability Challenges Google Cloud Storage What does a customer want from a cloud service? Reliability

More information

Optimizing Performance. Training Division New Delhi

Optimizing Performance. Training Division New Delhi Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

Efficient Data Replication Scheme based on Hadoop Distributed File System

Efficient Data Replication Scheme based on Hadoop Distributed File System , pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,

More information

NoSQL. Thomas Neumann 1 / 22

NoSQL. Thomas Neumann 1 / 22 NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,

More information

Data Consistency on Private Cloud Storage System

Data Consistency on Private Cloud Storage System Volume, Issue, May-June 202 ISS 2278-6856 Data Consistency on Private Cloud Storage System Yin yein Aye University of Computer Studies,Yangon yinnyeinaye.ptn@email.com Abstract: Cloud computing paradigm

More information

Although research on distributed database systems. Consistency Tradeoffs in Modern Distributed Database System Design COVER FEATURE

Although research on distributed database systems. Consistency Tradeoffs in Modern Distributed Database System Design COVER FEATURE COVER FEATURE Consistency Tradeoffs in Modern Distributed Database System Design Daniel J. Abadi, Yale University The CAP theorem s impact on modern distributed database system design is more limited than

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

bigdata Managing Scale in Ontological Systems

bigdata Managing Scale in Ontological Systems Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural

More information

The Cloud Trade Off IBM Haifa Research Storage Systems

The Cloud Trade Off IBM Haifa Research Storage Systems The Cloud Trade Off IBM Haifa Research Storage Systems 1 Fundamental Requirements form Cloud Storage Systems The Google File System first design consideration: component failures are the norm rather than

More information

Load Distribution in Large Scale Network Monitoring Infrastructures

Load Distribution in Large Scale Network Monitoring Infrastructures Load Distribution in Large Scale Network Monitoring Infrastructures Josep Sanjuàs-Cuxart, Pere Barlet-Ros, Gianluca Iannaccone, and Josep Solé-Pareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu

More information

Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara

Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara Sudipto Das (Microsoft summer intern) Shyam Antony (Microsoft now) Aaron Elmore (Amazon summer intern)

More information

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011 A Review of Column-Oriented Datastores By: Zach Pratt Independent Study Dr. Maskarinec Spring 2011 Table of Contents 1 Introduction...1 2 Background...3 2.1 Basic Properties of an RDBMS...3 2.2 Example

More information

Scalable Multiple NameNodes Hadoop Cloud Storage System

Scalable Multiple NameNodes Hadoop Cloud Storage System Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai

More information

Load Balancing on a Grid Using Data Characteristics

Load Balancing on a Grid Using Data Characteristics Load Balancing on a Grid Using Data Characteristics Jonathan White and Dale R. Thompson Computer Science and Computer Engineering Department University of Arkansas Fayetteville, AR 72701, USA {jlw09, drt}@uark.edu

More information

Cloud Storage Solution for WSN in Internet Innovation Union

Cloud Storage Solution for WSN in Internet Innovation Union Cloud Storage Solution for WSN in Internet Innovation Union Tongrang Fan, Xuan Zhang and Feng Gao School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang, 050043, China

More information

Transactions Management in Cloud Computing

Transactions Management in Cloud Computing Transactions Management in Cloud Computing Nesrine Ali Abd-El Azim 1, Ali Hamed El Bastawissy 2 1 Computer Science & information Dept., Institute of Statistical Studies & Research, Cairo, Egypt 2 Faculty

More information

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* Junho Jang, Saeyoung Han, Sungyong Park, and Jihoon Yang Department of Computer Science and Interdisciplinary Program

More information

An Approach Towards Customized Multi- Tenancy

An Approach Towards Customized Multi- Tenancy I.J.Modern Education and Computer Science, 2012, 9, 39-44 Published Online September 2012 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijmecs.2012.09.05 An Approach Towards Customized Multi- Tenancy

More information

Tree-Based Consistency Approach for Cloud Databases

Tree-Based Consistency Approach for Cloud Databases Tree-Based Consistency Approach for Cloud Databases Md. Ashfakul Islam Susan V. Vrbsky Department of Computer Science University of Alabama What is a cloud? Definition [Abadi 2009] shift of computer processing,

More information

Benchmarking and Analysis of NoSQL Technologies

Benchmarking and Analysis of NoSQL Technologies Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The

More information

Loose Coupling between Cloud Computing Applications and Databases: A Challenge to be Hit

Loose Coupling between Cloud Computing Applications and Databases: A Challenge to be Hit International Journal of Computer Systems (ISSN: 2394-1065), Volume 2 Issue 3, March, 2015 Available at http://www.ijcsonline.com/ Loose Coupling between Cloud Computing Applications and Databases: A Challenge

More information

A B S T R A C T. Index Terms : Apache s Hadoop, Map/Reduce, HDFS, Hashing Algorithm. I. INTRODUCTION

A B S T R A C T. Index Terms : Apache s Hadoop, Map/Reduce, HDFS, Hashing Algorithm. I. INTRODUCTION Speed- Up Extension To Hadoop System- A Survey Of HDFS Data Placement Sayali Ashok Shivarkar, Prof.Deepali Gatade Computer Network, Sinhgad College of Engineering, Pune, India 1sayalishivarkar20@gmail.com

More information

Cloud Storage Solution for WSN Based on Internet Innovation Union

Cloud Storage Solution for WSN Based on Internet Innovation Union Cloud Storage Solution for WSN Based on Internet Innovation Union Tongrang Fan 1, Xuan Zhang 1, Feng Gao 1 1 School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang,

More information

USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES

USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES 1 ALIREZA POORDAVOODI, 2 MOHAMMADREZA KHAYYAMBASHI, 3 JAFAR HAMIN 1, 3 M.Sc. Student, Computer Department, University of Sheikhbahaee,

More information

Managing Documents with NoSQL in Service Oriented Architecture

Managing Documents with NoSQL in Service Oriented Architecture Managing Documents with NoSQL in Service Oriented Architecture Milorad P. Stević, The Higher Education Technical School of Professional Studies, Novi Sad, Serbia, milorad.stevic@live.com Abstract The need

More information

MapReduce With Columnar Storage

MapReduce With Columnar Storage SEMINAR: COLUMNAR DATABASES 1 MapReduce With Columnar Storage Peitsa Lähteenmäki Abstract The MapReduce programming paradigm has achieved more popularity over the last few years as an option to distributed

More information

Daniel J. Adabi. Workshop presentation by Lukas Probst

Daniel J. Adabi. Workshop presentation by Lukas Probst Daniel J. Adabi Workshop presentation by Lukas Probst 3 characteristics of a cloud computing environment: 1. Compute power is elastic, but only if workload is parallelizable 2. Data is stored at an untrusted

More information

AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS

AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS INTERNATIONAL JOURNAL OF REVIEWS ON RECENT ELECTRONICS AND COMPUTER SCIENCE AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS Koyyala Vijaya Kumar 1, L.Sunitha 2, D.Koteswar Rao

More information

A Survey Paper: Cloud Computing and Virtual Machine Migration

A Survey Paper: Cloud Computing and Virtual Machine Migration 577 A Survey Paper: Cloud Computing and Virtual Machine Migration 1 Yatendra Sahu, 2 Neha Agrawal 1 UIT, RGPV, Bhopal MP 462036, INDIA 2 MANIT, Bhopal MP 462051, INDIA Abstract - Cloud computing is one

More information

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract)

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract) Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract) Johann Eder 1, Heinz Frank 1, Tadeusz Morzy 2, Robert Wrembel 2, Maciej Zakrzewicz 2 1 Institut für Informatik

More information

Keywords: Regression testing, database applications, and impact analysis. Abstract. 1 Introduction

Keywords: Regression testing, database applications, and impact analysis. Abstract. 1 Introduction Regression Testing of Database Applications Bassel Daou, Ramzi A. Haraty, Nash at Mansour Lebanese American University P.O. Box 13-5053 Beirut, Lebanon Email: rharaty, nmansour@lau.edu.lb Keywords: Regression

More information

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,

More information

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 1681 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 1681 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 1681 Software as a Model for Security in Cloud over Virtual Environments S.Vengadesan, B.Muthulakshmi PG Student,

More information

Data Distribution with SQL Server Replication

Data Distribution with SQL Server Replication Data Distribution with SQL Server Replication Introduction Ensuring that data is in the right place at the right time is increasingly critical as the database has become the linchpin in corporate technology

More information

DISTRIBUTED AND PARALLELL DATABASE

DISTRIBUTED AND PARALLELL DATABASE DISTRIBUTED AND PARALLELL DATABASE SYSTEMS Tore Risch Uppsala Database Laboratory Department of Information Technology Uppsala University Sweden http://user.it.uu.se/~torer PAGE 1 What is a Distributed

More information

Big Data & Scripting storage networks and distributed file systems

Big Data & Scripting storage networks and distributed file systems Big Data & Scripting storage networks and distributed file systems 1, 2, adaptivity: Cut-and-Paste 1 distribute blocks to [0, 1] using hash function start with n nodes: n equal parts of [0, 1] [0, 1] N

More information

Data Deduplication Scheme for Cloud Storage

Data Deduplication Scheme for Cloud Storage 26 Data Deduplication Scheme for Cloud Storage 1 Iuon-Chang Lin and 2 Po-Ching Chien Abstract Nowadays, the utilization of storage capacity becomes an important issue in cloud storage. In this paper, we

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

An Open MPI-based Cloud Computing Service Architecture

An Open MPI-based Cloud Computing Service Architecture An Open MPI-based Cloud Computing Service Architecture WEI-MIN JENG and HSIEH-CHE TSAI Department of Computer Science Information Management Soochow University Taipei, Taiwan {wjeng, 00356001}@csim.scu.edu.tw

More information

Database Optimizing Services

Database Optimizing Services Database Systems Journal vol. I, no. 2/2010 55 Database Optimizing Services Adrian GHENCEA 1, Immo GIEGER 2 1 University Titu Maiorescu Bucharest, Romania 2 Bodenstedt-Wilhelmschule Peine, Deutschland

More information

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Chapter 1: Introduction. Database Management System (DBMS) University Database Example This image cannot currently be displayed. Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Database Management System (DBMS) DBMS contains information

More information

White Paper. Optimizing the Performance Of MySQL Cluster

White Paper. Optimizing the Performance Of MySQL Cluster White Paper Optimizing the Performance Of MySQL Cluster Table of Contents Introduction and Background Information... 2 Optimal Applications for MySQL Cluster... 3 Identifying the Performance Issues.....

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

How To Write A Database Program

How To Write A Database Program SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Cloud Based Distributed Databases: The Future Ahead

Cloud Based Distributed Databases: The Future Ahead Cloud Based Distributed Databases: The Future Ahead Arpita Mathur Mridul Mathur Pallavi Upadhyay Abstract Fault tolerant systems are necessary to be there for distributed databases for data centers or

More information

Distributed file system in cloud based on load rebalancing algorithm

Distributed file system in cloud based on load rebalancing algorithm Distributed file system in cloud based on load rebalancing algorithm B.Mamatha(M.Tech) Computer Science & Engineering Boga.mamatha@gmail.com K Sandeep(M.Tech) Assistant Professor PRRM Engineering College

More information

An Intelligent Approach for Integrity of Heterogeneous and Distributed Databases Systems based on Mobile Agents

An Intelligent Approach for Integrity of Heterogeneous and Distributed Databases Systems based on Mobile Agents An Intelligent Approach for Integrity of Heterogeneous and Distributed Databases Systems based on Mobile Agents M. Anber and O. Badawy Department of Computer Engineering, Arab Academy for Science and Technology

More information

Data Management in the Cloud. Zhen Shi

Data Management in the Cloud. Zhen Shi Data Management in the Cloud Zhen Shi Overview Introduction 3 characteristics of cloud computing 2 types of cloud data management application 2 types of cloud data management architecture Conclusion Introduction

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

An Open Market of Cloud Data Services

An Open Market of Cloud Data Services An Open Market of Cloud Data Services Verena Kantere Institute of Services Science, University of Geneva, Geneva, Switzerland verena.kantere@unige.ch Keywords: Abstract: Cloud Computing Services, Cloud

More information

A Comparison of General Approaches to Multiprocessor Scheduling

A Comparison of General Approaches to Multiprocessor Scheduling A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.

More information

Secure Cloud Transactions by Performance, Accuracy, and Precision

Secure Cloud Transactions by Performance, Accuracy, and Precision Secure Cloud Transactions by Performance, Accuracy, and Precision Patil Vaibhav Nivrutti M.Tech Student, ABSTRACT: In distributed transactional database systems deployed over cloud servers, entities cooperate

More information

IPv4 and IPv6: Connecting NAT-PT to Network Address Pool

IPv4 and IPv6: Connecting NAT-PT to Network Address Pool Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(5):547-553 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Intercommunication Strategy about IPv4/IPv6 coexistence

More information

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013 Database Management System Choices Introduction To Database Systems CSE 373 Spring 2013 Outline Introduction PostgreSQL MySQL Microsoft SQL Server Choosing A DBMS NoSQL Introduction There a lot of options

More information

2.1.5 Storing your application s structured data in a cloud database

2.1.5 Storing your application s structured data in a cloud database 30 CHAPTER 2 Understanding cloud computing classifications Table 2.3 Basic terms and operations of Amazon S3 Terms Description Object Fundamental entity stored in S3. Each object can range in size from

More information

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Study and Comparison of Elastic Cloud Databases : Myth or Reality? Université Catholique de Louvain Ecole Polytechnique de Louvain Computer Engineering Department Study and Comparison of Elastic Cloud Databases : Myth or Reality? Promoters: Peter Van Roy Sabri Skhiri

More information

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

More information

Network Monitoring. Chu-Sing Yang. Department of Electrical Engineering National Cheng Kung University

Network Monitoring. Chu-Sing Yang. Department of Electrical Engineering National Cheng Kung University Network Monitoring Chu-Sing Yang Department of Electrical Engineering National Cheng Kung University Outline Introduction Network monitoring architecture Performance monitoring Fault monitoring Accounting

More information

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,

More information

Elasticity in Multitenant Databases Through Virtual Tenants

Elasticity in Multitenant Databases Through Virtual Tenants Elasticity in Multitenant Databases Through Virtual Tenants 1 Monika Jain, 2 Iti Sharma Career Point University, Kota, Rajasthan, India 1 jainmonica1989@gmail.com, 2 itisharma.uce@gmail.com Abstract -

More information

Locality Based Protocol for MultiWriter Replication systems

Locality Based Protocol for MultiWriter Replication systems Locality Based Protocol for MultiWriter Replication systems Lei Gao Department of Computer Science The University of Texas at Austin lgao@cs.utexas.edu One of the challenging problems in building replication

More information

A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications

A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications Li-Yan Yuan Department of Computing Science University of Alberta yuan@cs.ualberta.ca Lengdong

More information

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster , pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing

More information

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner 24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India

More information