A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications
|
|
|
- Lydia Jefferson
- 10 years ago
- Views:
Transcription
1 A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications Li-Yan Yuan Department of Computing Science University of Alberta [email protected] Lengdong Wu Department of Computing Science University of Alberta [email protected] Yan Chi Shanghai Shifang Software [email protected] Jia-Huai You Department of Computing Science University of Alberta [email protected] ABSTRACT We propose to demonstrate Rubato DB, a highly scalable NewSQL system, supporting various consistency levels from ACID to BASE for big data applications. Rubato DB employs the staged grid architecture with a novel formula based protocol for distributed concurrency control. Our demonstration will present Rubato DB as one NewSQL database management system running on a collection of commodity servers against two of benchmark sets. The demo attendees can modify the configuration of system size, fine-tune the query workload, and visualize the performance on the fly by the graphical user interface. Attendees can experiment with various system scales, and thus grasp the potential scalability of Rubato DB, whose performance, with the increase of the number of servers used, can achieve a linear growth for both OLTP application with the strong consistency properties and key-value storage applications with the weak consistency properties. Categories and Subject Descriptors H.2.4 [Database Management]: Systems Concurrency, Transaction processing; H.3.4 [Information Storage and Retrieval]: Systems and Software Distributed systems Keywords Scalability; Concurrency Control; ACID; BASE 1. INTRODUCTION Today, data is flowing into organizations at an unprecedented scale in the world. The ability to scale out for processing an enhanced workload has become an important factor for the proliferation and popularization of data management system. The pursuit for tackling the big data chal- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMOD 2015 Melbourne, Victoria, Australia Copyright 20XX ACM X-XXXXX-XX-X/XX/XX...$ lenges has given rise to a plethora of data management systems characterized by high scalability. Diverse systems for processing big data explore various possibilities in the infrastructure design space. The trend of using lower-end, commodity servers to scale out configurations has become popular, due to the drop in prices for the hardware and the improvement in performance and reliability. A notable phenomena is the NoSQL movement that began in early 2009 and is growing rapidly. NoSQL systems are characterized as simplified highly scalable systems addressing properties of schema-free, simple-api, horizontal scalability and relaxed consistency [2, 5, 8]. The recently proposed NewSQL systems (relative to NoSQL) aim to achieve the high scalability and availability same as NoSQL, while preserving the ACID properties for transactions and complex functionality of relational databases [11]. NewSQL database systems are appropriate in applications where traditional RDBMS have been used, but requiring additional scalability and performance enhancement. According to the core requirement differences between NoSQL systems and NewSQL systems, there are challenges in the implementation of NewSQL systems in the following aspects: Data model. NoSQL systems usually merely support non-relational data model such as key-value stores [5] and Bigtable-like stores [1]. NoSQL systems represent a recent evolution by making trade-off between scalability and complexity of the data model. Thus the implementation of NewSQL systems faces with the first issue: Is the complex relational data model an obstacle of scalability? Consistency model. The NoSQL systems represent an evolution in building scalable infrastructure by sacrificing strong consistency and opting for weak consistency [2, 5, 8]. However, strong consistency levels such as ACID properties are essential for NewSQL systems to manage critical data which requires both liveness and safety guarantees. The implementation of NewSQL systems meets the second challenge: Is it possible to achieve high scalability with ACID?
2 Architecture. Traditional database vendors employ new techniques to explore the scalability of the symmetric multiple processing (SMP) architecture and the massively parallel processing (MPP) architecture. However, because of the inherent deficiencies due to the resources contention, the scalability of such architectures is not comparable with NoSQL systems. Some NoSQL systems based on the Bigtable [1] and MapReduce framework [4] utilize the shared-nothing infrastructure to provide high scalability. A set of systems with high-level declarative languages, including Yahoo! Pig [9] and Facebook Hive [13], are realized to compile queries into the MapReduce framework on the Hadoop platform. To achieve high performance and scalability, innovative software architecture should be applied to NewSQL systems, then we need to consider: Is it possible to scale out commonly used single server database system design? We have developed a highly scalable NewSQL system, called Rubato DB [17] with satisfactory solutions for the challenges as mentioned above, by applying the principles as following: Exploring the close integration of data partitioning and relational data model to eschew data skew for transactions. Obtaining high scalability with the MapReduce [4] seviceoriented architecture. Scaling out the concurrency control protocol for ACID based on timestamps rather than any locking mechanism. Overcoming the weakness of BASE through stronger consistency models such as BASIC [16]. The implementation of Rubato DB combines and extends the ideas from: hybrid data partitioning, staged-event driven architecture (SEDA) [15], data-intensive MapReduce framework [4], and distributed timestampbased concurrency control; and using innovative technologies such as Formula based protocol, software instruction, lazy loading, dynamic timestamp ordering, multiple communication protocols. Rubato DB is a highly scalable NewSQL database system running on a collection of commodity servers that supports various consistency guarantees 1 including the traditional ACID and BASE, and a novel consistency level in between, i.e., BASIC [16]; and conforms with the SQL2003; and has been used in some commercial applications. The proposed demo is to demonstrate and verify the performance and scalability of Rubato DB under both TPC-C benchmark [14] and YCSB benchmark [3]. More specifically, the demonstration will allow demo attendees to try out Rubato DB for the following objectives: 1 The name Rubato, (from Italian rubare, to rob ), in music, means subtle rhythmic manipulation and nuance in performance. It describes the practice of Rubato DB for supporting various consistencies with freedom. 1. Feasible Configuration. To deploy Rubato DB on a collection of commodity servers, ranging from 1 to Data Loading. To load large size of data distributed across numbers of nodes automatically according to pre-defined data partitioning schema. 3. High Scalability. To demonstrate the linear scalability of Rubato DB for typical OLTP applications requiring the ACID properties under TPC-C benchmark. 4. Various Consistency Levels. To compare performances of Rubato DB with either BASE or BASIC properties and other NoSQL systems with BASE properties under YCSB benchmark. A friendly Graphical User Interface will be provided in the proposed demo to enable demo attendees to choose different experiments and deployment configurations, and to display performance results and internal working loads. The rest of the paper is organized as follows. Section 2 describes the architecture of Rubato DB and some of its key features; and Section 3 outlines the experiment setups and the demonstration we plan to show. The conclusion is in Section OVERVIEW OF RUBATODB Rubato DB is a scalable NewSQL database system supporting relational data model for data-centric applications. The main architectural components of Rubato DB are depicted in Figure 1 [17]. We firstly give a brief overview of the several essential Rubato DB components as below, before describing the contents of our demonstration. Figure 1: RubatoDB system architecture 2.1 Socket Monitor Rubato DB runs on a collection of servers as one database management system, with a single socket monitor to establish and record the connection states for all client requests. The socket monitor adopts a loading control schema, called lazy loading, to reduce data contention and to minimize unnecessary rollbacks. The socket monitor maintains two lists of clients: a list of active clients with active transactions, and a list of requesting clients whose requests are waiting to be evaluated. The socket monitor will not process any request when the oldest waiting transaction is not among the oldest 20% ones base on the following practical principle:
3 If all the current requests have higher potential to conflict with other requests, the system should rather wait awhile for new requests with lower conflict potential to arrive. 2.2 SQL Engine Rubato DB s SQL engine is used to process all the SQL queries, including aggregate functions and nested queries, updates, and all other requests according to SQL2003. The SQL engine is composed as a set of staged grid modules, each of which is a self-contained software module consisting of an incoming request queue, as illustrated in Figure 2. Threads within each staged grid module operate by pulling a sequence of requests, one at a time, off the input queue, invoking the application-supplied event handler (e.g. parser, optimizer, query, update, etc) to process requests, and dispatching outgoing results by enqueuing them on the request queues of outgoing staged grid module, located either in the same server or another server within LAN. Each request to the system will be processed in a relay-style by being passed from one staged grid module to the next one until it is completed. Both parallelism and pipeline execution are supported by the engine. A set of software instructions are employed to carry the operation s backpack with its private state and data. The instruction with a uniform format is the only packet flowing through different staged grid modules. To improve the performance, multiple communication protocols are utilized depending on the source and destination locations of stages and/or system resources. Figure 2: Staged Modules of SQL Engine the serializability is respected, which will increase the degree of concurrency. The two parts of the transaction manager perform their respective responsibilities: (a) Transaction stage. A dedicated stage grid module that is located in every grid server and is responsible for all the basic transactional operations including pre commit, commit and rollback of the transaction manager. (b) Formula DB. A thread-free layer on the top of the Berkeley DB such that all disk accesses in Rubato DB are through Formula DB. Three different consistency levels are supported by Rubato DB: 1. ACID. The strongest end of the consistency spectrum for the transactional functionalities. 2. BASE. The most notable weak consistency model used by NoSQL systems [2, 5, 8]. The BASE can be summarized as: the system responses basically all the time (Basically Available), is not necessary to be always consistent (Soft-state), but has to come to a consistent state eventually (Eventual consistency) [10]. 3. BASIC. Rubato DB is not limited to merely ACID that is too strong and BASE which is too weak, but rather supports BASIC, a spectrum between these two extreme. BASIC stands for Basic Availability, Scalability, Instant Consistency [16]. BASE and BASIC provide different choices between the model that is faster but requiring extra efforts to deal with inconsistent results and the model that delivers consistent results but is relatively slower with higher latency. Figure 3 illustrates the various consistency levels supported by Rubato DB together with different tolerance for fault or network partition. 2.3 Transaction Manager The transaction manager, consisting of Transaction stage and Formula DB, is responsible for coordinating data access on multiple nodes based on a novel formula protocol for concurrency (FPC) to ensure serializability. It is the transaction manager that performas all the transactional operations, including pre-commit (necessary for distributed concurrency), commit, and rollback. The FPC is a novel implementation of the classical Multiversion Timestamp Concurrency Control Protocol [12], with two distinct features: 1. Instead of using multiple versions of updated data items, FPC uses formulas stored in memory (associated with the updated data items) to represent the multiple values of updated data items, which will reduce overhead of storing multiple data versions. 2. The timestamp ordering of transactions may be altered to allow a transaction with older timestamp to read the data item updated by a later transaction, as long as Figure 3: Generalized CAP Theorem [16] 2.4 Storage Manager Rubato DB uses a hybrid data storage partition solution to allow a table to be partitioned vertically and horizontally and stored separately over different grid nodes.
4 2. Data Loading. How to load large size of data distributed across numbers of nodes automatically according to the predefined data partitioning schema? 3. High Scalability. What is the scalability of Rubato DB for the typical OLTP applications requiring the ACID properties? Figure 4: Hybrid partition of Storage Manager 4. Various Consistency Levels. What are performance comparisons between Rubato DB with either BASE or BASIC properties and other NoSQL systems with BASE property for big data applications? A graphical user interface for Rubato DB will be provided to facilitate users to explore Rubato DB. Figure 5: Grid partition illustration Storage manager uses Berkeley DB for all disk accesses to direct attached storages (DAS). All tables and their partitions if any are stored on local disk as a Berkeley DB file. To obtain maximum performance, when a table is created according to the schema, the storage manager provides facilities for users to specify fine-gained hybrid storage partitioning based on application semantics and query workload. As depicted in Figure 4, the column partition enables users to store a single relation as a collection of disjoint (none-key columns) vertical partitions of different groups. Columns frequently accessed together are clustered into the same frequent column group, while columns seldom accessed are categorized into static column groups. Compression can be also applied to take the benefits of column-oriented layout. The grid partition is used to deploy data on multiple nodes and thus provides scale-out capability. Rubato DB applies a tree-based schema for grid partitioning, as demonstrated in Figure 5. Tuples in every descendent table are partitioned according to the ancestor that they descend from. As a consequence, data accessed within a transaction will be located in the same data node [6, 7]. This tree structure implies that corresponding to every row in the root table, there are a group of related rows in the descendant tables. All rows inherent from the same root are guaranteed to be co-located. 3. DEMONSTRATION The proposed demo is to demonstrate and verify the performance of Rubato DB under both TPC-C and YCSB benchmark tests [3, 14]. The demonstration will allow demo attendees to try out Rubato DB for the following objectives: 1. Feasible Configuration. How to deploy Rubato DB on a collection of commodity servers? 3.1 Setup Details All the demo will be conducted on a collection of four (4) commodity servers with 4 quad-core Intel Xeon CPUs, 64 GB of main memory, SATA disks configured in RAID0, running Linux Ubuntu LTS. For simplicity, each server may be used as 1 to 4 nodes, that is, Rubato DB may be configured and then deployed on all four servers as 16 nodes such that all nodes will be connected only through the TCP/IP protocol. Our demo is based on two well-known representative benchmarks. We use the TPC-C [14] benchmark s set of transactions, database schema, and pre-generated data as our target domain for demo about OLTP workloads. All the demos conducted are according to the TPC-C specification. In addition, demo attendees can also take the Yahoo! Cloud Serving Benchmark (YCSB) [3] on Rubato DB. The number of operations per second will be reported with varying distribution of read/write operations. In order to demonstrate most important features of Rubato DB within a limited time, we have pre-load all the initial databases as per the benchmark specifications. For the TPC-C benchmark, we have populated 2000 warehouses, and all TPC-C tables are column-partitioned in to frequent and static column groups; and then grid-partitioned across different nodes according to a schema structure. The number of warehouses deployed is proportional to the system size chosen by the attendee. For YCSB which is relatively simple, we define a multicolumn structure for each record, which consists of one key column of integer type and 20 data columns of text type. We pre-load 100 million 1KB records on each node, resulting 120GB of raw data per node. 3.2 Demonstration Details The proposed demonstration will perform both TPC-C and YCSB benchmarks on various configurations, including the number of nodes (scalability), the number of warehouses (the work load for TPC-C), and weights in read/write (the workload for YCSB). The following table summarizes the results displayed for various tasks and/or configurations. Task Nodes Work load Display TPC-C tmpc (warehouses) CPU/Memory Usage YCSB % - 10% Throughput (write operations) Latency
5 3.3 User Interaction To facilitate interaction with Rubato DB during the demonstration, a friendly graphical user interface is provided, as shown in the mock-up in Figure 6. warehouses) running on a system instance with 16 nodes. The result under the TPC-C benchmark test clearly shows that: The performance of Rubato DB scales up linearly with the increase of the number of nodes deployed.the rollback ratio is stable at a low level. Figure 6: Graphical User Interface of Rubato DB The interface consists of the following panels. Figure 7: Expected Results of TPC-C Benchmark 1. Configuration Panel: This panel enable users to choose (a) the demo tasks: TPC-C or YCSB tests; (b) the number of nodes, from 1 to 16 nodes; (c) the workload: the number of warehouses for TPC- C and the read/write weight for YCSB; (d) the different consistency levels: ACID, BASIC, BASE. 2. Performance Monitor Panel: This panel displays the demo performance, including the number of transactions/minute (TPC-C) or the number of operations/second (YCSB), the CPU time and the memory usage, and the running status of each node. 3. Historical Graph Panel: It is used to visualize the change curve of throughput as well as the CPU time and memory usage of each node, as specified by the user. 4. SQL Panel: An SQL console is provided for users to query the Rubato DB as well to manipulate workload control. 3.4 Performance Expectation Demo attendees can obtain the performance of Rubato DB using the GUI. Here we provide the expected performance results so that attendees can make the comparison and verify the features of Rubato DB. Figure 7 demonstrates that the performance (tpmc) of the TPC-C benchmark test of Rubato DB increases from 5000 concurrent clients (500 warehouses) running on a system instance with 1-node, to 67,000 concurrent clients (6700 Figure 8: Expected Throughout Comparison of YCSB For the results of YCSB, though users can set any proportion for read/write operation, we present the performance for a typical read-intensive workloads: read intensive (90% read and 10% write operations). Also, we present the comparison with Cassandra 2 [8]: an open-source key-value store clone of Dynamo [5], HBase 3 : an open-source bigtable-like system [1]. The experiments are conducted with the number of nodes as 1, 2, 4, 8 and 12. This result sets shown in Figure 9 can clearly demonstrate that: The performance and scalability of Rubato DB is comparable with that of other popular scalable NoSQL system. By demonstrating the performance of Rubato DB with BASIC properties, we can also show that RubatoDB with
6 Figure 9: Expected Latency Comparison of YCSB BASIC properties still preserves near-linear scalability with increasing throughput and flat latency, same as systems with BASE. We can conclude that: The cost induced by BASIC is acceptable comparing with the extra efforts needed to manipulate the inconsistent soft states for BASE. BASIC pays a reasonable price for a higher consistency than BASE. 4. CONCLUSION We present a demonstration of Rubato DB, a scalable NewSQL system, showcasing the main feature of scalability with various consistency properties from ACID to BASE, which provides a positive answer to the question on the trade-off between scalability and consistency. The development of NewSQL database systems is a topic with strong interest over recent years and with a great potential impact on data management, especially in the face of big data challenges. With experiments and explorations on our pioneering Rubato DB system, we are exploiting road for the design and implementation of the new generation of database systems. 5. REFERENCES [1] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems, 26(2):4, [2] B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo! s hosted data serving platform. Proceedings of the VLDB Endowment, 1(2): , [3] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proc. the 1st ACM Symposium on Cloud computing, pages ACM, [4] J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1): , [5] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon s highly available key-value store. ACM SIGOPS Operating Systems Review, 41(6): , [6] M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. Hyrise: a main memory hybrid storage engine. Proc. the VLDB Endowment, 4(2): , [7] E. P. Jones, D. J. Abadi, and S. Madden. Low overhead concurrency control for partitioned main memory databases. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, pages ACM, [8] A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35 40, [9] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In Proc. the 2008 ACM SIGMOD International Conference on Management of Data, pages ACM, [10] D. Pritchett. Base: An acid alternative. Queue, 6(3):48 55, [11] M. Stonebraker. New opportunities for new sql. Communications of the ACM, 55(11):10 11, [12] R. Thomas. A solution to the concurrency control problem for multiple copy databases. In Digest of papers IEEE COMPCON spring, pages 56 62, [13] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proc. the VLDB Endowment, 2(2): , [14] TPC-C [15] M. Welsh, D. Culler, and E. Brewer. Seda: an architecture for well-conditioned, scalable internet services. ACM SIGOPS Operating Systems Review, 35(5): , [16] L. Wu, L. Yuan, and J. You. Basic, an alternative to base for large-scale data management system. In 2014 IEEE International Conference on Big Data, [17] L. Yuan, L. Wu, J. You, and Y. Chi. Rubato db: a highly scalable staged grid database system for oltp and big data applications. In Proc. the 23rd ACM International Conference on Conference on Information and Knowledge Management, pages ACM, 2014.
Evaluation of NoSQL and Array Databases for Scientific Applications
Evaluation of NoSQL and Array Databases for Scientific Applications Lavanya Ramakrishnan, Pradeep K. Mantha, Yushu Yao, Richard S. Canon Lawrence Berkeley National Lab Berkeley, CA 9472 [lramakrishnan,pkmantha,yyao,scanon]@lbl.gov
Cloud Data Management: A Short Overview and Comparison of Current Approaches
Cloud Data Management: A Short Overview and Comparison of Current Approaches Siba Mohammad Otto-von-Guericke University Magdeburg [email protected] Sebastian Breß Otto-von-Guericke University
Benchmarking and Analysis of NoSQL Technologies
Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The
A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems
A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
Which NoSQL Database? A Performance Overview
2014 by the authors; licensee RonPub, Lübeck, Germany. This article is an open access article distributed under the terms and conditions Veronika of the Creative Abramova, Commons Jorge Attribution Bernardino,
NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management
NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management A B M Moniruzzaman Department of Computer Science and Engineering, Daffodil International
AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS
INTERNATIONAL JOURNAL OF REVIEWS ON RECENT ELECTRONICS AND COMPUTER SCIENCE AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS Koyyala Vijaya Kumar 1, L.Sunitha 2, D.Koteswar Rao
MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM
MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM Julia Myint 1 and Thinn Thu Naing 2 1 University of Computer Studies, Yangon, Myanmar [email protected] 2 University of Computer
Hosting Transaction Based Applications on Cloud
Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India
Review of Query Processing Techniques of Cloud Databases Ruchi Nanda Assistant Professor, IIS University Jaipur.
Suresh Gyan Vihar University Journal of Engineering & Technology (An International Bi Annual Journal) Vol. 1, Issue 2, 2015,pp.12-16 ISSN: 2395 0196 Review of Query Processing Techniques of Cloud Databases
Loose Coupling between Cloud Computing Applications and Databases: A Challenge to be Hit
International Journal of Computer Systems (ISSN: 2394-1065), Volume 2 Issue 3, March, 2015 Available at http://www.ijcsonline.com/ Loose Coupling between Cloud Computing Applications and Databases: A Challenge
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Data Management Course Syllabus
Data Management Course Syllabus Data Management: This course is designed to give students a broad understanding of modern storage systems, data management techniques, and how these systems are used to
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
EXPERIMENTAL EVALUATION OF NOSQL DATABASES
EXPERIMENTAL EVALUATION OF NOSQL DATABASES Veronika Abramova 1, Jorge Bernardino 1,2 and Pedro Furtado 2 1 Polytechnic Institute of Coimbra - ISEC / CISUC, Coimbra, Portugal 2 University of Coimbra DEI
Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece [email protected].
Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece [email protected] Joining Cassandra Binjiang Tao Computer Science Department University of Crete Heraklion,
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
Introduction to NOSQL
Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
Transactions Management in Cloud Computing
Transactions Management in Cloud Computing Nesrine Ali Abd-El Azim 1, Ali Hamed El Bastawissy 2 1 Computer Science & information Dept., Institute of Statistical Studies & Research, Cairo, Egypt 2 Faculty
DISTRIBUTION OF DATA SERVICES FOR CORPORATE APPLICATIONS IN CLOUD SYSTEM
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE DISTRIBUTION OF DATA SERVICES FOR CORPORATE APPLICATIONS IN CLOUD SYSTEM Itishree Boitai 1, S.Rajeshwar 2 1 M.Tech Student, Dept of
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
USC Viterbi School of Engineering
USC Viterbi School of Engineering INF 551: Foundations of Data Management Units: 3 Term Day Time: Spring 2016 MW 8:30 9:50am (section 32411D) Location: GFS 116 Instructor: Wensheng Wu Office: GER 204 Office
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Slave. Master. Research Scholar, Bharathiar University
Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually
BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER
BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER TABLE OF CONTENTS INTRODUCTION WHAT IS BIG DATA?...
Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database
Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a
Communication System Design Projects
Communication System Design Projects PROFESSOR DEJAN KOSTIC PRESENTER: KIRILL BOGDANOV KTH-DB Geo Distributed Key Value Store DESIGN AND DEVELOP GEO DISTRIBUTED KEY VALUE STORE. DEPLOY AND TEST IT ON A
Data Management in the Cloud
Data Management in the Cloud Ryan Stern [email protected] : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
III. SYSTEM ARCHITECTURE
SQLMR : A Scalable Database Management System for Cloud Computing Meng-Ju Hsieh Institute of Information Science Academia Sinica, Taipei, Taiwan Email: [email protected] Chao-Rui Chang Institute of Information
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
JackHare: a framework for SQL to NoSQL translation using MapReduce
DOI 10.1007/s10515-013-0135-x JackHare: a framework for SQL to NoSQL translation using MapReduce Wu-Chun Chung Hung-Pin Lin Shih-Chang Chen Mon-Fong Jiang Yeh-Ching Chung Received: 15 December 2012 / Accepted:
Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis
Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis ABSTRACT E. Dede, M. Govindaraju SUNY Binghamton Binghamton, NY 13902 {edede,mgovinda}@cs.binghamton.edu Scientific
Survey of Large-Scale Data Management Systems for Big Data Applications
Lengdong Wu, Li-Yan Yuan, Jia-Huai You. JOURNAL OF COMPUTER SCIENCE AND TECHNOL- OGY : 1 Aug. 2014 Survey of Large-Scale Data Management Systems for Big Data Applications Lengdong Wu, Li-Yan Yuan, Jia-Huai
Scalable Multiple NameNodes Hadoop Cloud Storage System
Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai
Distributed Data Stores
Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra [email protected] Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA. Technology, Coimbatore. Engineering and Technology, Coimbatore.
A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA 1 V.N.Anushya and 2 Dr.G.Ravi Kumar 1 Pg scholar, Department of Computer Science and Engineering, Coimbatore Institute
Enhancing Massive Data Analytics with the Hadoop Ecosystem
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3, Issue 11 November, 2014 Page No. 9061-9065 Enhancing Massive Data Analytics with the Hadoop Ecosystem Misha
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Dynamo: Amazon s Highly Available Key-value Store
Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and
NoSQL Databases: a step to database scalability in Web environment
NoSQL Databases: a step to database scalability in Web environment Jaroslav Pokorny Charles University, Faculty of Mathematics and Physics, Malostranske n. 25, 118 00 Praha 1 Czech Republic +420-221914265
A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011
A Review of Column-Oriented Datastores By: Zach Pratt Independent Study Dr. Maskarinec Spring 2011 Table of Contents 1 Introduction...1 2 Background...3 2.1 Basic Properties of an RDBMS...3 2.2 Example
DIB: DATA INTEGRATION IN BIGDATA FOR EFFICIENT QUERY PROCESSING
DIB: DATA INTEGRATION IN BIGDATA FOR EFFICIENT QUERY PROCESSING P.Divya, K.Priya Abstract In any kind of industry sector networks they used to share collaboration information which facilitates common interests
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging
Yahoo! Cloud Serving Benchmark
Yahoo! Cloud Serving Benchmark Overview and results March 31, 2010 Brian F. Cooper [email protected] Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears System setup and
Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis
Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Daniel Gunter Lawrence Berkeley National Laboratory Berkeley, CA 94720 [email protected] Elif Dede Madhusudhan SUNY Binghamton
A Distribution Management System for Relational Databases in Cloud Environments
JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 11, NO. 2, JUNE 2013 169 A Distribution Management System for Relational Databases in Cloud Environments Sze-Yao Li, Chun-Ming Chang, Yuan-Yu Tsai, Seth
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
Structured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
Second Credit Seminar Presentation on Big Data Analytics Platforms: A Survey
Second Credit Seminar Presentation on Big Data Analytics Platforms: A Survey By, Mr. Brijesh B. Mehta Admission No.: D14CO002 Supervised By, Dr. Udai Pratap Rao Computer Engineering Department S. V. National
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING
BRAC UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING 12-12-2012 Investigating Cloud Data Storage Sumaiya Binte Mostafa (ID 08301001) Firoza Tabassum (ID 09101028) BRAC University
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
Cloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
NoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
Introduction to NoSQL
Introduction to NoSQL NoSQL Seminar 2012 @ TUT Arto Salminen What is NoSQL? Class of database management systems (DBMS) "Not only SQL" Does not use SQL as querying language Distributed, fault-tolerant
In-Memory Analytics for Big Data
In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...
Benchmarking Apache Accumulo BigData Distributed Table Store Using Its Continuous Test Suite
!000111333 IIIEEEEEEEEE IIInnnttteeerrrnnnaaatttiiiooonnnaaalll CCCooonnngggrrreeessssss ooonnn BBBiiiggg DDDaaatttaaa Benchmarking Apache Accumulo BigData Distributed Table Store Using Its Continuous
Scalable Queries For Large Datasets Using Cloud Computing: A Case Study
Scalable Queries For Large Datasets Using Cloud Computing: A Case Study James P. McGlothlin The University of Texas at Dallas Richardson, TX USA [email protected] Latifur Khan The University of
wow CPSC350 relational schemas table normalization practical use of relational algebraic operators tuple relational calculus and their expression in a declarative query language relational schemas CPSC350
Business Intelligence and Column-Oriented Databases
Page 12 of 344 Business Intelligence and Column-Oriented Databases Kornelije Rabuzin Faculty of Organization and Informatics University of Zagreb Pavlinska 2, 42000 [email protected] Nikola Modrušan
Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015
Big Data Technologies Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Situation: Bigger and Bigger Volumes of Data Big Data Use Cases Log Analytics (Web Logs, Sensor
Scalable Transaction Management on Cloud Data Management Systems
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 10, Issue 5 (Mar. - Apr. 2013), PP 65-74 Scalable Transaction Management on Cloud Data Management Systems 1 Salve
Scalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment
International Journal of Applied Information Systems (IJAIS) ISSN : 2249-868 Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment Yusuf Abubakar Department of Computer Science
A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage
Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
Integrating Hadoop and Parallel DBMS
Integrating Hadoop and Parallel DBMS Yu Xu Pekka Kostamaa Like Gao Teradata San Diego, CA, USA and El Segundo, CA, USA {yu.xu,pekka.kostamaa,like.gao}@teradata.com ABSTRACT Teradata s parallel DBMS has
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Accelerating Cassandra Workloads using SanDisk Solid State Drives
WHITE PAPER Accelerating Cassandra Workloads using SanDisk Solid State Drives February 2015 951 SanDisk Drive, Milpitas, CA 95035 2015 SanDIsk Corporation. All rights reserved www.sandisk.com Table of
Measuring Elasticity for Cloud Databases
Measuring Elasticity for Cloud Databases Thibault Dory, Boris Mejías Peter Van Roy ICTEAM Institute Univ. catholique de Louvain [email protected], [email protected], [email protected]
SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol-1, Iss.-3, JUNE 2014, 54-58 IIST SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE
High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo
High Availability for Database Systems in Cloud Computing Environments Ashraf Aboulnaga University of Waterloo Acknowledgments University of Waterloo Prof. Kenneth Salem Umar Farooq Minhas Rui Liu (post-doctoral
Scalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
A Survey of Distributed Database Management Systems
Brady Kyle CSC-557 4-27-14 A Survey of Distributed Database Management Systems Big data has been described as having some or all of the following characteristics: high velocity, heterogeneous structure,
How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)
WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
