Performance of Scalable Data Stores in Cloud
|
|
|
- Stephen Austin
- 10 years ago
- Views:
Transcription
1 Performance of Scalable Data Stores in Cloud Pankaj Deep Kaur, Gitanjali Sharma Abstract Cloud computing has pervasively transformed the way applications utilized underlying infrastructure like systems and software. System designers are in fast track pursuit of deploying applications/services over cloud to benefit from its elastic, scalable and pay-as-you-go model. Owing to the fact that many applications on cloud are extensively data driven, data management systems, hosting these applications, embody a vital component in cloud software store. However, maintaining performance of database read/write operations under fluctuating workloads, both regionally and globally, is quite challenging. In this context, distributed scalable data stores in cloud have promised high performance and reliable services through rapid partitioning, replication, elasticity and automated manager for self-management. Thus, the success of cloud computing paradigm critically depends on scalable, elastic and automated DBMSs. This paper discusses state-of-art of techniques and technologies utilized for cloud databases. It presents concepts of partitioning, replication, elastic scalability and automatic manager for management. The paper also addresses challenges faced by DBMSs designers. Index Terms Amdahl s Law, Elasticity, Scalability. I. INTRODUCTION Owing to technological proliferation, service providers are rapidly switching from computing infrastructure to cloud infrastructure, thereby giving rise to plethora of data. This has further led to an unsurpassed research and technological challenges in relation to database management systems. Unlike earlier scenarios, now an interruption in system has a far and wide global consequence making services unavailable to large number of users. Thus, experiences from past decades has led the DBMSs designers to design a scalable and reliable data store with high read/write performance even in the presence of fluctuating workloads. Need for a scalable data store emerges as a result of fluctuating load characteristics over web-based applications. It ensures that by dynamically adding additional resources in accordance to the load characteristics, performance of system can be critically augmented. Systems can scale either vertically or horizontally by adding more resources to a single node or by adding more nodes respectively. Further, scalability of any system is associated with elemental algorithms and computations [3]. Specifically, if in an underlying algorithm a snippet α is intrinsically sequential then rest of 1- α is parallelizable [3] and hence, can gain from deployment of multiple processors. For such a system peak speedup or scalability of deploying N CPUs is confined as stated by Amdahl s law [1], [3]. Revised Version Manuscript Received on June 29, Pankaj Deep Kaur, Computer Science and Engineering, Guru Nanak Dev University- Regional Campus, Jalandhar, India. Gitanjali Sharma, Computer Science and Engineering, Guru Nanak Dev University- Regional Campus, Jalandhar, India. This implies that scalability is bounded by underlying algorithms [3] and can not necessarily be achieved by simply adding resources. Another factor closely related to scalability challenge is to device a mechanism for responding to unanticipated load variations. This mechanism is called elasticity. Scalability is static in nature i.e. it only guarantees that a system is capable of scaling from few hundreds to thousands of machines. However, elasticity is dynamic in nature and it permits operational systems to scale on demand. Another issue in guaranteeing database performance is automatic self-management, particularly in context of scalability, elasticity and load redistribution. This paper briefly summarizes concepts of partitioning, replication, elastic scalability and automatic manager for management. It also addresses inherent challenges faced by ongoing researches in implementing these concepts over scalable cloud data stores. Rest of the paper is structured as follows. Section II reviews partitioning strategies to support on demand scalability. Section III presents various replication approaches to guarantee all time availability of data. Section IV discusses how live data migration is achieved through elastic scalability. Section V presents the desired characteristics of automatic manager. Section VI reviews related work in the design space of scalable database systems. Finally, section VII concludes the paper with future suggestions. II. DATABASE PARTITIONING Partitioning helps in determining the location of data on the different geographically distributed servers in cloud. Thus, it is one of the important features in deciding the read, write or data storage performance and scalability [17] of the system. Partitioning of data can be achieved either through vertical partitioning or horizontal partitioning (also called sharding). A. Vertical Partitioning Vertical Partitioning is achieved by splitting columns (i.e. attributes) of database table into new subsets of columns such that there is a mapping between subset of columns and application functionality. As discussed in [21], it is essential to choose the appropriate table and columns in order to create a right partition, since the multi-table join operations will now be carried within application code [21] and not over the relational schema. Thus, column family data stores are capable of providing both vertical partitioning and horizontal partitioning. B. Horizontal Partitioning Horizontal Partitioning or Sharding is achieved by splitting rows (i.e. tuples) of data base table into different tables with less number of tuples. Partitioning is achieved on the basis of shard keys which can either be directory-based, range-based or hash-based [21]. Based on the mapping of key and server mode, write requests are propagated to the appropriate server. Sharding based on later two keys is most commonly implemented by many major scalable data stores. 212
2 Performance of Scalable Data Stores in Cloud Thus, two common techniques of sharding are: Range Partitioning or Consistent Hashing. 1) Range Partitioning It is achieved by mapping data to multiple partitions spread over multiple servers on the basis on the range of shard keys [17]. This implies that one server is responsible for handling all read/write requests over a particular range of shard keys. However, it can result in hotspots or excessive load balancing problems. Moreover, this requires a routing server which will store the mapping of range to partitions as well as nodes and thus, help in routing requests to appropriate server. MongoDB [26], Cassandra [7], Hbase [18], and BerkleyDB [5], are among the few database that implement range partitioning. 2) Consistent Hashing It uses hash key. The output range of hashing is a circular ring (i.e. largest value wraps to smallest value). The ring is split into number of ranges which is same as the number of nodes available. A random value from among the output range is assigned to each of the nodes thereby, determining their position in ring. Each data item is further mapped to a node in the ring by hashing its unique key to determine its position. Thus, each node is responsible for data region between itself and its predecessor [17]. Thus, there is no need to store mapping information. However, it is not efficient in range query processing as the consecutive keys are scattered across multiple nodes. As said in [17], this sharding is useful in effective dynamic resizing since, addition or removal of nodes will only require reassignment of neighboring areas and rest of the nodes remain unaffected. Voldemort [33], CouchDB [10], VoltDB [34], Clustrix [8], Riak [32] and Cassandra [7] use this type of sharding. There are, data stores like Redis and Memcache, which implement no partitioning strategy at all and it is up to the client to come up with one. Amazon Simple DB provides its clients with a manual mechanism for partitioning but additional partitioning mechanism might be provided by service provider itself to achieve the desired throughput level as per service level Agreement (SLA) [17]. On the other hand, partitioning in Graph-oriented data stores [23] is complex to achieve owing to the highly mutable nature of graph data. Many graph partitioning mechanisms have been devised which try to achieve trade-off between two conflicting designs i.e. to store related graphs on same server to achieve effective query processing or to avoid storing too many nodes on the same server to prevent the need of load balancing. Graph data stores do not support stable shard keys and hence, do not implement partitioning as achieved by other data stores. For example: Neo4J [27] provides cache sharding [17] where as HypergraphDB [20] uses autonomous objects [17] to handle communication among graphs stored in neighboring per nodes. Furthermore, there are NewSQL data stores like Google Spanner [9] and NuoDB [28] which implement slightly different mechanisms of partitioning as discussed briefly in [17]. III. DATABASE REPLICATION Replication is the mechanism of storing multiple copies of same data over different servers in order to execute read/write requests by distributing queries over replicas. Mechanism adopted for replication influences performance of DBMSs. Besides being an important feature in determining scalability, it is also an essential feature in guaranteeing availability, fault-tolerance and affecting consistency level. A. Approaches to Replication According to Grolinger et al. [17], the main approaches to replication can be differentiated as: master-slave, multi-master or masterless replication. 1) Master-Slave Replication In master-slave replication, one node is designated as master and rest as slaves. Only master is responsible for processing the write requests and propagating data to slaves. Thus, the direction of propagation is always from master to slaves. Some of the data stores implementing master-slaves replications are: Hbase [18], Redis [31], and BerkleyDB [5]. 2) Multi-Master and Masterless Replication In multi-master replication, any number of nodes can process the write requests and updates are then propagated to every other node. Thus, here propagation can be in any direction. Some of the data stores implementing multi-master replication are: Couchbase Server [11] and CouchDB [10]. Master-less replication is similar to multi-master approach except the fact that in former, all the nodes play same role in replication system [17]. Examples of data stores using master-less replication are: Cassandra [7], Voldemort [33] and Riak [32]. NewSQL data store achieve replications through Paxos state machine algorithm like in Google Spanner [9] or through transaction/session manager [17] like in VoltDB [34] and Clustrix [8], etc. Further read and write performance or scalability of data store is affected by the choice of replication approach. Master-slave provides read scalability but not write. However, multi-master and master-less replication provide both read and write scalability owing to the fact that all nodes are allowed to handle both read and write requests. B. Update Processing Overheads Principle overhead in replication is the update processing required for remote as well as local propagation. As discussed by Kamal et al. [21], there are two main update processing operations: symmetric and asymmetric. Former processing requires a handsome amount of resources in remote replicas like CPU, I/O, etc and it achieves divergent consistency for non- deterministic db operations on local replicas and then binds the changes into write sets which are propagated to remote replicas as one single message. Depending on these overheads it is decided as to how the data is actually replicated [21] i.e. Full or Partial replication. 1) Full Replication In Full Replication, each participating node has replicated copy of data and every remote replica has exactly same snapshot of local database. Thus, in the event of large number of update workloads, it poses a great deal of overhead as update processing is now required for multiple remote replicas. 2) Partial Replication In Partial Replication, data is replicated to a group/subset of nodes instead of all participating nodes. Thus, update processing can be localized to a few replicas only. However, it also faces some challenges due to the dynamically changing 213
3 workload and application requirements as well as the complexity of determining the data item to be accused in replication. Two basic variants are: pure and hybrid partial replication [21]. a) Pure Partial Replication In Pure Partial Replication, no participating node has full snapshot of local data base i.e. all nodes have partial copy of local database. It becomes difficult to predict to which replica has the desired data item to be accessed unless proper partitioning is done and workload is more and less static. b) Hybrid Partial Replication In Hybrid Partial Replication, some nodes have full copy of local data base while others have only a subset. Here the read requests can be localized for efficient processing and write requests can be distributed over different replicas. This can cause the overhead of creating hotspots and thus, the need for load balancing. C. Replication Patterns As discussed by Kamal et. al [21], web applications are basically deployed over a multi tier cloud architecture where every single tier is responsible for handling the functionalities, coordinating with other tiers and providing desired services to the clients. Therefore, replicating a single tier is not an effective solution to achieve desired scalability. Besides read or write requests, there are compute intensive and data intensive operations. Former need more resources or scalability at application/logic tier and latter needs the same at data/persistence layer [21]. Moreover, in case of failures, interdependencies among tiers must not result in multiple execution of same workload at both application and database tier. Hence, considering above arguments, vertical and horizontal replication patterns are classified in [21]. 1) Vertical Replication Pattern This integrates one application and one database server into single replication unit which can be replicated vertically to achieve higher scalability. Here, replication logic is transparent to the replication unit, thereby enabling seamless working of the unit. However, it demands effective partitioning of application and its corresponding data to achieve expected scalability. Such systems are rarely used. 2) Horizontal Replication Pattern This allows each tier to replicate independently and a replication awareness scheme [21] runs in between for coordinating the tiers. Thus, it provides flexibility to scale each tier independently but for effective performance the awareness mechanism is must, such systems are used almost everywhere. D. Replication Architectures Based on Where to implement replication logic, different replication architectures as presented by Kamal et al. [21] can be classified as shown in table I. Table I. Replication architectures. Replication Architecture Kernel-Based (White Box explication) Centralised Middleware (Black Box Replication) Grey-Box Replication Replicated Centralised Middleware Based Distributed Middleware Based Implementation Logic implemented in database kernel Logic implemented in middleware layer Modified version of black-box replication. Explicitly presents concurrency control mechanisms by interfacing with middleware Backup of middleware is created Integrates every replica individually with middleware instance IV. DATABASE ELASTICITY Elasticity is the ability of the system to scale with load fluctuations by adding additional resources in the event of high workloads or by confining the tenants to less nodes during low workloads [3]. This is achieved dynamically in a live on-demand system without any disruption of services. Besides minimizing the operational cost, elasticity is also useful in live migration of database while having less impact on performance. Further, implementation of live database migration can be achieved over two major architectures of cloud data stores i.e. shared storage architecture and shared nothing architecture. A. Shared Storage Architecture Shared storage architecture stores the persistent database image in a network attached storage (NAS) [3] and is not migrated among the nodes. An iterative copy is designed for live database migration. Iterative copy lays emphasis on propagating only main memory state of a particular partition which consists of cached database state and transaction execution state like lock table, read/write collection of active or committed transactions. This minimizes the delay and downtime of tenant s window. It ensures transactional serializability during migration and transactional integrity in case of failures. HBase [18] and ElasTraS [12] use shared storage architecture. B. Shared Nothing Architecture In shared nothing architecture, each tenant has its own copy of database called partitions which are stored using locally attached storage. Unlike shared storage, here persistent database image is comparatively larger. Thus, it does not use iterative copy in order to avoid system downtime and service disruption, minimize data propagated among nodes and ensures safe propagation during failures. This approach does not count on replication thereby providing flexibility in choosing any location for migration. For example: Zephyr [3] uses this approach by introducing on demand pull and asynchronous push of data thereby allowing source and destination nodes to execute active and new transactions respectively. V. AUTOMATIC MANAGER Managing large database management systems presents remarkable challenges in system monitoring, operations and 214
4 Performance of Scalable Data Stores in Cloud management. Automatic manager is accountable for following: Monitoring system behavior. Tuning system performance. Elastic scaling. Load balancing on the basis of dynamic consumption patterns. Modelling system characteristics in order to predict the workload spikes. Taking effective control measures to deal with such spikes. Further, in order to guarantee efficient multi-tenant performance and service level agreements (SLAs) [3], the manager must configure dynamic behavior and resource requirements of various tenants for elastic scaling. Also migration costs in context of decisions regarding where to migrate, when to migrate and which tenant to migrate must be predicted. Automatic manager comprises of two logical components: static and dynamic. A. Static Component Static component configures characteristics of tenants and their resource utilization to determine their placement and identify co-located tenants with supplementary resource requirements. This configuration presumes that once tenant characteristics are modeled and their placement is identified, system will retain its behavior and is thus called static component. B. Dynamic Component Dynamic component configures entire system s characteristics to govern appropriate moment for elastic load balancing. It guarantees minimum changes in tenant positioning and rebalances load through live database migration. Dynamic component identifies dynamic transformations in load and resource usage characteristics. VI. RELATED WORK Peer-to-peer systems by [30] have been used lately to address the storage and distribution of data over network. These support flat namespaces. There were unstructured P2P [30] systems which usually broadcasted queries through network in order to hunt for as many peers possible which share same data. Examples include Freenet [2] and Gnutella [14]. Further evolutions led to structured P2P [30] systems which route queries to selective peers with required data using routing protocols like in Chord [2], Tapestry [2], CAN [2], etc. Later many systems evolved with the advances in efficient routing mechanisms like Oceanstore [29]. Distributed databases have been in use since decades to handle data storage and accesses through distributed transactions and query processing. Similar to distributed databases there are distributed file systems which act as object stores and use hierarchical namespaces. Some of these include Boxwood and Sinfonia. Further, distributed hash tables have also been used by certain projects like Chord [2], [15] Pastry [15], etc. Elastras presented by Agrawal et al. and Das et al. [3], [12] implements schema level partitioning [3] where different shards are independent of each other and hence, performance overhead owing to distributed transactions is omitted. However, here partitioning was static. Many common databases as evaluated by Das et al. [13], Lindsay et al. [22] provide iterative copy [3] guarantees which transfer main memory state of partition. Zephyr by Emore et al. [16] was designed to ensure serializability and transaction isolation [22] without relying on replication, thereby, reducing amount of data transferred among nodes. Remuse described by Minhas et al. [25] on the other hand provides highly available data by using virtual machines and by preserving all ACID guarantees at the time of failure. Chimera by Minhas et al. [25] is another db architecture which operates on a hybrid platform of shared and shared nothing DBMSs. It can, thus, scale out elastically and balance load through data sharing. Further, discussions by Bhat et al. [6], Grolinger et al and Hu et al. [17, 19] present a detailed overview of other NoSQL and NewSQL databases on the basis of design decisions adopted. VII. CONCLUSION Database management systems on cloud form an integral component of cloud software stack. DBMSs designers are continuously facing various challenges to augment performance of cloud databases while minimizing operational cost of the system. The success of cloud computing is highly contingent on the effective design of scalable DBMSs. Hence, techniques discussed in this paper can be helpful in understanding and addressing the challenges for future advancements. Considering the previous and ongoing advancements in architectural design space helps us to conclude that upcoming scalable database projects must be capable of supporting automatic partitioning or replication in accordance with dynamic workloads. Also, the primary issue is to ensure rapid consistency with acceptable level of latency. REFERENCES 1. Amdahl s Law. [Online] Accessed 23 Nov S. Androutsellis-Theotokis, A White Paper: A survey of peer-to-peer file sharingtechnologies. [Online] Accessed 23 Nov D. Agrawal, A.E. Abbadi, S. Das and A.J. Elmore, Database scalability, elasticity and autonomy in the cloud [Extended Abstract] Technical report UCSB CS. 4. J. Baker, C. Bond, J.C. Corbett, J.J. Furman, A. Khorlin, J. Larson, J-M. Leon, A. Lloyd, V. Yuhprakh, (2011). Megastore: providing scalable, highly available storage for interactive services. Published under CIDR 11. Pages: BerkeleyDB. overview/index.html Accessed 31 Jan U. Bhat and S. Jadhav (2010) Moving towards non-relational databases. In IJCA ( ) Vol. 1, No Cassandra. Accessed 31 Jan Clustrix. Accessed 31 Jan J.C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, JJ Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, D. Woodford (2012) Spanner: Google s globally-distributed database. Published in the proceedings of OSDI. 10. CouchDB. Accessed 31 Jan Couchbase Server. Accessed 31 Jan S. Das, S. Agrawal, D. Agrawal, A. E. Abbadi, ElasTraS: an elastic, scalable, and self managing transactional database for the cloud. UCSB Computer Science technical report S. Das, S. Nishimura, D. Agrawal, A. E. Abbadi (2010) Live database migration for elasticity in a multitenant database for cloud platforms. Technical report, CS, UCSB(2010) 215
5 14. H. C. Ding, S. Nutanong, R. Buyya. Peer-to-peer networks for content sharing S. El-Ansary, S. Haridi (2004). An overview of structured p2p overlay networks J. Emore, S. Das, D. Agrawal, A. E. Abbadi (2011) Zephyr: live migration in shared nothing databases for elastic cloud platforms. Published in ACM SIGMOD K. Grolinger, W. A. Higashino, A. Tiwari, M. AM. Capretz (2013) Data management in cloud environments: NoSQL and NewSQL data stores. Journal of Cloud Computing: advances, systems and applications 2013, 2:22 doi: / X HBase. Accessed 28 Feb H. Hu, Y. Wen, T-S. Chua, X. Li (2014) Towards scalable systems for Big Data analytics: a technology tutorial. In IEEE Vol. 2, 2014 Pages HyperGraphDB. Accessed 31 Jan J.M.M. Kamal, M. Murshed (2014) Chapter 2 Distributed database management systems: architectural design choices for the cloud. Under Springer International Publishing- Mahmood (ed.), Cloud computing, Computer Communications and Networks 22. B.G. Lindsay (2008) Jim Gray at IBM. The transaction processing revolution. In SIGMOD Record Vol. 37, No. 2. Pages T.S. Madhulatha (2012) Graph partitioning advance clustering technique. In IJCSES Vol. 3, No. 1. Pages Memcached. Accessed 28 Feb U. F. Minhas (2013) Scalable and highly available database systems in the cloud. PhD Thesis, University of Waterloo, Canada. 26. MongoDB. Accessed 31 Jan Neo4J. Accessed 31 Dec NuoDB. Accessed 28 Dec Oceanstore. Accessed 12 Nov Peer-to-peer. Accessed 12 Nov Redis. Accessed 23 Feb Riak. Accessed 28 Dec Voldemort Feb Volt DB. Accessed 12 Feb Pankaj Deep Kaur, PHD, MIT gold medalist and UGC-JRF qualified, 32 publications in journals and conferences, specialization in cloud computing and big data, 35 publications in journals and conferences. Gitanjali Sharma, B.Tech. in Computer Science and Engineering and currently pursuing M.Tech. in Computer Science and Engineering, Research area deals with cloud computing, big data and cloud databases, 2 publications in international conferences. 216
Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales
Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales Bogdan Aurel Vancea May 2014 1 Introduction F1 [1] is a distributed relational database developed by Google
A Taxonomy of Partitioned Replicated Cloud-based Database Systems
A Taxonomy of Partitioned Replicated Cloud-based Database Divy Agrawal University of California Santa Barbara Kenneth Salem University of Waterloo Amr El Abbadi University of California Santa Barbara Abstract
Amr El Abbadi. Computer Science, UC Santa Barbara [email protected]
Amr El Abbadi Computer Science, UC Santa Barbara [email protected] Collaborators: Divy Agrawal, Sudipto Das, Aaron Elmore, Hatem Mahmoud, Faisal Nawab, and Stacy Patterson. Client Site Client Site Client
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
How To Build Cloud Storage On Google.Com
Building Scalable Cloud Storage Alex Kesselman [email protected] Agenda Desired System Characteristics Scalability Challenges Google Cloud Storage What does a customer want from a cloud service? Reliability
Cloud Based Application Architectures using Smart Computing
Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products
Cloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
Elasticity in Multitenant Databases Through Virtual Tenants
Elasticity in Multitenant Databases Through Virtual Tenants 1 Monika Jain, 2 Iti Sharma Career Point University, Kota, Rajasthan, India 1 [email protected], 2 [email protected] Abstract -
Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.
Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one
extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
Megastore: Providing Scalable, Highly Available Storage for Interactive Services
Megastore: Providing Scalable, Highly Available Storage for Interactive Services J. Baker, C. Bond, J.C. Corbett, JJ Furman, A. Khorlin, J. Larson, J-M Léon, Y. Li, A. Lloyd, V. Yushprakh Google Inc. Originally
TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES
Constantin Brâncuşi University of Târgu Jiu ENGINEERING FACULTY SCIENTIFIC CONFERENCE 13 th edition with international participation November 07-08, 2008 Târgu Jiu TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Hosting Transaction Based Applications on Cloud
Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India
The Sierra Clustered Database Engine, the technology at the heart of
A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel
NoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
Scalability of web applications. CSCI 470: Web Science Keith Vertanen
Scalability of web applications CSCI 470: Web Science Keith Vertanen Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing Approaches
CloudDB: A Data Store for all Sizes in the Cloud
CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective
Database Scalability {Patterns} / Robert Treat
Database Scalability {Patterns} / Robert Treat robert treat omniti postgres oracle - mysql mssql - sqlite - nosql What are Database Scalability Patterns? Part Design Patterns Part Application Life-Cycle
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Journal of Cloud Computing: Advances, Systems and Applications
Journal of Cloud Computing: Advances, Systems and Applications This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted PDF and full text (HTML) versions will be
MASTER PROJECT. Resource Provisioning for NoSQL Datastores
Vrije Universiteit Amsterdam MASTER PROJECT - Parallel and Distributed Computer Systems - Resource Provisioning for NoSQL Datastores Scientific Adviser Dr. Guillaume Pierre Author Eng. Mihai-Dorin Istin
Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk
Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria
Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.
Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat
The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL [email protected] / @marcua
The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL [email protected] / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
Preparing Your Data For Cloud
Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
System Models for Distributed and Cloud Computing
System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems
City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015
City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Data-Intensive Computing Course Code: CS4480
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015
Cloud DBMS: An Overview Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Outline Definition and requirements S through partitioning A through replication Problems of traditional DDBMS Usage analysis: operational
A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage
Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
MakeMyTrip CUSTOMER SUCCESS STORY
MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip is the leading travel site in India that is running two ClustrixDB clusters as multi-master in two regions. It removed single point of failure. MakeMyTrip frequently
Cloud Computing Architecture: A Survey
Cloud Computing Architecture: A Survey Abstract Now a day s Cloud computing is a complex and very rapidly evolving and emerging area that affects IT infrastructure, network services, data management and
Fault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
A Survey of Distributed Database Management Systems
Brady Kyle CSC-557 4-27-14 A Survey of Distributed Database Management Systems Big data has been described as having some or all of the following characteristics: high velocity, heterogeneous structure,
Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara
Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara Sudipto Das (Microsoft summer intern) Shyam Antony (Microsoft now) Aaron Elmore (Amazon summer intern)
In Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
How To Write A Database Program
SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store
Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
Benchmarking and Analysis of NoSQL Technologies
Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The
1. Comments on reviews a. Need to avoid just summarizing web page asks you for:
1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of
A Survey Paper: Cloud Computing and Virtual Machine Migration
577 A Survey Paper: Cloud Computing and Virtual Machine Migration 1 Yatendra Sahu, 2 Neha Agrawal 1 UIT, RGPV, Bhopal MP 462036, INDIA 2 MANIT, Bhopal MP 462051, INDIA Abstract - Cloud computing is one
NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management
NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management A B M Moniruzzaman Department of Computer Science and Engineering, Daffodil International
DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández
DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING Carlos de Alfonso Andrés García Vicente Hernández 2 INDEX Introduction Our approach Platform design Storage Security
MyDBaaS: A Framework for Database-as-a-Service Monitoring
MyDBaaS: A Framework for Database-as-a-Service Monitoring David A. Abreu 1, Flávio R. C. Sousa 1 José Antônio F. Macêdo 1, Francisco J. L. Magalhães 1 1 Department of Computer Science Federal University
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction
Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings
Solution Brief Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings Introduction Accelerating time to market, increasing IT agility to enable business strategies, and improving
Understanding Neo4j Scalability
Understanding Neo4j Scalability David Montag January 2013 Understanding Neo4j Scalability Scalability means different things to different people. Common traits associated include: 1. Redundancy in the
F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar ([email protected]) 15-799 10/21/2013
F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar ([email protected]) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords
Techniques for Scaling Components of Web Application
, March 12-14, 2014, Hong Kong Techniques for Scaling Components of Web Application Ademola Adenubi, Olanrewaju Lewis, Bolanle Abimbola Abstract Every organisation is exploring the enormous benefits of
Big data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
A distributed system is defined as
A distributed system is defined as A collection of independent computers that appears to its users as a single coherent system CS550: Advanced Operating Systems 2 Resource sharing Openness Concurrency
The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service
The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service Jinesh Varia and Jose Papo March 2012 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1
Cloud Based Distributed Databases: The Future Ahead
Cloud Based Distributed Databases: The Future Ahead Arpita Mathur Mridul Mathur Pallavi Upadhyay Abstract Fault tolerant systems are necessary to be there for distributed databases for data centers or
Distributed Systems LEEC (2005/06 2º Sem.)
Distributed Systems LEEC (2005/06 2º Sem.) Introduction João Paulo Carvalho Universidade Técnica de Lisboa / Instituto Superior Técnico Outline Definition of a Distributed System Goals Connecting Users
Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework
Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should
How To Manage A Multi-Tenant Database In A Cloud Platform
UCSB Computer Science Technical Report 21-9. Live Database Migration for Elasticity in a Multitenant Database for Cloud Platforms Sudipto Das Shoji Nishimura Divyakant Agrawal Amr El Abbadi Department
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
Design and Evaluation of a Hierarchical Multi-Tenant Data Management Framework for Cloud Applications
Design and Evaluation of a Hierarchical Multi-Tenant Data Management Framework for Cloud Applications Pieter-Jan Maenhaut, Hendrik Moens, Veerle Ongenae and Filip De Turck Ghent University, Faculty of
A Survey on Cloud Database Management
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 2 Issue 1 Jan 2013 Page No. 229-233 A Survey on Cloud Database Management Ms.V.Srimathi, Ms.N.Sathyabhama and
NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
Database Replication with Oracle 11g and MS SQL Server 2008
Database Replication with Oracle 11g and MS SQL Server 2008 Flavio Bolfing Software and Systems University of Applied Sciences Chur, Switzerland www.hsr.ch/mse Abstract Database replication is used widely
DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2
DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing Slide 1 Slide 3 A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.
be architected pool of servers reliability and
TECHNICAL WHITE PAPER GRIDSCALE DATABASE VIRTUALIZATION SOFTWARE FOR MICROSOFT SQL SERVER Typical enterprise applications are heavily reliant on the availability of data. Standard architectures of enterprise
Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)
Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure
DataStax Enterprise, powered by Apache Cassandra (TM)
PerfAccel (TM) Performance Benchmark on Amazon: DataStax Enterprise, powered by Apache Cassandra (TM) Disclaimer: All of the documentation provided in this document, is copyright Datagres Technologies
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Performance Prediction, Sizing and Capacity Planning for Distributed E-Commerce Applications
Performance Prediction, Sizing and Capacity Planning for Distributed E-Commerce Applications by Samuel D. Kounev ([email protected]) Information Technology Transfer Office Abstract Modern e-commerce
NoSQL Evaluation. A Use Case Oriented Survey
2011 International Conference on Cloud and Service Computing NoSQL Evaluation A Use Case Oriented Survey Robin Hecht Chair of Applied Computer Science IV University ofbayreuth Bayreuth, Germany robin.hecht@uni
Scalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
Scalable Web Application
Scalable Web Applications Reference Architectures and Best Practices Brian Adler, PS Architect 1 Scalable Web Application 2 1 Scalable Web Application What? An application built on an architecture that
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
Designing a Cloud Storage System
Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes
Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens
Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at
Planning the Migration of Enterprise Applications to the Cloud
Planning the Migration of Enterprise Applications to the Cloud A Guide to Your Migration Options: Private and Public Clouds, Application Evaluation Criteria, and Application Migration Best Practices Introduction
Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)
Big Data Management in the Clouds Alexandru Costan IRISA / INSA Rennes (KerData team) Cumulo NumBio 2015, Aussois, June 4, 2015 After this talk Realize the potential: Data vs. Big Data Understand why we
Introduction to NOSQL
Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo
Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
How To Create A P2P Network
Peer-to-peer systems INF 5040 autumn 2007 lecturer: Roman Vitenberg INF5040, Frank Eliassen & Roman Vitenberg 1 Motivation for peer-to-peer Inherent restrictions of the standard client/server model Centralised
Relational Databases in the Cloud
Contact Information: February 2011 zimory scale White Paper Relational Databases in the Cloud Target audience CIO/CTOs/Architects with medium to large IT installations looking to reduce IT costs by creating
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
ORACLE DATABASE 10G ENTERPRISE EDITION
ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.
Building highly available systems in Erlang. Joe Armstrong
Building highly available systems in Erlang Joe Armstrong How can we get 10 nines reliability? Why Erlang? Erlang was designed to program fault-tolerant systems Overview n Types of HA systems n Architecture/Algorithms
Scala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
Software-Defined Networks Powered by VellOS
WHITE PAPER Software-Defined Networks Powered by VellOS Agile, Flexible Networking for Distributed Applications Vello s SDN enables a low-latency, programmable solution resulting in a faster and more flexible
