Review of Query Processing Techniques of Cloud Databases Ruchi Nanda Assistant Professor, IIS University Jaipur.
|
|
|
- Ariel Osborne
- 10 years ago
- Views:
Transcription
1 Suresh Gyan Vihar University Journal of Engineering & Technology (An International Bi Annual Journal) Vol. 1, Issue 2, 2015,pp ISSN: Review of Query Processing Techniques of Cloud Databases Ruchi Nanda Assistant Professor, IIS University Jaipur. Abstract: The big challenge at the present time is to manage big distributed data like cloud. Traditional relational database management systems (RDBMS) are a choice but they are not well-suited to scale across large clusters of distributed servers. Hence alternatives to RDBMS have been developed. The development of new database management systems (DBMS) for the cloud computing environment or adaptability of the existing systems to the cloud computing environment is a critical component of cloud computing research. This paper focuses on the study of the types of existing DBMS in the cloud computing domain. It reviews the various alternatives to RDBMS. It also focuses on the study of the parameters responsible for the performance of cloud database queries and reviews the work carried out so far, for the enhancement of the cloud database queries. Studies conducted in the paper helps in identifying the areas where research is exclusively needed. Keywords: Databases, Cloud Databases, SQL stores, NoSQL data stores, cloud query performance, indexing. I. INTRODUCTION Cloud is a network of servers that pools different resources and cloud computing is a computing where the customers can access those servers using a web browser regardless of their device and location. A cloud provides access to the resources when there is demand and automatically deprivations them when there is no demand. A. Database Management System The relational model was first introduced by Edgar Codd in It uses a collection of tables to represent data items and their relationships. It is wellsuited for Online Transaction Processing Systems. These applications are based on the ACID properties i.e. Atomicity, Consistency, Isolation, and Durability (ACID). These traditional DBMS are also known as row-oriented DBMS, as they store the data row-byrow. They keep all the information about an entity together and are preferred when queries access the data regarding an entity. These row-oriented DBMS includes Oracle, DB2, SQLServer, Teradata. B. Cloud Databases The data is distributed across several machines in network, so efficient management of data is a big worry for organizations using services of cloud. Here, the most important requirement of database management system is of scalability. Though RDBMS are robust and a vast majority of current database systems are based on the relational model yet they are not well-suited to scaling across large clusters of servers as they are not designed to be distributed. So it is difficult to ensure consistency, referential integrity and query performance in a distributed relational environment. Structured Query Language (SQL) is the dominant language of RDBMS. But SQL databases don t scale [25]. Hence alternatives to relational database management systems have been developed. Some of them are: 1. Column-oriented DBMS is a database management system that stores the data by column. It is faster to read and allows a more efficient compression of the data. Hence query processing time is decreased. These are wellsuited for online analytical processing systems (OLAP) and data warehouses. Some of the database management systems which are already in the market are Vertica (a commercial version of C-store), SADAS as well as open-source DBMS like LucidDB and MonetDB. 2. Key-value Stores (non-relational DBMS or NoSQL datastores) are used for storing large scale data & provide easy access. Its data models are schema-free, join-less and horizontally scaled. Here, domain is just like a table, but there is not a predefined schema. It is like a bucket where we can put the items. Items are identified by keys and a given key can have dynamic set of attributes attached to it. Data is created, updated, deleted and retrieved using API method calls. Some of its projects include Dynamo[11] used by Amazon.com, Google s Bigtable [7] used in the Google s application, Cassandra used by 12
2 Facebook for inbox search and project Voldemort used by LinkedIn. 3. Mapreduce[7] is a distributed framework that allows us to process large amount of data in parallel approach. Programmers create different map and reduce functions on the basis of user queries. The data files are stored in distributed file system (DFS). This approach is being used in Google s web search service, for the generation of data stored in Bigtable. II. PARAMETERS DETERMINING PERFORMANCE OF DATABASE MANAGEMENT SYSTEMS The objective of performance enhancement is to minimize the response time for each query and to maximize the throughput of the database server. The performance of the database management system can be determined by the following factors: System level issues: These issues can perform serious performance degradation. It occurs if [26]: o Utilization of CPU is high o Loads of I/O changes frequently o Hardware and software is not configured properly o Operating System of virtual machines and hypervisor [25]. o Generally these issues are automatically managed by the cloud DBMS. So it reduces the need for extensive manual testing. Database design: The database design is one of the most important decision which has to be made carefully as the performance of the queries depends on: o File organization techniques o Constraints on the attributes o Normalization and De-normalization of relations. o Indexing Query Processing and Optimization techniques: Query processing and optimization are the main components of the database management system. The function of query processor [1] is to transform the query written in high-level language into a correct and efficient execution plan expressed in low-level language. As there are many ways to execute the same query, the aim of query optimization is to choose an efficient execution plan for processing a query. It chooses the one that minimizes the resources.. It takes information from the system catalog. The optimizer are required to consider factors such as the order in which to join the tables, the number of rows for each join when calculating an optimal access path, the algorithm to be used for performing the joins. 13
3 In a distributed environment like cloud, data is distributed to a number of sites, stored in its entirety on all sites or spilt on many sites. Here the query is processed and optimized in a different way. There are various issues which needs to be considered like, which copy of the data is to be used i.e. site selection, amount of data that needs to be transmitted from its location to the execution site, relative processing speed at each site and transmitting the final result to the site where the query is issued. The cost of the plan depends on these issues. III. LITERATURE REVIEW The section focuses on the techniques used to enhance the performance of query processor in cloud databases. This section highlights the current work on the specific techniques of indexing, processing and optimization of queries. Query efficiency is achieved by employing a pure key-value data model where, both key and value are arbitrary byte strings (e.g. Dynamo), or its variant (as in Bigtable), where key is an arbitrary byte string and value is a structured record consisting of a number of named columns. But these solutions lack support for secondary indexes, range queries and multidimensional queries [24]. The current solution is Mapreduce [10], which is a programming model and a framework for processing large sets of raw data. A map-reduce program consists of two functions: Map and Reduce. The Map function processes the input data by distributing them to worker nodes for parallel computation and produces a set of intermediate results as key-value pairs, while the reduce function aggregates all the intermediate results with the same key from each node to produce the result. It can be used for structured data analysis of large sets. The limitations of Mapreduce as given by [23] are: It produces the necessary secondary indices in an offline batch manner. Hence, secondary indexes are not up-to-date. So newly inserted rows cannot be queried until they are indexed. It does not provide data schema support, declarative query language and cost-based query optimizations. Cloud Global Index (CG-Index) [23], a secondary B + -tree based indexing scheme for cloud storage systems was proposed. It provides high scalability, high throughput, high availability and high concurrency. It provides range search and dictionary operations. Index distribution technique for desired scalability was used. Experiments on Amazon s EC2 were carried on and the results demonstrated that it handles a mixed workload of queries and updates efficiently, but the main limitation of CG-index was that it supports one-dimensional queries only. Performance of the query processing can be increased if operations are directly applied on the compressed data [2]. C-store [22] is a column oriented store which was extended by keeping this view. Algorithms especially suited for column-oriented systems compared with algorithms commonly used by traditional DBMS. The comparisons were performed by changing the parameters like query workload and size of the data set. The results showed that the performance benefits of operating directly on compressed data in column oriented schemes is much greater than the benefits in operating directly on roworiented schemes. They created decision-tree to aid the database designer to decide how to compress a particular column. The limitation with the optimizer was that it is not aware of the decompression costs of the various compression algorithms. Various greedy and approximation algorithms have been proposed for optimization of queries. But they do not scale well for realistic workloads [17]. Two greedy algorithms were developed later which emphasizes on finding the most beneficial view in each step instead of finding most promising query. A number of dynamic programming algorithms have been used but they are not able scale. So a new class of query optimization algorithms was developed known as Iterative dynamic programming. [16]. The performance of the database can be increased if common subexpressions from multiple queries can be evaluated only once and that can be reused in case the same subexpression again comes. Interapplication multi-query optimizer [18] was presented that re-uses previously computed (intermediate) results and eliminates redundant work. It was experimentally proved that the inter-application multi-query optimizer improves the query evaluation performance significantly. The limitation of interapplication optimizer is that the optimizer is limited to identify and re-use equivalent intermediate results, only. It is also beneficial to re-use the smallest superset of a requested intermediate result in case the equivalent result is not available, provided that using the superset is cheaper. The studies show that there is enough scope in the performance improvement of queries as there are certain limitations with the works reported in the past and the same have a scope for further improvement. 14
4 IV. FUTURE WORK & CONCLUSION With the increasing volume of data across the large number of applications, the challenge is to distribute the computations, responding to the query along with the distribution of data. Relational database management systems have grown overly complex, difficult to manage, and are struggling today to take full advantage of cloud computing technology [6]. So, there is need to reanalyze the design and processing of relational database technologies and refine the existing methods or develop new approaches exclusively for the cloud environment. These can be achieved by focusing on the following issues: How to reduce the execution time of various operations using indexing? How the desired information can be retrieved immediately from the database? How to get the results back from the database in time as well as in cost effective manner? For this, there is need to analyze thoroughly the operational and architectural characteristics of cloud databases. There is also a need to study the techniques of processing and optimization of queries in the cloud databases, so that existing techniques can be refined. REFERENCES [1] Abadi, D. J. (2009), Data Management in the Cloud: Limitations and Opportunities. IEEE Data Engineering Bulletin, Volume 32, Number 1, March 2009, pp [2] Abadi D. J., S.R. Madden and M.C. Ferreira (2006), Integrating Compression and Execution in Column-Oriented Database Systems. Proceedings of the 2006 ACM SIGMOD International conference on Management of data, ISBN: pp [3] Abadi D. J., S.R. Madden and N. Hachem (2008), Columnstores vs. row-stores: how different are they really? Proceedings of ACM SIGMOD International conference on Management of data, ISBN: , pp [4] Abadi, D. J. (2008), Query Execution in Column- Oriented Database Systems. Massachusetts Institute of Technology, U.S.A. [5] Anh (2009), Query Processing and Optimization. [6] Bobrowski, S. (2011), The Future of Databases is in the Clouds. in_the_clouds [7] Chang, F., J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber (2006), Bigtable: A Distributed Storage System for Structured Data. Proceedings of the 7th symposium on Operating systems design and implementation, ISBN: , pp [8] Cloud Computing vhttp:// [9] Connolly T. and C. Begg (1998), Database Systems A Practical Approach to Design, Implementation and Management, Pearson Education Ltd., New Delhi, India. [10] Dean, J. and S. Ghemawat (2004) Mapreduce: Simplified data processing on large clusters. Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation ACM pp [11] DeCandia, G., D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels(2007) Dynamo: Amazon s highly available keyvalue store, In SOSP, pp [12] Garret (2010), What is a Column Oriented Database? [13] Gounaris, A. (2009), A Vision for Next Generation Query Processors and an Associated Research Agenda. 2nd International Conference on Data Management in Grid and Peer-to-Peer Systems, pp [14] Jain, V. (1997), Information Technology, BPB Publications, New Delhi. [15] Hurwitz, J., R. Bloor, M. Kaufman and F. Halpur (2010), Cloud Computing for Dummies, Wiley Publishing Inc., Indianapolis, Indiana. [16] Kossmann, D. and K. Stocker (2000), Iterative dynamic programming: a new class of query optimization algorithms. ACM Transactions on Database Systems (TODS) Volume 25, Issue 1. [17] Kalnis, P. and D. Papadias (2003) Multi-query optimization for on-line analytical processing. Information Systems, Volume 28, Issue 5, pp [18] Manegold, S., A.J. Pellenkoft and M.L.Kersten, (2000), A Multi-Query Optimizer for Monet Springer-Verlag, 2000 (repository id: 11170). [19] Marks, E.A. and B. Lozano (2010), Executive Guide to Cloud Computing, John Wiley & Sons, Inc., Hoboken, New Jersey. [20] Oracle Enterprise Manager Cloud Control Advanced Installation and Configuration Guide, 12c Release 2 ( ) [21] Ramakrishnan, R. and J. Gehrke (2000), Database Management Systems, McGraw-Hill International Editions, Singapore. [22] Stonebraker, M., D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. R. Madden, E. J. O Neil, P. E. O Neil, A. Rasin, N. Tran, and S. B. Zdonik (2005) C-Store: A Column-Oriented DBMS. In VLDB Trondheim, Norway, pp [23] Wu, S., D. Beng and Kun (2010), Efficient B-tree based indexing for cloud data processing., Volume 3, Issue
5 [24] Wu, S. and K.L. Wu (2009), An indexing framework for efficient retrieval on the cloud. IEEE Data Engineering Bulletin, 32(1):pp [25] Wiggins, A. (2009), SQL Databases Don't Scale. ale/ [26] Fujitsu Technology Solutions GmbH (2010) Performance Report on Hyper-V globalsp.ts.fujitsu.com/dmsp/publications/public/wp-pr- Hyper-V-en.pdf 16
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece [email protected].
Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece [email protected] Joining Cassandra Binjiang Tao Computer Science Department University of Crete Heraklion,
Slave. Master. Research Scholar, Bharathiar University
Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually
A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems
A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-
Business Intelligence and Column-Oriented Databases
Page 12 of 344 Business Intelligence and Column-Oriented Databases Kornelije Rabuzin Faculty of Organization and Informatics University of Zagreb Pavlinska 2, 42000 [email protected] Nikola Modrušan
Hosting Transaction Based Applications on Cloud
Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India
AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS
INTERNATIONAL JOURNAL OF REVIEWS ON RECENT ELECTRONICS AND COMPUTER SCIENCE AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS Koyyala Vijaya Kumar 1, L.Sunitha 2, D.Koteswar Rao
A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications
A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications Li-Yan Yuan Department of Computing Science University of Alberta [email protected] Lengdong
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Data Management Course Syllabus
Data Management Course Syllabus Data Management: This course is designed to give students a broad understanding of modern storage systems, data management techniques, and how these systems are used to
How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D.
1 How To Build a High-Performance Data Warehouse How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D. Over the last decade, the
Data Management in Cloud based Environment using k- Median Clustering Technique
Data Management in Cloud based Environment using k- Median Clustering Technique Kashish Ara Shakil Department of Computer Science Jamia Millia Islamia New Delhi, India Mansaf Alam Department of Computer
Telemetry Database Query Performance Review
Telemetry Database Query Performance Review SophosLabs Network Security Group Michael Shannon Vulnerability Research Manager, SophosLabs [email protected] Christopher Benninger Linux Deep Packet
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
How To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA. Technology, Coimbatore. Engineering and Technology, Coimbatore.
A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA 1 V.N.Anushya and 2 Dr.G.Ravi Kumar 1 Pg scholar, Department of Computer Science and Engineering, Coimbatore Institute
Scalable Multiple NameNodes Hadoop Cloud Storage System
Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai
extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
DISTRIBUTION OF DATA SERVICES FOR CORPORATE APPLICATIONS IN CLOUD SYSTEM
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE DISTRIBUTION OF DATA SERVICES FOR CORPORATE APPLICATIONS IN CLOUD SYSTEM Itishree Boitai 1, S.Rajeshwar 2 1 M.Tech Student, Dept of
Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015
Big Data Technologies Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Situation: Bigger and Bigger Volumes of Data Big Data Use Cases Log Analytics (Web Logs, Sensor
DYNAMIC QUERY FORMS WITH NoSQL
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 7, Jul 2014, 157-162 Impact Journals DYNAMIC QUERY FORMS WITH
HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367
HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
EXPERIMENTAL EVALUATION OF NOSQL DATABASES
EXPERIMENTAL EVALUATION OF NOSQL DATABASES Veronika Abramova 1, Jorge Bernardino 1,2 and Pedro Furtado 2 1 Polytechnic Institute of Coimbra - ISEC / CISUC, Coimbra, Portugal 2 University of Coimbra DEI
NoSQL Database Options
NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
Evaluation of NoSQL and Array Databases for Scientific Applications
Evaluation of NoSQL and Array Databases for Scientific Applications Lavanya Ramakrishnan, Pradeep K. Mantha, Yushu Yao, Richard S. Canon Lawrence Berkeley National Lab Berkeley, CA 9472 [lramakrishnan,pkmantha,yyao,scanon]@lbl.gov
Preparing Your Data For Cloud
Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability
USC Viterbi School of Engineering
USC Viterbi School of Engineering INF 551: Foundations of Data Management Units: 3 Term Day Time: Spring 2016 MW 8:30 9:50am (section 32411D) Location: GFS 116 Instructor: Wensheng Wu Office: GER 204 Office
In-Memory Data Management for Enterprise Applications
In-Memory Data Management for Enterprise Applications Jens Krueger Senior Researcher and Chair Representative Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University
Cassandra A Decentralized, Structured Storage System
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922
Indexing Techniques for Data Warehouses Queries. Abstract
Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 [email protected] [email protected] Abstract Recently,
Big Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
Cloud Data Management: A Short Overview and Comparison of Current Approaches
Cloud Data Management: A Short Overview and Comparison of Current Approaches Siba Mohammad Otto-von-Guericke University Magdeburg [email protected] Sebastian Breß Otto-von-Guericke University
Structured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
Communication System Design Projects
Communication System Design Projects PROFESSOR DEJAN KOSTIC PRESENTER: KIRILL BOGDANOV KTH-DB Geo Distributed Key Value Store DESIGN AND DEVELOP GEO DISTRIBUTED KEY VALUE STORE. DEPLOY AND TEST IT ON A
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction
MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM
MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM Julia Myint 1 and Thinn Thu Naing 2 1 University of Computer Studies, Yangon, Myanmar [email protected] 2 University of Computer
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
low-level storage structures e.g. partitions underpinning the warehouse logical table structures
DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol-1, Iss.-3, JUNE 2014, 54-58 IIST SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE
NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
Introduction to NOSQL
Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo
Which NoSQL Database? A Performance Overview
2014 by the authors; licensee RonPub, Lübeck, Germany. This article is an open access article distributed under the terms and conditions Veronika of the Creative Abramova, Commons Jorge Attribution Bernardino,
Making Sense of NoSQL Dan McCreary Wednesday, Nov. 13 th 2014
Making Sense of NoSQL Dan McCreary Wednesday, Nov. 13 th 2014 Agenda Why NoSQL? What are the key NoSQL architectures? How are they different from traditional RDBMS Systems? What types of problems do they
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
CloudDB: A Data Store for all Sizes in the Cloud
CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective
Big Automotive Data. Leveraging large volumes of data for knowledge-driven product development
Big Automotive Data Leveraging large volumes of data for knowledge-driven product development Mathias Johanson, Stanislav Belenki, Jonas Jalminger, Magnus Fant Alkit Communications AB Mölndal, Sweden {mathias,
A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011
A Review of Column-Oriented Datastores By: Zach Pratt Independent Study Dr. Maskarinec Spring 2011 Table of Contents 1 Introduction...1 2 Background...3 2.1 Basic Properties of an RDBMS...3 2.2 Example
What is Analytic Infrastructure and Why Should You Care?
What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group [email protected] ABSTRACT We define analytic infrastructure to be the services,
How To Write A Database Program
SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store
A Distribution Management System for Relational Databases in Cloud Environments
JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 11, NO. 2, JUNE 2013 169 A Distribution Management System for Relational Databases in Cloud Environments Sze-Yao Li, Chun-Ming Chang, Yuan-Yu Tsai, Seth
The Quest for Extreme Scalability
The Quest for Extreme Scalability In times of a growing audience, very successful internet applications have all been facing the same database issue: while web servers can be multiplied without too many
One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008
One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone Michael Stonebraker December, 2008 DBMS Vendors (The Elephants) Sell One Size Fits All (OSFA) It s too hard for them to maintain multiple code
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
III. SYSTEM ARCHITECTURE
SQLMR : A Scalable Database Management System for Cloud Computing Meng-Ju Hsieh Institute of Information Science Academia Sinica, Taipei, Taiwan Email: [email protected] Chao-Rui Chang Institute of Information
DIB: DATA INTEGRATION IN BIGDATA FOR EFFICIENT QUERY PROCESSING
DIB: DATA INTEGRATION IN BIGDATA FOR EFFICIENT QUERY PROCESSING P.Divya, K.Priya Abstract In any kind of industry sector networks they used to share collaboration information which facilitates common interests
Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759
Lecture 10: HBase! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
CISC 432/CMPE 432/CISC 832 Advanced Database Systems
CISC 432/CMPE 432/CISC 832 Advanced Database Systems Course Info Instructor: Patrick Martin Goodwin Hall 630 613 533 6063 [email protected] Office Hours: Wednesday 11:00 1:00 or by appointment Schedule:
Referential Integrity in Cloud NoSQL Databases
Referential Integrity in Cloud NoSQL Databases by Harsha Raja A thesis submitted to the Victoria University of Wellington in partial fulfilment of the requirements for the degree of Master of Engineering
Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15
Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15 2014 Amazon.com, Inc. and its affiliates. All rights
Benchmarking and Analysis of NoSQL Technologies
Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The
Advanced Data Management Technologies
ADMT 2014/15 Unit 15 J. Gamper 1/44 Advanced Data Management Technologies Unit 15 Introduction to NoSQL J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE ADMT 2014/15 Unit 15
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
2.1.5 Storing your application s structured data in a cloud database
30 CHAPTER 2 Understanding cloud computing classifications Table 2.3 Basic terms and operations of Amazon S3 Terms Description Object Fundamental entity stored in S3. Each object can range in size from
Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)
Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure
NoSQL Databases. Polyglot Persistence
The future is: NoSQL Databases Polyglot Persistence a note on the future of data storage in the enterprise, written primarily for those involved in the management of application development. Martin Fowler
X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
Report Data Management in the Cloud: Limitations and Opportunities
Report Data Management in the Cloud: Limitations and Opportunities Article by Daniel J. Abadi [1] Report by Lukas Probst January 4, 2013 In this report I want to summarize Daniel J. Abadi's article [1]
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where
Cleveland State University
Cleveland State University CIS 612 Modern Database Processing & Big Data (3-0-3) Fall 2015 Section 50 Class Nbr. 5378. Tues, Thu 4:30 5:45 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred. Instructor:
Reference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012
Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation
Sharding by Hash Partitioning A database scalability pattern to achieve evenly sharded database clusters
Sharding by Hash Partitioning A database scalability pattern to achieve evenly sharded database clusters Caio H. Costa 1, João Vianney B. M. Filho 1, Paulo Henrique M. Maia 1, Francisco Carlos M. B. Oliveira
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12
Introduction to NoSQL Databases and MapReduce Tore Risch Information Technology Uppsala University 2014-05-12 What is a NoSQL Database? 1. A key/value store Basic index manager, no complete query language
Introduction to NoSQL
Introduction to NoSQL NoSQL Seminar 2012 @ TUT Arto Salminen What is NoSQL? Class of database management systems (DBMS) "Not only SQL" Does not use SQL as querying language Distributed, fault-tolerant
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
