Exploration of Non-Relational Database Models. Swayze Smartt. Department of Computer Science. Wake Forest University. Spring 2011 Honors Thesis

Size: px
Start display at page:

Download "Exploration of Non-Relational Database Models. Swayze Smartt. Department of Computer Science. Wake Forest University. Spring 2011 Honors Thesis"

Transcription

1 Smartt 1 Exploration of Non-Relational Database Models Swayze Smartt Department of Computer Science Wake Forest University Spring 2011 Honors Thesis Advised by Dr. Stan Thomas

2 Smartt 2 Abstract While relational databases have been popular for at least the last quarter-century, new non-relational models are quickly gaining popularity largely due to the cloud computing movement and increased reliance on distributed computing environments. These new models, relying heavily on variations of the Google-inspired MapReduce functionality, promise new efficiencies and capabilities not available in old DBMS models. 1 This paper discusses the advantages and disadvantages of this new database paradigm and examines current distributed database implementations currently on the market including: Amazon s SimpleDB, Google s Bigtable, and Apache CouchDB. The paper also discusses the importance of this new database model and its likely ascent as the preferred model for distributed environments. 1. Introduction Since the 1970s, the relational database and the associated entity relationship models have together been the standard for database development. 2 Recent trends in both software and hardware have opened up exploration for a new non-relational database model, relying on key-value pairs instead of the combination of primary and foreign keys stored in the familiar table format. 3 This new model, driven in large part by the cloud-computing movement, promises a more efficient and scalable database structure, with faster processing through the powerful map-reduce function. 4 Despite its many benefits, the non-relational model has several drawbacks, the most notable being its frequent inability to structure data in a semantically meaningful way. 1 Bhat and Jadhav, p Harrington, p Dean and Ghemawat, p Dean and Ghemawat, p. 77

3 Smartt 3 Regardless of whether the non-relational model becomes the new database standard, an unlikely event, the new model already has widespread use in specific applications and is projected to become more popular based on computing trends. Given these trends, a wellinformed database administrator should have at least a basic understanding of the nonrelational model and its desirability in many situations. 2. The Relational Database Model 2.1 Overview The relational database model is over 40 years old, developed in 1970 at IBM by Dr. Edgar Ted (Codd) as an extremely intuitive way to store, process and query data. 5 The model follows a common way of displaying data, in tables (or entities). The tables have rows (also records or tuples) with various columns (also fields or attributes) and relationships to other tables Normalization Rules Relational DBMS design is governed by various rules in creating efficient and lowredundancy databases. Following a sequential process of normalization eliminates the chance of data anomalies occurring from queries updating and deleting data. 7 This structuring of the database allows for an additional layer of data integrity independent from any program logic. This distinction is important as implementing the same integrity constraints in a non-relational model is much more difficult. 8 Normalization, however, also poses potential problems as the processes can be complex for large sets of data. 3. The Non-Relational Model 5 Burleson, p Harrington, p Bhat and Jadhav, p

4 Smartt Overview To database administrators and many others familiar with the traditional relational model of database structure, the non-relational may not seem like a database at all. One of the most ubiquitous applications of this paradigm is Google s massive archive of the internet which reportedly takes up petabytes of space. 9 According to Stonebraker, all the major Web-search engines use home-brew text software to serve us search results. None use relational DBMSs. 10 Several features, discussed below, make the non-relational model a desirable choice in database development. 3.2 Key-value Pairs and MapReduce The fundamental relationship present in a non-relational database is the key-value relationship in which some index, the key, is associated with a data item or set of data items, the value. The simplicity of this relationship allows for faster processing of data as compared to the traditional relational model. Although not a database implementation per se, Google s MapReduce technique implements two simple functions, map and reduce, which allow for distributed programming to occur automatically without the programmer having any specific knowledge about the underlying architecture or implementation. 11 The map function performs some operation on a key-value pair, generating intermediate keyvalue pairs. The reduce function then performs an operation on the resulting set to consolidate the values into a single value associated with that key. The most common illustration of the MapReduce function is an algorithm to count all of the words in a document. Figure 1: MapReduce Illustration* 9 Chang et al., p Stonebraker, p Dean and Ghemawat, p. 72

5 Smartt 5 Function map(string documentname, String document) { } for each word w in document: EmitIntermediate(w, 1); Function reduce(string word, Int partialcounts) { } int result = 0; for each partialcount in partialcounts: result+=partialcount; Emit(result); *Adapted from Dean and Ghemawat The implementation of MapReduce is incredible simple given the complexity of what actually takes place at the hardware level. Google s Dean and Ghemawat eloquently describe this seemingly magical process of automatic parallelization: MapReduce automatically parallelizes and executes the program on a large cluster of commodity machines. The runtime system takes care of the details of partitioning the input data, scheduling the program s execution across a set of machines, handling machine failures, and managing required intermachine communication. 12 Because of the simplicity in implementing MapReduce, it has grown in popularity among programmers who have little experience with parallelization but want to benefit from its superior performance over non-parallel relational models. 4. Comparison of Different Non-Relational Databases 12 Dean and Ghemawat, p. 72

6 Smartt 6 This discussion will next look at three popular non-relational models and examine the advantages and disadvantages of each. 4.1 CouchDB CouchDB is an open source distributed database management system that uses a RESTful HTTP API and stores data in the JavaScript Object Notation (JSON) format. 13 Couch is an acronym for Cluster of Unreliable Commodity Hardware, emphasizing both its commitment to being a distributive DBMS and to fault tolerance despite the use of commodity hardware. 14 As previously alluded to, CouchDB, like other non-relational DBMSs, is schema-less and requires relationships between data objects be defined by the developer at a higher level than typical of a relational database. 15 CouchDB uses views which are server-side JavaScript functions to enforce relationship constraints defined by the developer. 16 This schema-less system allows more flexibility and efficiency in data storage and processing, however, it potentially sacrifices data integrity if not properly enforced by the high-level developer CouchDB Structure CouchDB databases work by storing documents with a unique ID, and revision number; all data in a CouchDB database is stored within one of these documents. 17 When updating, CouchDB simply increments the version number of a document; concurrent updating is allowed, however, data control is lockless meaning update conflict resolutions are all-ornothing Bhat and Jadhav, p Leff and Rayfield, p Bhat and Jadhav, p

7 Smartt 7 Figure 2: CouchDB Implementation Replica Databases Documents Doc1 HTTP Request CouchDB Engine Replica Database #1 Doc2 Doc1 Replica Database #2 Doc2 Doc1 Replica Database #3 Doc2 CouchDB replicates a database across multiple hosts, each with complete read/write access to the database. Conflicts are automatically resolved and all prior versions of documents are retained and routinely compacted CouchDB Advantages CouchDB is an extremely simple implementation of a non-relational database that allows for an easy and intuitive way to store data in semantically significant documents. The DBMS also allows much more flexibility in adding and removing attributes in documents as well as having different attributes across documents in the same database. This flexibility minimizes the need for complex design decisions in implementing a database CouchDB Disadvantages Besides the typical disadvantages associated with non-structured data, the lockless read/write controls allow for the possibility of conflicts to occur frequently. While a purposeful decision that allows substantial performance improvements over traditional locking methods, the developer must consider conflict as one of two simultaneous users will receive an error when attempting to update a dirty document and the changes will not be committed to the database. 4.2 Amazon s SimpleDB 19 Apache CouchDB: Introduction, online

8 Smartt 8 Amazon s SimpleDB is a highly-scalable database-as-a-service implementation of a nonrelational database. Although non-relational, SimpleDB still allows a developer to follow some of the traditional rules of the relational model. In SimpleDB, traditional tables are called domains with rows of data called items and corresponding columns called attributes Amazon s SimpleDB Structure Although similar to the traditional database model, SimpleDB deviates in many ways from the rules of relational databases. First normal form prevents two values from being stored in one column of data, while SimpleDB allows this by design. Furthermore, two items, even in the same domain, can have different attributes. 21 This would be analogous to a typical entity-relationship database allowing different columns for each attribute in one table. Simple DB indexes domains automatically to allow for efficient querying without much concern for how the underlying data are stored. 22 Schema can also change as the need for the database changes, a characteristic of the SimpleDB model that would be almost impossible to implement in traditional relational databases. 23 Should a new need arise for the database to store a completely different attribute, a simple query allows this attribute to be stored in the relevant items without disturbing the structure of the database nor the efficiency provided by the indices Amazon s SimpleDB Advantages Amazon s SimpleDB integrates well with existing Amazon Web Services (AWS) and allows for completely cloud-based development solutions that extend well beyond database implementation. Usage of SimpleDB follows the same pay-as-you-go model as other 20 Kavanagh, online

9 Smartt 9 software-as-a-service products. 24 As SimpleDB is very similar to the open source CouchDB, the advantages discussed earlier are also relevant here Amazon s SimpleDB Disadvantages Some administrators may have concerns about entrusting data to a third party cloud-based solution, which are well founded given recent problems Amazon has faced with its web services. 25 Aside from these concerns, having unstructured data means that the integrity of data must be preserved outside of the database environment, when writing to the database. When querying data, aggregate functions may be more difficult to implement because the syntax supported by SimpleDB is not traditional SQL. Instances may arise frequently when data is first queried from the database, then some operation must be performed to arrive at some desired aggregate value. The disadvantages associated with CouchDB are also relevant here. 4.3 Google s BigTable Responding to the increasing petabytes of data gathered as it archived the World Wide Web, Google created BigTable, a reliable and highly scalable non-relational database-like storage system. 26 A number of features make Google s BigTable a good test case for the effectiveness of non-relational models in specific instances BigTable Structure The BigTable data model is essentially a three-dimensional version of the typical database table, with loose schema requirements. One dimension stores both the webpage contents as well as the anchors that link to a particular page, another dimension stores other websites (arranged in alphabetical order allowing webpages from the same website to be stored 24 Amazon SimpleDB Pricing, online 25 Goldman, online 26 Chang et al., p. 2

10 Smartt 10 closely in memory), and the final dimension stores the cache of prior Figure 1: Illustration of Google s BigTable Webpage contents Websites that link to wfu.edu Figure 3: BigTable Structure* anchor:collegeboard.com/wfu anchor:wakesg.com versions of a particular website BigTable Advantages Most of the advantages associated with Other websites edu.wfu.www <html> Wake Forest University WFU Homepage BigTable relate to the efficiency of queries *Adapted from Chang et al. and storage. Because data is stored contiguously in tablets based on some commonality in the data, fewer queries need to be performed to retrieve information BigTable Disadvantages While BigTable is desirable for large scale database implementations, the expense associated with setting up a distributed environment may not be necessary for small-scale implementations. As with the other non-relational models, data integrity can be an issue. BigTable does have some structure; however, as data types are supported for each attribute. 5. Advantages of the Non-Relational Model 5.1 Efficiency Perhaps the greatest impetus behind the growing popularity of the non-relational model is the increased efficiency associated with the structuring of the data. In many cases, these efficiencies can be substantial Row vs. Column Stores

11 Smartt 11 Row stores are characterized by the physical data relating to a record s attributes being stored contiguously in memory. 27 This storage technique is common to typical relational DBMSs and prioritizes the speed of writing data over reading data. From the row store diagram, it is easy to understand why this storage technique is more effective at writing data, as data would commonly be written in logical groupings Figure 4: Row Store Physical Data Storage employeeid fname lname that follow a record s attributes. However, in applications where the attributes of data are constantly changing, it is equally understandable why row stores would not John Bob Sue Tim Steve Robinson Johnson Peterson Mead Bandow be desirable. Consider the impact of adding another attribute, for example phoneno, to Figure 5: Column Store Physical Data Storage employeeid fname lname John Bob Sue Tim Steve Robinson Johnson Peterson Mead Bandow our sample database above. To maintain the physical grouping of the data, a new portion of memory must be allocated every n bytes of data, creating complexity. Another issue with this storage technique is that the entirety of each tuple must be brought into memory for a given query, including the irrelevant attributes. 28 Column stores, in contrast, are characterized by record attributes being stored contiguously in memory, prioritizing read operations. 29 Most commonly implemented in data warehousing systems, this storage technique allows much faster processing of queries and has the added benefit of holding in memory only those attributes relevant to the query. Many non-relational models typically organize data in this manner, focusing on the 27 Stonebraker et al, p

12 Smartt 12 efficiency of querying the data rather than organizing the data in a way that necessarily follows the logical structure of the record and its attributes. This implementation allows for new attributes to be added as database needs change without much disruption of the physical storage. In data warehousing, for example, the non-relational column store is 50 times faster than a relational row store. 30 The relational approach requires every column to be read while the non-relational approach allows only columns relevant to the query to be read. While relational databases do exist with column store implementations, namely in newer DBMSs, most legacy systems in place today still rely on code written in the 1980s, which implement the row store technique Hashing Efficiencies extend beyond data warehousing to many other applications. Even the most basic web crawling algorithms utilize non-relational databases for storage are still at least two orders of magnitude faster than the relational databases marketed by major vendors. 32 The discussion of Google s BigTable earlier provides support for this claim, as efficiencies in hashing allow faster lookups of data. Instead of storing all attributes contiguously, BigTable will store references to the final location of a data item on disk, allowing for more efficient read-optimized disks to be used. 5.2 Scalability While relational databases do scale well up to a certain point, they are limited when expansion needs grow beyond one server. 33 As more and more servers are added, the relational model becomes increasingly complex as database administrators must carefully plan how to properly balance system demands across hundreds or thousands of servers. 30 Stonebraker, p Bain, online

13 Smartt 13 Non-relational models, on the other hand, are designed to scale well by implementing the key-value paradigm. Because all data associated with a particular key is stored together on the same server, there is no need to join tables across systems, dramatically reducing the complexity of adding new servers. 5.3 Simplicity The final major benefit of the non-relational database model is its general simplicity. Because non-relational DBMSs are schemaless, there is no need to go through the normalization process or even know what the final database will look like. All of the nonrelational models discussed previously allow adding new attributes by design without compromising the structure of the database. Furthermore, because normalization is not required, complex relationships will also not need to be planned out, as in the typical relational model. Because the structure of databases must often evolve to changing needs, this feature is especially important. 6. Disadvantages of the Non-Relational Model 6.1 Lack of Structure/Data Integrity Because non-relational databases share data across different application platforms, data integrity is difficult to enforce and normalization is often sacrificed for performance. 34 While having a loosely defined structure may be beneficial in certain situations, constraints exist in the relational model to preserve the integrity of the data. With multiple applications performing frequent read/write queries to a database, ensuring that all of these applications will properly preserve the integrity of the database can prove difficult. Instead of database integrity being controlled at the database administrator level, it is now controlled at the developer level, which could lead to problems in standardizing data. 34 Bhat and Jadhav, p

14 Smartt Implementation/Migration A more practical concern, most major corporations, academic institutions, and government agencies have well-established database implementations. Migrating from a relational DBMS to a non-relational DBMS is a difficult process, and one which may deter many from even attempting it. Before considering conversion, these organizations must consider the costs associated with the transition as well as the benefits provided by the non-relational model to determine whether such a move is practical. Although the transition may be difficult, because non-relational models are much more liberal than their relational counterparts, transitioning from a relational model to a non-relational one may prove less difficult than even the transition between two relational models. 7. Conclusion Recent trends in both cloud and distributed computing are making the non-relational model more desirable. The need to scale quickly, disassociate hardware from the data model, and provide more efficient databases are all contributing factors in this transition. When demands outside of the relational realm have come about in the past, administrators pursued less-than-desirable bolt-on approaches that. The most important aspect of the non-relational database movement has been the many varieties of databases that are available to developers outside the legacy systems. Now developers do not have to settle for the relational model when data needs dictate a different approach to storage. While the relational model will likely continue to exist for the foreseeable future, the nonrelational model will only grow in popularity. As demands for performance and scalability increase, variations of the key-value database will continue to evolve.

15 Smartt 15 Works Cited Amazon SimpleDB Pricing. Amazon Web Services. Web. 25 April < Apache CouchDB: Introduction. Apache Software Foundation. Web. 25 April < Bain, Tony. Is the Relational Database Doomed? Readwriteweb.com. ReadWrite Enterprise, 12 February Web. 24 April Bhat, Uma and Shraddha Jadhav. Moving Towards Non-Relational Databases. International Journal of Computer Applications 1.13 (2010): Burleson, Donald. Inside the Database Object Model. Boca Raton: CRC Press, Chang, Fay, et al. Bigtable: A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems 26.2/4 (2008): Dean, Jeffrey and Sanjay Ghemawat. MapReduce: A Flexible Data Processing Tool. Communications of the ACM (2010) 53.1: Goldman, David. Why Amazon s Cloud Titanic Went Down. CNN.com. CNN, 22 April Web. 24 April Harrington, Jan. Relational Database Design and Implementation: Clearly Explained. Burlington: Morgan Kaufmann Publishers, Kavanagh, David. Relating to Amazon SimpleDB. March 4, Web. 28 April < Leff, Avraham and James Rayfield. EDS: An Elastic Data-Service for Situational Applications IEEE International Conference on Web Services (2010). Stonebraker, Mike. Saying Good-bye to DBMSs Communications of the ACM 52.9 (2010):

16 Smartt 16 Stonebraker, Mike et al. C-Store: A Column-oriented DBMS. Proceedings of the 31st VLDB Conference (2005):

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

More information

Referential Integrity in Cloud NoSQL Databases

Referential Integrity in Cloud NoSQL Databases Referential Integrity in Cloud NoSQL Databases by Harsha Raja A thesis submitted to the Victoria University of Wellington in partial fulfilment of the requirements for the degree of Master of Engineering

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

marlabs driving digital agility WHITEPAPER Big Data and Hadoop marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

NoSQL. Thomas Neumann 1 / 22

NoSQL. Thomas Neumann 1 / 22 NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,

More information

Slave. Master. Research Scholar, Bharathiar University

Slave. Master. Research Scholar, Bharathiar University Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually

More information

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP White Paper Big Data and Hadoop Abhishek S, Java COE www.marlabs.com Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP Table of contents Abstract.. 1 Introduction. 2 What is Big

More information

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05 Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,

More information

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what

More information

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

Cassandra A Decentralized, Structured Storage System

Cassandra A Decentralized, Structured Storage System Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Chapter 1: Introduction. Database Management System (DBMS) University Database Example This image cannot currently be displayed. Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Database Management System (DBMS) DBMS contains information

More information

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect

More information

Cloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal

Cloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal Cloud data store services and NoSQL databases Ricardo Vilaça Universidade do Minho Portugal Context Introduction Traditional RDBMS were not designed for massive scale. Storage of digital data has reached

More information

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:

More information

Introduction to Hadoop

Introduction to Hadoop Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction

More information

Big Systems, Big Data

Big Systems, Big Data Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

More information

Cloud computing doesn t yet have a

Cloud computing doesn t yet have a The Case for Cloud Computing Robert L. Grossman University of Illinois at Chicago and Open Data Group To understand clouds and cloud computing, we must first understand the two different types of clouds.

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

2.1.5 Storing your application s structured data in a cloud database

2.1.5 Storing your application s structured data in a cloud database 30 CHAPTER 2 Understanding cloud computing classifications Table 2.3 Basic terms and operations of Amazon S3 Terms Description Object Fundamental entity stored in S3. Each object can range in size from

More information

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example MapReduce MapReduce and SQL Injections CS 3200 Final Lecture Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: Sixth Symposium on Operating System Design

More information

wow CPSC350 relational schemas table normalization practical use of relational algebraic operators tuple relational calculus and their expression in a declarative query language relational schemas CPSC350

More information

MapReduce (in the cloud)

MapReduce (in the cloud) MapReduce (in the cloud) How to painlessly process terabytes of data by Irina Gordei MapReduce Presentation Outline What is MapReduce? Example How it works MapReduce in the cloud Conclusion Demo Motivation:

More information

Cloud Scale Distributed Data Storage. Jürmo Mehine

Cloud Scale Distributed Data Storage. Jürmo Mehine Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented

More information

Relational Database Basics Review

Relational Database Basics Review Relational Database Basics Review IT 4153 Advanced Database J.G. Zheng Spring 2012 Overview Database approach Database system Relational model Database development 2 File Processing Approaches Based on

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

NoSQL Database Options

NoSQL Database Options NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has

More information

Using a Data Warehouse as Part of a General Business Process Data Analysis System

Using a Data Warehouse as Part of a General Business Process Data Analysis System Claremont Colleges Scholarship @ Claremont CMC Senior Theses CMC Student Scholarship 2016 Using a Data Warehouse as Part of a General Business Process Data Analysis System Amit Maor Claremont McKenna College

More information

16.1 MAPREDUCE. For personal use only, not for distribution. 333

16.1 MAPREDUCE. For personal use only, not for distribution. 333 For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several

More information

Certified Apache CouchDB Professional VS-1045

Certified Apache CouchDB Professional VS-1045 Certified Apache CouchDB Professional VS-1045 Certified Apache CouchDB Professional Certification Code VS-1045 Vskills certification for Apache CouchDB Professional assesses the candidate for couchdb database.

More information

Trafodion Operational SQL-on-Hadoop

Trafodion Operational SQL-on-Hadoop Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides. MapReduce for data intensive computing Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

More information

InfiniteGraph: The Distributed Graph Database

InfiniteGraph: The Distributed Graph Database A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

More information

Microsoft Azure Data Technologies: An Overview

Microsoft Azure Data Technologies: An Overview David Chappell Microsoft Azure Data Technologies: An Overview Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Blobs... 3 Running a DBMS in a Virtual Machine... 4 SQL Database...

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do. Database Management Systems Chapter 1 Mirek Riedewald Many slides based on textbook slides by Ramakrishnan and Gehrke 1 Logistics Go to http://www.ccs.neu.edu/~mirek/classes/2010-f- CS3200 for all course-related

More information

www.dotnetsparkles.wordpress.com

www.dotnetsparkles.wordpress.com Database Design Considerations Designing a database requires an understanding of both the business functions you want to model and the database concepts and features used to represent those business functions.

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

CloudDB: A Data Store for all Sizes in the Cloud

CloudDB: A Data Store for all Sizes in the Cloud CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective

More information

Domain driven design, NoSQL and multi-model databases

Domain driven design, NoSQL and multi-model databases Domain driven design, NoSQL and multi-model databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Moving Towards Non-Relational Databases

Moving Towards Non-Relational Databases Moving Towards Non-Relational Databases Uma Bhat Usha Mittal Institute of Technology, SNDT Women s University, Santacruz (W), Mumbai, 400049 Shraddha Jadhav Usha Mittal Institute of Technology, SNDT Women

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

The Sierra Clustered Database Engine, the technology at the heart of

The Sierra Clustered Database Engine, the technology at the heart of A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel

More information

Introduction to Hadoop

Introduction to Hadoop 1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools

More information

This paper defines as "Classical"

This paper defines as Classical Principles of Transactional Approach in the Classical Web-based Systems and the Cloud Computing Systems - Comparative Analysis Vanya Lazarova * Summary: This article presents a comparative analysis of

More information

Hadoop and NoSQL Basics: Big Data Demystified. NYS Innovation Summit, 12/17/2013. Matt LeMay, @mattlemay

Hadoop and NoSQL Basics: Big Data Demystified. NYS Innovation Summit, 12/17/2013. Matt LeMay, @mattlemay Hadoop and NoSQL Basics: Big Data Demystified NYS Innovation Summit, 12/17/2013 Matt LeMay, @mattlemay When I want people to think I m smart, I just say HADOOP really loud. Hadoop! There it is. Big Data!

More information

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011 A Review of Column-Oriented Datastores By: Zach Pratt Independent Study Dr. Maskarinec Spring 2011 Table of Contents 1 Introduction...1 2 Background...3 2.1 Basic Properties of an RDBMS...3 2.2 Example

More information

Cloud Based Distributed Databases: The Future Ahead

Cloud Based Distributed Databases: The Future Ahead Cloud Based Distributed Databases: The Future Ahead Arpita Mathur Mridul Mathur Pallavi Upadhyay Abstract Fault tolerant systems are necessary to be there for distributed databases for data centers or

More information

Big Data Storage, Management and challenges. Ahmed Ali-Eldin

Big Data Storage, Management and challenges. Ahmed Ali-Eldin Big Data Storage, Management and challenges Ahmed Ali-Eldin (Ambitious) Plan What is Big Data? And Why talk about Big Data? How to store Big Data? BigTables (Google) Dynamo (Amazon) How to process Big

More information

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Data Consistency on Private Cloud Storage System

Data Consistency on Private Cloud Storage System Volume, Issue, May-June 202 ISS 2278-6856 Data Consistency on Private Cloud Storage System Yin yein Aye University of Computer Studies,Yangon yinnyeinaye.ptn@email.com Abstract: Cloud computing paradigm

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

Delivering Real-World Total Cost of Ownership and Operational Benefits

Delivering Real-World Total Cost of Ownership and Operational Benefits Delivering Real-World Total Cost of Ownership and Operational Benefits Treasure Data - Delivering Real-World Total Cost of Ownership and Operational Benefits 1 Background Big Data is traditionally thought

More information

Big Table A Distributed Storage System For Data

Big Table A Distributed Storage System For Data Big Table A Distributed Storage System For Data OSDI 2006 Fay Chang, Jeffrey Dean, Sanjay Ghemawat et.al. Presented by Rahul Malviya Why BigTable? Lots of (semi-)structured data at Google - - URLs: Contents,

More information

RDF graph Model and Data Retrival

RDF graph Model and Data Retrival Distributed RDF Graph Keyword Search 15 2 Linked Data, Non-relational Databases and Cloud Computing 2.1.Linked Data The World Wide Web has allowed an unprecedented amount of information to be published

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

How To Improve Performance In A Database

How To Improve Performance In A Database Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed

More information

Relational Databases in the Cloud

Relational Databases in the Cloud Contact Information: February 2011 zimory scale White Paper Relational Databases in the Cloud Target audience CIO/CTOs/Architects with medium to large IT installations looking to reduce IT costs by creating

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

Data Management in the Cloud -

Data Management in the Cloud - Data Management in the Cloud - current issues and research directions Patrick Valduriez Esther Pacitti DNAC Congress, Paris, nov. 2010 http://www.med-hoc-net-2010.org SOPHIA ANTIPOLIS - MÉDITERRANÉE Is

More information

Infrastructures for big data

Infrastructures for big data Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

Understanding NoSQL on Microsoft Azure

Understanding NoSQL on Microsoft Azure David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Data on Azure: The Big Picture... 3 Relational Technology: A Quick

More information

nosql and Non Relational Databases

nosql and Non Relational Databases nosql and Non Relational Databases Image src: http://www.pentaho.com/big-data/nosql/ Matthias Lee Johns Hopkins University What NoSQL? Yes no SQL.. Atleast not only SQL Large class of Non Relaltional Databases

More information

Understanding NoSQL Technologies on Windows Azure

Understanding NoSQL Technologies on Windows Azure David Chappell Understanding NoSQL Technologies on Windows Azure Sponsored by Microsoft Corporation Copyright 2013 Chappell & Associates Contents Data on Windows Azure: The Big Picture... 3 Windows Azure

More information

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12 Introduction to NoSQL Databases and MapReduce Tore Risch Information Technology Uppsala University 2014-05-12 What is a NoSQL Database? 1. A key/value store Basic index manager, no complete query language

More information

How To Write A Database Program

How To Write A Database Program SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store

More information

Raima Database Manager Version 14.0 In-memory Database Engine

Raima Database Manager Version 14.0 In-memory Database Engine + Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized

More information

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of

More information

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL davidh@il.ibm.com 2012 IBM Corporation

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL davidh@il.ibm.com 2012 IBM Corporation Page:1 Openstack Swift Object Store Cloud built from the grounds up David Hadas Swift ATC HRL davidh@il.ibm.com Page:2 Object Store Cloud Services Expectations: PUT/GET/DELETE Huge Capacity (Scale) Always

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK OVERVIEW ON BIG DATA SYSTEMATIC TOOLS MR. SACHIN D. CHAVHAN 1, PROF. S. A. BHURA

More information

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,

More information

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359

More information