Advanced Data Management Technologies
|
|
- Milton Joseph
- 8 years ago
- Views:
Transcription
1 ADMT 2014/15 Unit 15 J. Gamper 1/44 Advanced Data Management Technologies Unit 15 Introduction to NoSQL J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE
2 ADMT 2014/15 Unit 15 J. Gamper 2/44 Outline 1 Motivation 2 NoSQL 3 Categories of NoSQL Datastores Key-Value Stores Column Stores Document Stores Graph Databases
3 Motivation ADMT 2014/15 Unit 15 J. Gamper 3/44 Outline 1 Motivation 2 NoSQL 3 Categories of NoSQL Datastores Key-Value Stores Column Stores Document Stores Graph Databases
4 Motivation ADMT 2014/15 Unit 15 J. Gamper 4/44 New Trends
5 Motivation ADMT 2014/15 Unit 15 J. Gamper 5/44 Big Data The Digital Age/1 IDC/EMC annual report The Diverse and Exploding Digital Universe : The worlds information is doubling every two years. In 2011 the world will create a staggering 1.8 zettabytes. By 2020 the world will generate 50 times the amount of information... while IT staff to manage it will grow less than 1.5 times. New information taming technologies such as deduplication, compression, and analysis tools are driving down the cost of creating, capturing, managing, and storing information to one-sixth the cost in 2011 in comparison to zettabyte = bytes = 1 bio. terabytes
6 Motivation ADMT 2014/15 Unit 15 J. Gamper 6/44 Big Data The Digital Age/2 The New York Stock Exchange generates about 1 terabyte of new trade data per day. Facebook hosts approximately 10 billion photos, taking up one petabyte of storage. Ancestry.com, the genealogy site, stores around 2.5 petabytes of data. The Large Hadron Collider near Geneva will produce about 15 petabytes of data per year. But even an might produce a lot of data.
7 Motivation ADMT 2014/15 Unit 15 J. Gamper 7/44 3 V s of Big Data More V s are coming up: Veracity: accuracy and quality of data is difficult to control Value: it is important to turn big data it into value...
8 Motivation ADMT 2014/15 Unit 15 J. Gamper 8/44 RDBMSs The predominant choice in storing data up until now. First formulated in 1969 by Codd We are using RDBMS everywhere! BUT, are RDBMSs good in managing todays data?
9 Motivation ADMT 2014/15 Unit 15 J. Gamper 9/44 The Death of RDBMS?
10 Motivation ADMT 2014/15 Unit 15 J. Gamper 10/44 What is Wrong with RDBMSs? Nothing is wrong. They are great... SQL provides a rich, declarative language Database enforce referential integrity ACID properties are guaranteed Well understood by developers and administrators Support by many different languages
11 Motivation ADMT 2014/15 Unit 15 J. Gamper 11/44 ACID Properites Atomicity all or nothing Consistency any transaction will take the DB from one consistent state to another with no broken contraints (referential integrity) Isolation other operations cannot access data that has been modified during a transaction that has not yet completed Durability ability to recover the committed transaction updates against any kind of systems failure
12 Motivation ADMT 2014/15 Unit 15 J. Gamper 12/44 But there are some Problems with RDBMSs Problem: Complex objects Object/relational impedance mismatch Complicated to map rich domain model Performance issues: many rows in many tables, many joins,... Problem: Schema evolution Adding attributes to an object have to add columns to a table Expensive for large tables Holding locks on the tables for long time Problem: Semi-structered data Relational schema does not easily handle semi-structured data Common solutions Name/Value table: poor performance Serializable as Blob: fewer joins but no query capabilities Problem: Relational is hard to scale ACID does not scale well Easy to scale reads, but hard to scale writes
13 Motivation ADMT 2014/15 Unit 15 J. Gamper 13/44 One Size does not Fit All! There is nothing wrong with RDBMSs, but one size does not fit all! Alternative tools are available, just use the right tool. The rise of NoSQL databases marks the end of the era of relational database dominance. But NoSQL databases will not become the new dominators. Relational will still be popular, and used in the majority of situations. They, however, will no longer be the automatic choice.
14 NoSQL ADMT 2014/15 Unit 15 J. Gamper 14/44 Outline 1 Motivation 2 NoSQL 3 Categories of NoSQL Datastores Key-Value Stores Column Stores Document Stores Graph Databases
15 NoSQL ADMT 2014/15 Unit 15 J. Gamper 15/44 What is NoSQL? SQL or Not Only SQL or No to SQL? There is no standard definition! The term NoSQL was coined by Carlo Strozzi in 1998 In 2009 used by Eric Evans to refer to DBs which are non-relational, distributed and not conform to ACID. In 2009 first NoSQL conference Refers generally to data models that are non-relational, schema-free, non-(quite)-acid, horizontally scalable, distributed, easy replication support, simple API
16 NoSQL ADMT 2014/15 Unit 15 J. Gamper 16/44 Changing Requirements in the Web Age ACID properties are always desirable But, web applications have different needs from applications that RDBMS were designed for Low and predictable response time (latency) Scalability & elasticity (at low cost!) High availability Flexible schemas and semi-structured data Geographic distribution (multiple data centers) Web applications can (usually) do without Transactions, strong consistency, integrity Complex queries
17 NoSQL ADMT 2014/15 Unit 15 J. Gamper 17/44 CAP Theorem/1 Desired properties of web applications: Consistency the system is in a consistent state after an operation All clients see the same data Strong consistency (ACID) vs. eventual consistency (BASE) Availability the system is always on, no downtime Node failure tolerance all clients can find some available replica Software/hardware upgrade tolerance Partition tolerance the system continues to function even when split into disconnected subsets, e.g., due to network errors or addition/removal of nodes Not only for reads, but writes as well! CAP Theorem (E. Brewer, N. Lynch) In a shared-data system, at most 2 out of the 3 properties can be achieved at any given moment in time.
18 NoSQL ADMT 2014/15 Unit 15 J. Gamper 18/44 CAP Theorem/2 CA CP AP Single site clusters (easier to ensure all nodes are always in contact) e.g., 2PC When a partition occurs, the system blocks Some data may be inaccessible (availability sacrificed), but the rest is still consistent/accurate e.g., sharded database System is still available under partitioning, but some of the data returned my be inaccurate i.e., availability and partition tolerance are more important than strict consistency e.g., DNS, caches, Master/Slave replication Need some conflict resolution strategy
19 NoSQL ADMT 2014/15 Unit 15 J. Gamper 19/44 BASE Properties Requirements regarding reliability, availability, consistency and durability are changing. For a growing number of applications, availability and partition tolerance are more important than strict consistency. These properties are difficult to achieve with ACID properties The BASE properties forfeit the ACID properties of consistency and isolation in favor of availability, graceful degradation, and performance BASE properties Basically Available an application works basically all the time; Soft-state does not have to be consistent all the time; Eventual consistency but will be in some known state eventually. i.e., an application works basically all the time (basically available), does not have to be consistent all the time (soft-state) but will be in some known state eventually (eventual consistency
20 NoSQL ADMT 2014/15 Unit 15 J. Gamper 20/44 BASE vs. ACID Should be considered as a spectrum between the two extremes rather than two altenatives excluding each other ACID BASE Strong consistency Weak consistency stale data OK Isolation Availability first Focus on commit Best effort Nested transactions Approximate answers OK Availability? Aggressive (optimistic) Conservative (pessimistic) Simpler! Difficult evolution (e.g., schema) Faster Easier evolution
21 NoSQL ADMT 2014/15 Unit 15 J. Gamper 21/44 NoSQL Pros and Cons Advantages Massive scalability (horizontal scalability), i.e., machines can be added/removed High availability Lower cost (than competitive solutions at that scale) (Usually) Predictable elasticity Schema flexibility, sparse & semi-structured data Quicker and cheaper to set up Disadvantages Limited query capabilities (so far) Eventual consistency is not intuitive to program Makes client applications more complicated No standardization Portability might be an issue Insufficient access control
22 Categories of NoSQL Datastores ADMT 2014/15 Unit 15 J. Gamper 22/44 Outline 1 Motivation 2 NoSQL 3 Categories of NoSQL Datastores Key-Value Stores Column Stores Document Stores Graph Databases
23 Categories of NoSQL Datastores ADMT 2014/15 Unit 15 J. Gamper 23/44 Categories of NoSQL Datastores Key-Value stores Simple K/V lookups (DHT) Column stores Each key is associated with many attributes (columns) NoSQL column stores are actually hybrid row/column stores Different from pure relational column stores! Document stores Store semi-structured documents (JSON) Map/Reduce based materialisation, sorting, aggregation, etc. Graph databases Not exactly NoSQL... Cannot satisfy the requirements for high availability and scalability/elasticity very well.
24 Categories of NoSQL Datastores ADMT 2014/15 Unit 15 J. Gamper 24/44 Focus of Different NoSQL Data Models
25 Categories of NoSQL Datastores ADMT 2014/15 Unit 15 J. Gamper 25/44 Comparison of SQL and NoSQL Data Models Data Model Performance Scalability Flexibility Complexity Functionality Key-value Stores high high high none variable (none) Column Store high high moderate low minimal Document Store high variable (high) high low variable (low) Graph Database variable variable high high graph theory Relational Database variable variable low moderate relational algebra
26 Categories of NoSQL Datastores Key-Value Stores ADMT 2014/15 Unit 15 J. Gamper 26/44 Key-Value Stores Simple data model: global collection of key-value pairs. Favor high scalability to handle massive data over consistency Rich ad-hoc querying and analytics features are mostly omitted (especially joins and aggregate operations are set aside). Simple API with put and get Key-value stores have existed for a long time, e.g., Berkeley DB. Recent developments have been inspired by Distributed Hashtables and Amazon s Dynamo DeCandia et al., Dynamo: Amazon s Highly Available Key-value Store, SOSP 07 Another important free and open-source key-value store is Voldemort. Multiple types In memory: Memcache On disk: Redis, SimpleDB Eventually consistent: Dynamo, Voldemort
27 Categories of NoSQL Datastores Key-Value Stores ADMT 2014/15 Unit 15 J. Gamper 27/44 Dynamo P2P key-value store at Amazon, 2007 Context and requirements at Amazon Infrastructure: tens of thousands of servers and network components located in many data centers around the world Commodity hardware is used, where component failure is the standard mode of operation Amazon uses a highly decentralized, loosely coupled, service oriented architecture consisting of hundreds of services Low latency and high throughput Simple query model: unique keys, blobs, no schema, no multi-access Scale out (elasticity) Simple API get(key): returning a list of objects and a context put(key, context, object): no return value Key and object values are not interpreted but handled as an opaque array of bytes
28 Categories of NoSQL Datastores Key-Value Stores ADMT 2014/15 Unit 15 J. Gamper 28/44 Voldemort/1 Key-value store initially developed for and still used at LinkedIn Inspired by Amazon s Dynamo Features Written in Java Simple data model and only simple and efficient queries no joins or complex queries no constraints on foreign keys etc. Performance of queries can be predicted well P2P Scale-out / elastic Consistent hashing of keyspace Eventual consistency / high availability Pluggable storage BerkeleyDB, In Memory, MySQL
29 Categories of NoSQL Datastores Key-Value Stores ADMT 2014/15 Unit 15 J. Gamper 29/44 Voldemort/2 API consists of three functions: get(key): returning a value object put(key, value): writing an object/value delete(key): deleting an object Keys and values can be complex, compound objects as well consisting of lists and maps
30 Categories of NoSQL Datastores Column Stores ADMT 2014/15 Unit 15 J. Gamper 30/44 Column Stores Data model: each key is associated with multiple attributes (i.e., columns) Hybrid row/column store Inspired by Google BigTable Examples: BigTable, HBase, Cassandra
31 Categories of NoSQL Datastores Column Stores ADMT 2014/15 Unit 15 J. Gamper 31/44 BigTable BigTable at Google, 2006 A distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers Observation Key-value pairs are a useful building block, but should not be the only one Design goal: data model should be richer than simple key-value pairs, and support sparse semi-structured data, but simple enough that it lends itself to a very efficient flat-file representation
32 Categories of NoSQL Datastores Column Stores ADMT 2014/15 Unit 15 J. Gamper 32/44 BigTable Data Model/1 Sparse, distributed, persistent multidimensional sorted map Values are stored as arrays of bytes (strings) which are not interpreted Values are addressed by (row, column, timestamp) dimensions Example: Multidimensional sorted map with information that a web crawler might emit Flexible number of rows representing domains Flexible number of columns first column contains the content of the web page the others store link text from referring domains Every value has a timestamp
33 Categories of NoSQL Datastores Column Stores ADMT 2014/15 Unit 15 J. Gamper 33/44 BigTable Data Model/2 Row Keys are arbitrary strings Data is sorted by row key Tablet Row range is dynamically partitioned into tablets (sequence of rows) Range scans are very efficient Row keys should be chosen to improve locality of data access Column, Column Family Column keys are arbitrary strings, unlimited number of columns Column keys can be grouped into families Data in a CF is stored and compressed together (Locality Groups) Access control on the CF level
34 Categories of NoSQL Datastores Column Stores ADMT 2014/15 Unit 15 J. Gamper 34/44 BigTable Data Model/3 Timestamps Each cell has multiple versions Can be manually assigned Versioning Automated garbage collection Retain last N versions or versions newer than TS Architecture Data stored on GFS 1 Master server Thousands of Tablet servers
35 Categories of NoSQL Datastores Column Stores ADMT 2014/15 Unit 15 J. Gamper 35/44 BigTable Architecture Data is stored in a 3-level hierarchy similar to B + -trees Chubby file contains location of root tablet Root tablet contains all tablet locations in Metadata table Metadata table stores locations of actual tablets
36 Categories of NoSQL Datastores Document Stores ADMT 2014/15 Unit 15 J. Gamper 36/44 Document Stores Similar to a key-value database, but with a major difference: value is a document. Inspired by Lotus Notes Flexible schema Any number of fields can be added Document mainly stored in JSON or BSON formats Example document: { } day: [ 2010, 01, 23 ], products: { apple: { price: 10 quantity: 6 } kiwi: { price: 20 quantity: 2 } } checkout: 100
37 Categories of NoSQL Datastores Document Stores ADMT 2014/15 Unit 15 J. Gamper 37/44 CouchDB/1 Schema-free, document store DB Documents stored in JSON format (XML in old versions) B-tree storage engine MVCC model, no locking No joins, no PK/FK (UUIDs are auto assigned) Implemented in Erlang 1st version in C++, 2nd in Erlang and 500 times more scalable Replication (incremental) Documents UUID Old versions retained Custom persistent views using MapReduce RESTful HTTP interface
38 Categories of NoSQL Datastores Document Stores ADMT 2014/15 Unit 15 J. Gamper 38/44 CouchDB/2 Main abstraction and data structure is a document Consist of named fields that have a key/name and a value Field name must be unique in document Value may be a string, number, boolean, date, ordered list, map References to other documents (URIs, URLs) are possible but not checked by the DB Example document Title : CouchDB, Last editor : , Last modified : 9/23/2010, Categories : [ Database, NoSQL, Document Database ], Body : CouchDB is a..., Reviewed : false
39 Categories of NoSQL Datastores Document Stores ADMT 2014/15 Unit 15 J. Gamper 39/44 MongoDB Document store DB written in C++ Full index support Replication & high availability Supports ad-hoc querying Fast in-place updates Officially supported drivers available for multiple languages C, C++, Java, Javascript, Perla, PHP, Python, Ruby Map/Reduce GridFS Commercial support
40 Categories of NoSQL Datastores Document Stores ADMT 2014/15 Unit 15 J. Gamper 40/44 MongoDB/2 A database resides on a MongoDB server A MongoDB database consists of one or more collections of documents Schema-free, i.e., documents in a collection may be heterogeneous Main abstraction and data structure is a document Comparable to an XML document or a JSON document Documents are stored in BSON Similar to JSON, but binary representation for efficiency reasons Example document: { title : MongoDB, last editor : , last modified : new Date ( 9/23/2010 ), body : MongoDB is a..., categories : [ Database, NoSQL, Document Database ], reviewed : false }
41 Categories of NoSQL Datastores Document Stores ADMT 2014/15 Unit 15 J. Gamper 41/44 MongoDB Example Create a collection named mycoll with 10,000,000 bytes of preallocated disk space and no automatically generated and indexed document-field db.createcollection( mycoll, size: , autoindexid: false) Add a document into mycoll db.mycoll.insert(title: MongoDB, last editor:... ) Retrieve a document from mycoll db.mycoll.find(categories: [ NoSQL, Document Databases ])
42 Categories of NoSQL Datastores Document Stores ADMT 2014/15 Unit 15 J. Gamper 42/44 MongoDB Deployment
43 Categories of NoSQL Datastores Graph Databases ADMT 2014/15 Unit 15 J. Gamper 43/44 Graph Databases Data Model Nodes Relations Properties Inspired by Euler s graph theory Examples: Neo4j, InfiniteGraph
44 ADMT 2014/15 Unit 15 J. Gamper 44/44 Summary New trends emerged in the past decade: big data, complexity, connectivity, diversity, etc. New requirements: consistency, availability and partitioning tolerance. NoSQL provides flexible solution for such requirements. NoSQL taxonomy Key-value stores Column stores Document stores Graph databases Use the right data model for the right problem.
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
More informationPreparing Your Data For Cloud
Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability
More informationSQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
More informationWhy NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
More informationCloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
More informationNoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
More informationAnalytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
More informationStructured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
More informationOverview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
More informationA COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where
More informationSlave. Master. Research Scholar, Bharathiar University
Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually
More informationA Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011
A Review of Column-Oriented Datastores By: Zach Pratt Independent Study Dr. Maskarinec Spring 2011 Table of Contents 1 Introduction...1 2 Background...3 2.1 Basic Properties of an RDBMS...3 2.2 Example
More informationNoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
More informationwow CPSC350 relational schemas table normalization practical use of relational algebraic operators tuple relational calculus and their expression in a declarative query language relational schemas CPSC350
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationnosql and Non Relational Databases
nosql and Non Relational Databases Image src: http://www.pentaho.com/big-data/nosql/ Matthias Lee Johns Hopkins University What NoSQL? Yes no SQL.. Atleast not only SQL Large class of Non Relaltional Databases
More informationNoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect
More informationDatabases : Lecture 11 : Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2014. Apologies to Martin Fowler ( NoSQL Distilled )
Databases : Lecture 11 : Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2014 Rise of Web and cluster-based computing NoSQL Movement Relationships vs. Aggregates Key-value store XML or JSON
More informationCloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal
Cloud data store services and NoSQL databases Ricardo Vilaça Universidade do Minho Portugal Context Introduction Traditional RDBMS were not designed for massive scale. Storage of digital data has reached
More informationNoSQL. Thomas Neumann 1 / 22
NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,
More informationIntroduction to NOSQL
Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo
More informationMongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15
MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You
More informationNoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
More informationextensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
More informationComposite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
More informationBRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING
BRAC UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING 12-12-2012 Investigating Cloud Data Storage Sumaiya Binte Mostafa (ID 08301001) Firoza Tabassum (ID 09101028) BRAC University
More informationComparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationA programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
More informationNot Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)
Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure
More informationNOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE <GEOFF@ZORCHED.NET> @GEOFFLANE
NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE @GEOFFLANE WHAT IS NOSQL? NON-RELATIONAL DATA STORAGE USUALLY SCHEMA-FREE ACCESS DATA WITHOUT SQL (THUS... NOSQL) WIDE-COLUMN / TABULAR
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationBig Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
More informationCS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #13: NoSQL and MapReduce Announcements HW4 is out You have to use the PGSQL server START EARLY!! We can not help if everyone
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationStudy and Comparison of Elastic Cloud Databases : Myth or Reality?
Université Catholique de Louvain Ecole Polytechnique de Louvain Computer Engineering Department Study and Comparison of Elastic Cloud Databases : Myth or Reality? Promoters: Peter Van Roy Sabri Skhiri
More informationHow To Write A Database Program
SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store
More informationNoSQL Systems for Big Data Management
NoSQL Systems for Big Data Management Venkat N Gudivada East Carolina University Greenville, North Carolina USA Venkat Gudivada NoSQL Systems for Big Data Management 1/28 Outline 1 An Overview of NoSQL
More informationThe CAP theorem and the design of large scale distributed systems: Part I
The CAP theorem and the design of large scale distributed systems: Part I Silvia Bonomi University of Rome La Sapienza www.dis.uniroma1.it/~bonomi Great Ideas in Computer Science & Engineering A.A. 2012/2013
More informationMongoDB Developer and Administrator Certification Course Agenda
MongoDB Developer and Administrator Certification Course Agenda Lesson 1: NoSQL Database Introduction What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL Types of NoSQL
More informationMaking Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction
More informationthese three NoSQL databases because I wanted to see a the two different sides of the CAP
Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the
More informationВовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН
Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН Zettabytes Petabytes ABC Sharding A B C Id Fn Ln Addr 1 Fred Jones Liberty, NY 2 John Smith?????? 122+ NoSQL Database
More informationIntroduction to NoSQL and MongoDB. Kathleen Durant Lesson 20 CS 3200 Northeastern University
Introduction to NoSQL and MongoDB Kathleen Durant Lesson 20 CS 3200 Northeastern University 1 Outline for today Introduction to NoSQL Architecture Sharding Replica sets NoSQL Assumptions and the CAP Theorem
More informationDo Relational Databases Belong in the Cloud? Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com
Do Relational Databases Belong in the Cloud? Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com How do you model data in the cloud? Relational Model A query operation on a relation
More informationReferential Integrity in Cloud NoSQL Databases
Referential Integrity in Cloud NoSQL Databases by Harsha Raja A thesis submitted to the Victoria University of Wellington in partial fulfilment of the requirements for the degree of Master of Engineering
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationDomain driven design, NoSQL and multi-model databases
Domain driven design, NoSQL and multi-model databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationApplication of NoSQL Database in Web Crawling
Application of NoSQL Database in Web Crawling College of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China doi:10.4156/jdcta.vol5.issue6.31 Abstract
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationBig Data Solutions. Portal Development with MongoDB and Liferay. Solutions
Big Data Solutions Portal Development with MongoDB and Liferay Solutions Introduction Companies have made huge investments in Business Intelligence and analytics to better understand their clients and
More informationDatabases 2 (VU) (707.030)
Databases 2 (VU) (707.030) Introduction to NoSQL Denis Helic KMI, TU Graz Oct 14, 2013 Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 1 / 37 Outline 1 NoSQL Motivation 2 NoSQL Systems 3 NoSQL Examples 4
More informationNoSQL: Going Beyond Structured Data and RDBMS
NoSQL: Going Beyond Structured Data and RDBMS Scenario Size of data >> disk or memory space on a single machine Store data across many machines Retrieve data from many machines Machine = Commodity machine
More informationCloud Computing with Microsoft Azure
Cloud Computing with Microsoft Azure Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com http://www.reliablesoftware.com/dasblog/default.aspx Azure's Three Flavors Azure Operating
More informationX4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
More informationNoSQL Evaluation. A Use Case Oriented Survey
2011 International Conference on Cloud and Service Computing NoSQL Evaluation A Use Case Oriented Survey Robin Hecht Chair of Applied Computer Science IV University ofbayreuth Bayreuth, Germany robin.hecht@uni
More information2.1.5 Storing your application s structured data in a cloud database
30 CHAPTER 2 Understanding cloud computing classifications Table 2.3 Basic terms and operations of Amazon S3 Terms Description Object Fundamental entity stored in S3. Each object can range in size from
More informationNoSQL in der Cloud Why? Andreas Hartmann
NoSQL in der Cloud Why? Andreas Hartmann 17.04.2013 17.04.2013 2 NoSQL in der Cloud Why? Quelle: http://res.sys-con.com/story/mar12/2188748/cloudbigdata_0_0.jpg Why Cloud??? 17.04.2013 3 NoSQL in der Cloud
More informationLecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
More informationMaking Sense of NoSQL Dan McCreary Wednesday, Nov. 13 th 2014
Making Sense of NoSQL Dan McCreary Wednesday, Nov. 13 th 2014 Agenda Why NoSQL? What are the key NoSQL architectures? How are they different from traditional RDBMS Systems? What types of problems do they
More informationIn Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
More informationNoSQL Database Systems and their Security Challenges
NoSQL Database Systems and their Security Challenges Morteza Amini amini@sharif.edu Data & Network Security Lab (DNSL) Department of Computer Engineering Sharif University of Technology September 25 2
More informationNoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases
NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases Background Inspiration: postgresapp.com demo.beatstream.fi (modern desktop browsers without
More informationUsing Object Database db4o as Storage Provider in Voldemort
Using Object Database db4o as Storage Provider in Voldemort by German Viscuso db4objects (a division of Versant Corporation) September 2010 Abstract: In this article I will show you how
More informationNoSQL Databases. Polyglot Persistence
The future is: NoSQL Databases Polyglot Persistence a note on the future of data storage in the enterprise, written primarily for those involved in the management of application development. Martin Fowler
More informationF1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013
F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords
More informationThe Quest for Extreme Scalability
The Quest for Extreme Scalability In times of a growing audience, very successful internet applications have all been facing the same database issue: while web servers can be multiplied without too many
More informationIntegrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
More information.NET User Group Bern
.NET User Group Bern Roger Rudin bbv Software Services AG roger.rudin@bbv.ch Agenda What is NoSQL Understanding the Motivation behind NoSQL MongoDB: A Document Oriented Database NoSQL Use Cases What is
More informationNOSQL DATABASES AND CASSANDRA
NOSQL DATABASES AND CASSANDRA Semester Project: Advanced Databases DECEMBER 14, 2015 WANG CAN, EVABRIGHT BERTHA Université Libre de Bruxelles 0 Preface The goal of this report is to introduce the new evolving
More informationCloud Computing Is In Your Future
Cloud Computing Is In Your Future Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com http://www.reliablesoftware.com/dasblog/default.aspx Cloud Computing is Utility Computing Illusion
More informationHypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
More informationOpen source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
More informationBigtable is a proven design Underpins 100+ Google services:
Mastering Massive Data Volumes with Hypertable Doug Judd Talk Outline Overview Architecture Performance Evaluation Case Studies Hypertable Overview Massively Scalable Database Modeled after Google s Bigtable
More informationSentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
More informationNoSQL Database Options
NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has
More informationApplications for Big Data Analytics
Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:
More informationPractical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
More informationINTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
More informationThe evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
More informationUnderstanding NoSQL Technologies on Windows Azure
David Chappell Understanding NoSQL Technologies on Windows Azure Sponsored by Microsoft Corporation Copyright 2013 Chappell & Associates Contents Data on Windows Azure: The Big Picture... 3 Windows Azure
More informationBenchmarking and Analysis of NoSQL Technologies
Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The
More informationA survey of big data architectures for handling massive data
CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - jordydomingos@gmail.com Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context
More informationOpen Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
More informationCloudDB: A Data Store for all Sizes in the Cloud
CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective
More informationIntroduction to NoSQL
Introduction to NoSQL NoSQL Seminar 2012 @ TUT Arto Salminen What is NoSQL? Class of database management systems (DBMS) "Not only SQL" Does not use SQL as querying language Distributed, fault-tolerant
More informationbigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
More informationUnderstanding NoSQL on Microsoft Azure
David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Data on Azure: The Big Picture... 3 Relational Technology: A Quick
More informationThe NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua
The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in
More informationNoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011
NoSQL - What we ve learned with mongodb Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011 DW2.0 and NoSQL management decision support intgrated access - local v. global - structured v.
More informationHBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367
HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive
More informationCluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.
Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one
More informationDatabase Management System Choices. Introduction To Database Systems CSE 373 Spring 2013
Database Management System Choices Introduction To Database Systems CSE 373 Spring 2013 Outline Introduction PostgreSQL MySQL Microsoft SQL Server Choosing A DBMS NoSQL Introduction There a lot of options
More informationNoSQL. What Is NoSQL? Why NoSQL?
SYSADMIN NoSQL GREG BURD Greg Burd is a Developer Advocate for Basho Technologies, makers of Riak. Before Basho, Greg spent nearly ten years as the product manager for Berkeley DB at Sleepycat Software
More informationMaking Sense of NoSQL Dan McCreary Ann Kelly
Sample Chapter Making Sense of NoSQL Dan McCreary Ann Kelly Chapter 1 Copyright 2013 Manning Publications brief contents PART 1 INTRODUCTION...1 1 NoSQL: It s about making intelligent choices 3 2 NoSQL
More informationHigh Throughput Computing on P2P Networks. Carlos Pérez Miguel carlos.perezm@ehu.es
High Throughput Computing on P2P Networks Carlos Pérez Miguel carlos.perezm@ehu.es Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More information