26/05/2015. Relational Databases BIG DATA: STORING STRUCTURED INFORMATION. Information Retrieval: Storing Unstructured Information

Size: px
Start display at page:

Download "26/05/2015. Relational Databases BIG DATA: STORING STRUCTURED INFORMATION. Information Retrieval: Storing Unstructured Information"

Transcription

1 CC PROCESAMIENTO MASIVO DE DATOS OTOÑO 2015 Information Retrieal: Storing Unstructured Information Lecture 9: NoSQL I Aidan Hogan aidhog@gmail.com Relational Databases BIG DATA: STORING STRUCTURED INFORMATION Relational Databases: One Size Fits All? 1

2 RDBMS: Performance Oerheads Structured Query Language (SQL): Declaratie Language Lots of Rich Features Difficult to Optimise! Atomicity, Consistency, Isolation, Durability (ACID): Maes sure your database stays correct Een if there s a lot of traffic! Transactions incur a lot of oerhead Multi-phase locs, multi-ersioning, write ahead logging Distribution not straightforward Transactional oerhead: the cost of ACID 640 tps for system with transactional support 12,700 tps for system without logs, transactions or loc scheduling RDBMS: Complexity ALTERNATIVES TO RELATIONAL DATABASES FOR QUERYING BIG STRUCTURED DATA? NoSQL What do you guys now about NoSQL? The Database Landscape Batch analysis of data Not using the relational model Using the relational model Real-time Stores documents (semi-structured alues) Not only SQL Maps Column Oriented Graph-structured data Relational Databases with focus on scalability to compete with NoSQL while maintaining ACID Cloud storage In-Memory 2

3 NoSQL NoSQL: CAP (not ACID) NoSQL CA: Guarantees to gie a correct response but only while networ wors fine (Centralised / Traditional) C CP: Guarantees responses are correct een if there are networ failures, but response may fail (Wea aailability) A P (No intersection) AP: Always proides a best-effort response een in presence of networ failures (Eentual consistency) Distributed! Sharding: splitting data oer serers horizontally Replication Lower-leel than RDBMS/SQL Simpler ad hoc APIs But you build the application (programming not querying) Operations simple and cheap Different flaours (for different scenarios) Different CAP emphasis Different scalability profiles Different query functionality Different data models The Database Landscape Not using the relational model Batch analysis of data Using the relational model Real-time NOSQL: KEY VALUE STORE Stores documents (semi-structured alues) Not only SQL Maps Column Oriented Graph-structured data Relational Databases with focus on scalability to compete with NoSQL while maintaining ACID Cloud storage In-Memory 3

4 Key Value Store Model It s just a Map / Associate Array put(ey,alue) get(ey) delete(ey) Key Afghanistan Albania Algeria Andorra la Vella Angola Antigua and Barbuda Value Kabul Tirana Algiers Andorra la Vella Luanda St. John s. But You Can Do a Lot With a Map Key country:afghanistan country:albania city:kabul city:tirana user:10239 Value capital@city:kabul,continent:asia,pop: #2011 capital@city:tirana,continent:europe,pop: #2013 country:afghanistan,pop: #2013 country:albania,pop: #2013 basedin@city:tirana,post:{103,10430,201} actually you can model any data in a map (but possibly with a lot of redundancy and inefficient looups if unsorted). Products Listings: prices, details, stoc THE CASE OF AMAZON Customer info: shopping cart, account, etc. Recommendations, etc.: 4

5 Amazon customers: Key Value Store: Amazon Dynamo(DB) Databases struggling But many Amazon serices don t need: SQL (a simple map often enough) or een: transactions, strong consistency, etc. Goals: Scalability (able to grow) High aailability (reliable) Performance (fast) Don t need full SQL, don t need full ACID Key Value Store: Distribution Key Value Store: Distribution How might a ey alue store be distributed oer multiple machines? What happens if a machine joins or leaes half way through? Or a custom partitioner Or a custom partitioner 5

6 Key Value Store: Distribution How can we sole this? Or a custom partitioner Consistent Hashing Aoid re-hashing eerything Hash using a ring Each machine pics n psuedo-random points on the ring Machine responsible for arc after its point If a machine leaes, its range moes to preious machine If machine joins, it pics new points Objects mapped to ring How many eys (on aerage) need to be moed if a machine joins or leaes? Amazon Dynamo: Hashing Consistent Hashing (128-bit MD5) Key Value Store: Replication A set replication factor (here 3) Commonly primary / secondary replicas Primary replica elected from secondary replicas in the case of failure of primary A1 B1 C1 D1 E1 Amazon Dynamo: Replication Replication factor of n Easy: pic n next bucets (different machines!) Amazon Dynamo: Object Versioning Object Versioning (per bucet) PUT doesn t oerwrite: pushes ersion GET returns most recent ersion 6

7 Amazon Dynamo: Object Versioning Object Versioning (per bucet) DELETE doesn t wipe GET will return not found Amazon Dynamo: Object Versioning Object Versioning (per bucet) GET by ersion Amazon Dynamo: Object Versioning Object Versioning (per bucet) PERMANENT DELETE by ersion wiped Amazon Dynamo: Model Named table with primary ey and a alue Primary ey is hashed / unordered Primary Key Afghanistan Albania Primary Key Kabul Tirana Countries Value capital:kabul,continent:asia,pop: #2011 capital:tirana,continent:europe,pop: #2013 Cities Value country:afghanistan,pop: #2013 country:albania,pop: #2013 Amazon Dynamo: Model Dual primary ey also aailable: Hash: unordered Range: ordered Countries Hash Key Range Key Value Vatican City 839 capital:vatican City,continent:Europe Nauru 9945 capital:yaren,continent:oceania Amazon Dynamo: CAP Two options for each table: AP: Eentual consistency, High aailability CP: Strong consistency, Lower aailability What s a CP system again? What s an AP system again? 7

8 Amazon Dynamo: Consistency Gossiping Keep alie messages sent between nodes with state Amazon Dynamo: Consistency Two ersions of one shopping cart: Quorums: N nodes responsible for a read write Multiple nodes acnowledge read/write for success At the cost of aailability! Hinted Handoff For transient failures A node coers for another node while its down How best to handle multiple conflicting ersions of a alue (nowing as reconciliation)? Application nows best ( and must support multiple ersions being returned) Amazon Dynamo: Vector Clocs Vector Cloc: A list of pairs indicating a node (i.e., a serer) and a time stamp Used to trac/order ersions Amazon Dynamo: Eentual Consistency using Merle Trees Merle tree is a hash tree Nodes hae hashes of their children Leaf node hashes from data: eys or ranges Amazon Dynamo: Eentual Consistency using Merle Trees Easy to erify regions of the Map Can compare leel-at-a-time Amazon Dynamo: Budgeting Assign throughput per table: allocate resources Reads (4 KB resolution): Writes (1 KB resolution) 8

9 Read More OTHER KEY VALUE STORES Other Key Value Stores Other Key Value Stores Other Key Value Stores NOSQL: DOCUMENT STORE 9

10 The Database Landscape Key Value Stores: Values are Documents Real-time Stores documents (semi-structured alues) Not only SQL Maps Column Oriented Graph-structured data In-Memory Not using the relational model Batch analysis of data Using the relational model Relational Databases with focus on scalability to compete with NoSQL while maintaining ACID Cloud storage Key country:afghanistan Value <country> <capital>city:kabul</capital> <continent>asia</continent> <population> <alue> </alue> <year>2011</year> </population> </country> Document-type depends on store XML, JSON, Blobs, Natural language Operators for documents e.g., filtering, in. indexing, XML/JSON querying, etc. MongoDB: JSON Based Key 6ads786a5a9 o Value (Document) { _id : ObjectId( 6ads786a5a9 ), name : Afghanistan, capital : Kabul, continent : Asia, population : { alue : , year : 2011 } } Document Stores Can inoe Jaascript oer the JSON objects Document fields can be indexed Recap Relational Databases don t sole eerything SQL and ACID add oerhead Distribution not so easy RECAP NoSQL: what if you don t need SQL or ACID? Something simpler Something more scalable Trade efficiency against guarantees 10

11 Recap Recap Key alue stores inspired by Amazon Dynamo Distributed maps Hash eys and range eys Table names Consistent hashing Replication Object ersioning / ector clocs Gossiping / Quorums / Hinted Hand-offs Merle trees Budgeting Document stores: documents as alues Support for JSON, XML alues, field indexing, etc. Questions 11

Cloud Scale Distributed Data Storage. Jürmo Mehine

Cloud Scale Distributed Data Storage. Jürmo Mehine Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented

More information

Transactions and ACID in MongoDB

Transactions and ACID in MongoDB Transactions and ACID in MongoDB Kevin Swingler Contents Recap of ACID transactions in RDBMSs Transactions and ACID in MongoDB 1 Concurrency Databases are almost always accessed by multiple users concurrently

More information

these three NoSQL databases because I wanted to see a the two different sides of the CAP

these three NoSQL databases because I wanted to see a the two different sides of the CAP Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the

More information

Dynamo: Amazon s Highly Available Key-value Store

Dynamo: Amazon s Highly Available Key-value Store Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and

More information

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010 System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached

More information

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00 Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn

More information

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,

More information

How To Improve Performance In A Database

How To Improve Performance In A Database Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed

More information

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

More information

NoSQL Databases. Nikos Parlavantzas

NoSQL Databases. Nikos Parlavantzas !!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!

More information

Distributed Data Stores

Distributed Data Stores Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High

More information

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Study and Comparison of Elastic Cloud Databases : Myth or Reality? Université Catholique de Louvain Ecole Polytechnique de Louvain Computer Engineering Department Study and Comparison of Elastic Cloud Databases : Myth or Reality? Promoters: Peter Van Roy Sabri Skhiri

More information

Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch September 30, 2013 29-09-2013 1

Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch September 30, 2013 29-09-2013 1 Big Data Management Big Data Management (BDM) Autumn 2013 Povl Koch September 30, 2013 29-09-2013 1 Overview Today s program 1. Little more practical details about this course 2. Recap from last time 3.

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI Big Data Development CASSANDRA NoSQL Training - Workshop March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 121109 Dubai UAE, email training-coordinator@isidusnet M: +97150

More information

Institutionen för datavetenskap Department of Computer and Information Science

Institutionen för datavetenskap Department of Computer and Information Science Institutionen för datavetenskap Department of Computer and Information Science Final thesis Exploration of NoSQL Technologies for managing hotel reservations by Sylvain Coulombel LiTH-IDA/ERASMUS-A--15/001--SE

More information

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You

More information

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election

More information

Open source, high performance database

Open source, high performance database Open source, high performance database Anti-social Databases: NoSQL and MongoDB Will LaForest Senior Director of 10gen Federal will@10gen.com @WLaForest 1 SQL invented Dynamic Web Content released IBM

More information

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what

More information

MongoDB Developer and Administrator Certification Course Agenda

MongoDB Developer and Administrator Certification Course Agenda MongoDB Developer and Administrator Certification Course Agenda Lesson 1: NoSQL Database Introduction What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL Types of NoSQL

More information

Distributed Storage Systems

Distributed Storage Systems Distributed Storage Systems John Leach john@brightbox.com twitter @johnleach Brightbox Cloud http://brightbox.com Our requirements Bright box has multiple zones (data centres) Should tolerate a zone failure

More information

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011 NoSQL - What we ve learned with mongodb Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011 DW2.0 and NoSQL management decision support intgrated access - local v. global - structured v.

More information

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where

More information

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb, Consulting MTS The following is intended to outline our general product direction. It is intended for information

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

Distributed Systems. Tutorial 12 Cassandra

Distributed Systems. Tutorial 12 Cassandra Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

Introduction to NOSQL

Introduction to NOSQL Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo

More information

Preparing Your Data For Cloud

Preparing Your Data For Cloud Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability

More information

IBM Tivoli Monitoring Version 6.3 Fix Pack 2. Windows OS Agent Reference

IBM Tivoli Monitoring Version 6.3 Fix Pack 2. Windows OS Agent Reference IBM Tioli Monitoring Version 6.3 Fix Pack 2 Windows OS Agent Reference IBM Tioli Monitoring Version 6.3 Fix Pack 2 Windows OS Agent Reference Note Before using this information and the product it supports,

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

NoSQL in der Cloud Why? Andreas Hartmann

NoSQL in der Cloud Why? Andreas Hartmann NoSQL in der Cloud Why? Andreas Hartmann 17.04.2013 17.04.2013 2 NoSQL in der Cloud Why? Quelle: http://res.sys-con.com/story/mar12/2188748/cloudbigdata_0_0.jpg Why Cloud??? 17.04.2013 3 NoSQL in der Cloud

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2014/15 Unit 15 J. Gamper 1/44 Advanced Data Management Technologies Unit 15 Introduction to NoSQL J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE ADMT 2014/15 Unit 15

More information

Introduction to NoSQL and MongoDB. Kathleen Durant Lesson 20 CS 3200 Northeastern University

Introduction to NoSQL and MongoDB. Kathleen Durant Lesson 20 CS 3200 Northeastern University Introduction to NoSQL and MongoDB Kathleen Durant Lesson 20 CS 3200 Northeastern University 1 Outline for today Introduction to NoSQL Architecture Sharding Replica sets NoSQL Assumptions and the CAP Theorem

More information

Xiaowe Xiaow i e Wan Wa g Jingxin Fen Fe g n Mar 7th, 2011

Xiaowe Xiaow i e Wan Wa g Jingxin Fen Fe g n Mar 7th, 2011 Xiaowei Wang Jingxin Feng Mar 7 th, 2011 Overview Background Data Model API Architecture Users Linearly scalability Replication and Consistency Tradeoff Background Cassandra is a highly scalable, eventually

More information

Using Object Database db4o as Storage Provider in Voldemort

Using Object Database db4o as Storage Provider in Voldemort Using Object Database db4o as Storage Provider in Voldemort by German Viscuso db4objects (a division of Versant Corporation) September 2010 Abstract: In this article I will show you how

More information

Cassandra A Decentralized, Structured Storage System

Cassandra A Decentralized, Structured Storage System Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922

More information

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015 7/04/05 Fundamentals of Distributed Systems CC5- PROCESAMIENTO MASIVO DE DATOS OTOÑO 05 Lecture 4: DFS & MapReduce I Aidan Hogan aidhog@gmail.com Inside Google circa 997/98 MASSIVE DATA PROCESSING (THE

More information

The CAP theorem and the design of large scale distributed systems: Part I

The CAP theorem and the design of large scale distributed systems: Part I The CAP theorem and the design of large scale distributed systems: Part I Silvia Bonomi University of Rome La Sapienza www.dis.uniroma1.it/~bonomi Great Ideas in Computer Science & Engineering A.A. 2012/2013

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

In-Memory Database User Guide

In-Memory Database User Guide IBM soliddb Version 7.0 In-Memory Database User Guide SC27-3845-05 Note Before using this information and the product it supports, read the information in Notices on page 41. First edition, fifth reision

More information

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011 A Review of Column-Oriented Datastores By: Zach Pratt Independent Study Dr. Maskarinec Spring 2011 Table of Contents 1 Introduction...1 2 Background...3 2.1 Basic Properties of an RDBMS...3 2.2 Example

More information

2.1.5 Storing your application s structured data in a cloud database

2.1.5 Storing your application s structured data in a cloud database 30 CHAPTER 2 Understanding cloud computing classifications Table 2.3 Basic terms and operations of Amazon S3 Terms Description Object Fundamental entity stored in S3. Each object can range in size from

More information

NOSQL, BIG DATA AND GRAPHS. Technology Choices for Today s Mission- Critical Applications

NOSQL, BIG DATA AND GRAPHS. Technology Choices for Today s Mission- Critical Applications NOSQL, BIG DATA AND GRAPHS Technology Choices for Today s Mission- Critical Applications 2 NOSQL, BIG DATA AND GRAPHS NOSQL, BIG DATA AND GRAPHS TECHNOLOGY CHOICES FOR TODAY S MISSION- CRITICAL APPLICATIONS

More information

Design Patterns for Distributed Non-Relational Databases

Design Patterns for Distributed Non-Relational Databases Design Patterns for Distributed Non-Relational Databases aka Just Enough Distributed Systems To Be Dangerous (in 40 minutes) Todd Lipcon (@tlipcon) Cloudera June 11, 2009 Introduction Common Underlying

More information

Study concluded that success rate for penetration from outside threats higher in corporate data centers

Study concluded that success rate for penetration from outside threats higher in corporate data centers Auditing in the cloud Ownership of data Historically, with the company Company responsible to secure data Firewall, infrastructure hardening, database security Auditing Performed on site by inspecting

More information

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction

More information

NoSQL. Thomas Neumann 1 / 22

NoSQL. Thomas Neumann 1 / 22 NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensie Computing Uniersity of Florida, CISE Department Prof. Daisy Zhe Wang Map/Reduce: Simplified Data Processing on Large Clusters Parallel/Distributed

More information

Database Appliances, MapReduce, and NoSQL Compared to RDBMS

Database Appliances, MapReduce, and NoSQL Compared to RDBMS Appliances, MapReduce, and NoSQL Compared to RDBMS Prepared by Philip Czachorowsi Noember 17, 2011 1 Disclaimer All technical information is presented as is. Any mentions of products does not constitute

More information

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...

More information

How To Use Big Data For Telco (For A Telco)

How To Use Big Data For Telco (For A Telco) ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information

Department of Software Systems. Presenter: Saira Shaheen, 227233 saira.shaheen@tut.fi 0417016438 Dated: 02-10-2012

Department of Software Systems. Presenter: Saira Shaheen, 227233 saira.shaheen@tut.fi 0417016438 Dated: 02-10-2012 1 MongoDB Department of Software Systems Presenter: Saira Shaheen, 227233 saira.shaheen@tut.fi 0417016438 Dated: 02-10-2012 2 Contents Motivation : Why nosql? Introduction : What does NoSQL means?? Applications

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

We are immersed in the Information Age where

We are immersed in the Information Age where Oeriew Big Data with Cloud Computing: an insight on the computing enironment, MapReduce, and programming frameworks Alberto Fernández, 1 Sara del Río, 2 Victoria López, 2 Abdullah Bawakid, 3 María J. del

More information

NoSQL Database Options

NoSQL Database Options NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has

More information

ERserver. Single signon. iseries. Version 5 Release 3

ERserver. Single signon. iseries. Version 5 Release 3 ERserer iseries Single signon Version 5 Release 3 ERserer iseries Single signon Version 5 Release 3 Note Before using this information and the product it supports, be sure to read the information in Notices,

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...

More information

AS/400e. Digital Certificate Management

AS/400e. Digital Certificate Management AS/400e Digital Certificate Management AS/400e Digital Certificate Management ii AS/400e: Digital Certificate Management Contents Part 1. Digital Certificate Management............ 1 Chapter 1. Print

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases Background Inspiration: postgresapp.com demo.beatstream.fi (modern desktop browsers without

More information

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful. Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one

More information

Cassandra vs MySQL. SQL vs NoSQL database comparison

Cassandra vs MySQL. SQL vs NoSQL database comparison Cassandra vs MySQL SQL vs NoSQL database comparison 19 th of November, 2015 Maxim Zakharenkov Maxim Zakharenkov Riga, Latvia Java Developer/Architect Company Goals Explore some differences of SQL and NoSQL

More information

CS435 Introduction to Big Data

CS435 Introduction to Big Data CS435 Introduction to Big Data Final Exam Date: May 11 6:20PM 8:20PM Location: CSB 130 Closed Book, NO cheat sheets Topics covered *Note: Final exam is NOT comprehensive. 1. NoSQL Impedance mismatch Scale-up

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

How graph databases started the multi-model revolution

How graph databases started the multi-model revolution How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015 Welcome to Big Data 90% of the data in the world today has been created in the

More information

IBM Unica Marketing Operations and Campaign Version 8 Release 6 May 25, 2012. Integration Guide

IBM Unica Marketing Operations and Campaign Version 8 Release 6 May 25, 2012. Integration Guide IBM Unica Marketing Operations and Campaign Version 8 Release 6 May 25, 2012 Integration Guide Note Before using this information and the product it supports, read the information in Notices on page 51.

More information

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions Big Data Solutions Portal Development with MongoDB and Liferay Solutions Introduction Companies have made huge investments in Business Intelligence and analytics to better understand their clients and

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure

More information

Oracle Database 12c Plug In. Switch On. Get SMART.

Oracle Database 12c Plug In. Switch On. Get SMART. Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.

More information

Big Data & Data Science Course Example using MapReduce. Presented by Juan C. Vega

Big Data & Data Science Course Example using MapReduce. Presented by Juan C. Vega Big Data & Data Science Course Example using MapReduce Presented by What is Mongo? Why Mongo? Mongo Model Mongo Deployment Mongo Query Language Built-In MapReduce Demo Q & A Agenda Founders Max Schireson

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 4. Basic Principles Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ NoSQL Overview Main objective:

More information

ERserver. iseries. Backup, Recovery and Media Services (BRMS)

ERserver. iseries. Backup, Recovery and Media Services (BRMS) ERserer iseries Backup, Recoery and Media Serices (BRMS) ERserer iseries Backup, Recoery and Media Serices (BRMS) Copyright International Business Machines Corporation 1998, 2002. All rights resered.

More information

Rational Build Forge. AutoExpurge System. Version7.1.2andlater

Rational Build Forge. AutoExpurge System. Version7.1.2andlater Rational Build Forge AutoExpurge System Version7.1.2andlater Note Before using this information and the product it supports, read the information in Notices, on page 11. This edition applies to ersion

More information

Enterprise Operational SQL on Hadoop Trafodion Overview

Enterprise Operational SQL on Hadoop Trafodion Overview Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development

More information

INFO5011 Advanced Topics in IT: Cloud Computing

INFO5011 Advanced Topics in IT: Cloud Computing INFO5011 Advanced Topics in IT: Cloud Computing Week 5: Distributed Data Management: From 2PC to Dynamo Dr. Uwe Röhm School of Information Technologies Outline Distributed Data Processing Data Partitioning

More information

Domain driven design, NoSQL and multi-model databases

Domain driven design, NoSQL and multi-model databases Domain driven design, NoSQL and multi-model databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra

More information

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING BRAC UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING 12-12-2012 Investigating Cloud Data Storage Sumaiya Binte Mostafa (ID 08301001) Firoza Tabassum (ID 09101028) BRAC University

More information

Tungsten Replicator, more open than ever!

Tungsten Replicator, more open than ever! Tungsten Replicator, more open than ever! MC Brown, Senior Product Line Manager September, 2015 2014 VMware Inc. All rights reserved. We Face An Age Old Problem BRS/Search 2 It s Gotten Worse 3 Much Worse

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

IBM SmartCloud Monitoring - Application Insight. User Interface Help SC27-5618-01

IBM SmartCloud Monitoring - Application Insight. User Interface Help SC27-5618-01 IBM SmartCloud Monitoring - Application Insight User Interface Help SC27-5618-01 IBM SmartCloud Monitoring - Application Insight User Interface Help SC27-5618-01 ii IBM SmartCloud Monitoring - Application

More information

multiparadigm programming Multiparadigm Data Storage for Enterprise Applications

multiparadigm programming Multiparadigm Data Storage for Enterprise Applications focus multiparadigm programming Multiparadigm Data Storage for Enterprise Applications Debasish Ghosh, Anshin Software Storing data the same way it s used in an application simplifies the programming model,

More information

Lecture 21: NoSQL III. Monday, April 20, 2015

Lecture 21: NoSQL III. Monday, April 20, 2015 Lecture 21: NoSQL III Monday, April 20, 2015 Announcements Issues/questions with Quiz 6 or HW4? This week: MongoDB Next class: Quiz 7 Make-up quiz: 04/29 at 6pm (or after class) Reminders: HW 4 and Project

More information

Ryan Horn, Lead Software Engineer at Twilio. November 12, 2014 Las Vegas. BDT312 Using the Cloud to Scale from a Database to a Data Platform

Ryan Horn, Lead Software Engineer at Twilio. November 12, 2014 Las Vegas. BDT312 Using the Cloud to Scale from a Database to a Data Platform BDT312 Using the Cloud to Scale from a Database to a Data Platform Ryan Horn, Lead Software Engineer at Twilio November 12, 2014 Las Vegas 2014 Amazon.com, Inc. and its affiliates. All rights reserved.

More information

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive

More information

Speed, scale, query: can NoSQL give us all three? Arun Gupta, @arungupta Matthew Revell, @matthewrevell Couchbase

Speed, scale, query: can NoSQL give us all three? Arun Gupta, @arungupta Matthew Revell, @matthewrevell Couchbase Speed, scale, query: can NoSQL give us all three? Arun Gupta, @arungupta Matthew Revell, @matthewrevell Couchbase The project management triangle 2015 Couchbase Inc. Photo by https://www.flickr.com/photos/centralasian/

More information

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop

More information

Best Practices for Migrating from RDBMS to Amazon DynamoDB

Best Practices for Migrating from RDBMS to Amazon DynamoDB Best Practices for Migrating from RDBMS to Amazon DynamoDB Leverage the Power of NoSQL for Suitable Workloads Nathaniel Slater March 2015 Contents Contents Abstract Introduction Overview of Amazon DynamoDB

More information

www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach

www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging

More information