NoSQL Data Base Basics

Similar documents
Application Development. A Paradigm Shift

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Hadoop IST 734 SS CHUNG

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Lecture Data Warehouse Systems

How To Scale Out Of A Nosql Database

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Cloud Scale Distributed Data Storage. Jürmo Mehine

Preparing Your Data For Cloud

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Case study: CASSANDRA

The Quest for Extreme Scalability

Big Data With Hadoop

Open source large scale distributed data management with Google s MapReduce and Bigtable

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

NoSQL. Thomas Neumann 1 / 22

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Slave. Master. Research Scholar, Bharathiar University

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Introduction to Apache Cassandra

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Large scale processing using Hadoop. Ján Vaňo

Apache Hadoop FileSystem and its Usage in Facebook

Making Sense of NoSQL Dan McCreary Ann Kelly

Hadoop Architecture. Part 1

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

So What s the Big Deal?

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

INTRODUCTION TO CASSANDRA

Introduction to NOSQL

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Apache HBase. Crazy dances on the elephant back

CSE-E5430 Scalable Cloud Computing Lecture 2

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

How To Handle Big Data With A Data Scientist

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop. Sunday, November 25, 12

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Challenges for Data Driven Systems

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Hypertable Architecture Overview

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Big Data and Data Science: Behind the Buzz Words

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

An Approach to Implement Map Reduce with NoSQL Databases

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce

Bigtable is a proven design Underpins 100+ Google services:

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

Design and Evolution of the Apache Hadoop File System(HDFS)

A survey of big data architectures for handling massive data

Databases 2 (VU) ( )

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

MapReduce with Apache Hadoop Analysing Big Data

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Yahoo! Cloud Serving Benchmark

NoSQL and Hadoop Technologies On Oracle Cloud

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

How To Store Data In Nosql

Applications for Big Data Analytics

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

Integrating Big Data into the Computing Curricula

A Survey of Distributed Database Management Systems

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Big Data and Apache Hadoop s MapReduce

Cloud & Big Data a perfect marriage? Patrick Valduriez

Apache Hadoop: Past, Present, and Future

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

NoSQL Systems for Big Data Management

Hadoop Distributed File System (HDFS) Overview

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Big Fast Data Hadoop acceleration with Flash. June 2013

Benchmarking Failover Characteristics of Large-Scale Data Storage Applications: Cassandra and Voldemort

Big Data Analytics - Accelerated. stream-horizon.com

Sentimental Analysis using Hadoop Phase 2: Week 2

NoSQL for SQL Professionals William McKnight

Hadoop & its Usage at Facebook

Distributed File Systems

Can the Elephants Handle the NoSQL Onslaught?

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Hadoop: Embracing future hardware

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

A Brief Outline on Bigdata Hadoop

Understanding Neo4j Scalability

Introduction to Hadoop

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

LARGE-SCALE DATA STORAGE APPLICATIONS

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

Transcription:

NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu

HDFS Hadoop: standard storage mechanism for HADOOP Hadoop Distributed File System (HDFS) 2

HDFS Hadoop Distributed File System (HDFS) Fault tolerance Assuming that failure will happen allows HDFS to run on commodity hardware. Streaming data access HDFS is written with batch processing in mind, and emphasizes high throughput rather than random access to data. Extreme scalability HDFS will scale to petabytes (current versions) Portability HDFS is portable across platforms. 3

Hadoop: standard storage mechanism Hadoop Distributed File System (HDFS) Most HDFS applications need a write-once-read-many access model for files By assuming a file will remain unchanged after it is written, HDFS simplifies replication and speeds up data throughput. Moving Computation is Cheaper than Moving Data : Locality of computation Due to data volume, it is often much faster to move the program near to the data à HDFS has features to facilitate this. 4

Hadoop: standard storage mechanism Starting point http://hadoop.apache.org/docs/r1.0.4/hdfs_user_guide.html / 5

Hadoop: standard storage mechanism HDFS Interface Interface similar to that of regular filesystems. can only store and retrieve data, not index it. Simple random access to data is not possible. Map Reduce Solution: higher-level layers à HBase have been created to provide finer-grained functionality to Hadoop deployments Hbase HDFS 6

Hbase, the Hadoop HBase Creates indexes à offers fast and random access to its content Modeled after Google's BigTable DB is a column-oriented database designed to store massive amounts of data. Uses HDFS as a storage system Map Reduce Hbase HDFS It belongs to the NoSQL universe similar to Cassandra, Hypertable, 7

Hbase versus HDFS (a brief comparison) HDFS: Optimized For: Large Files Sequential Access (High Throughput) Append Only Use for fact tables that are mostly append only and require sequential full table scans. HBase: Optimized For: Small Records (but many records) Random Access Atomic Record Updates Use for dimension lookup tables which are updated frequently and require random low-latency lookups. 8

HDFS: an example A given file is broken down into blocks (default=64mb), 1 2 3 4 5 9

HDFS: an example then blocks are replicated across cluster (default=3). 1 3 5 2 3 4 1 2 3 4 5 1 3 4 2 4 5 1 2 5 10

: Resource Management Scheduling A given job is broken down into tasks, then tasks are scheduled to be as close to data as possible. 2 3 4 1 3 5 Optimized for Bach processing Failure recovery 2 4 5 1 3 4 1 2 5 11

Common characteristics of NoSQL Shared nothing systems CPU CPU CPU CPU CPU CPU BUS RAM RAM RAM LAN RAM Disk RAM Disk SAN Shared RAM Shared Disk Shared Nothing LAN Shared nothing systems have proven to be most cost-effective and flexible Source: h*p://www.slideshare.net/couchbase/webinar- making- sense- of- nosql- applying- nonrela?onal- databases- to- business- needs?ref=h*p:// www.slideshare.net/slideshow/embed_code/18124982?rel=0 12

Common characteristics of NoSQL Distributed models requests Node Master-Slave Master Node Used only if primary master fails Standby Master Node requests Peer-to-Peer Node Node Node Node Peer to peer models do not have standby nodes that are idle Source: h*p://www.slideshare.net/couchbase/webinar- making- sense- of- nosql- applying- nonrela?onal- databases- to- business- needs?ref=h*p:// www.slideshare.net/slideshow/embed_code/18124982?rel=0 13 13

Common characteristics of NoSQL Move Queries to the Nodes Query Queries work best if the run on the local node that has the data Source: h*p://www.slideshare.net/couchbase/webinar- making- sense- of- nosql- applying- nonrela?onal- databases- to- business- needs?ref=h*p://www.slideshare.net/slideshow/embed_code/18124982?rel=0 14

Alternatives to Hbase/HDFS? An Apache project, Cassandra originated at Facebook and is now in production in many large-scale websites (also at BSC). Hypertable was created at Zvents and spun out as an open source project. Are both scalable column-store databases that follow the pattern of BigTable, similar to HBase. Map Reduce Cassandra Map Reduce Hypertable And 15

And dozens http://nosql-database.org List Of NoSQL s [currently 150] 16

NoS QL The concept is something that has gained momentum in recent years Today is a mature and efficient alternative that can help us solve the problems of scalability and performance (e.g. online applications with thousands of concurrent users and million hits a day) 17

NoSQL on Google Trends Source: http://www.google.com/trends/explore#q=nosql 18 18

Different Types of NoSQL Systems Distributed Key-Value Systems Amazon s S3 Key-Value Store (Dynamo) Voldemort (LinkedIn) Cassandra (Facebook) Column-based Systems BigTable (Google) HBase Cassandra Document-based systems CouchDB MongoDB Graph DB Neo4j 19 19

Common Themes Horizontal scalability Clever use of hashing and caching Parallel execution of queries move queries to the data, not the other way around Share resources when possible Example memcached protocol Use simple interfaces when possible put, get, delete Source: Kelly-McCreary & Associates, LLC http://www.slideshare.net/couchbase/webinar-making-sense-of-nosql-applyingnonrelational-databases-to-business-needs?ref=http://www.slideshare.net/ slideshow/embed_code/18124982?rel=0 20 20

21