NOSQL DATABASE SYSTEMS

Size: px
Start display at page:

Download "NOSQL DATABASE SYSTEMS"

Transcription

1 NOSQL DATABASE SYSTEMS Big Data Technologies: NoSQL DBMS - SoSe

2 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe

3 NoSQL Database Systems NoSQL Considered Categories of NoSQL Database Systems Key-Value Database Systems Document Database Systems Column Family Database Systems Big Data Technologies: NoSQL DBMS - SoSe

4 Key-Value Database Systems NoSQL Data Model Key-value pairs Unique keys Values arbitrary type (serialized byte arrays) or strings, lists, sets, ordered sets (of strings) Schema-free key key key key key value value value value value Storage Layout Hash-Maps, B-Trees, Indexes Primary indexes (Hash, B-tree) on key Secondary indexes on values? Big Data Technologies: NoSQL DBMS - SoSe

5 Key-Value Database Systems (Cont.) NoSQL Query Models Simple API set (key, value) value = get (key) delete (key) Operations on values? More complex operations Language Bindings MapReduce later in this chapter key key key key key value value value value value Systems Oracle Berkeley DB (mid-90s) Caches (EHCache, Memcache) Amazon Dynamo/S3, Redis, Riak, Voldemort, Big Data Technologies: NoSQL DBMS - SoSe

6 Document Store Database Systems NoSQL Data Model Key-value pairs with documents as value Document format: JSON or BSON (Binary JSON) Loosely structured name(key)-value pairs Hierarchical Additionally, MongoDB uses collections arbitrary documents could be grouped together documents in a collection should be similar to facilitate effective indexing { } "id": 1, "name": football boot", "price": 199, "stock": { "warehouse": 120, "retail": 10 } Storage Layout B-Trees to store the documents MongoDB: Documents in a single collection are stored together Big Data Technologies: NoSQL DBMS - SoSe

7 Document Store Database Systems (Cont.) NoSQL Indexes Primary indexes on documentid (key) Secondary indexes on JSON-names Default or user defined Composite indexes may be supported Query Models Simple API: set/get/delete Further query support differ widely Powerful ad-hoc queries with integrated query language (MongoDB) No ad-hoc queries, predefined views with indexes only (CouchDB & Couchbase) Language Bindings MapReduce later in this chapter Systems MongoDB, CouchDB, Couchbase, { } "id": 1, "name": football boot", "price": 199, "stock": { "warehouse": 120, "retail": 10 } Big Data Technologies: NoSQL DBMS - SoSe

8 Column Family Database Systems NoSQL Data Model Loosely structured by columns and column families ( set of nested maps ) Column Family set of columns grouped together into a bundle Column families have to be predefined Column Not predefined; any type or data (can be nested) Table Column Family Column Family Row Key1 column column column column column Row Key2 column column column column Big Data Technologies: NoSQL DBMS - SoSe

9 Column Family Database Systems (Cont.) NoSQL Data Model (Cont.) Example: Row Key: title Column Family text Column Family revision "NoSQL" "Redis" text:content: "A NoSQL database provides a mechanism " text:content: "Redis is an open-source, networked " revision:author: "Mendel" revision:comment": "changed " revision:author: "Torben" revision:comment: "initial " Column family database systems support multiple versions of each cell by timestamps: Row Key: title Time Stamp Column Family text Column Family revision "NoSQL" t5 text:content: " " revision:author: "Mendel" revision:comment: "changed " t4 revision:author: "Torben" revision:comment: "there " "Redis" t3 text:content: " " revision:author: "Torben" revision:comment: "initial " Big Data Technologies: NoSQL DBMS - SoSe

10 Column Family Database Systems (Cont.) NoSQL Row Key: title Time Stamp Column Family text Column Family revision "NoSQL" t5 text:content: " " revision:author: "Mendel" revision:comment: "changed view " Storage Layout Data is stored by column family t4 Row Key: title Time Stamp Column Family text column: content NoSQL t5 A NoSQL database provides a mechanism Redis t3 Redis is an open-source, networked revision:author: "Torben" revision:comment: there should be " "Redis" t3 text:content: " " revision:author: "Torben" revision:comment: "initial " Row Key: title Time Stamp ColumnFamily revision column: author column: comment NoSQL t5 Mendel changed view NoSQL t4 Torben there should be Redis t3 Torben initial Big Data Technologies: NoSQL DBMS - SoSe

11 Column Family Database Systems (Cont.) NoSQL Classical example: Web table Row Key Time Stamp Column Family contents Column Family anchor "com.cnn.www" t9 anchor:anchor:"cnnsi.com anchor:anchortext:"cnn" t8 t6 t5 "<html> " "<html> " anchor:anchor:"my.look.ch anchor:anchortext: "CNN.com" Big Data Technologies: NoSQL DBMS - SoSe

12 Column Family Database Systems (Cont.) NoSQL Query Models Simple API set (table, row, column, value) value = get (table, row, column) delete (table, row, column) timestamp optional Language Bindings More powerful query engines integrated (Cassandra Query Language) or as additional software products (e.g. Google App Engine / Google Datastore for BigTable, Hive for Data Warehousing on HBase) MapReduce later in this chapter Indexes Primary indexes (B-Trees sorted ordered) Default or user defined secondary indexes Systems Google BigTable, HBase, Cassandra, Amazon SimpleDB, Big Data Technologies: NoSQL DBMS - SoSe

13 NoSQL (Not only SQL): Definition NoSQL NoSQL Definition: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above. Source: S. Edlich, nosql-database.org Big Data Technologies: NoSQL DBMS - SoSe

14 NoSQL (Not only SQL): Definition NoSQL Next Generation Databases mostly addressing some of the points: non-relational schema-free simple API more complex APIs currently under development distributed and horizontally scalable easy replication support eventually consistent / BASE (not ACID) BASE as well as ACID are supported nowadays open-source??? Big Data Technologies: NoSQL DBMS - SoSe

15 NoSQL: The Essence NoSQL Data Model non-relational schema-free Scalability distributed and horizontally scalable easy replication support Big Data Technologies: NoSQL DBMS - SoSe

16 NoSQL Database Systems: Use Cases NoSQL Key-Value Database Systems Suitable Use Cases Storing Session Information User Profiles, Preferences Shopping Cart Data Examples Amazon (shopping carts) Temetra (meter data) Document Store Database Systems Event Logging Content Management Systems Blogging Platforms Web Analytics or Real-Time Analytics Forbes (CMS) MTV (CMS) Column Family Database Systems Event Logging Content Management Systems Blogging Platforms Google (web pages) Facebook (messaging) Twitter (places of interest) Big Data Technologies: NoSQL DBMS - SoSe

17 NoSQL Family Tree NoSQL Source: cloudant.com Big Data Technologies: NoSQL DBMS - SoSe

18 Solution Architectures (Examples) NoSQL Google Stack Hadoop Stack Source: Saake/Schallehn:2011 Big Data Technologies: NoSQL DBMS - SoSe

19 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe

20 Data Modeling id ti Object-relational impedance mismatch Example: blog, blogpost, comment, author Object-oriented modeling Mapping to relational database Big Data Technologies: NoSQL DBMS - SoSe

21 Data Modeling Decisions id ti Primary Decision: Embedding vs. Referencing However, to consider There are no join operations within NoSQL database systems! There are no distributed transactions within NoSQL! Advantages and Disadvantages of Embedding Advantages and Disadvantages of Referencing Martin Fowler: Aggregate-Oriented Modeling Big Data Technologies: NoSQL DBMS - SoSe

22 Data Modeling: Document Store DBS How to realize references? id ti Direction of references? Embedding: What about denormalization and redundancy? Big Data Technologies: NoSQL DBMS - SoSe

23 Data Modeling: Column Family DBS id ti How to implement embedded objects in column family database systems? Variant 1: Using run-time named column qualifiers Variant 2: Using timestamps (or other id s) New (Cassandra CQL3): Using collection types (map, set, list) What about column families? Big Data Technologies: NoSQL DBMS - SoSe

24 Data Modeling id ti What about data modeling in key-value database systems? Data Modeling: Conclusion More degrees of freedom Embedding vs. referencing Denormalization and redundancy Big Data Technologies: NoSQL DBMS - SoSe

25 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe

26 Application Development for NoSQL Simple command line APIs REST-API (Some) more powerful query languages / query engines Language Bindings Java, Ruby, C#, Python, Erlang, PHP, Perl, REST Thrift Big Data Technologies: NoSQL DBMS - SoSe

27 Application Development for NoSQL Example: title, content from blogpost with id = 042 // HBase get 'blogposts', '042', { COLUMN => ['blogpost_data:title', 'blogpost_data:content'] } // Cassandra SELECT title, content FROM blogposts WHERE id = '042'; // MongoDB db.blogposts.find( { _id : '042' }, { title: 1, content: 1 } ) // Couchbase function (doc) { if (doc._id == '042') { emit(doc._id, [doc.title, doc.content]); } } Big Data Technologies: NoSQL DBMS - SoSe

28 Application Development for NoSQL Challenge Big data Data distributed over several hundred notes (remember: scale out) Data-to-Code or Code-to-Data? Executing jobs in parallel over several nodes There is a need for appropriated algorithms and frameworks! Big Data Technologies: NoSQL DBMS - SoSe

29 MapReduce: Basic Idea Old idea from functional programming (LISP, ML, Erlang, Scala etc.) Divide tasks into small discrete tasks and run them in parallel Never change original data (pipe concept) Different operations on the same data do not influence No concurrency conflicts No deadlocks No race conditions MapReduce Basic idea and framework introduced by Google 2004: J. Dean and S. Gehmawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI' Big Data Technologies: NoSQL DBMS - SoSe

30 MapReduce: Basic Idea & WordCount Example Doc1 Doc2 Doc3 Doc4 Developers should implement two primary methods Map: (key1, val1) [(key2, val2)] Reduce: (key2, [val2]) [(key3, val3)] Documents Sport, Handball, Soccer Soccer, FIFA Documents Sport, Gym, Money Soccer, FIFA, Money MAP MAP Key Sport 1 Handball 1 Soccer 1 Value Soccer 1 Key Value FIFA 1 Sport 1 Gym 1 Money 1 Soccer 1 FIFA 1 Money 1 REDUCE REDUCE Key Sport 2 Handball 1 Soccer 3 Key FIFA 2 Gym 1 Money 2 Value Value Big Data Technologies: NoSQL DBMS - SoSe

31 MapReduce: Architecture and Phases Source: Big Data Technologies: NoSQL DBMS - SoSe

32 Hadoop Example Map & Reduce Functions (Example) public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { public void map(longwritable key, Text value, OutputCollector<Text, IntWritable> output, ) { String line = value.tostring(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasmoretokens()) { word.set(tokenizer.nexttoken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, ) { int sum = 0; while (values.hasnext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } Source: Big Data Technologies: NoSQL DBMS - SoSe

33 MapReduce: Optional Combine Phase Decrease the shuffling cost Reduce the result size of map functions Perform reduce-like function in each machine Documents Sport, Handball, Soccer Soccer, FIFA Documents Sport, Gym, Money Soccer, FIFA, Money MAP MAP Key Sport 1 Handball 1 Soccer 1 Value Soccer 1 Key FIFA Value 1 Sport 1 Gym 1 Money 1 Soccer 1 FIFA 1 Money 1 COMBINE COMBINE Key Value Sport 1 Handball 1 Soccer 2 FIFA 1 Key Value Sport 1 Gym 1 Money 2 Soccer 1 FIFA 1 REDUCE REDUCE Big Data Technologies: NoSQL DBMS - SoSe

34 MapReduce Frameworks MapReduce frameworks take care of Scaling Fault tolerance (Load balancing) MapReduce Frameworks Google (however, Google now promotes Dataflow) Apache Hadoop standalone or integrated in NoSQL (and SQL) DBMS Also commercial distributors: Cloudera, MapR, HortonWorks, Proprietary MapReduce framework integrated in NoSQL DBMS Big Data Technologies: NoSQL DBMS - SoSe

35 Map Reduce and Query Languages MapReduce paradigm is too low-level Only two declarative primitives (map + reduce) Custom code for simple operations like projection and filtering Code is difficult to reuse and maintain Combination of high-level declarative querying and low-level programming with MapReduce Dataflow Programming Languages HiveQL Pig (Jaql) Big Data Technologies: NoSQL DBMS - SoSe

36 Hadoop Stack Source: Saake/Schallehn:2011 Big Data Technologies: NoSQL DBMS - SoSe

37 HiveQL Hive: data warehouse infrastructure built on top of Hadoop, providing: Data Summarization Ad hoc querying Simple query language: HiveQL (based on SQL) Extendable via custom mappers and reducers Developed by Facebook, now subproject of Hadoop Big Data Technologies: NoSQL DBMS - SoSe

38 HiveQL: Example Source: Saake/Schallehn: Data Management in the Cloud, 2011 Big Data Technologies: NoSQL DBMS - SoSe

39 Pig A platform for analyzing large data sets Pig consists of two parts: PigLatin: A Data Processing Language Pig Infrastructure: An Evaluator for PigLatin programs Pig compiles Pig Latin into physical plans Plans are to be executed over Hadoop Interface between the declarative style of SQL and low-level, procedural style of MapReduce Big Data Technologies: NoSQL DBMS - SoSe

40 Pig: Example Source: Saake/Schallehn: Data Management in the Cloud, 2011 Big Data Technologies: NoSQL DBMS - SoSe

41 MapReduce in Practice VLDB 2012: Chen, Alspaugh, Katz: Interactive Analytical Processing in Big Data Systems: A CrossIndustry Study of MapReduce Workloads: 7 Hadoop deployments (Cloudera = Hadoop with commercial services) Big Data Technologies: NoSQL DBMS - SoSe

42 MapReduce in Practice (Cont.) Source: Chen, Alspaugh, Katz. Interactive Analytical Processing in Big Data Systems: A CrossIndustry Study of MapReduce Workloads; VLDB2012 Big Data Technologies: NoSQL DBMS - SoSe

43 MapReduce Trends Hadoop 2.0 with YARN (Abstract from MapReduce) Source: HortonWorks Apache In-Memory Hadoop Performance! Written in Scala Big Data Technologies: NoSQL DBMS - SoSe

44 Application Development for NoSQL MapReduce: Concept and Frameworks State of the art application development With relational database systems: Object-Relational Mapping (ORM) frameworks and standards (Java Persistence API etc.) Frameworks for Object-NoSQL mapping?! Big Data Technologies: NoSQL DBMS - SoSe

45 Object-NoSQL Mapper: Architecture Applikation id tit SELECT b.titel, b.text FROM blogpost b WHERE b.id = 042 Objekt-NoSQL Mapper id tit db.blogposts.find ( { _id : 042 }, { titel: 1, text: 1 } ) SELECT titel, text FROM blogposts WHERE id = 042 ; get blogposts, 042, { COLUMN => [ blogpost_daten:titel, blogpost_daten:text ] } { } "id" : "042", "titel" :... id titel 042 id titel 042 MongoDB Cassandra HBase Big Data Technologies: NoSQL DBMS - SoSe

46 Object-NoSQL Mapper: Market Overview Mapper for different Programming Languages Java,.NET, Python, Ruby Volatile Market Main Focus: Object-NoSQL Mapper for Java Standardization: Java Persistence API (JPA) with Java Persistence Query Language (JPQL) Categorization Multi Data Store Mapper Single Data Store Mapper Big Data Technologies: NoSQL DBMS - SoSe

47 Java Multi Data Store Mapper Support for Document Store, Column Family, and Graph Database Systems in Java Multi Data Store Mapper Document Store Couchbase Data Nucleus Eclipse Link Hibernate OGM Kundera CouchDB PlayORM MongoDB Column-Family DBMS Cassandra HBase Graph DBMS Neo4J Spring Data Big Data Technologies: NoSQL DBMS - SoSe

48 Java Multi Data Store Mapper Support for Key-Value Database Systems in Java Multi Data Store Mapper Key-Value DBMS AmazonDynamoDB Apache Solr Ehcache Data Nucleus Eclipse Link Hibernate Kundera PlayORM Elasticsearch GemFire Infinispan Oracle NoSQL Redis Spring Data Big Data Technologies: NoSQL DBMS - SoSe

49 Java Object-NoSQL Mapper: Supported Functionality Single Data Store Mapper *Limited functionality (depending from the underlying NoSQL data store) Source: Störl/Hauf/Klettke/Scherzinger: Schemaless NoSQL Data Stores Object-NoSQL Mappers to the Rescue? BTW 2015, Hamburg, March 2015 Big Data Technologies: NoSQL DBMS - SoSe

50 Object-NoSQL Mapper: Query Language Support Challenge: Different Query Language Interfaces Examples: Most systems do not support any JOINS Many systems do not offer aggregate functions, LIKE operator, or NOT operator, Approaches 1. Offer only the particular subset of features that is implemented by all supported NoSQL data stores, i.e. the intersection of features 2. Distinguish by data store and offer only the set of features implemented by a particular NoSQL data store 3. Offer the same set of features for all supported NoSQL data stores, possibly complementing missing features by implementing them inside the Object-NoSQL Mapper Big Data Technologies: NoSQL DBMS - SoSe

51 Object-NoSQL Mapper: Query Language Support Approach 2: NoSQL data store specific support of JPQL operators Drawback: restricted portability Systems: Hibernate OGM, Kundera, EclipseLink Example: JPQL operators (selection) in Kundera Big Data Technologies: NoSQL DBMS - SoSe

52 Object-NoSQL Mapper: Query Language Support Approach 2: NoSQL data store specific support of JPQL operators Extension: Use third-party libraries to offer more functionality for some but not for all supported NoSQL data stores Systems: Hibernate OGM (Hibernate Search), Kundera each with Apache Lucene Application Object-NoSQL Mapper NoSQL-DBMS Search Engine Index Big Data Technologies: NoSQL DBMS - SoSe

53 Object-NoSQL Mapper: Query Language Support Approach 3: Offer the same set of features for all supported NoSQL data stores Complementing missing features by implementing them inside the Object-NoSQL Mapper Benefit: Portability Drawback: Performance Application Systems: DataNucleus, Hibernate OGM (announced) Object-NoSQL Mapper NoSQL-DBMS Big Data Technologies: NoSQL DBMS - SoSe

54 Object-NoSQL Mapper: Query Language Support Outlook: Combination of Approach 2 and 3 Application Object-NoSQL Mapper NoSQL-DBMS Search Engine Index Systems: Hibernate OGM (announced) Big Data Technologies: NoSQL DBMS - SoSe

55 Conclusion: Java Object-NoSQL Mapper Vendor Independency / Portability Standardized Query Language (JPQL) Support for different NoSQL data stores Supported query operators often depend on the capabilities of the underlying NoSQL data stores Performance (as of end of 2014) In reading data, there is only a small gap between native access and the Object-NoSQL Mappers for the majority of the evaluated products Yet in writing, object mappers introduce a significant overhead Further reading: U. Störl, Th. Hauf, M. Klettke and S. Scherzinger: Schemaless NoSQL Data Stores Object-NoSQL Mappers to the Rescue? BTW 2015, Hamburg, March 2015 Big Data Technologies: NoSQL DBMS - SoSe

56 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe

57 Partitioning Vertical vs. horizontal partitioning? Horizontal Partitioning Range-based Partitioning Based on an attribute value Disadvantage? Key-based Partitioning Based on key value Implemented with hash function usually why? Disadvantage? Big Data Technologies: NoSQL DBMS - SoSe

58 Partitioning: Consistent Hashing Consistent Hashing Karger et al 1997: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. Keys and nodes (identified by id or server name) are mapped onto a circle Keys are assigned to the node that is next to them in a clockwise direction Source: Big Data Technologies: NoSQL DBMS - SoSe

59 Partitioning: Consistent Hashing (Cont.) Consistent Hashing: Example Removal of server C Addition of server D Source: Big Data Technologies: NoSQL DBMS - SoSe

60 Partitioning: Consistent Hashing (Cont.) Consistent Hashing with virtual nodes Source: Big Data Technologies: NoSQL DBMS - SoSe

61 Partitioning in NoSQL Database Systems Key-based Partitioning Redis (client-side hash function), Riak, CouchBase, Range-based Partitioning BigTable, HBase, MongoDB, Big Data Technologies: NoSQL DBMS - SoSe

62 Replication Motivation / Benefit Performance enhancement Availability enhancement (fault tolerance) Tradeoff between benefits of replication and work required to keep replicas consistent Big Data Technologies: NoSQL DBMS - SoSe

63 Types of Replication Two basic (and orthogonal) parameters: Where is the update performed? everywhere vs. selected copy When are the updates propagated? synchronous vs. asynchronous Types of Replication Master-Slave Replication Where: Selected copy for update Multi-Master Replication Where: Update everywhere Big Data Technologies: NoSQL DBMS - SoSe

64 Master-Slave Replication Where is the update performed? One selected copy (primary node / primary copy) All other replicas are read only Different data items can have different primary nodes When are the updates propagated? synchronous vs. asynchronous (eager vs. lazy) Big Data Technologies: NoSQL DBMS - SoSe

65 Update Propagation Synchronous update propagation (eager replication) Expensive, blocking Asynchronous update propagation (lazy replication) Consistency Weak consistency later in this chapter Ensure consistency without synchronous update propagation? Read only from master node (replicas for failover purposes only) e.g. CouchBase Quorum Consensus Protocols Big Data Technologies: NoSQL DBMS - SoSe

66 Quorum Consensus: Server-Side Terminology N: the number of nodes that store replicas of the data W: the number of replicas that need to acknowledge the receipt of the update before the update completes R: the number of replicas that are contacted when a data item is accessed through a read operation W+R > N Quorum Assembly Strong Consistency W+R <= N Weak Consistency R W R W Big Data Technologies: NoSQL DBMS - SoSe

67 Quorum Consensus: Server-Side W+R > N strong consistency through quorum assembly W=N and R = 1 read optimized strong consistency (ROWA) W=1 and R = N write optimized strong consistency R = W = N/2 + 1 Majority Consensus A common choice in NoSQL database systems is N=3, R=2, W=2 Source: Big Data Technologies: NoSQL DBMS - SoSe

68 Quorum Consensus: Variations Unweighted vs. weighted votes Weighted votes Each node is assigned a weight, e.g. for better replicas Instead of sum of nodes N use sum of weights of nodes Static vs. dynamic quorum Dynamic Quorum Quorums can be chosen separately for each item, e.g. in case of unavailable nodes Big Data Technologies: NoSQL DBMS - SoSe

69 Quorum Consensus: Client-Side Example Cassandra (Version 1.2) Write Consistency Level Zero: A write must be written to at least one node. (If all replica nodes for the given row key are down hinted handoff) One/Two/Three: A write must be written to at least one/two/three replica node(s). Quorum*: A write must be written on a quorum of replica nodes (N/2 +1). All: A write must be written on all replica nodes. Read Consistency Level One/Two/Three: Returns a response from the closest / two / three of the closest replica (may be inconsistent). Quorum*: Returns the record with the most recent timestamp once a quorum of replicas (N/2 + 1) has responded. All: Returns the record with the most recent timestamp once all replicas have responded. The read operation will fail if a replica does not respond. *LOCAL_QUORUM / EACH_QUORUM: quorum of replica nodes in the same data center / in all data centers Big Data Technologies: NoSQL DBMS - SoSe

70 Asynchronous Write: Write-Related Strategies How to propagate a write operation to an unavailable node? Hinted Handoff Algorithms Do a hinted write to an alive node (e.g., nearest live replica) When the failed node returns to the cluster, the updates received by the neighboring nodes are handed off to it System can continue to handle requests as if the node where still there Implemented in Cassandra, Riak, etc. But: How does a node learn when a node is available? E.g., gossip protocols each node periodically sends its current view of the ring state to a randomly-selected peer (or other protocols to choose the peers) Big Data Technologies: NoSQL DBMS - SoSe

71 Read-Related Strategies Asynchronous Write: Read-Related Strategies A write has not propagated to all replicas Repair outdated replicas after read Read Repair Repair outdated replicas that have not been read Anti-Entropy Read Repair Algorithm A system may detect that several nodes are out of sync with older versions of the data requested in a read operation Mark the nodes with the stale data with a Read Repair flag Synchronizing the stale nodes with newest version of the data requested Implemented in Cassandra, Riak, etc. Big Data Technologies: NoSQL DBMS - SoSe

72 Multi-Master Replication Where is the update performed? Update everywhere New challenge: concurrent writes When are the updates propagated? Synchronous No concurrent writes Expensive, blocking, Deadlocks Asynchronous Write Quorum: W > N/2 to avoid concurrent writes Big Data Technologies: NoSQL DBMS - SoSe

73 Conflict Detection and Resolution How to detect older versions of data? How to detect concurrent writes? Timestamps! Timestamps in a distributed environment?! global unique timestamp:= <node identifier, unique local timestamp> Define within each node N i a logical clock (LC i )*, which generates the unique local timestamp If N i received a request from a transaction T with timestamp < N j, LC j > and LC i < LC j set LC i = LC j + 1 Common approach in NoSQL database systems: Vector Clocks *Lamport timestamp Big Data Technologies: NoSQL DBMS - SoSe

74 Conflict Detection: Vector Clocks Vector Clocks Vector clock = list of (node, counter) On receive: element-wise maximum A C:1 B:1 C:1 A:1 B:1 C:1 A:1 B:2 C:1 B C:1 B:1 C:1 A:1 B:1 C:1 A:1 B:2 C:1 read C C:1 B:1 C:1 A:1 B:1 C:1 A:1 B:2 C:1 A:1 B:2 C:2 read Big Data Technologies: NoSQL DBMS - SoSe

75 Conflict Detection: Vector Clocks (Cont.) Vector Clocks allows to determine whether one object is a direct descendant of the other / direct descendant of a common parent / are unrelated in recent heritage A C:1 B:1 C:1 A:1 B:1 C:1 B C:1 B:1 C:1 B:2 C:1 C C:1 B:1 C:1 B:2 C:1 A:1 B:1 C:1 Concurrent writes cannot be resolved automatically! Big Data Technologies: NoSQL DBMS - SoSe

76 Replication and Availability: Systems Master-Slave Replication Redis, HBase, MongoDB, CouchBase, etc. Master: single point of failure automatic failover mechanisms necessary Special instances responsible for monitoring (e.g. Redis, HBase) Handle failover with existing instances (e.g. MongoDB, CouchBase) Multi-Master Replication Riak Big Data Technologies: NoSQL DBMS - SoSe

77 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe

78 CAP Theorem CAP Theorem (Eric Brewer, 2000): in a distributed database system, you can only have at most two of the following three characteristics: Consistency: all clients have the same view, even in case of updates Availability: every request received by a non-failing node in the system must result in a response (i.e., even when severe network failures occur, every request must terminate) Partition tolerance: system properties hold even when the network (system) is partitioned (i.e., nodes can still function when communication with other groups of nodes is los) Big Data Technologies: NoSQL DBMS - SoSe

79 CAP Theorem DBMS A, B C DBMS X, Y A DBMS K, L P Asymmetry of CAP properties Some are properties of the system in general Some are properties of the system only when there is a partition Big Data Technologies: NoSQL DBMS - SoSe

80 Critism of CAP Theorem [Aba2012]: Abadi, J. Consistency Tradeoffs in Modern Distributed Database System Design. In IEEE Computer 45(2) CAP has become increasingly misunderstood and misapplied, potentially causing significant harm. In particular, many designers incorrectly conclude that the theorem imposes certain restrictions on a DDBS during normal system operation, and therefore implement an unnecessarily limited system. In reality, CAP only posits limitations in the face of certain types of failures, and does not constrain any system capabilities during normal operation. The theorem simply states that a network partition causes the system to have to decide between reducing availability or consistency General: Fundamental tradeoff between consistency, availability and latency tradeoff between consistency and latency Big Data Technologies: NoSQL DBMS - SoSe

81 PACELC [Aba2012]: Rewriting CAP as PACELC: if there is a partition (P), how does the system trade off availability and consistency (A and C); else (E), when the system is running normally in the absence of partitions, how does the system trade off latency (L) and consistency (C)? PA/EL Default versions of Amazon Dynamo, Cassandra, Riak However, R + W > N more consistency PC/EC HBase, BigTable PA/EC MongoDB Big Data Technologies: NoSQL DBMS - SoSe

82 Strong vs. Weak Concistency Strong consistency After the update completes, any subsequent access will return the updated value, i.e., any subsequent access from A, B, C will return D 1 Weak consistency The system does not guarantee that subsequent accesses will return the updated value D 1 (a number of conditions need to be met before D 1 is returned) Source: Sattler:2010 Big Data Technologies: NoSQL DBMS - SoSe

83 Eventual Concistency Eventual consistency (Vogel, 2008) Specific form of weak consistency Guarantees that if no new updates are made, eventually all accesses will return D 1 If no failures occur, the maximum size of the inconsistency window can be determined based on factors such as communication delays, the load on the system, and the number of replicas involved in the replication scheme. Source: Sattler:2010 Big Data Technologies: NoSQL DBMS - SoSe

84 BASE Alternative consistency model: BASE (Eric Brewer, 2000) Basically available Availability of the system even in case of failures Soft-state The state of the system may change over time, even without input (clients must accept stale state under certain circumstances.) Eventually consistent The system will become consistent over time, given that the system doesn't receive input during that time Big Data Technologies: NoSQL DBMS - SoSe

85 Variants of Eventual Consistency Causal consistency: If A notifies B about the update, B will read D 1 (but not C!) Read-your-writes: A will always read D 1 after its own update Session consistency: Read-your-writes inside a session Monotonic reads: If a process has seen D k, any subsequent access will never return any D i with i < k Monotonic writes: Guarantees to serialize the writes of the same process Source: Sattler:2010 Big Data Technologies: NoSQL DBMS - SoSe

86 Consistency and Replication: Example Facebook s strategy: The master copy is always in one location, a remote user typically has a closer but potentially stale copy. However, when users update their pages, the update goes to the master copy directly as do all the user s reads for a short time, despite higher latency. After 20 seconds, the user s traffic reverts to the closer copy, which by that time should reflect the update. Source: Eric A. Brewer: Pushing the CAP: Strategies for Consistency and Availability. In IEEE Computer 45(2) Based on: J. Sobel: Scaling Out. Facebook Engineering Notes, 20 Aug. 2008; Big Data Technologies: NoSQL DBMS - SoSe

87 ACID vs. BASE? Transactions What about isolation on the same node? MVCC BASE focus mainly on C and I however, what bout A and D in ACID? Atomicity Most NoSQL systems: No concept of transactional features Atomicity only for a single object, row, or document respectively Few NoSQL systems only: Concept of transactional features over multiple objects Redis, Google Datastore, Restriction: objects have to be located on the same instance Big Data Technologies: NoSQL DBMS - SoSe

88 Durability Transactions Durability in relational database systems: write-ahead logging (WAL) Concepts of Persistence and Durability in NoSQL systems In-memory only Durability by replication only In-memory and write-ahead logging by configuration Redis, CouchBase Write-ahead logging by default Riak, HBase, MongoDB, Some systems (e.g. Riak) use the write-ahead log as place of storage Big Data Technologies: NoSQL DBMS - SoSe

89 Categorization Data Model Storage Layout Query Models NoSQL Database Systems NoSQL Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe

90 Performance / Benchmarks Traditional database benchmarks Benchmarks simulate typical usage scenarios (OLTP: TPC-C, OLAP: TPC-H, SAP benchmarks) Metrics: Performance (transactions per minute) Price/performance Benchmarks for NoSQL database systems? What is a typical NoSQL scenario? Facebook? Log analytics? Web Caching? Metrics? Scalability Availability Partition Tolerance Big Data Technologies: NoSQL DBMS - SoSe

91 Benchmarks for NoSQL Systems Open research field! Up to now: Specific workloads only Yahoo! Cloud Serving Benchmark YCSB(SoCC 2010) Simple operations only (read, insert, delete, range scans) No specific use-case; different workload scenarios Workload Workload R Workload U Workload I Workload M Operations 100 % Reads 100 % Updates 100 % Inserts 50 % Reads 25 % Updates 25 % Inserts Big Data Technologies: NoSQL DBMS - SoSe

92 Yahoo! Cloud Serving Benchmark Yahoo! Cloud Serving Benchmark: Metrics Performance For constant hardware increase throughput measure latency Scaling Scale-up: Increase hardware, data size and workload proportionally measure latency Example: Elastic Speedup: measure latency during dynamically server addition Source: Cooper et al. Benchmarking Cloud Serving Systems with YCSB; SoCC 2010 Big Data Technologies: NoSQL DBMS - SoSe

93 Status Quo YCSB Yahoo! Cloud Serving Benchmark Lot of variants publishes (over 560 forks on GitHub ) Including implementation errors regarding measurement as well as analysis of results (see H. Wegert: Benchmarking von NoSQL- Datenbanksystemen, master s thesis, University of Applied Sciences, April 2015) General challenge: Comparison of different configurations of NoSQL database systems No independent Benchmarking Council for NoSQL database benchmarking up to now (like TPC for relational database systems) Big Data Technologies: NoSQL DBMS - SoSe

94 Benchmarking: Persistence Couchbase (2015) YCSB Thumbtack Technologies Version 4 Server Nodes, 4 Clients (Big Data Cluster h_da) Ø Operationen pro Sekunde (log) Keine Bestätigung Eine Bestätigung Zwei Bestätigungen Drei Bestätigungen Vier Bestätigungen 1 I Workload U Big Data Technologies: NoSQL DBMS - SoSe

95 Benchmarking: Durability Cassandra (2015) YCSB Thumbtack Technologies Version 4 Server Nodes, 4 Clients (Big Data Cluster h_da) Ø Operationen pro Sekunde (log) Periodic ( ms) Batch (50 ms) WAL deaktiviert 1 I U M Workload Big Data Technologies: NoSQL DBMS - SoSe

96 Select the Right DBMS Choosing between NoSQL or RDBMS?! Select the right (NoSQL) DBMS?! Criteria Data Analysis Estimated size of date Complexity of data Type of navigation Consistency Requirements Query Requirements Performance Requirements (latency, scalability) Non functional Requirements (license, company policies, security, documentation etc.) Costs (including development and administration) More detailed list: Prototyping and performance analysis! Big Data Technologies: NoSQL DBMS - SoSe

97 One Size Fits All? Polyglot Persistence Alternative: Choose the right system for the right task! Example: Amadeus Log Service Hundreds of terabyte log data each week (SOA architecture with several servers) Architecture (Prototype, Kossmann:2012) Distributed file system (HDFS) for compressed log data NoSQL system (HBase) for storage and instant random access (indexing by timestamp and SessionID) Full text search engine (Apache Solr) for queries on log messages MapReduce framework (Apache Hadoop) for analysis (usage statistics and error) Relational DBMS (Oracle) for meta data (user infos etc.) Polyglot Persistence (by Martin Fowler) Big Data Technologies: NoSQL DBMS - SoSe

98 Polyglot Persistence Example (Source: Sadalage, P. J., & Fowler, M. NoSQL Distilled. Pearson Education, 2013) E-Commerce Plattform Shopping cart and session data Completed Orders Inventory and Item Price Customer social graph Key-Value DBMS Document Store DBMS RDBMS (Legacy DB) Graph DBMS Big Data Technologies: NoSQL DBMS - SoSe

99 One size fits all? NewSQL DBMS Idea behind: Best of both worlds SQL ACID Non locking concurrency control High per-node performance Scale out, shared nothing architecture Opportunity 1: Development of new database systems VoltDB (Michael Stonebraker) Drizzle Opportunity 2: Integration in existing database systems MySQL Cluster JSON integration Big Data Technologies: NoSQL DBMS - SoSe

100 Trends: JSON Integration PostgreSQL 9.2 Native JSON support since release 9.2 (2012) Proprietary JSON Query API IBM DB2 Native JSON support since 10.5 (June 2013) Using MongoDB API IBM Informix Native JSON support since (September 2013) Using MongoDB API Oracle Native JSON Support since (July 2014) Proprietary JSON Query API Big Data Technologies: NoSQL DBMS - SoSe

101 Trends: JSON Integration Example DB2 JSON stored as BSON in BLOB column Source: IBM, 2013 Big Data Technologies: NoSQL DBMS - SoSe

102 Trends: Integration of MapReduce Trend: MapReduce (Hadoop) Integration in relational DBMS and data warehouse systems 2012 available Oracle BigData-Appliance Oracle NoSQL 2.0 (Key-Value-Store) IBM Infosphere with Hadoop support Microsoft SQL Server 2012 with Hadoop* support 2013 Integration of Hadoop in SAPs BigData portfolio (SAP HANA, SAP Sybase IQ, SAP Data Integrator, SAP Business Objects) Hadoop Integration in Teradata with SQL-H-API (instead of writing Map-Reduce jobs) SAS on Hadoop *Microsoft: Discontinuation of Microsoft's MapReduce framework Dryad Big Data Technologies: NoSQL DBMS - SoSe

103 The Evolving Database Landscape Big Data Technologies: NoSQL DBMS - SoSe

104 The Evolving Database Landscape Big Data Technologies: NoSQL DBMS - SoSe

105 Database Popularity Scoring: Google/Bing results, Google Trends, Stackoverflow, job offers, LinkedIn Big Data Technologies: NoSQL DBMS - SoSe

106 Big Data Technologies Introduction NoSQL Database Systems Column Store Database Systems In-Memory Database Systems Conclusion & Outlook Big Data Technologies: NoSQL DBMS - SoSe

NOSQL DATABASE SYSTEMS

NOSQL DATABASE SYSTEMS NOSQL DATABASE SYSTEMS Big Data Technologies: NoSQL DBMS - SoSe 2015 1 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application

More information

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Big Data Technologies Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Situation: Bigger and Bigger Volumes of Data Big Data Use Cases Log Analytics (Web Logs, Sensor

More information

Cloud Scale Distributed Data Storage. Jürmo Mehine

Cloud Scale Distributed Data Storage. Jürmo Mehine Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented

More information

NoSQL Databases. Nikos Parlavantzas

NoSQL Databases. Nikos Parlavantzas !!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

Introduction to NOSQL

Introduction to NOSQL Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

Distributed Data Stores

Distributed Data Stores Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High

More information

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...

More information

Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch November 11, 2013 10-11-2013 1

Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch November 11, 2013 10-11-2013 1 Big Data Management Big Data Management (BDM) Autumn 2013 Povl Koch November 11, 2013 10-11-2013 1 Overview Today s program 1. Little more practical details about this course 2. Recap from last time (Google

More information

Preparing Your Data For Cloud

Preparing Your Data For Cloud Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13

Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13 Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13 Astrid Rheinländer Wissensmanagement in der Bioinformatik What is Big Data? collection of data sets so large and complex

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2014/15 Unit 15 J. Gamper 1/44 Advanced Data Management Technologies Unit 15 Introduction to NoSQL J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE ADMT 2014/15 Unit 15

More information

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН Zettabytes Petabytes ABC Sharding A B C Id Fn Ln Addr 1 Fred Jones Liberty, NY 2 John Smith?????? 122+ NoSQL Database

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive

More information

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what

More information

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since Feb

More information

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team) Big Data Management in the Clouds Alexandru Costan IRISA / INSA Rennes (KerData team) Cumulo NumBio 2015, Aussois, June 4, 2015 After this talk Realize the potential: Data vs. Big Data Understand why we

More information

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00 Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

How To Write A Database Program

How To Write A Database Program SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store

More information

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch September 30, 2013 29-09-2013 1

Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch September 30, 2013 29-09-2013 1 Big Data Management Big Data Management (BDM) Autumn 2013 Povl Koch September 30, 2013 29-09-2013 1 Overview Today s program 1. Little more practical details about this course 2. Recap from last time 3.

More information

Data Management in the Cloud

Data Management in the Cloud Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05 Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970

More information

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE <GEOFF@ZORCHED.NET> @GEOFFLANE

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE <GEOFF@ZORCHED.NET> @GEOFFLANE NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE @GEOFFLANE WHAT IS NOSQL? NON-RELATIONAL DATA STORAGE USUALLY SCHEMA-FREE ACCESS DATA WITHOUT SQL (THUS... NOSQL) WIDE-COLUMN / TABULAR

More information

.NET User Group Bern

.NET User Group Bern .NET User Group Bern Roger Rudin bbv Software Services AG roger.rudin@bbv.ch Agenda What is NoSQL Understanding the Motivation behind NoSQL MongoDB: A Document Oriented Database NoSQL Use Cases What is

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information

wow CPSC350 relational schemas table normalization practical use of relational algebraic operators tuple relational calculus and their expression in a declarative query language relational schemas CPSC350

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Cloud Databases: The Next Post- Relational World 18 April 2012 Prof. Chris Clifton Beyond RDBMS The Relational Model is too limiting! Simple data model doesn t capture semantics

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

NoSQL in der Cloud Why? Andreas Hartmann

NoSQL in der Cloud Why? Andreas Hartmann NoSQL in der Cloud Why? Andreas Hartmann 17.04.2013 17.04.2013 2 NoSQL in der Cloud Why? Quelle: http://res.sys-con.com/story/mar12/2188748/cloudbigdata_0_0.jpg Why Cloud??? 17.04.2013 3 NoSQL in der Cloud

More information

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment International Journal of Applied Information Systems (IJAIS) ISSN : 2249-868 Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment Yusuf Abubakar Department of Computer Science

More information

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect

More information

Slave. Master. Research Scholar, Bharathiar University

Slave. Master. Research Scholar, Bharathiar University Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually

More information

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Big Data and Scripting Systems build on top of Hadoop

Big Data and Scripting Systems build on top of Hadoop Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform interactive execution of map reduce jobs Pig is the name of the system Pig Latin is the

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

nosql and Non Relational Databases

nosql and Non Relational Databases nosql and Non Relational Databases Image src: http://www.pentaho.com/big-data/nosql/ Matthias Lee Johns Hopkins University What NoSQL? Yes no SQL.. Atleast not only SQL Large class of Non Relaltional Databases

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010 System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached

More information

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go

More information

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013 Database Management System Choices Introduction To Database Systems CSE 373 Spring 2013 Outline Introduction PostgreSQL MySQL Microsoft SQL Server Choosing A DBMS NoSQL Introduction There a lot of options

More information

Using Object Database db4o as Storage Provider in Voldemort

Using Object Database db4o as Storage Provider in Voldemort Using Object Database db4o as Storage Provider in Voldemort by German Viscuso db4objects (a division of Versant Corporation) September 2010 Abstract: In this article I will show you how

More information

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

The CAP theorem and the design of large scale distributed systems: Part I

The CAP theorem and the design of large scale distributed systems: Part I The CAP theorem and the design of large scale distributed systems: Part I Silvia Bonomi University of Rome La Sapienza www.dis.uniroma1.it/~bonomi Great Ideas in Computer Science & Engineering A.A. 2012/2013

More information

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg HDB++: HIGH AVAILABILITY WITH Page 1 OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 2 OVERVIEW What is Cassandra (C*)?

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

NoSQL Databases. Polyglot Persistence

NoSQL Databases. Polyglot Persistence The future is: NoSQL Databases Polyglot Persistence a note on the future of data storage in the enterprise, written primarily for those involved in the management of application development. Martin Fowler

More information

The Cloud Trade Off IBM Haifa Research Storage Systems

The Cloud Trade Off IBM Haifa Research Storage Systems The Cloud Trade Off IBM Haifa Research Storage Systems 1 Fundamental Requirements form Cloud Storage Systems The Google File System first design consideration: component failures are the norm rather than

More information

LARGE-SCALE DATA STORAGE APPLICATIONS

LARGE-SCALE DATA STORAGE APPLICATIONS BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort

More information

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required. What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees

More information

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING BRAC UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING 12-12-2012 Investigating Cloud Data Storage Sumaiya Binte Mostafa (ID 08301001) Firoza Tabassum (ID 09101028) BRAC University

More information

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Cloud DBMS: An Overview Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Outline Definition and requirements S through partitioning A through replication Problems of traditional DDBMS Usage analysis: operational

More information

How To Use Big Data For Telco (For A Telco)

How To Use Big Data For Telco (For A Telco) ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris

More information

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359

More information

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace Introduction to Polyglot Persistence Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace FOSSCOMM 2016 Background - 14 years in databases and system engineering - NoSQL DBA @ ObjectRocket

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI Big Data Development CASSANDRA NoSQL Training - Workshop March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 121109 Dubai UAE, email training-coordinator@isidusnet M: +97150

More information

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014 Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014 About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western

More information

Benchmarking and Analysis of NoSQL Technologies

Benchmarking and Analysis of NoSQL Technologies Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The

More information

Sentimental Analysis using Hadoop Phase 2: Week 2

Sentimental Analysis using Hadoop Phase 2: Week 2 Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular

More information

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011 A Review of Column-Oriented Datastores By: Zach Pratt Independent Study Dr. Maskarinec Spring 2011 Table of Contents 1 Introduction...1 2 Background...3 2.1 Basic Properties of an RDBMS...3 2.2 Example

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Choosing the right NoSQL database for the job: a quality attribute evaluation

Choosing the right NoSQL database for the job: a quality attribute evaluation Lourenço et al. Journal of Big Data (2015) 2:18 DOI 10.1186/s40537-015-0025-0 RESEARCH Choosing the right NoSQL database for the job: a quality attribute evaluation João Ricardo Lourenço 1*, Bruno Cabral

More information

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Study and Comparison of Elastic Cloud Databases : Myth or Reality? Université Catholique de Louvain Ecole Polytechnique de Louvain Computer Engineering Department Study and Comparison of Elastic Cloud Databases : Myth or Reality? Promoters: Peter Van Roy Sabri Skhiri

More information

Enterprise Operational SQL on Hadoop Trafodion Overview

Enterprise Operational SQL on Hadoop Trafodion Overview Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development

More information

Big Data and Scripting Systems build on top of Hadoop

Big Data and Scripting Systems build on top of Hadoop Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform Pig is the name of the system Pig Latin is the provided programming language Pig Latin is

More information

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful. Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one

More information