NOSQL DATABASE SYSTEMS

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "NOSQL DATABASE SYSTEMS"

Transcription

1 NOSQL DATABASE SYSTEMS Big Data Technologies: NoSQL DBMS - SoSe

2 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe

3 NoSQL Database Systems NoSQL Considered Categories of NoSQL Database Systems Key-Value Database Systems Document Database Systems Column Family Database Systems Big Data Technologies: NoSQL DBMS - SoSe

4 Key-Value Database Systems NoSQL Data Model Key-value pairs Unique keys Values arbitrary type (serialized byte arrays) or strings, lists, sets, ordered sets (of strings) Schema-free key key key key key value value value value value Storage Layout Hash-Maps, B-Trees, Indexes Primary indexes (Hash, B-tree) on key Secondary indexes on values? Big Data Technologies: NoSQL DBMS - SoSe

5 Key-Value Database Systems (Cont.) NoSQL Query Models Simple API set (key, value) value = get (key) delete (key) Operations on values? More complex operations Language Bindings MapReduce later in this chapter key key key key key value value value value value Systems Oracle Berkeley DB (mid-90s) Caches (EHCache, Memcache) Amazon Dynamo/S3, Redis, Riak, Voldemort, Big Data Technologies: NoSQL DBMS - SoSe

6 Document Store Database Systems NoSQL Data Model Key-value pairs with documents as value Document format: JSON or BSON (Binary JSON) Loosely structured name(key)-value pairs Hierarchical Additionally, MongoDB uses collections arbitrary documents could be grouped together documents in a collection should be similar to facilitate effective indexing { } "id": 1, "name": football boot", "price": 199, "stock": { "warehouse": 120, "retail": 10 } Storage Layout B-Trees to store the documents MongoDB: Documents in a single collection are stored together Big Data Technologies: NoSQL DBMS - SoSe

7 Document Store Database Systems (Cont.) NoSQL Indexes Primary indexes on documentid (key) Secondary indexes on JSON-names Default or user defined Composite indexes may be supported Query Models Simple API: set/get/delete Further query support differ widely Powerful ad-hoc queries with integrated query language (MongoDB) No ad-hoc queries, predefined views with indexes only (CouchDB & Couchbase) Language Bindings MapReduce later in this chapter Systems MongoDB, CouchDB, Couchbase, { } "id": 1, "name": football boot", "price": 199, "stock": { "warehouse": 120, "retail": 10 } Big Data Technologies: NoSQL DBMS - SoSe

8 Column Family Database Systems NoSQL Data Model Loosely structured by columns and column families ( set of nested maps ) Column Family set of columns grouped together into a bundle Column families have to be predefined Column Not predefined; any type or data (can be nested) Table Column Family Column Family Row Key1 column column column column column Row Key2 column column column column Big Data Technologies: NoSQL DBMS - SoSe

9 Column Family Database Systems (Cont.) NoSQL Data Model (Cont.) Example: Row Key: title Column Family text Column Family revision "NoSQL" "Redis" text:content: "A NoSQL database provides a mechanism " text:content: "Redis is an open-source, networked " revision:author: "Mendel" revision:comment": "changed " revision:author: "Torben" revision:comment: "initial " Column family database systems support multiple versions of each cell by timestamps: Row Key: title Time Stamp Column Family text Column Family revision "NoSQL" t5 text:content: " " revision:author: "Mendel" revision:comment: "changed " t4 revision:author: "Torben" revision:comment: "there " "Redis" t3 text:content: " " revision:author: "Torben" revision:comment: "initial " Big Data Technologies: NoSQL DBMS - SoSe

10 Column Family Database Systems (Cont.) NoSQL Row Key: title Time Stamp Column Family text Column Family revision "NoSQL" t5 text:content: " " revision:author: "Mendel" revision:comment: "changed view " Storage Layout Data is stored by column family t4 Row Key: title Time Stamp Column Family text column: content NoSQL t5 A NoSQL database provides a mechanism Redis t3 Redis is an open-source, networked revision:author: "Torben" revision:comment: there should be " "Redis" t3 text:content: " " revision:author: "Torben" revision:comment: "initial " Row Key: title Time Stamp ColumnFamily revision column: author column: comment NoSQL t5 Mendel changed view NoSQL t4 Torben there should be Redis t3 Torben initial Big Data Technologies: NoSQL DBMS - SoSe

11 Column Family Database Systems (Cont.) NoSQL Classical example: Web table Row Key Time Stamp Column Family contents Column Family anchor "com.cnn.www" t9 anchor:anchor:"cnnsi.com anchor:anchortext:"cnn" t8 t6 t5 "<html> " "<html> " anchor:anchor:"my.look.ch anchor:anchortext: "CNN.com" Big Data Technologies: NoSQL DBMS - SoSe

12 Column Family Database Systems (Cont.) NoSQL Query Models Simple API set (table, row, column, value) value = get (table, row, column) delete (table, row, column) timestamp optional Language Bindings More powerful query engines integrated (Cassandra Query Language) or as additional software products (e.g. Google App Engine / Google Datastore for BigTable, Hive for Data Warehousing on HBase) MapReduce later in this chapter Indexes Primary indexes (B-Trees sorted ordered) Default or user defined secondary indexes Systems Google BigTable, HBase, Cassandra, Amazon SimpleDB, Big Data Technologies: NoSQL DBMS - SoSe

13 NoSQL (Not only SQL): Definition NoSQL NoSQL Definition: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above. Source: S. Edlich, nosql-database.org Big Data Technologies: NoSQL DBMS - SoSe

14 NoSQL (Not only SQL): Definition NoSQL Next Generation Databases mostly addressing some of the points: non-relational schema-free simple API more complex APIs currently under development distributed and horizontally scalable easy replication support eventually consistent / BASE (not ACID) BASE as well as ACID are supported nowadays open-source??? Big Data Technologies: NoSQL DBMS - SoSe

15 NoSQL: The Essence NoSQL Data Model non-relational schema-free Scalability distributed and horizontally scalable easy replication support Big Data Technologies: NoSQL DBMS - SoSe

16 NoSQL Database Systems: Use Cases NoSQL Key-Value Database Systems Suitable Use Cases Storing Session Information User Profiles, Preferences Shopping Cart Data Examples Amazon (shopping carts) Temetra (meter data) Document Store Database Systems Event Logging Content Management Systems Blogging Platforms Web Analytics or Real-Time Analytics Forbes (CMS) MTV (CMS) Column Family Database Systems Event Logging Content Management Systems Blogging Platforms Google (web pages) Facebook (messaging) Twitter (places of interest) Big Data Technologies: NoSQL DBMS - SoSe

17 NoSQL Family Tree NoSQL Source: cloudant.com Big Data Technologies: NoSQL DBMS - SoSe

18 Solution Architectures (Examples) NoSQL Google Stack Hadoop Stack Source: Saake/Schallehn:2011 Big Data Technologies: NoSQL DBMS - SoSe

19 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe

20 Data Modeling id ti Object-relational impedance mismatch Example: blog, blogpost, comment, author Object-oriented modeling Mapping to relational database Big Data Technologies: NoSQL DBMS - SoSe

21 Data Modeling Decisions id ti Primary Decision: Embedding vs. Referencing However, to consider There are no join operations within NoSQL database systems! There are no distributed transactions within NoSQL! Advantages and Disadvantages of Embedding Advantages and Disadvantages of Referencing Martin Fowler: Aggregate-Oriented Modeling Big Data Technologies: NoSQL DBMS - SoSe

22 Data Modeling: Document Store DBS How to realize references? id ti Direction of references? Embedding: What about denormalization and redundancy? Big Data Technologies: NoSQL DBMS - SoSe

23 Data Modeling: Column Family DBS id ti How to implement embedded objects in column family database systems? Variant 1: Using run-time named column qualifiers Variant 2: Using timestamps (or other id s) New (Cassandra CQL3): Using collection types (map, set, list) What about column families? Big Data Technologies: NoSQL DBMS - SoSe

24 Data Modeling id ti What about data modeling in key-value database systems? Data Modeling: Conclusion More degrees of freedom Embedding vs. referencing Denormalization and redundancy Big Data Technologies: NoSQL DBMS - SoSe

25 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe

26 Application Development for NoSQL Simple command line APIs REST-API (Some) more powerful query languages / query engines Language Bindings Java, Ruby, C#, Python, Erlang, PHP, Perl, REST Thrift Big Data Technologies: NoSQL DBMS - SoSe

27 Application Development for NoSQL Example: title, content from blogpost with id = 042 // HBase get 'blogposts', '042', { COLUMN => ['blogpost_data:title', 'blogpost_data:content'] } // Cassandra SELECT title, content FROM blogposts WHERE id = '042'; // MongoDB db.blogposts.find( { _id : '042' }, { title: 1, content: 1 } ) // Couchbase function (doc) { if (doc._id == '042') { emit(doc._id, [doc.title, doc.content]); } } Big Data Technologies: NoSQL DBMS - SoSe

28 Application Development for NoSQL Challenge Big data Data distributed over several hundred notes (remember: scale out) Data-to-Code or Code-to-Data? Executing jobs in parallel over several nodes There is a need for appropriated algorithms and frameworks! Big Data Technologies: NoSQL DBMS - SoSe

29 MapReduce: Basic Idea Old idea from functional programming (LISP, ML, Erlang, Scala etc.) Divide tasks into small discrete tasks and run them in parallel Never change original data (pipe concept) Different operations on the same data do not influence No concurrency conflicts No deadlocks No race conditions MapReduce Basic idea and framework introduced by Google 2004: J. Dean and S. Gehmawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI' Big Data Technologies: NoSQL DBMS - SoSe

30 MapReduce: Basic Idea & WordCount Example Doc1 Doc2 Doc3 Doc4 Developers should implement two primary methods Map: (key1, val1) [(key2, val2)] Reduce: (key2, [val2]) [(key3, val3)] Documents Sport, Handball, Soccer Soccer, FIFA Documents Sport, Gym, Money Soccer, FIFA, Money MAP MAP Key Sport 1 Handball 1 Soccer 1 Value Soccer 1 Key Value FIFA 1 Sport 1 Gym 1 Money 1 Soccer 1 FIFA 1 Money 1 REDUCE REDUCE Key Sport 2 Handball 1 Soccer 3 Key FIFA 2 Gym 1 Money 2 Value Value Big Data Technologies: NoSQL DBMS - SoSe

31 MapReduce: Architecture and Phases Source: https://developers.google.com/appengine/docs/python/dataprocessing/overview Big Data Technologies: NoSQL DBMS - SoSe

32 Hadoop Example Map & Reduce Functions (Example) public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { public void map(longwritable key, Text value, OutputCollector<Text, IntWritable> output, ) { String line = value.tostring(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasmoretokens()) { word.set(tokenizer.nexttoken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, ) { int sum = 0; while (values.hasnext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } Source: Big Data Technologies: NoSQL DBMS - SoSe

33 MapReduce: Optional Combine Phase Decrease the shuffling cost Reduce the result size of map functions Perform reduce-like function in each machine Documents Sport, Handball, Soccer Soccer, FIFA Documents Sport, Gym, Money Soccer, FIFA, Money MAP MAP Key Sport 1 Handball 1 Soccer 1 Value Soccer 1 Key FIFA Value 1 Sport 1 Gym 1 Money 1 Soccer 1 FIFA 1 Money 1 COMBINE COMBINE Key Value Sport 1 Handball 1 Soccer 2 FIFA 1 Key Value Sport 1 Gym 1 Money 2 Soccer 1 FIFA 1 REDUCE REDUCE Big Data Technologies: NoSQL DBMS - SoSe

34 MapReduce Frameworks MapReduce frameworks take care of Scaling Fault tolerance (Load balancing) MapReduce Frameworks Google (however, Google now promotes Dataflow) Apache Hadoop standalone or integrated in NoSQL (and SQL) DBMS Also commercial distributors: Cloudera, MapR, HortonWorks, Proprietary MapReduce framework integrated in NoSQL DBMS Big Data Technologies: NoSQL DBMS - SoSe

35 Map Reduce and Query Languages MapReduce paradigm is too low-level Only two declarative primitives (map + reduce) Custom code for simple operations like projection and filtering Code is difficult to reuse and maintain Combination of high-level declarative querying and low-level programming with MapReduce Dataflow Programming Languages HiveQL Pig (Jaql) Big Data Technologies: NoSQL DBMS - SoSe

36 Hadoop Stack Source: Saake/Schallehn:2011 Big Data Technologies: NoSQL DBMS - SoSe

37 HiveQL Hive: data warehouse infrastructure built on top of Hadoop, providing: Data Summarization Ad hoc querying Simple query language: HiveQL (based on SQL) Extendable via custom mappers and reducers Developed by Facebook, now subproject of Hadoop Big Data Technologies: NoSQL DBMS - SoSe

38 HiveQL: Example Source: Saake/Schallehn: Data Management in the Cloud, 2011 Big Data Technologies: NoSQL DBMS - SoSe

39 Pig A platform for analyzing large data sets Pig consists of two parts: PigLatin: A Data Processing Language Pig Infrastructure: An Evaluator for PigLatin programs Pig compiles Pig Latin into physical plans Plans are to be executed over Hadoop Interface between the declarative style of SQL and low-level, procedural style of MapReduce Big Data Technologies: NoSQL DBMS - SoSe

40 Pig: Example Source: Saake/Schallehn: Data Management in the Cloud, 2011 Big Data Technologies: NoSQL DBMS - SoSe

41 MapReduce in Practice VLDB 2012: Chen, Alspaugh, Katz: Interactive Analytical Processing in Big Data Systems: A CrossIndustry Study of MapReduce Workloads: 7 Hadoop deployments (Cloudera = Hadoop with commercial services) Big Data Technologies: NoSQL DBMS - SoSe

42 MapReduce in Practice (Cont.) Source: Chen, Alspaugh, Katz. Interactive Analytical Processing in Big Data Systems: A CrossIndustry Study of MapReduce Workloads; VLDB2012 Big Data Technologies: NoSQL DBMS - SoSe

43 MapReduce Trends Hadoop 2.0 with YARN (Abstract from MapReduce) Source: HortonWorks Apache In-Memory Hadoop Performance! Written in Scala Big Data Technologies: NoSQL DBMS - SoSe

44 Application Development for NoSQL MapReduce: Concept and Frameworks State of the art application development With relational database systems: Object-Relational Mapping (ORM) frameworks and standards (Java Persistence API etc.) Frameworks for Object-NoSQL mapping?! Big Data Technologies: NoSQL DBMS - SoSe

45 Object-NoSQL Mapper: Architecture Applikation id tit SELECT b.titel, b.text FROM blogpost b WHERE b.id = 042 Objekt-NoSQL Mapper id tit db.blogposts.find ( { _id : 042 }, { titel: 1, text: 1 } ) SELECT titel, text FROM blogposts WHERE id = 042 ; get blogposts, 042, { COLUMN => [ blogpost_daten:titel, blogpost_daten:text ] } { } "id" : "042", "titel" :... id titel 042 id titel 042 MongoDB Cassandra HBase Big Data Technologies: NoSQL DBMS - SoSe

46 Object-NoSQL Mapper: Market Overview Mapper for different Programming Languages Java,.NET, Python, Ruby Volatile Market Main Focus: Object-NoSQL Mapper for Java Standardization: Java Persistence API (JPA) with Java Persistence Query Language (JPQL) Categorization Multi Data Store Mapper Single Data Store Mapper Big Data Technologies: NoSQL DBMS - SoSe

47 Java Multi Data Store Mapper Support for Document Store, Column Family, and Graph Database Systems in Java Multi Data Store Mapper Document Store Couchbase Data Nucleus Eclipse Link Hibernate OGM Kundera CouchDB PlayORM MongoDB Column-Family DBMS Cassandra HBase Graph DBMS Neo4J Spring Data Big Data Technologies: NoSQL DBMS - SoSe

48 Java Multi Data Store Mapper Support for Key-Value Database Systems in Java Multi Data Store Mapper Key-Value DBMS AmazonDynamoDB Apache Solr Ehcache Data Nucleus Eclipse Link Hibernate Kundera PlayORM Elasticsearch GemFire Infinispan Oracle NoSQL Redis Spring Data Big Data Technologies: NoSQL DBMS - SoSe

49 Java Object-NoSQL Mapper: Supported Functionality Single Data Store Mapper *Limited functionality (depending from the underlying NoSQL data store) Source: Störl/Hauf/Klettke/Scherzinger: Schemaless NoSQL Data Stores Object-NoSQL Mappers to the Rescue? BTW 2015, Hamburg, March 2015 Big Data Technologies: NoSQL DBMS - SoSe

50 Object-NoSQL Mapper: Query Language Support Challenge: Different Query Language Interfaces Examples: Most systems do not support any JOINS Many systems do not offer aggregate functions, LIKE operator, or NOT operator, Approaches 1. Offer only the particular subset of features that is implemented by all supported NoSQL data stores, i.e. the intersection of features 2. Distinguish by data store and offer only the set of features implemented by a particular NoSQL data store 3. Offer the same set of features for all supported NoSQL data stores, possibly complementing missing features by implementing them inside the Object-NoSQL Mapper Big Data Technologies: NoSQL DBMS - SoSe

51 Object-NoSQL Mapper: Query Language Support Approach 2: NoSQL data store specific support of JPQL operators Drawback: restricted portability Systems: Hibernate OGM, Kundera, EclipseLink Example: JPQL operators (selection) in Kundera https://github.com/impetus-opensource/kundera/wiki/jpql Big Data Technologies: NoSQL DBMS - SoSe

52 Object-NoSQL Mapper: Query Language Support Approach 2: NoSQL data store specific support of JPQL operators Extension: Use third-party libraries to offer more functionality for some but not for all supported NoSQL data stores Systems: Hibernate OGM (Hibernate Search), Kundera each with Apache Lucene Application Object-NoSQL Mapper NoSQL-DBMS Search Engine Index Big Data Technologies: NoSQL DBMS - SoSe

53 Object-NoSQL Mapper: Query Language Support Approach 3: Offer the same set of features for all supported NoSQL data stores Complementing missing features by implementing them inside the Object-NoSQL Mapper Benefit: Portability Drawback: Performance Application Systems: DataNucleus, Hibernate OGM (announced) Object-NoSQL Mapper NoSQL-DBMS Big Data Technologies: NoSQL DBMS - SoSe

54 Object-NoSQL Mapper: Query Language Support Outlook: Combination of Approach 2 and 3 Application Object-NoSQL Mapper NoSQL-DBMS Search Engine Index Systems: Hibernate OGM (announced) Big Data Technologies: NoSQL DBMS - SoSe

55 Conclusion: Java Object-NoSQL Mapper Vendor Independency / Portability Standardized Query Language (JPQL) Support for different NoSQL data stores Supported query operators often depend on the capabilities of the underlying NoSQL data stores Performance (as of end of 2014) In reading data, there is only a small gap between native access and the Object-NoSQL Mappers for the majority of the evaluated products Yet in writing, object mappers introduce a significant overhead Further reading: U. Störl, Th. Hauf, M. Klettke and S. Scherzinger: Schemaless NoSQL Data Stores Object-NoSQL Mappers to the Rescue? BTW 2015, Hamburg, March 2015 Big Data Technologies: NoSQL DBMS - SoSe

56 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe

NOSQL DATABASE SYSTEMS

NOSQL DATABASE SYSTEMS NOSQL DATABASE SYSTEMS Big Data Technologies: NoSQL DBMS - SoSe 2015 1 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application

More information

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Big Data Technologies Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Situation: Bigger and Bigger Volumes of Data Big Data Use Cases Log Analytics (Web Logs, Sensor

More information

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2016

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2016 Big Data Technologies Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2016 Situation: Bigger and Bigger Volumes of Data Big Data Use Cases Log Analytics (Web Logs, Sensor

More information

Cloud Scale Distributed Data Storage. Jürmo Mehine

Cloud Scale Distributed Data Storage. Jürmo Mehine Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3

CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 INTRODUCTION xvii PART I: GETTING STARTED CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit of History 4 Big Data 7 Scalability 9 Defi nition and Introduction

More information

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05 Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970

More information

NoSQL Databases. Nikos Parlavantzas

NoSQL Databases. Nikos Parlavantzas !!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!

More information

Preparing Your Data For Cloud

Preparing Your Data For Cloud Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability

More information

Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch November 11, 2013 10-11-2013 1

Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch November 11, 2013 10-11-2013 1 Big Data Management Big Data Management (BDM) Autumn 2013 Povl Koch November 11, 2013 10-11-2013 1 Overview Today s program 1. Little more practical details about this course 2. Recap from last time (Google

More information

Mom, I so wish Hibernate for my NoSQL database... Speaker : Alexey Zinoviev

Mom, I so wish Hibernate for my NoSQL database... Speaker : Alexey Zinoviev Mom, I so wish Hibernate for my NoSQL database... Speaker : Alexey Zinoviev About I am a scientist. The area of my interests includes machine learning, traffic jams prediction, BigData algorythms. But

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Objectives. Introduce some key concepts behind the NoSQL family of databases

Objectives. Introduce some key concepts behind the NoSQL family of databases NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, Pearson Education, 2013 Objectives Introduce some key concepts behind the

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Cloud Databases: The Next Post- Relational World 18 April 2012 Prof. Chris Clifton Beyond RDBMS The Relational Model is too limiting! Simple data model doesn t capture semantics

More information

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...

More information

Hadoop WordCount Explained! IT332 Distributed Systems

Hadoop WordCount Explained! IT332 Distributed Systems Hadoop WordCount Explained! IT332 Distributed Systems Typical problem solved by MapReduce Read a lot of data Map: extract something you care about from each record Shuffle and Sort Reduce: aggregate, summarize,

More information

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

Lots of Data, Little Money. A Last.fm perspective. Martin Dittus, martind@last.fm London Geek Nights, 2009-04-23

Lots of Data, Little Money. A Last.fm perspective. Martin Dittus, martind@last.fm London Geek Nights, 2009-04-23 Lots of Data, Little Money. A Last.fm perspective Martin Dittus, martind@last.fm London Geek Nights, 2009-04-23 Big Data Little Money You have lots of data You want to process it For your product (Last.fm:

More information

Hadoop vs Apache Spark

Hadoop vs Apache Spark Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies

More information

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required. What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees

More information

Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13

Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13 Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13 Astrid Rheinländer Wissensmanagement in der Bioinformatik What is Big Data? collection of data sets so large and complex

More information

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since Feb

More information

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive

More information

SQL, NoSQL, and Next Generation DBMSs. Shahram Ghandeharizadeh Director of the USC Database Lab

SQL, NoSQL, and Next Generation DBMSs. Shahram Ghandeharizadeh Director of the USC Database Lab SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect

More information

Introduction to MapReduce and Hadoop

Introduction to MapReduce and Hadoop Introduction to MapReduce and Hadoop Jie Tao Karlsruhe Institute of Technology jie.tao@kit.edu Die Kooperation von Why Map/Reduce? Massive data Can not be stored on a single machine Takes too long to process

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team) Big Data Management in the Clouds Alexandru Costan IRISA / INSA Rennes (KerData team) Cumulo NumBio 2015, Aussois, June 4, 2015 After this talk Realize the potential: Data vs. Big Data Understand why we

More information

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE @GEOFFLANE

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE <GEOFF@ZORCHED.NET> @GEOFFLANE NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE @GEOFFLANE WHAT IS NOSQL? NON-RELATIONAL DATA STORAGE USUALLY SCHEMA-FREE ACCESS DATA WITHOUT SQL (THUS... NOSQL) WIDE-COLUMN / TABULAR

More information

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010 System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached

More information

HPCHadoop: MapReduce on Cray X-series

HPCHadoop: MapReduce on Cray X-series HPCHadoop: MapReduce on Cray X-series Scott Michael Research Analytics Indiana University Cray User Group Meeting May 7, 2014 1 Outline Motivation & Design of HPCHadoop HPCHadoop demo Benchmarking Methodology

More information

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea Overview Riding Google App Engine Taming Hadoop Summary Riding

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

Slave. Master. Research Scholar, Bharathiar University

Slave. Master. Research Scholar, Bharathiar University Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually

More information

.NET User Group Bern

.NET User Group Bern .NET User Group Bern Roger Rudin bbv Software Services AG roger.rudin@bbv.ch Agenda What is NoSQL Understanding the Motivation behind NoSQL MongoDB: A Document Oriented Database NoSQL Use Cases What is

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects

More information

Hadoop: Understanding the Big Data Processing Method

Hadoop: Understanding the Big Data Processing Method Hadoop: Understanding the Big Data Processing Method Deepak Chandra Upreti 1, Pawan Sharma 2, Dr. Yaduvir Singh 3 1 PG Student, Department of Computer Science & Engineering, Ideal Institute of Technology

More information

Hadoop and Eclipse. Eclipse Hawaii User s Group May 26th, 2009. Seth Ladd http://sethladd.com

Hadoop and Eclipse. Eclipse Hawaii User s Group May 26th, 2009. Seth Ladd http://sethladd.com Hadoop and Eclipse Eclipse Hawaii User s Group May 26th, 2009 Seth Ladd http://sethladd.com Goal YOU can use the same technologies as The Big Boys Google Yahoo (2000 nodes) Last.FM AOL Facebook (2.5 petabytes

More information

Open Source Technologies on Microsoft Azure

Open Source Technologies on Microsoft Azure Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

Enterprise Operational SQL on Hadoop Trafodion Overview

Enterprise Operational SQL on Hadoop Trafodion Overview Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2014/15 Unit 15 J. Gamper 1/44 Advanced Data Management Technologies Unit 15 Introduction to NoSQL J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE ADMT 2014/15 Unit 15

More information

Hadoop Framework. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN

Hadoop Framework. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Hadoop Framework technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Warning! Slides are only for presenta8on guide We will discuss+debate addi8onal

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 3. Apache Hadoop Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Apache Hadoop Open-source

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12 Introduction to NoSQL Databases and MapReduce Tore Risch Information Technology Uppsala University 2014-05-12 What is a NoSQL Database? 1. A key/value store Basic index manager, no complete query language

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

Infrastructures for big data

Infrastructures for big data Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

wow CPSC350 relational schemas table normalization practical use of relational algebraic operators tuple relational calculus and their expression in a declarative query language relational schemas CPSC350

More information

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You

More information

NoSQL Databases. Polyglot Persistence

NoSQL Databases. Polyglot Persistence The future is: NoSQL Databases Polyglot Persistence a note on the future of data storage in the enterprise, written primarily for those involved in the management of application development. Martin Fowler

More information

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what

More information

Open Source Technologies on Microsoft Azure

Open Source Technologies on Microsoft Azure Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions

More information

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the

More information

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft Big Data Architectures Tom Cahill, Vice President Worldwide Channels, Jaspersoft Jaspersoft + Big Data = Fast Insights Success in the Big Data era is more than about size. It s about getting insight from

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Hadoop Ecosystem Overview of this Lecture Module Background Google MapReduce The Hadoop Ecosystem Core components: Hadoop

More information

Sentimental Analysis using Hadoop Phase 2: Week 2

Sentimental Analysis using Hadoop Phase 2: Week 2 Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular

More information

NewSQL. Andy Pavlo February 6, 2012

NewSQL. Andy Pavlo February 6, 2012 NewSQL Andy Pavlo February 6, 2012 Outline The Last Decade of Databases NewSQL Introduction H-Store Early-2000s All the big players were heavyweight and expensive. Oracle, DB2, Sybase, SQL Server, etc.

More information

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359

More information

Introduc)on to the MapReduce Paradigm and Apache Hadoop. Sriram Krishnan sriram@sdsc.edu

Introduc)on to the MapReduce Paradigm and Apache Hadoop. Sriram Krishnan sriram@sdsc.edu Introduc)on to the MapReduce Paradigm and Apache Hadoop Sriram Krishnan sriram@sdsc.edu Programming Model The computa)on takes a set of input key/ value pairs, and Produces a set of output key/value pairs.

More information

Table of Contents. Développement logiciel pour le Cloud (TLC) Table of Contents. 5. NoSQL data models. Guillaume Pierre

Table of Contents. Développement logiciel pour le Cloud (TLC) Table of Contents. 5. NoSQL data models. Guillaume Pierre Table of Contents Développement logiciel pour le Cloud (TLC) 5. NoSQL data models Guillaume Pierre Université de Rennes 1 Fall 2012 http://www.globule.org/~gpierre/ Développement logiciel pour le Cloud

More information

Big Data Analytics* Outline. Issues. Big Data

Big Data Analytics* Outline. Issues. Big Data Outline Big Data Analytics* Big Data Data Analytics: Challenges and Issues Misconceptions Big Data Infrastructure Scalable Distributed Computing: Hadoop Programming in Hadoop: MapReduce Paradigm Example

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

Xiaoming Gao Hui Li Thilina Gunarathne

Xiaoming Gao Hui Li Thilina Gunarathne Xiaoming Gao Hui Li Thilina Gunarathne Outline HBase and Bigtable Storage HBase Use Cases HBase vs RDBMS Hands-on: Load CSV file to Hbase table with MapReduce Motivation Lots of Semi structured data Horizontal

More information

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases Background Inspiration: postgresapp.com demo.beatstream.fi (modern desktop browsers without

More information

Word count example Abdalrahman Alsaedi

Word count example Abdalrahman Alsaedi Word count example Abdalrahman Alsaedi To run word count in AWS you have two different ways; either use the already exist WordCount program, or to write your own file. First: Using AWS word count program

More information

Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart

Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart Hadoop/MapReduce Object-oriented framework presentation CSCI 5448 Casey McTaggart What is Apache Hadoop? Large scale, open source software framework Yahoo! has been the largest contributor to date Dedicated

More information

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace Introduction to Polyglot Persistence Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace FOSSCOMM 2016 Background - 14 years in databases and system engineering - NoSQL DBA @ ObjectRocket

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Introduc8on to Apache Spark

Introduc8on to Apache Spark Introduc8on to Apache Spark Jordan Volz, Systems Engineer @ Cloudera 1 Analyzing Data on Large Data Sets Python, R, etc. are popular tools among data scien8sts/analysts, sta8s8cians, etc. Why are these

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

Applications for Big Data Analytics

Applications for Big Data Analytics Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

Big Data and Scripting Systems build on top of Hadoop

Big Data and Scripting Systems build on top of Hadoop Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform interactive execution of map reduce jobs Pig is the name of the system Pig Latin is the

More information

Introduction to MapReduce Tore Risch Information Technology Uppsala University

Introduction to MapReduce  Tore Risch Information Technology Uppsala University Introduction to MapReduce http://user.it.uu.se/~torer/kurser/dm2/mapreduce.pdf Tore Risch Information Technology Uppsala University 2015-05-11 What is a NoSQL Database? A key/value store Basic index manager,

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in

More information

The Hadoop Eco System Shanghai Data Science Meetup

The Hadoop Eco System Shanghai Data Science Meetup The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related

More information

A Study of NoSQL and NewSQL databases for data aggregation on Big Data

A Study of NoSQL and NewSQL databases for data aggregation on Big Data A Study of NoSQL and NewSQL databases for data aggregation on Big Data ANANDA SENTRAYA PERUMAL MURUGAN Master s Degree Project Stockholm, Sweden 2013 TRITA-ICT-EX-2013:256 A Study of NoSQL and NewSQL

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759

More information