NOSQL DATABASE SYSTEMS
|
|
|
- Brent Horn
- 10 years ago
- Views:
Transcription
1 NOSQL DATABASE SYSTEMS Big Data Technologies: NoSQL DBMS - SoSe
2 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe
3 NoSQL Database Systems NoSQL Considered Categories of NoSQL Database Systems Key-Value Database Systems Document Database Systems Column Family Database Systems Big Data Technologies: NoSQL DBMS - SoSe
4 Key-Value Database Systems NoSQL Data Model Key-value pairs Unique keys Values arbitrary type (serialized byte arrays) or strings, lists, sets, ordered sets (of strings) Schema-free key key key key key value value value value value Storage Layout Hash-Maps, B-Trees, Indexes Primary indexes (Hash, B-tree) on key Secondary indexes on values? Big Data Technologies: NoSQL DBMS - SoSe
5 Key-Value Database Systems (Cont.) NoSQL Query Models Simple API set (key, value) value = get (key) delete (key) Operations on values? More complex operations Language Bindings MapReduce later in this chapter key key key key key value value value value value Systems Oracle Berkeley DB (mid-90s) Caches (EHCache, Memcache) Amazon Dynamo/S3, Redis, Riak, Voldemort, Big Data Technologies: NoSQL DBMS - SoSe
6 Document Store Database Systems NoSQL Data Model Key-value pairs with documents as value Document format: JSON or BSON (Binary JSON) Loosely structured name(key)-value pairs Hierarchical Additionally, MongoDB uses collections arbitrary documents could be grouped together documents in a collection should be similar to facilitate effective indexing { } "id": 1, "name": football boot", "price": 199, "stock": { "warehouse": 120, "retail": 10 } Storage Layout B-Trees to store the documents MongoDB: Documents in a single collection are stored together Big Data Technologies: NoSQL DBMS - SoSe
7 Document Store Database Systems (Cont.) NoSQL Indexes Primary indexes on documentid (key) Secondary indexes on JSON-names Default or user defined Composite indexes may be supported Query Models Simple API: set/get/delete Further query support differ widely Powerful ad-hoc queries with integrated query language (MongoDB) No ad-hoc queries, predefined views with indexes only (CouchDB & Couchbase) Language Bindings MapReduce later in this chapter Systems MongoDB, CouchDB, Couchbase, { } "id": 1, "name": football boot", "price": 199, "stock": { "warehouse": 120, "retail": 10 } Big Data Technologies: NoSQL DBMS - SoSe
8 Column Family Database Systems NoSQL Data Model Loosely structured by columns and column families ( set of nested maps ) Column Family set of columns grouped together into a bundle Column families have to be predefined Column Not predefined; any type or data (can be nested) Table Column Family Column Family Row Key1 column column column column column Row Key2 column column column column Big Data Technologies: NoSQL DBMS - SoSe
9 Column Family Database Systems (Cont.) NoSQL Data Model (Cont.) Example: Row Key: title Column Family text Column Family revision "NoSQL" "Redis" text:content: "A NoSQL database provides a mechanism " text:content: "Redis is an open-source, networked " revision:author: "Mendel" revision:comment": "changed " revision:author: "Torben" revision:comment: "initial " Column family database systems support multiple versions of each cell by timestamps: Row Key: title Time Stamp Column Family text Column Family revision "NoSQL" t5 text:content: " " revision:author: "Mendel" revision:comment: "changed " t4 revision:author: "Torben" revision:comment: "there " "Redis" t3 text:content: " " revision:author: "Torben" revision:comment: "initial " Big Data Technologies: NoSQL DBMS - SoSe
10 Column Family Database Systems (Cont.) NoSQL Row Key: title Time Stamp Column Family text Column Family revision "NoSQL" t5 text:content: " " revision:author: "Mendel" revision:comment: "changed view " Storage Layout Data is stored by column family t4 Row Key: title Time Stamp Column Family text column: content NoSQL t5 A NoSQL database provides a mechanism Redis t3 Redis is an open-source, networked revision:author: "Torben" revision:comment: there should be " "Redis" t3 text:content: " " revision:author: "Torben" revision:comment: "initial " Row Key: title Time Stamp ColumnFamily revision column: author column: comment NoSQL t5 Mendel changed view NoSQL t4 Torben there should be Redis t3 Torben initial Big Data Technologies: NoSQL DBMS - SoSe
11 Column Family Database Systems (Cont.) NoSQL Classical example: Web table Row Key Time Stamp Column Family contents Column Family anchor "com.cnn.www" t9 anchor:anchor:"cnnsi.com anchor:anchortext:"cnn" t8 t6 t5 "<html> " "<html> " anchor:anchor:"my.look.ch anchor:anchortext: "CNN.com" Big Data Technologies: NoSQL DBMS - SoSe
12 Column Family Database Systems (Cont.) NoSQL Query Models Simple API set (table, row, column, value) value = get (table, row, column) delete (table, row, column) timestamp optional Language Bindings More powerful query engines integrated (Cassandra Query Language) or as additional software products (e.g. Google App Engine / Google Datastore for BigTable, Hive for Data Warehousing on HBase) MapReduce later in this chapter Indexes Primary indexes (B-Trees sorted ordered) Default or user defined secondary indexes Systems Google BigTable, HBase, Cassandra, Amazon SimpleDB, Big Data Technologies: NoSQL DBMS - SoSe
13 NoSQL (Not only SQL): Definition NoSQL NoSQL Definition: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above. Source: S. Edlich, nosql-database.org Big Data Technologies: NoSQL DBMS - SoSe
14 NoSQL (Not only SQL): Definition NoSQL Next Generation Databases mostly addressing some of the points: non-relational schema-free simple API more complex APIs currently under development distributed and horizontally scalable easy replication support eventually consistent / BASE (not ACID) BASE as well as ACID are supported nowadays open-source??? Big Data Technologies: NoSQL DBMS - SoSe
15 NoSQL: The Essence NoSQL Data Model non-relational schema-free Scalability distributed and horizontally scalable easy replication support Big Data Technologies: NoSQL DBMS - SoSe
16 NoSQL Database Systems: Use Cases NoSQL Key-Value Database Systems Suitable Use Cases Storing Session Information User Profiles, Preferences Shopping Cart Data Examples Amazon (shopping carts) Temetra (meter data) Document Store Database Systems Event Logging Content Management Systems Blogging Platforms Web Analytics or Real-Time Analytics Forbes (CMS) MTV (CMS) Column Family Database Systems Event Logging Content Management Systems Blogging Platforms Google (web pages) Facebook (messaging) Twitter (places of interest) Big Data Technologies: NoSQL DBMS - SoSe
17 NoSQL Family Tree NoSQL Source: cloudant.com Big Data Technologies: NoSQL DBMS - SoSe
18 Solution Architectures (Examples) NoSQL Google Stack Hadoop Stack Source: Saake/Schallehn:2011 Big Data Technologies: NoSQL DBMS - SoSe
19 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe
20 Data Modeling id ti Object-relational impedance mismatch Example: blog, blogpost, comment, author Object-oriented modeling Mapping to relational database Big Data Technologies: NoSQL DBMS - SoSe
21 Data Modeling Decisions id ti Primary Decision: Embedding vs. Referencing However, to consider There are no join operations within NoSQL database systems! There are no distributed transactions within NoSQL! Advantages and Disadvantages of Embedding Advantages and Disadvantages of Referencing Martin Fowler: Aggregate-Oriented Modeling Big Data Technologies: NoSQL DBMS - SoSe
22 Data Modeling: Document Store DBS How to realize references? id ti Direction of references? Embedding: What about denormalization and redundancy? Big Data Technologies: NoSQL DBMS - SoSe
23 Data Modeling: Column Family DBS id ti How to implement embedded objects in column family database systems? Variant 1: Using run-time named column qualifiers Variant 2: Using timestamps (or other id s) New (Cassandra CQL3): Using collection types (map, set, list) What about column families? Big Data Technologies: NoSQL DBMS - SoSe
24 Data Modeling id ti What about data modeling in key-value database systems? Data Modeling: Conclusion More degrees of freedom Embedding vs. referencing Denormalization and redundancy Big Data Technologies: NoSQL DBMS - SoSe
25 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe
26 Application Development for NoSQL Simple command line APIs REST-API (Some) more powerful query languages / query engines Language Bindings Java, Ruby, C#, Python, Erlang, PHP, Perl, REST Thrift Big Data Technologies: NoSQL DBMS - SoSe
27 Application Development for NoSQL Example: title, content from blogpost with id = 042 // HBase get 'blogposts', '042', { COLUMN => ['blogpost_data:title', 'blogpost_data:content'] } // Cassandra SELECT title, content FROM blogposts WHERE id = '042'; // MongoDB db.blogposts.find( { _id : '042' }, { title: 1, content: 1 } ) // Couchbase function (doc) { if (doc._id == '042') { emit(doc._id, [doc.title, doc.content]); } } Big Data Technologies: NoSQL DBMS - SoSe
28 Application Development for NoSQL Challenge Big data Data distributed over several hundred notes (remember: scale out) Data-to-Code or Code-to-Data? Executing jobs in parallel over several nodes There is a need for appropriated algorithms and frameworks! Big Data Technologies: NoSQL DBMS - SoSe
29 MapReduce: Basic Idea Old idea from functional programming (LISP, ML, Erlang, Scala etc.) Divide tasks into small discrete tasks and run them in parallel Never change original data (pipe concept) Different operations on the same data do not influence No concurrency conflicts No deadlocks No race conditions MapReduce Basic idea and framework introduced by Google 2004: J. Dean and S. Gehmawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI' Big Data Technologies: NoSQL DBMS - SoSe
30 MapReduce: Basic Idea & WordCount Example Doc1 Doc2 Doc3 Doc4 Developers should implement two primary methods Map: (key1, val1) [(key2, val2)] Reduce: (key2, [val2]) [(key3, val3)] Documents Sport, Handball, Soccer Soccer, FIFA Documents Sport, Gym, Money Soccer, FIFA, Money MAP MAP Key Sport 1 Handball 1 Soccer 1 Value Soccer 1 Key Value FIFA 1 Sport 1 Gym 1 Money 1 Soccer 1 FIFA 1 Money 1 REDUCE REDUCE Key Sport 2 Handball 1 Soccer 3 Key FIFA 2 Gym 1 Money 2 Value Value Big Data Technologies: NoSQL DBMS - SoSe
31 MapReduce: Architecture and Phases Source: Big Data Technologies: NoSQL DBMS - SoSe
32 Hadoop Example Map & Reduce Functions (Example) public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { public void map(longwritable key, Text value, OutputCollector<Text, IntWritable> output, ) { String line = value.tostring(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasmoretokens()) { word.set(tokenizer.nexttoken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, ) { int sum = 0; while (values.hasnext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } Source: Big Data Technologies: NoSQL DBMS - SoSe
33 MapReduce: Optional Combine Phase Decrease the shuffling cost Reduce the result size of map functions Perform reduce-like function in each machine Documents Sport, Handball, Soccer Soccer, FIFA Documents Sport, Gym, Money Soccer, FIFA, Money MAP MAP Key Sport 1 Handball 1 Soccer 1 Value Soccer 1 Key FIFA Value 1 Sport 1 Gym 1 Money 1 Soccer 1 FIFA 1 Money 1 COMBINE COMBINE Key Value Sport 1 Handball 1 Soccer 2 FIFA 1 Key Value Sport 1 Gym 1 Money 2 Soccer 1 FIFA 1 REDUCE REDUCE Big Data Technologies: NoSQL DBMS - SoSe
34 MapReduce Frameworks MapReduce frameworks take care of Scaling Fault tolerance (Load balancing) MapReduce Frameworks Google (however, Google now promotes Dataflow) Apache Hadoop standalone or integrated in NoSQL (and SQL) DBMS Also commercial distributors: Cloudera, MapR, HortonWorks, Proprietary MapReduce framework integrated in NoSQL DBMS Big Data Technologies: NoSQL DBMS - SoSe
35 Map Reduce and Query Languages MapReduce paradigm is too low-level Only two declarative primitives (map + reduce) Custom code for simple operations like projection and filtering Code is difficult to reuse and maintain Combination of high-level declarative querying and low-level programming with MapReduce Dataflow Programming Languages HiveQL Pig (Jaql) Big Data Technologies: NoSQL DBMS - SoSe
36 Hadoop Stack Source: Saake/Schallehn:2011 Big Data Technologies: NoSQL DBMS - SoSe
37 HiveQL Hive: data warehouse infrastructure built on top of Hadoop, providing: Data Summarization Ad hoc querying Simple query language: HiveQL (based on SQL) Extendable via custom mappers and reducers Developed by Facebook, now subproject of Hadoop Big Data Technologies: NoSQL DBMS - SoSe
38 HiveQL: Example Source: Saake/Schallehn: Data Management in the Cloud, 2011 Big Data Technologies: NoSQL DBMS - SoSe
39 Pig A platform for analyzing large data sets Pig consists of two parts: PigLatin: A Data Processing Language Pig Infrastructure: An Evaluator for PigLatin programs Pig compiles Pig Latin into physical plans Plans are to be executed over Hadoop Interface between the declarative style of SQL and low-level, procedural style of MapReduce Big Data Technologies: NoSQL DBMS - SoSe
40 Pig: Example Source: Saake/Schallehn: Data Management in the Cloud, 2011 Big Data Technologies: NoSQL DBMS - SoSe
41 MapReduce in Practice VLDB 2012: Chen, Alspaugh, Katz: Interactive Analytical Processing in Big Data Systems: A CrossIndustry Study of MapReduce Workloads: 7 Hadoop deployments (Cloudera = Hadoop with commercial services) Big Data Technologies: NoSQL DBMS - SoSe
42 MapReduce in Practice (Cont.) Source: Chen, Alspaugh, Katz. Interactive Analytical Processing in Big Data Systems: A CrossIndustry Study of MapReduce Workloads; VLDB2012 Big Data Technologies: NoSQL DBMS - SoSe
43 MapReduce Trends Hadoop 2.0 with YARN (Abstract from MapReduce) Source: HortonWorks Apache In-Memory Hadoop Performance! Written in Scala Big Data Technologies: NoSQL DBMS - SoSe
44 Application Development for NoSQL MapReduce: Concept and Frameworks State of the art application development With relational database systems: Object-Relational Mapping (ORM) frameworks and standards (Java Persistence API etc.) Frameworks for Object-NoSQL mapping?! Big Data Technologies: NoSQL DBMS - SoSe
45 Object-NoSQL Mapper: Architecture Applikation id tit SELECT b.titel, b.text FROM blogpost b WHERE b.id = 042 Objekt-NoSQL Mapper id tit db.blogposts.find ( { _id : 042 }, { titel: 1, text: 1 } ) SELECT titel, text FROM blogposts WHERE id = 042 ; get blogposts, 042, { COLUMN => [ blogpost_daten:titel, blogpost_daten:text ] } { } "id" : "042", "titel" :... id titel 042 id titel 042 MongoDB Cassandra HBase Big Data Technologies: NoSQL DBMS - SoSe
46 Object-NoSQL Mapper: Market Overview Mapper for different Programming Languages Java,.NET, Python, Ruby Volatile Market Main Focus: Object-NoSQL Mapper for Java Standardization: Java Persistence API (JPA) with Java Persistence Query Language (JPQL) Categorization Multi Data Store Mapper Single Data Store Mapper Big Data Technologies: NoSQL DBMS - SoSe
47 Java Multi Data Store Mapper Support for Document Store, Column Family, and Graph Database Systems in Java Multi Data Store Mapper Document Store Couchbase Data Nucleus Eclipse Link Hibernate OGM Kundera CouchDB PlayORM MongoDB Column-Family DBMS Cassandra HBase Graph DBMS Neo4J Spring Data Big Data Technologies: NoSQL DBMS - SoSe
48 Java Multi Data Store Mapper Support for Key-Value Database Systems in Java Multi Data Store Mapper Key-Value DBMS AmazonDynamoDB Apache Solr Ehcache Data Nucleus Eclipse Link Hibernate Kundera PlayORM Elasticsearch GemFire Infinispan Oracle NoSQL Redis Spring Data Big Data Technologies: NoSQL DBMS - SoSe
49 Java Object-NoSQL Mapper: Supported Functionality Single Data Store Mapper *Limited functionality (depending from the underlying NoSQL data store) Source: Störl/Hauf/Klettke/Scherzinger: Schemaless NoSQL Data Stores Object-NoSQL Mappers to the Rescue? BTW 2015, Hamburg, March 2015 Big Data Technologies: NoSQL DBMS - SoSe
50 Object-NoSQL Mapper: Query Language Support Challenge: Different Query Language Interfaces Examples: Most systems do not support any JOINS Many systems do not offer aggregate functions, LIKE operator, or NOT operator, Approaches 1. Offer only the particular subset of features that is implemented by all supported NoSQL data stores, i.e. the intersection of features 2. Distinguish by data store and offer only the set of features implemented by a particular NoSQL data store 3. Offer the same set of features for all supported NoSQL data stores, possibly complementing missing features by implementing them inside the Object-NoSQL Mapper Big Data Technologies: NoSQL DBMS - SoSe
51 Object-NoSQL Mapper: Query Language Support Approach 2: NoSQL data store specific support of JPQL operators Drawback: restricted portability Systems: Hibernate OGM, Kundera, EclipseLink Example: JPQL operators (selection) in Kundera Big Data Technologies: NoSQL DBMS - SoSe
52 Object-NoSQL Mapper: Query Language Support Approach 2: NoSQL data store specific support of JPQL operators Extension: Use third-party libraries to offer more functionality for some but not for all supported NoSQL data stores Systems: Hibernate OGM (Hibernate Search), Kundera each with Apache Lucene Application Object-NoSQL Mapper NoSQL-DBMS Search Engine Index Big Data Technologies: NoSQL DBMS - SoSe
53 Object-NoSQL Mapper: Query Language Support Approach 3: Offer the same set of features for all supported NoSQL data stores Complementing missing features by implementing them inside the Object-NoSQL Mapper Benefit: Portability Drawback: Performance Application Systems: DataNucleus, Hibernate OGM (announced) Object-NoSQL Mapper NoSQL-DBMS Big Data Technologies: NoSQL DBMS - SoSe
54 Object-NoSQL Mapper: Query Language Support Outlook: Combination of Approach 2 and 3 Application Object-NoSQL Mapper NoSQL-DBMS Search Engine Index Systems: Hibernate OGM (announced) Big Data Technologies: NoSQL DBMS - SoSe
55 Conclusion: Java Object-NoSQL Mapper Vendor Independency / Portability Standardized Query Language (JPQL) Support for different NoSQL data stores Supported query operators often depend on the capabilities of the underlying NoSQL data stores Performance (as of end of 2014) In reading data, there is only a small gap between native access and the Object-NoSQL Mappers for the majority of the evaluated products Yet in writing, object mappers introduce a significant overhead Further reading: U. Störl, Th. Hauf, M. Klettke and S. Scherzinger: Schemaless NoSQL Data Stores Object-NoSQL Mappers to the Rescue? BTW 2015, Hamburg, March 2015 Big Data Technologies: NoSQL DBMS - SoSe
56 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application Development Scalability, Availability and Consistency Partitioning, Replication Consistency Models and Transactions Select the Right DBMS Performance and Benchmarks Polyglot Persistence Big Data Technologies: NoSQL DBMS - SoSe
NOSQL DATABASE SYSTEMS
NOSQL DATABASE SYSTEMS Big Data Technologies: NoSQL DBMS - SoSe 2015 1 Categorization NoSQL Data Model Storage Layout Query Models Solution Architectures NoSQL Database Systems Data Modeling id ti Application
Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015
Big Data Technologies Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015 Situation: Bigger and Bigger Volumes of Data Big Data Use Cases Log Analytics (Web Logs, Sensor
Cloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
NoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch November 11, 2013 10-11-2013 1
Big Data Management Big Data Management (BDM) Autumn 2013 Povl Koch November 11, 2013 10-11-2013 1 Overview Today s program 1. Little more practical details about this course 2. Recap from last time (Google
Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05
Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Preparing Your Data For Cloud
Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Hadoop WordCount Explained! IT332 Distributed Systems
Hadoop WordCount Explained! IT332 Distributed Systems Typical problem solved by MapReduce Read a lot of data Map: extract something you care about from each record Shuffle and Sort Reduce: aggregate, summarize,
Structured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13
Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13 Astrid Rheinländer Wissensmanagement in der Bioinformatik What is Big Data? collection of data sets so large and complex
CS54100: Database Systems
CS54100: Database Systems Cloud Databases: The Next Post- Relational World 18 April 2012 Prof. Chris Clifton Beyond RDBMS The Relational Model is too limiting! Simple data model doesn t capture semantics
You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team [email protected]
Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team [email protected] Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since Feb
HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367
HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive
Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)
Big Data Management in the Clouds Alexandru Costan IRISA / INSA Rennes (KerData team) Cumulo NumBio 2015, Aussois, June 4, 2015 After this talk Realize the potential: Data vs. Big Data Understand why we
NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect
How To Write A Database Program
SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store
Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
HPCHadoop: MapReduce on Cray X-series
HPCHadoop: MapReduce on Cray X-series Scott Michael Research Analytics Indiana University Cray User Group Meeting May 7, 2014 1 Outline Motivation & Design of HPCHadoop HPCHadoop demo Benchmarking Methodology
Internals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
Comparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
Introduction to MapReduce and Hadoop
Introduction to MapReduce and Hadoop Jie Tao Karlsruhe Institute of Technology [email protected] Die Kooperation von Why Map/Reduce? Massive data Can not be stored on a single machine Takes too long to process
NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE <[email protected]> @GEOFFLANE
NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF LANE @GEOFFLANE WHAT IS NOSQL? NON-RELATIONAL DATA STORAGE USUALLY SCHEMA-FREE ACCESS DATA WITHOUT SQL (THUS... NOSQL) WIDE-COLUMN / TABULAR
What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea
What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea Overview Riding Google App Engine Taming Hadoop Summary Riding
Hadoop: Understanding the Big Data Processing Method
Hadoop: Understanding the Big Data Processing Method Deepak Chandra Upreti 1, Pawan Sharma 2, Dr. Yaduvir Singh 3 1 PG Student, Department of Computer Science & Engineering, Ideal Institute of Technology
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
Hadoop and Eclipse. Eclipse Hawaii User s Group May 26th, 2009. Seth Ladd http://sethladd.com
Hadoop and Eclipse Eclipse Hawaii User s Group May 26th, 2009 Seth Ladd http://sethladd.com Goal YOU can use the same technologies as The Big Boys Google Yahoo (2000 nodes) Last.FM AOL Facebook (2.5 petabytes
Slave. Master. Research Scholar, Bharathiar University
Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Big Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 3. Apache Hadoop Doc. RNDr. Irena Holubova, Ph.D. [email protected] http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Apache Hadoop Open-source
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
.NET User Group Bern
.NET User Group Bern Roger Rudin bbv Software Services AG [email protected] Agenda What is NoSQL Understanding the Motivation behind NoSQL MongoDB: A Document Oriented Database NoSQL Use Cases What is
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Moving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
Open Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
Advanced Data Management Technologies
ADMT 2014/15 Unit 15 J. Gamper 1/44 Advanced Data Management Technologies Unit 15 Introduction to NoSQL J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE ADMT 2014/15 Unit 15
Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12
Introduction to NoSQL Databases and MapReduce Tore Risch Information Technology Uppsala University 2014-05-12 What is a NoSQL Database? 1. A key/value store Basic index manager, no complete query language
MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15
MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You
Dominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
Enterprise Operational SQL on Hadoop Trafodion Overview
Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development
Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Hadoop Ecosystem Overview of this Lecture Module Background Google MapReduce The Hadoop Ecosystem Core components: Hadoop
Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu
Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects
Infrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Hadoop Framework. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN
Hadoop Framework technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Warning! Slides are only for presenta8on guide We will discuss+debate addi8onal
Introduc)on to the MapReduce Paradigm and Apache Hadoop. Sriram Krishnan [email protected]
Introduc)on to the MapReduce Paradigm and Apache Hadoop Sriram Krishnan [email protected] Programming Model The computa)on takes a set of input key/ value pairs, and Produces a set of output key/value pairs.
Word count example Abdalrahman Alsaedi
Word count example Abdalrahman Alsaedi To run word count in AWS you have two different ways; either use the already exist WordCount program, or to write your own file. First: Using AWS word count program
Open source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Lecture 10: HBase! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Sentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
NoSQL Databases. Polyglot Persistence
The future is: NoSQL Databases Polyglot Persistence a note on the future of data storage in the enterprise, written primarily for those involved in the management of application development. Martin Fowler
Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft
Big Data Architectures Tom Cahill, Vice President Worldwide Channels, Jaspersoft Jaspersoft + Big Data = Fast Insights Success in the Big Data era is more than about size. It s about getting insight from
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
Applications for Big Data Analytics
Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:
Xiaoming Gao Hui Li Thilina Gunarathne
Xiaoming Gao Hui Li Thilina Gunarathne Outline HBase and Bigtable Storage HBase Use Cases HBase vs RDBMS Hands-on: Load CSV file to Hbase table with MapReduce Motivation Lots of Semi structured data Horizontal
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
Big Data Analytics* Outline. Issues. Big Data
Outline Big Data Analytics* Big Data Data Analytics: Challenges and Issues Misconceptions Big Data Infrastructure Scalable Distributed Computing: Hadoop Programming in Hadoop: MapReduce Paradigm Example
The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL [email protected] / @marcua
The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL [email protected] / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in
NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases
NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases Background Inspiration: postgresapp.com demo.beatstream.fi (modern desktop browsers without
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
The Hadoop Eco System Shanghai Data Science Meetup
The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra [email protected] Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
wow CPSC350 relational schemas table normalization practical use of relational algebraic operators tuple relational calculus and their expression in a declarative query language relational schemas CPSC350
Big Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform interactive execution of map reduce jobs Pig is the name of the system Pig Latin is the
Big Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace
Introduction to Polyglot Persistence Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace FOSSCOMM 2016 Background - 14 years in databases and system engineering - NoSQL DBA @ ObjectRocket
Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart
Hadoop/MapReduce Object-oriented framework presentation CSCI 5448 Casey McTaggart What is Apache Hadoop? Large scale, open source software framework Yahoo! has been the largest contributor to date Dedicated
Introduc8on to Apache Spark
Introduc8on to Apache Spark Jordan Volz, Systems Engineer @ Cloudera 1 Analyzing Data on Large Data Sets Python, R, etc. are popular tools among data scien8sts/analysts, sta8s8cians, etc. Why are these
Introduction to Apache Cassandra
Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating
Qsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН
Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН Zettabytes Petabytes ABC Sharding A B C Id Fn Ln Addr 1 Fred Jones Liberty, NY 2 John Smith?????? 122+ NoSQL Database
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
Jeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
Big Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform Pig is the name of the system Pig Latin is the provided programming language Pig Latin is
How To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
TRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Map Reduce & Hadoop Recommended Text:
Big Data Map Reduce & Hadoop Recommended Text:! Large datasets are becoming more common The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately
Data Management in the Cloud
Data Management in the Cloud Ryan Stern [email protected] : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
