Sherpa: Cloud Computing of the Third Kind
|
|
- April Mosley
- 8 years ago
- Views:
Transcription
1 Sherpa: Cloud Computing of the Third Kind Raghu Ramakrishnan Yahoo! and Platform Engineering Team
2 What s in a Name? Data Intensive Super Scalable Computing Grid Computing Super Computing Cloud Computing Parallel Database Management Systems Distributed Database Management Systems Vary across: Workload, Programming model, Ownership model, Architectural trade-offs - 2 -
3 Cloud Computing: Computing as a Service Packaged Software Cloud Computing CPU Intensive Data Intensive High-throughput E.g., Condor Transactional Storage & Serving E.g., PNUTS, S3, SSDS, UDB Analytic E.g., SSDS, Hadoop - 3 -
4 Trivia Question What s the world s most widely used parallel programming language? - 4 -
5 Why Not Use an RDBMS for Analytics? RDBMS provides too much ACID transactions Complex query language Lots and lots of knobs to turn RDBMS provides too little Lots of optimization and tuning possible for analytics E.g., Column stores, bit-map indexes Flexible programming model E.g., Group By vs. Map-Reduce; multi-dimensional OLAP But many good ideas to borrow! Declarative language; parallelization and optimization techniques; value of data consistency - 5 -
6 Why Not Use an RDBMS for OLTP? RDBMS provides too much ACID transactions Complex query language Lots and lots of knobs to turn RDBMS provides too little Lack of (cost-effective) scalability, availability Not enough schema/data type flexibility RDBMS and Sherpa aim for different parts of the space RDBMS: Heavyweight, strongly consistent OLTP Sherpa: Lightweight but massive scale, relaxed consistency OLTP - 6 -
7 I want a big, virtual database What I want is a robust, high performance virtual relational database that runs transparently over a cluster, nodes dropping in and out of service at will, read-write replication and data migration all done automatically. I want to be able to install a database on a server cloud and use it like it was all running on one machine. -- Greg Linden s blog, 2006 We re building a hosted version of such a system - 7 -
8 An Example Web App Heavy use of simple database operations Updates uploads tags as flower» Your Photos Queries» Photos tagged as flower» Friend activity Sonja uploaded Brandon tagged a photo - 8 -
9 Why Hosted? simple API Rapid application development On-demand scaling DBA functions amortized across applications - 9 -
10 Rapid Application Development What does it take to get the Next Great Thing off the ground? Now: Set up multiple replicas of a clustered data store Set up a system for indexing Set up a system for caching Set up auxiliary DBMS instances for reporting, etc. Set up the feeds and messaging between them Write the application logic Fairly complex system at first line of new code Our vision: Write the application logic Use a hosted infrastructure to store and query your data Or, as Joshua Shachter puts it: The next cool thing shouldn t take a team of 30, it should be three guys, PHP and a long weekend
11 Implications Data management as a service Scientists and others who ve resisted (installing, maintaining, and) using DBMSs will find it much easier to reap the benefits Data centers and Computing Centers will come into vogue again The Web is becoming open E.g., OpenSocial, OpenID Hosted back-ends and RAD tools will make Web application development accessible to all Ideas will be the most valuable currency, not the wherewithal to build complex systems Paradigm shifts possible for how we do research in many fields: Build applications that embed your algorithms and test them directly in the field Computer Scientists can interact directly with users (ironically, this would still be a breakthrough of sorts after four decades!) Many other disciplines (e.g., Sociology, microeconomics) can design and conduct online experiments involving unprecedented numbers of participants
12 PNUTS: DB in the Cloud A E B W C W D E E C F E Indexes Indexes and and views views A E B W C W D E E C F E Parallel Parallel database database CREATE TABLE Parts (( ID ID VARCHAR, StockNumber INT, Status VARCHAR )) A E B W C W D E E C F E Geographic Geographic replication replication Structured, Structured, flexible flexible schema schema Hosted, Hosted, managed managed infrastructure infrastructure
13 Sherpa Data Services Applications PNUTS Services Query planning and execution Index maintenance YCA: Authorization Distributed infrastructure for tabular data Data partitioning Update consistency Replication YDOT FS Ordered tables YDHT FS Hash tables YMB Pub/sub messaging Zookeeper Consistency service
14 Guiding Principles for PNUTS Reliable and robust storage Replication for fault tolerance Predictable consistency guarantees Simple to use Simple operations set Minimal client configuration Service-level authentication Flexible schemas Highly Scalable / Performant Partitioning data over many machines Horizontal scaling at every level Data is local to its usage Predictable performance via quality of service levels Predicates evaluated on back end Cheaper consistency guarantees than full ACID Multiple rich access methods Hash and ordered table types System-maintained secondary indexes Optimization for complex access patterns Rapid provisioning of new storage Simple, automated cluster growth Cheap table creation Pay as you grow, grow big as you need Operationally cheap Automated failover Automated load balancing No single points of failure Hosted platform
15 Data Model and Retrieval YDOT/YDHT Data model: Key value dictionary Value can be packed with multiple attributes YDHT operations: Hash table calls Get Set (insert and update) Remove Scan YDOT: YDHT + ordered ranges PNUTS Data model: Relational tables with flexible schema Typed, declared attributes Fast addition of new attributes Operations: PNUTS query language Point lookup Range queries Insert/Update/Remove Complex predicates Ordering Top-K Primary API is web services (JSON over HTTP) Client libraries for various languages (PHP, C++, Java, )
16 YDHT Scalable distributed record store Optimized for small reads and writes Focus on ease of operations, multi-region redundancy, organic scalability Storage as a service Clients Tablet Controller Routers Storage servers
17 Ways to use YDHT As a primary store APP YDHT As a materialized view/cache APP YDHT Primary store As part of PNUTS! APP PNUTS YDHT
18 Data Concepts YDHT Table Primary key Record Tablet Grape Lime Apple Strawberry Orange Avocado Lemon Tomato Banana Kiwi Grapes are good to eat Limes are green Apple is wisdom Strawberry shortcake Arrgh! Don t get scurvy! But at what price? How much did you pay for this lemon? Is this a vegetable? The perfect fruit New Zealand Fields
19 Data Concepts YDOT Ordered by primary key Tablets contain clustered ranges Apple Avocado Banana Grape Kiwi Lemon Lime Orange Strawberry Tomato Apple is wisdom But at what price? The perfect fruit Grapes are good to eat New Zealand How much did you pay for this lemon? Limes are green Arrgh! Don t get scurvy! Strawberry shortcake Is this a vegetable?
20 YDOT Ordered Table Store YDOT provides clustered, ordered retrieval of records Apple Avocado Banana Blueberry Grapefruit Pear? Canteloupe Grape Kiwi Lemon Lime Mango Orange Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Router Lime Pear? Grapefruit Lime? Apple Strawberry Avocado Tomato Banana Watermelon Blueberry Strawberry Tomato Watermelon Lime Mango Orange Canteloupe Grape Kiwi Lemon Storage unit 1 Storage unit 2 Storage unit
21 Data Concepts PNUTS Schema: declared, typed fields Name Description Price Apple Apple is wisdom $1 Avocado But at what price? $3 Banana The perfect fruit $2 Grape Grapes are good to eat $12 Kiwi New Zealand $8 Retains tablet structure of YDHT/YDOT Lemon Lime How much did you pay for this lemon? Limes are green $1 $9 Orange Arrgh! Don t get scurvy! $2 Strawberry Strawberry shortcake $900 Tomato Is this a vegetable? $
22 Flexible Schema Primary table Posted date Listing id Item Price 6/1/ Couch $570 6/1/ Bike $86 6/3/ Car $1123 6/5/ Lamp $15 Color Red Condition Good Fair
23 Asynchronous Replication
24 Mastering A E B W C W D E E C F E Tablet master A E B W C W D E E C F E A E B W C W D E E C F E
25 Basic Consistency Model Goal: Make it easier for applications to reason about updates and cope with asynchrony alternative to transactions in an asynchronous world What happens to a record with primary key Brian? Record inserted Update Update Delete Record Update inserted Update Update Delete Record inserted Delete v. 1 v. 2 v. 3 Generation 1 v. 1 v. 2 v. 3 v. 4 Generation 2 v. 1 Generation 3 Time Guarantees: Every reader will always see some consistent, but possibly stale version Readers can request a more up-to-date version, but may pay extra latency Special case: Critical read (writer/readers see their own writes) Writers can verify that the record is still at the version they expect
26 Distribution 6/1/ /1/ Couch $570 Data Distribution shuffling for for load parallelism load balancing Car $1123 6/2/ /5/ /7/ /9/ /11/ /11/ Bike $86 Chair $10 Lamp $19 Bike $56 Scooter $18 Hammer $8000 Server 1 Server 2 Server 3 Server
27 Tablet Splitting and Balancing Each Each storage unit unit has has many many tablets tablets Storage unit unit may may become a hotspot hotspot Overfull tablets tablets split split Tablets Tablets may may grow grow over over time time Shed Shed load load by by moving moving tablets tablets to to other other servers servers
28 Architecture Data-path components Clients Each can can be be scaled horizontally Tablet map Load balancer Server monitor Tablet controller Routers WS API YMB SU API Storage units Cluster 1 Cluster 2 Query processing
29 Yahoo! Message Broker (YMB) Pub/sub based on reliable logging Topic-based Persistent subscriptions Multi-region presence Guarantees In the presence of at most one YMB machine failure: Published messages will be delivered on live subscriptions system-wide Messages published in one region will be delivered to all subscribers in the order they were published (partial order) Published messages available for re-delivery until subscriber calls consume() If there are two machine failures: Subscribers will be notified of broken subscription Since messages may have been lost Uses in YDHT/PNUTS Reliably replicate data and updates between regions Reliably communicate coordination/synchronization message between distributed actors Reliably log to-do actions for individual actors
30 Quality of Service Hosted platform supporting multiple applications And eventually, multi-tenancy! Inter-application isolation Applications run on leased servers Performance is as good as those servers give you Unaffected by other applications Some shared infrastructure Overprovisioned to ensure performance agreements Intra-application isolation How to share my data without hurting my app s performance? Gold versus best-effort access Best-effort may be interrupted to serve gold requests
31 BigTable BigTable overview Rows and columns abstraction with flexible schemas and data versioning, range scans Built on top of GFS Things BigTable emphasizes that we don t (for now, anyway) Keeping multiple versions Tight integration with MapReduce Things we emphasize that BigTable doesn t Asynchrony Geographic replication Indexing
32 Dynamo Dynamo overview Highly write available data store Uses gossip and eventual consistency: can write anywhere, eventually update will propagate to all replicas PNUTS versus Dynamo Dynamo is a hash table; PNUTS is both hashed and ordered Eventual consistency model exposes dirty data PNUTS can operate in high availability or high consistency mode Gossip is not tuned for geographic replication No record structure or indexes in Dynamo
33 Summary Hosted data management is a new frontier Beyond the issues we discussed, many novel aspects that arise because of hosting (e.g., multi-tenancy) Paradigm shift that goes beyond the technology (e.g., new kinds of usage, new business models) Formulas for new research problem: Old research problem + fine-grained asynchrony Old research problem + hosted service model Formulas for solutions? None so far, but lots of good ideas in the old solutions!
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationCloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal
Cloud data store services and NoSQL databases Ricardo Vilaça Universidade do Minho Portugal Context Introduction Traditional RDBMS were not designed for massive scale. Storage of digital data has reached
More informationCloud Data Management @ Yahoo!
Cloud Data Management @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Brian Cooper Adam Silberstein Yahoo! Research Joint work with the Sherpa team in Cloud Computing 1 Cloud
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationWhere We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
More informationThe Sierra Clustered Database Engine, the technology at the heart of
A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel
More informationPerspectives on Cloud Computing
Perspectives on Cloud Computing Raghu Ramakrishnan Yahoo! Fellow Chief Scientist, Audience and Cloud Computing (Many slides courtesy of others at Yahoo!) 1 Outline Several applications Some takeaways on
More informationF1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013
F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords
More informationNoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationReferences. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline
References Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of
More informationIntroduction to Database Systems CSE 444
Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon References Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationIntegrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
More informationNon-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF
Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationthese three NoSQL databases because I wanted to see a the two different sides of the CAP
Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the
More informationDesign and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationFacebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election
More informationData Management in the Cloud -
Data Management in the Cloud - current issues and research directions Patrick Valduriez Esther Pacitti DNAC Congress, Paris, nov. 2010 http://www.med-hoc-net-2010.org SOPHIA ANTIPOLIS - MÉDITERRANÉE Is
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationOverview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
More informationCloudDB: A Data Store for all Sizes in the Cloud
CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationAmr El Abbadi. Computer Science, UC Santa Barbara amr@cs.ucsb.edu
Amr El Abbadi Computer Science, UC Santa Barbara amr@cs.ucsb.edu Collaborators: Divy Agrawal, Sudipto Das, Aaron Elmore, Hatem Mahmoud, Faisal Nawab, and Stacy Patterson. Client Site Client Site Client
More informationCassandra A Decentralized, Structured Storage System
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationLARGE-SCALE DATA STORAGE APPLICATIONS
BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort
More informationHosting Transaction Based Applications on Cloud
Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India
More informationData Management in the Cloud
Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationComparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
More informationApache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationWhite Paper. Optimizing the Performance Of MySQL Cluster
White Paper Optimizing the Performance Of MySQL Cluster Table of Contents Introduction and Background Information... 2 Optimal Applications for MySQL Cluster... 3 Identifying the Performance Issues.....
More informationChapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju
Chapter 7: Distributed Systems: Warehouse-Scale Computing Fall 2011 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note:
More informationPreview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
More informationStructured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
More informationNoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
More informationPNUTS: Yahoo! s Hosted Data Serving Platform
PNUTS: Yahoo! s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni Yahoo!
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationAffordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
More information16.1 MAPREDUCE. For personal use only, not for distribution. 333
For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several
More informationSQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationMongoDB Developer and Administrator Certification Course Agenda
MongoDB Developer and Administrator Certification Course Agenda Lesson 1: NoSQL Database Introduction What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL Types of NoSQL
More informationIn Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
More informationPreparing Your Data For Cloud
Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationextensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
More information2.1.5 Storing your application s structured data in a cloud database
30 CHAPTER 2 Understanding cloud computing classifications Table 2.3 Basic terms and operations of Amazon S3 Terms Description Object Fundamental entity stored in S3. Each object can range in size from
More informationA programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
More informationBuilding a Cloud for Yahoo!
Building a Cloud for Yahoo! Brian F. Cooper, Eric Baldeschwieler, Rodrigo Fonseca, James J. Kistler, P.P.S. Narayan, Chuck Neerdaels, Toby Negrin, Raghu Ramakrishnan, Adam Silberstein, Utkarsh Srivastava,
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
More informationLecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
More informationNoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011
NoSQL - What we ve learned with mongodb Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011 DW2.0 and NoSQL management decision support intgrated access - local v. global - structured v.
More informationMongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15
MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You
More informationInfrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
More informationSQL Azure vs. SQL Server
SQL Azure vs. SQL Server Authors Dinakar Nethi, Niraj Nagrani Technical Reviewers Michael Thomassy, David Robinson Published April 2010 Summary SQL Azure Database is a cloud-based relational database service
More informationSQL Server Administrator Introduction - 3 Days Objectives
SQL Server Administrator Introduction - 3 Days INTRODUCTION TO MICROSOFT SQL SERVER Exploring the components of SQL Server Identifying SQL Server administration tasks INSTALLING SQL SERVER Identifying
More informationTrafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
More informationORACLE DATABASE 10G ENTERPRISE EDITION
ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationPractical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
More informationf...-. I enterprise Amazon SimpIeDB Developer Guide Scale your application's database on the cloud using Amazon SimpIeDB Prabhakar Chaganti Rich Helms
Amazon SimpIeDB Developer Guide Scale your application's database on the cloud using Amazon SimpIeDB Prabhakar Chaganti Rich Helms f...-. I enterprise 1 3 1 1 I ; i,acaessiouci' cxperhs;;- diotiilea PUBLISHING
More informationYouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, 2015. Anthony Yeh, Software Engineer, YouTube. http://vitess.
YouTube Vitess Cloud-Native MySQL Oracle OpenWorld Conference October 26, 2015 Anthony Yeh, Software Engineer, YouTube http://vitess.io/ Spoiler Alert Spoilers 1. History of Vitess 2. What is Cloud-Native
More informationbigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
More informationElastic Application Platform for Market Data Real-Time Analytics. for E-Commerce
Elastic Application Platform for Market Data Real-Time Analytics Can you deliver real-time pricing, on high-speed market data, for real-time critical for E-Commerce decisions? Market Data Analytics applications
More informationArchitectures for massive data management
Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with
More informationDatabase Scalability {Patterns} / Robert Treat
Database Scalability {Patterns} / Robert Treat robert treat omniti postgres oracle - mysql mssql - sqlite - nosql What are Database Scalability Patterns? Part Design Patterns Part Application Life-Cycle
More informationIntroduction to NOSQL
Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo
More informationDistributed Systems. Tutorial 12 Cassandra
Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationTips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier
Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier Simon Law TimesTen Product Manager, Oracle Meet The Experts: Andy Yao TimesTen Product Manager, Oracle Gagan Singh Senior
More informationHypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
More informationUsing RDBMS, NoSQL or Hadoop?
Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest
More informationNoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
More informationA survey of big data architectures for handling massive data
CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - jordydomingos@gmail.com Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationDomain driven design, NoSQL and multi-model databases
Domain driven design, NoSQL and multi-model databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra
More informationCitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
More informationLecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
More informationSQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation
SQL Server 2014 New Features/In- Memory Store Juergen Thomas Microsoft Corporation AGENDA 1. SQL Server 2014 what and when 2. SQL Server 2014 In-Memory 3. SQL Server 2014 in IaaS scenarios 2 SQL Server
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationTutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
More informationApache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
More informationORACLE COHERENCE 12CR2
ORACLE COHERENCE 12CR2 KEY FEATURES AND BENEFITS ORACLE COHERENCE IS THE #1 IN-MEMORY DATA GRID. KEY FEATURES Fault-tolerant in-memory distributed data caching and processing Persistence for fast recovery
More informationHigh Availability Using MySQL in the Cloud:
High Availability Using MySQL in the Cloud: Today, Tomorrow and Keys to Success Jason Stamper, Analyst, 451 Research Michael Coburn, Senior Architect, Percona June 10, 2015 Scaling MySQL: no longer a nice-
More informationThe evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
More informationEvaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
More information