Scaling To 1 Billion Hits A Day. Chander Dhall [email protected]

Size: px
Start display at page:

Download "Scaling To 1 Billion Hits A Day. Chander Dhall Twitter @csdhall [email protected]"

Transcription

1 Scaling To 1 Billion Hits A Day Chander Dhall [email protected]

2 About Me Microsoft MVP Tech Ed Speaker Asp.NET Insider Web API Advisor Pluralsight Author Dev Chair - Dev Connections

3 About Me Conference Organizer - jssaturday Leader NodeLA user group Leader.NET user group at UTDallas Owner Chander Dhall, Inc. Conference Organizer MVPMIX.com Chander Tech Podcast

4 Free Resharper

5 Why? Amazon claim Just an extra 1/10 th of a second on their response times will cost them 1% in sales. Google ½ a second increase in latency caused traffic to drop by a fifth.

6 Theory of Scaling #devconnections

7 Practice of Scalability #devconnections

8 Agenda Why is it important to scale? Creating a scalable solution (in incremental steps) Propose an Architecture Identify Failures and Bottlenecks Identify Downtime Apply a better solution Repeat until we solve (in 10 steps) Then some bonus stuff (a better solution)

9 Unfortunate Solution Load Balancer S S S S S Services

10 Gilbert and Lynch white paper Network 1 Network 2 A Write Algorithm { name : Chander, gender : m } B Read Algorithm { name : Chander, gender : m }

11 Update Message SCALING TO 1 BILLION HITS A DAY Happy path scenario Network 1 Network 2 A Write Algorithm { name : Dhall, gender : m } B Read Algorithm { name : Chander, gender : m }

12 Happy path scenario Network 1 Network 2 A Write Algorithm { name : Dhall, gender : m } B Read Algorithm { name : Dhall, gender : m }

13 Update Message SCALING TO 1 BILLION HITS A DAY Network partitions Network 1 Network 2 A Write Algorithm { name : Dhall, gender : m } B Read Algorithm { name : Chander, gender : m }

14 CAP Theorem Consistency Availability Partitioning

15 Brewer s CAP Theorem Consistency (or more appropriately Atomic) Availability Partition Tolerance No set of failures less than total network failure is allowed to cause the system to respond incorrectly Gilbert & Lynch

16 Just FYI Consistency (in CAP theorem) Atomicity (in ACID) Consistency(in ACID) Means any transaction will bring the database from one valid state to another.

17 Fallacies of Distributed Computing Network is reliable. Latency is zero. Bandwidth is infinite. Network is secure. Topology doesn t change. There is one administrator.

18 Fallacies of Distributed Computing Transport cost is zero. Network is homogenous.

19 Why is Scalability Important Instant success Thanks to Social networking Twitter: 200 billion tweets per year Facebook: 1.23 billion active monthly users a month Billions of devices (desktops, tablets, mobile) Need: Millions of hits with Zero downtime

20 Why is Scalability Important The website was working great UNTIL we launched Instagram was down on the launch day

21 The Variables Scalability - Number of users / sessions / transactions / operations the entire system can handle Performance Optimal utilization of resources Responsiveness Time taken per operation Availability Probability of the application being available at any given point in time Downtime Impact - The impact of a downtime of a server/service/resource - number of users, type of impact etc

22 Major Factors Platform selection Hardware Application Design Database/Datastore Structure and Architecture Caching strategy Asynchronous processing Deployment Process and Architecture Monitoring mechanisms and more

23 Step 1 Appserver & DBServer App Server Database Server 23

24 Step 2 Vertical Scaling Appserver & DBServer App Server Database Server Throw more RAM and CPU

25 Step 2 - Vertical Scaling. or Scale up Increasing the hardware resources without changing the number of nodes Disadvantages Law of diminishing returns Downtime Increases Downtime Impact Incremental costs increase exponentially

26 Step 3 Vertical Partitioning (Services) Introduction Deploying each service on a separate node Advantages Increases Availability (per app) Easy to tune and optimize Reduces context switching Simple to implement App Server Db Server

27 Step 3 Vertical Partitioning (Services) Disadvantages Sub-optimal resource utilization May not increase overall availability Finite Scalability App Server Db Server

28 Vertical Partitioning Distribute the responsibilities. Increased number of nodes. Each node (or cluster) performs separate Tasks Each node (or cluster) is different from the other

29 Step 4 Horizontal Scaling Load Balancer DB Server

30 Horizontal Scaling Replication of nodes Nodes perform the same tasks Nodes are identical Scale out

31 Sticky Sessions Subsequent requests from a user are sent to the original server Asymmetrical load distribution Downtime Impact Loss of session data User 1 Load Balancer

32 Central Session Store Session store is a single point of failure Session reads and writes generate Disk + Network I/O Load Balancer Session Store Ap p S E R V E R

33 Clustered Session Management No Single point of failure Session reads are instantaneous Session writes generate Network I/O Increase in number of nodes increases Network I/O exponentially What happens when? User request arrives before intranode communication finished Intra-node communication fails Clustered Session Management Load Balancer

34 Recommendations Use scaled version of a Central Session Store (Recommended) Use Clustered Session Management ONLY if you have Smaller Number of App Servers Fewer Session writes Don t use sticky sessions if you want to scale

35 Load Balanced App Server Cluster Active-Active assumes that each LB is independently able to take up the load of the other Load Balancer Users Load Balancer

36 Step 5 Vertical Partitioning (Hardware) Load Balancer Load Balancer DB Server SAN

37 Step 5 Vertical Partitioning (Hardware) Advantages Allows Scaling Up the DB Server Boosts Performance of DB Server Disadvantages Increases Cost

38 Step 6 Horizontal Scaling (DB) Introduction Increasing the number of DB nodes Referred to as Scaling out the DB Server Options Shared nothing Cluster Real Application Cluster (or Shared Storage Cluster)

39 Step 6 Horizontal Scaling (DB) Load Balancer Load Balancer DB Server SAN

40 Step 6 Horizontal Scaling (DB) Load Balancer Load Balancer DB Replica DB Server DB Server DB Server SAN

41 Step 7 Vertical / Horizontal Partitioning (DB) Introduction Increasing the number of DB Clusters by dividing the data Options Vertical Partitioning - Dividing tables / columns Horizontal Partitioning - Dividing by rows (value)

42 Step 7 Vertical / Horizontal Partitioning (DB) Load Balancer Load Balancer DB Cluster DB Server DB Server DB Server SAN

43 Step 7 Vertical / Horizontal Partitioning (DB) App Cluster Vertical Partitioning Db Cluster 1 Db Cluster 2 Twitter Table Facebook Table Users Table Products Table

44 Step 7 Vertical / Horizontal Partitioning (DB) App Cluster Horizontal Partitioning 1st Million Users 2nd Million Users Db Cluster 1 Db Cluster 2 Twitter Table Facebook Table Twitter Table Facebook Table

45 Step 7 Vertical / Horizontal Partitioning (DB) Load Balancer Load Balancer Hash Map SAN DB Cluster DB Cluster DB DB DB DB DB DB

46 Step 8 Separating Sets Global Redirector Global Look up Hash Map Load Balancer Load Balancer Load Balancer Load Balancer Hash Map Hash Map DB Cluster DB Cluster DB Cluster DB Cluster DB DB DB DB DB DB DB DB DB DB DB DB Set 1-10 Million Users Set Million Users

47 Step 9 Caching Add caches within App Server Object Cache Session Cache API cache Page cache Software Memcached Redis Azure Cache (App Fabric)

48 Step 10 HTTP Accelerator A good HTTP Accelerator / Reverse proxy performs the following Redirect static content requests to a lighter HTTP server (lighttpd) Cache content based on rules Use Async Non blocking IO Maintain a limited pool of Keep-alive connections to the App Server Intelligent load balancing Solutions Nginx (HTTP / IMAP) Perlbal Hardware accelerators plus LBs

49 More Important Stuff CDNs IP Anycasting Async Nonblocking IO (for all Network Servers) If possible - Async Nonblocking IO for disk Incorporate multi-layer caching strategy where required L1 cache in-process with App Server L2 cache across network boundary L3 cache on disk Grid computing

50 Scalability and Performance SCALING TO 1 BILLION HITS A DAY NoSql Vs Relational Memcached Key Value Document Databases Relational Databases Depth of Functionality

51 NoSql vs Relational No Joins Do you need them though? Transactions RDBMS great for concurrency, integrity or data type validity.

52 Relational -> NoSql Ever increasing users. Scalability needs. Highly structured data to structured, semistructured and unstructured data. Advent of high speed data networking. Distributed computing. Cheap and plenty memory.

53 Relational -> NoSql

54 Scaling RDBMS RDBMS sharding Highly disruptive to re-shard. Lose benefits of relational model. Create and maintain schema on every server.

55 Scaling RDBMS Denormalizing Why use a RDBMS?

56 Scaling RDBMS Distributed caching for RDBMS (eg: memcached) Speed up reads only. Cold cache thrash. Management costs.

57 Relational -> NoSql Schemaless. Auto-sharding. Distributed querying. Integrated caching.

58 No SQL Types Key Value Ordered key value Wide Column Store Document Store/Full Text Search Graph DBs Object DBs

59 Key-Value Store Pros Simple. Programmer friendly. Powerful. Fast. Cons Key range support not good. Aggregation support lacking. Key Key Key Key Key Value Value Value Value Value

60 Ordered Key-Value Store Pros Processes key ranges. More powerful. Cons No framework for value modeling. Key Key Key Key Key Value Value Value Value Value

61 Big Table Key Value Pros Model values as maps of maps of maps. Key Value Key Value Cons Key Value Not appropriate for schemes arbitrary complexity. Key Value

62 Big Table Pros Model values as maps of maps of maps. Cons Not appropriate for schemes arbitrary complexity.

63 Big table Key Column family Key Column family

64 Document/Full-Text Pros Collection of documents which contain keyvalue collections. Natural data modeling. Programmer friendly. Web based. Mostly REST/Json friendly.

65 Document/Full text search databases Key Key Key Val Val Val Person : { name : Chander Dhall, address : { city : los angeles, state : CA, zip : } }

66 Graph databases Key Key Key Key

67 Step 12- Finalizing Caching Load Balancer Load Balancer Hash Map Offline Processing DB Cluster DB Cluster Master DB DB DB DB DB DB Master Slave Slave Slave Slave No Sql SAN Search Db

68 osql Paradigm - Denormalization Data duplication and denormalization First class citizens. Increases total data volume. Simplifies query processing esp. in a distributed environment.

69 NoSql Paradigm - Atomic Aggregates Checking Id Min bal Account Id Account No. Savings Id Interest rate Account { Type : Checking, Id : chk123, Min Bal : 10000, } Account { Type : Savings, Id : sav123, Interest Rate : 5%, }

70 No Sql Paradigm No joins Sql Joins query time. Hence, performance penalty. Handled in application instead.

71 No Sql Paradigm Enumerable keys Sequential Ids for composite keys eg. DeptId_employeeId. Group into buckets sorted by timestamp, day, week etc.

72 No sql paradigm Index table Employe e Id Details [email protected]; State: CA; Dept: IT [email protected]; State: TX; Dept: Sales State Employee Id Dept Employee Id 2234 CA 1234, , [email protected]; 1236, 1244 State: AL; Dept: IT IT 1234, 1235, 1236, 1244 TX 8000, 8100, 8235, 8266 Sales 8000, 8100, 8235, AL 2212, , [email protected]; 2234, 2256 State: WA; Dept: Sales Acc 2212, 2221, 2234, 2256

73 No sql paradigm Tree Index Country - USA State - CA City - LA { property : [{ facilityname : abc, facilityid : 111 }, { facilityname : xyz, facilityid : 222 }] } Properties Facilities

74 No sql paradigm Composite Key E M P L O Y E E S IT Employees Sales Employees IT: Software: 1123 IT: Software: 2323 IT: Hardware: 6767 Sales: Online: 832 Sales : Online: 423 Sales : Store : 556 EmpName: John; Address: Los Angeles EmpName: Kevin; Address: Dallas, TX EmpName: Matt; Address: San Francisco EmpName: Katie: Address: Austin, Tx EmpName: Karen: Address: Irvine, CA EmpName: Richard; Address: San Diego Dept= IT:* or Dept= Sales:Online:*

75 No sql paradigm - Grouping U123: O111 Product Ids: [ Surface, xbox ] U124:O123 U124:O234 U124:O999 U125:O789 U125:O945 Product Ids: [ Win 8, xbox ] Product Ids: [ Win phone, surface ] Product Ids: [ office, azure sub ] Product Ids: [ msdn, office ] Product Ids: [ surface, xbox ] Colocation of a users data.

76 nverted search & direct aggregation EmpId, dept, city,. Dept-IT: [111, 123, 234.] Dept-Sales:[673, 343, 434.] 111: Dept-Sales, City: LA 222: Dept-IT, City: Dallas. City: Dallas City: LA

77 No sql paradigm Materialized paths Electronics TV Phones Computers Cameras Samsung Apple LG LCD LED

78 No sql paradigm Materialized paths TV { entity : TV, category : Electronics } { entity : Samsung, category : Electronics, TV } Samsung Apple LG LCD LED { entity : Samsung, category : Electronics, TV, LCD }

79 No sql paradigm Nested sets Electronics TV Phones Samsung Sony Cell Landline

80 No sql paradigm Nested sets Electronics TV Phone Samsung Sony Cell Landline

81 Flattening nested documents Name: Chander Hadoop: Expert { name : chander, skills : hadoop, nodejs, Spanish, level : expert, expert, novice } Nodejs: Expert Spanish: Novice Skills:hadoop AND level:expert

82 Flattening nested documents Name: Chander Hadoop: Expert Nodejs: Expert Spanish: Novice { name : chander, skills_1 : hadoop, skills_2 : nodejs, skills_3 : spanish, level_1 : expert, level_2 : expert, level_3 : novice }

83 References Highly scalable blog

84 References Building Scalable Architecture

Database Scalability {Patterns} / Robert Treat

Database Scalability {Patterns} / Robert Treat Database Scalability {Patterns} / Robert Treat robert treat omniti postgres oracle - mysql mssql - sqlite - nosql What are Database Scalability Patterns? Part Design Patterns Part Application Life-Cycle

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

Practical Cassandra. Vitalii Tymchyshyn [email protected] @tivv00

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00 Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn

More information

Big Systems, Big Data

Big Systems, Big Data Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

More information

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Scalability of web applications. CSCI 470: Web Science Keith Vertanen Scalability of web applications CSCI 470: Web Science Keith Vertanen Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing Approaches

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,

More information

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar ([email protected]) 15-799 10/21/2013

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar ([email protected]) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 4. Basic Principles Doc. RNDr. Irena Holubova, Ph.D. [email protected] http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ NoSQL Overview Main objective:

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Preparing Your Data For Cloud

Preparing Your Data For Cloud Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go

More information

Evaluation of NoSQL databases for large-scale decentralized microblogging

Evaluation of NoSQL databases for large-scale decentralized microblogging Evaluation of NoSQL databases for large-scale decentralized microblogging Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Decentralized Systems - 2nd semester 2012/2013 Universitat Politècnica

More information

Introduction to NOSQL

Introduction to NOSQL Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI

More information

MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You

More information

NoSQL Databases. Nikos Parlavantzas

NoSQL Databases. Nikos Parlavantzas !!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!

More information

nosql and Non Relational Databases

nosql and Non Relational Databases nosql and Non Relational Databases Image src: http://www.pentaho.com/big-data/nosql/ Matthias Lee Johns Hopkins University What NoSQL? Yes no SQL.. Atleast not only SQL Large class of Non Relaltional Databases

More information

these three NoSQL databases because I wanted to see a the two different sides of the CAP

these three NoSQL databases because I wanted to see a the two different sides of the CAP Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the

More information

A survey of big data architectures for handling massive data

A survey of big data architectures for handling massive data CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - [email protected] Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context

More information

SCALABILITY. Hodicska Gergely. email: [email protected] twitter: @felhobacsi. Web Engineering Manager as Ustream. May 7, 2012

SCALABILITY. Hodicska Gergely. email: felho@ustream.tv twitter: @felhobacsi. Web Engineering Manager as Ustream. May 7, 2012 SCALABILITY Hodicska Gergely Web Engineering Manager as Ustream email: [email protected] twitter: @felhobacsi SCALABILITY BME 1 DEFINING SCALABILITY It is not: Performance Easier to scale HA It is the ability

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

How graph databases started the multi-model revolution

How graph databases started the multi-model revolution How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015 Welcome to Big Data 90% of the data in the world today has been created in the

More information

Transactions and ACID in MongoDB

Transactions and ACID in MongoDB Transactions and ACID in MongoDB Kevin Swingler Contents Recap of ACID transactions in RDBMSs Transactions and ACID in MongoDB 1 Concurrency Databases are almost always accessed by multiple users concurrently

More information

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through

More information

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful. Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one

More information

Cloud Computing with Microsoft Azure

Cloud Computing with Microsoft Azure Cloud Computing with Microsoft Azure Michael Stiefel www.reliablesoftware.com [email protected] http://www.reliablesoftware.com/dasblog/default.aspx Azure's Three Flavors Azure Operating

More information

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra [email protected] Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Do Relational Databases Belong in the Cloud? Michael Stiefel www.reliablesoftware.com [email protected]

Do Relational Databases Belong in the Cloud? Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com Do Relational Databases Belong in the Cloud? Michael Stiefel www.reliablesoftware.com [email protected] How do you model data in the cloud? Relational Model A query operation on a relation

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb, Consulting MTS The following is intended to outline our general product direction. It is intended for information

More information

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction

More information

ZingMe Practice For Building Scalable PHP Website. By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG

ZingMe Practice For Building Scalable PHP Website. By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG ZingMe Practice For Building Scalable PHP Website By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG Agenda About ZingMe Scaling PHP application Scalability definition Scaling up vs

More information

bigdata Managing Scale in Ontological Systems

bigdata Managing Scale in Ontological Systems Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural

More information

Benchmarking and Analysis of NoSQL Technologies

Benchmarking and Analysis of NoSQL Technologies Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The

More information

Database Scalability and Oracle 12c

Database Scalability and Oracle 12c Database Scalability and Oracle 12c Marcelle Kratochvil CTO Piction ACE Director All Data/Any Data [email protected] Warning I will be covering topics and saying things that will cause a rethink in

More information

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where

More information

Open Source Technologies on Microsoft Azure

Open Source Technologies on Microsoft Azure Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

NOT IN KANSAS ANY MORE

NOT IN KANSAS ANY MORE NOT IN KANSAS ANY MORE How we moved into Big Data Dan Taylor - JDSU Dan Taylor Dan Taylor: An Engineering Manager, Software Developer, data enthusiast and advocate of all things Agile. I m currently lucky

More information

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

More information

MySQL Cluster 7.0 - New Features. Johan Andersson MySQL Cluster Consulting [email protected]

MySQL Cluster 7.0 - New Features. Johan Andersson MySQL Cluster Consulting johan.andersson@sun.com MySQL Cluster 7.0 - New Features Johan Andersson MySQL Cluster Consulting [email protected] Mat Keep MySQL Cluster Product Management [email protected] Copyright 2009 MySQL Sun Microsystems. The

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

Cloud Computing Is In Your Future

Cloud Computing Is In Your Future Cloud Computing Is In Your Future Michael Stiefel www.reliablesoftware.com [email protected] http://www.reliablesoftware.com/dasblog/default.aspx Cloud Computing is Utility Computing Illusion

More information

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at

More information

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

JBoss & Infinispan open source data grids for the cloud era

JBoss & Infinispan open source data grids for the cloud era JBoss & Infinispan open source data grids for the cloud era Dimitris Andreadis Manager of Software Engineering JBoss Application Server JBoss by Red Hat 5 th Free and Open Source Developer s Conference

More information

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO [email protected] DAMA SF December 15, 2011

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011 NoSQL - What we ve learned with mongodb Paul Pedersen, Deputy CTO [email protected] DAMA SF December 15, 2011 DW2.0 and NoSQL management decision support intgrated access - local v. global - structured v.

More information

BASICS OF SCALING: LOAD BALANCERS

BASICS OF SCALING: LOAD BALANCERS BASICS OF SCALING: LOAD BALANCERS Lately, I ve been doing a lot of work on systems that require a high degree of scalability to handle large traffic spikes. This has led to a lot of questions from friends

More information

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! [email protected]

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the

More information

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

Building Scalable Applications Using Microsoft Technologies

Building Scalable Applications Using Microsoft Technologies Building Scalable Applications Using Microsoft Technologies Padma Krishnan Senior Manager Introduction CIOs lay great emphasis on application scalability and performance and rightly so. As business grows,

More information

InfiniteGraph: The Distributed Graph Database

InfiniteGraph: The Distributed Graph Database A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

More information

Large-Scale Web Applications

Large-Scale Web Applications Large-Scale Web Applications Mendel Rosenblum Web Application Architecture Web Browser Web Server / Application server Storage System HTTP Internet CS142 Lecture Notes - Intro LAN 2 Large-Scale: Scale-Out

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Building Scalable Web Sites: Tidbits from the sites that made it work. Gabe Rudy

Building Scalable Web Sites: Tidbits from the sites that made it work. Gabe Rudy : Tidbits from the sites that made it work Gabe Rudy What Is This About Scalable is hot Web startups tend to die or grow... really big Youtube Founded 02/2005. Acquired by Google 11/2006 03/2006 30 million

More information

www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach

www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging

More information

Distributed Storage Systems

Distributed Storage Systems Distributed Storage Systems John Leach [email protected] twitter @johnleach Brightbox Cloud http://brightbox.com Our requirements Bright box has multiple zones (data centres) Should tolerate a zone failure

More information

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases Background Inspiration: postgresapp.com demo.beatstream.fi (modern desktop browsers without

More information

Domain driven design, NoSQL and multi-model databases

Domain driven design, NoSQL and multi-model databases Domain driven design, NoSQL and multi-model databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group NoSQL Evaluator s Guide McKnight Consulting Group William McKnight is the former IT VP of a Fortune 50 company and the author of Information Management: Strategies for Gaining a Competitive Advantage with

More information

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect

More information

Adding scalability to legacy PHP web applications. Overview. Mario Valdez-Ramirez

Adding scalability to legacy PHP web applications. Overview. Mario Valdez-Ramirez Adding scalability to legacy PHP web applications Overview Mario Valdez-Ramirez The scalability problems of legacy applications Usually were not designed with scalability in mind. Usually have monolithic

More information

MySQL. Leveraging. Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli

MySQL. Leveraging. Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli Leveraging MySQL Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli MySQL is a popular, open-source Relational Database Management System (RDBMS) designed to run on almost

More information

Common Server Setups For Your Web Application - Part II

Common Server Setups For Your Web Application - Part II Common Server Setups For Your Web Application - Part II Introduction When deciding which server architecture to use for your environment, there are many factors to consider, such as performance, scalability,

More information

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

1. Comments on reviews a. Need to avoid just summarizing web page asks you for: 1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of

More information

The CAP theorem and the design of large scale distributed systems: Part I

The CAP theorem and the design of large scale distributed systems: Part I The CAP theorem and the design of large scale distributed systems: Part I Silvia Bonomi University of Rome La Sapienza www.dis.uniroma1.it/~bonomi Great Ideas in Computer Science & Engineering A.A. 2012/2013

More information

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) Erdélyi Ernő, Component Soft Kft. [email protected] www.component.hu 2013 (c) Component Soft Ltd Leading Hadoop Vendor Copyright 2013,

More information

Cloud Scale Distributed Data Storage. Jürmo Mehine

Cloud Scale Distributed Data Storage. Jürmo Mehine Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented

More information

Future-Proofing MySQL for the Worldwide Data Revolution

Future-Proofing MySQL for the Worldwide Data Revolution Future-Proofing MySQL for the Worldwide Data Revolution Robert Hodges, CEO. What is Future-Proo!ng? Future-proo!ng = creating systems that last while parts change and improve MySQL is not losing out to

More information

High Throughput Computing on P2P Networks. Carlos Pérez Miguel [email protected]

High Throughput Computing on P2P Networks. Carlos Pérez Miguel carlos.perezm@ehu.es High Throughput Computing on P2P Networks Carlos Pérez Miguel [email protected] Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured

More information

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

Distribution transparency. Degree of transparency. Openness of distributed systems

Distribution transparency. Degree of transparency. Openness of distributed systems Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science [email protected] Chapter 01: Version: August 27, 2012 1 / 28 Distributed System: Definition A distributed

More information

Developing Scalable Java Applications with Cacheonix

Developing Scalable Java Applications with Cacheonix Developing Scalable Java Applications with Cacheonix Introduction Presenter: Slava Imeshev Founder and main committer, Cacheonix Frequent speaker on scalability [email protected] www.cacheonix.com/blog/

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2014/15 Unit 15 J. Gamper 1/44 Advanced Data Management Technologies Unit 15 Introduction to NoSQL J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE ADMT 2014/15 Unit 15

More information

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2 DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing Slide 1 Slide 3 A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING BRAC UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING 12-12-2012 Investigating Cloud Data Storage Sumaiya Binte Mostafa (ID 08301001) Firoza Tabassum (ID 09101028) BRAC University

More information

How To Use Big Data For Telco (For A Telco)

How To Use Big Data For Telco (For A Telco) ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information