Scaling Pinterest. Yash Nelapati Ascii Artist. Pinterest Engineering. Saturday, August 31, 13

Size: px
Start display at page:

Download "Scaling Pinterest. Yash Nelapati Ascii Artist. Pinterest Engineering. Saturday, August 31, 13"

Transcription

1 Scaling Pinterest Yash Nelapati Ascii Artist

2 Pinterest is... An online pinboard to organize and share what inspires you.

3

4

5

6 Growth March 2010 Page views per day Mar 2010 Jan 2011 Jan 2012 May 2012

7 Growth March 2010 Page views per day Mar 2010 Jan 2011 Jan 2012 May 2012

8 Growth March 2010 Page views per day RackSpace 1 small Web Engine 1 small MySQL DB 1 Engineer + 2 Founders Mar 2010 Jan 2011 Jan 2012 May 2012

9 Growth March 2010

10 Growth March 2010

11 Growth January 2011 Page views per day Mar 2010 Jan 2011 Jan 2012

12 Growth January 2011 Page views per day Mar 2010 Jan 2011 Jan 2012

13 Growth January 2011 Page views per day Amazon EC2 + S3 + CloudFront 1 NGinX, 4 Web Engines 1 MySQL DB + 1 Read Slave 1 Task Queue + 2 Task Processors 1 MongoDB 2 Engineers + 2 Founders Mar 2010 Jan 2011 Jan 2012

14

15 Growth September 2011 Page views per day Mar 2010 Jan 2011 Jan 2012 May 2012

16 Growth September 2011 Page views per day Mar 2010 Jan 2011 Jan 2012 May 2012

17 Growth September 2011 Page views per day Amazon EC2 + S3 + CloudFront 2 NGinX, 16 Web Engines + 2 API Engines 5 Functionally Sharded MySQL DB + 9 read slaves 4 Cassandra Nodes 15 Membase Nodes (3 separate clusters) 8 Memcache Nodes 10 Redis Nodes 3 Task Routers + 4 Task Processors 4 Elastic Search Nodes 3 Mongo Clusters 3 Engineers (8 Total) Mar 2010 Jan 2011 Jan 2012 May 2012

18 It will fail. Keep it simple.

19 Growth April 2012 Page views per day Mar 2010 Mar 2010 Jan 2011 Jan 2012 May 2012

20 Growth April 2012 Page views per day Mar 2010 Mar 2010 Jan 2011 Jan 2012 May 2012

21 Growth April 2012 Page views per day Amazon EC2 + S3 + Edge Cast 135 Web Engines + 75 API Engines 10 Service Instances 80 MySQL DBs (m1.xlarge) + 1 slave each 110 Redis Instances 60 Memcache Instances 2 Redis Task Manager + 60 Task Processors Mar rd party sharded Solr 15 Engineers (25 Total) Mar 2010 Jan 2011 Jan 2012 May 2012

22 Growth January 2012

23 Scaling Pinterest

24 Growth August 2013 Page views per day April 2012 August 2013

25 Growth August 2013 Page views per day April 2012 August 2013

26 Growth August 2013 Amazon EC2 + S3 + Edge Cast 400+ Web Engines API Engines 70+ MySQL DBs (hi.4xlarge on SSDs) + 1 slave each 100+ Redis Instances 230+ Memcache Instances 10 Redis Task Manager Task Processors 70+ Engineers (130+ Total) Page views per day April 2012 August 2013

27 Growth August 2013 Amazon EC2 + S3 + Edge Cast 400+ Web Engines API Engines 70+ MySQL DBs (hi.4xlarge on SSDs) + 1 slave each 100+ Redis Instances 230+ Memcache Instances 10 Redis Task Manager Task Processors 70+ Engineers (130+ Total) 6 services (80 instances) Sharded Solr 20 HBase 12 Kafka + Azkabhan 8 Zookeeper Instances 12 Varnish Page views per day April 2012 August 2013

28 ELB Puppet Routing & Filtering (Varnish) Task Queue (Redis) StatD Monit API App (Python) Web App (Python) Task Processing (Python/Pyres) Ganglia All connection pairings managed by ZooKeeper Follower Service (Python/Thrift) MySQL Service (Java/Finagle) Memcache Mux (Nutcracker) Feed Service (Python/Thrift) Follower Service (Python/Thrift) Follower Service (Python/Thrift) Images (S3 + CDN) MySQL Memcache Redis HBase

29 API (Python) Web App (Python) Task Processing (Python/Pyres) Kafka S3 Copier Tripwire S3 EMR Hive Azkaban

30

31 Technologies

32 Choosing Your Tech Questions to ask Does it meet your needs? How mature is the product? Is it commonly used? Can you hire people who have used it? Is the community active? How robust is it to failure? How well does it scale? Will you be the biggest user? Does it have a good debugging tools? Profiler? Backup software? Is the cost justified?

33 Hosting Why Amazon Web Services? Variety of servers running Linux Very good peripherals, such as load balancing, DNS, map reduce, basic firewalls, and more Good reliability (don t throw tomatoes at me!) Very active dev community Not cheap, but... New instances ready in seconds

34 Code Why Python? Extremely mature Well known and well liked Solid active community Very good libraries specifically targeted to web development Effective rapid prototyping Free

35 Production Data Why MySQL and Memcache? Extremely mature Well known and well liked Rarely catastrophic loss of data Response time to request rate increases linearly Very good software support: XtraBackup, Innotop, Maatkit Solid active community Free

36 Production Data Why Redis? Well known and well liked Active community Consistently good performance Variety of convenient and efficient data structures 3 Flavors of Persistence: Now, Snapshot, Never Free

37 Production Data Why HBASE? (Why not MySQL) Efficient Storage Handle large write throughput Solid Hadoop interface Maturing quickly, used by facebook Built on HDFS Free

38 What happened to Cassandra, Production Data Mongo, ES, and Membase? Does it meet your needs? How mature is the product? Is it commonly used? Can you hire people who have used it? Is the community active? How robust is it to failure? How well does it scale? Will you be the biggest user? Does it have a good debugging tools? Profiler? Backup software? Is the cost justified?

39 If you re the biggest user of a technology, the challenges will be greatly amplified

40 What s happening now?

41 Employee Growth Challenge: One Codebase + Lots of Engineers = Deploy Hell Major bugs and performance issues stall deploys Performance issues creep in under radar 7+ development teams, 1 ops team Workload changing more rapidly and less predictably Want developers to not fear moving fast

42 Employee Growth Solution: Deploy Checkpoints Aggressive unit tests (careful! don t erase your DB!) Rings of deployment Canary, employees only, 5% of user base, etc. Continuous deployment Production integration tests

43 Uptime & Latency Challenge: Increase Availability, Decrease Latency Push for better uptime and lower latency Initially, most uptime and latency issues due to DB + caching Fewer Instances => Few, but big failures More Instances => More smaller failures + more complexity How aggressively can you retry without hurting the system?

44 Uptime & Latency Solution: Metrics Dashboard and Alerts Create dashboard + alerts, and review response times weekly When? Soon after launch at latest Profile everything MySQL - Maatkit, InnoTop Memcache - Maatkit Frontend - New Relic General Ops - StatsD, Nagios / Monit, Ganglia

45 Uptime & Latency Solution: Configuration Manager and Failover Provides load balancing and automatic connection reconfiguration When? 30+ caches / DBs One option: Intermediate load balancers Example: HAProxy, Nginx, Varnish Extra latency hop More complication Configuration hassle (1 LB / 7 services?)

46 Coordination Solution: Zookeeper Centralized configuration management Used for service discovery Notifies of service failures WATCH and its callback are pretty reliable Experiment framework

47 Coordination Solution: Zookeeper Centralized configuration management Used for service discovery Notifies of service failures WATCH and its callback are pretty reliable Experiment framework Services app Zookeeper Register

48 Coordination Solution: Zookeeper Centralized configuration management Used for service discovery Notifies of service failures WATCH and its callback are pretty reliable Experiment framework WATCH Services app Zookeeper Register

49 MySQL Failover Part 1: Configuration Manager and Failover readonly=true A B { master : A } App Zookeeper

50 MySQL Failover Part 2: Configuration Manager and Failover readonly=true A B { master : B } App Zookeeper

51 MySQL Failover Part 2: Configuration Manager and Failover readonly=false A B { master : B } App Zookeeper

52 Connections Challenge: Number of Connections Rising Initially, entire app tier connected to all Memcache, Redis, MySQL On Memcache... 20k connections * 10kB / connection = 195MB / Memcache 40 Memcaches means 7.6 GB used on connections Connection space is not allocated from slab memory! Can eventually cause Memcache process to leak into swap On MySQL At least 256 kb / connection

53 Connections Solution: Connection Pooling and Multiplexing Data Services, Nutcracker When? Once any service gets close to 10k connections Success: Memcache Once was >20k connections Now 1.3k connections But, aggressive fan-out causes... Network contention Incast congestion

54 Memcache Failures App Nutcracker

55 Memcache Failures App Nutcracker Ketama Ring Adjusted

56 Why Java Over Python? Finagle RPC for high concurrency Twitter Completely asynchronous Previous experience with Finagle Lots of compatible libraries JVM Lots of bells and whistles - Ostrich, Zipkin, lago

57 How did you shard?

58 How wesharded db00001 db db00512 db00513 db db01024 db03073 db db03583 db03584 db db04096

59 Master Master Replication High Availability db00001 db db00512 db00513 db db01024 db03073 db db03583 db03584 db db04096

60 Split your Shards Increased Load on DB? db00001 db db00256 db00001 db db00512 db00256 db db00512

61 ID Structure 64 bits Shard ID Type Local ID A lookup data structure has physical server to shard ID range (cached by each app server process) Shard ID denotes which shard Type denotes object type (e.g., pins) Local ID denotes position in table

62 Object tables (e.g., pin, board, user, comment) Objects & Mappings Local ID MySQL blob (JSON / Serialized thrift) Mapping tables (e.g., user has boards, pin has likes) Full ID Full ID (+ timestamp) Naming schema is noun_verb_noun Queries are PK or index lookups (no joins) Data DOES NOT MOVE All tables exist on all shards No schema changes required (index = new table)

63 What s next? Looking Forward Continually improve Pinner experience Better uptime and lower latency Help Pinners discover more of the things they love Reduce spam and abuse Continually collaborate and build bigger, better, faster products 140 Pinployees and beyond MySQL 5.6

64 pinterest.com/yashh

65 pinterest.com/yashh

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

the road to cloud native applications Fabien Hermenier

the road to cloud native applications Fabien Hermenier the road to cloud native applications Fabien Hermenier 1 cloud ready applications single-tiered monolithic hardware specific cloud native applications leverage cloud services scalable reliable 2 Agenda

More information

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...

More information

xpaaerns on Spark, Shark, Tachyon and Mesos

xpaaerns on Spark, Shark, Tachyon and Mesos xpaaerns on Spark, Shark, Tachyon and Mesos Spark Summit 2014 Claudiu Barbura Sr. Director of Engineering A>geo Agenda xpa&erns Architecture From Hadoop to BDAS & our contribu

More information

Big data blue print for cloud architecture

Big data blue print for cloud architecture Big data blue print for cloud architecture -COGNIZANT Image Area Prabhu Inbarajan Srinivasan Thiruvengadathan Muralicharan Gurumoorthy Praveen Codur 2012, Cognizant Next 30 minutes Big Data / Cloud challenges

More information

SCALABILITY. Hodicska Gergely. email: felho@ustream.tv twitter: @felhobacsi. Web Engineering Manager as Ustream. May 7, 2012

SCALABILITY. Hodicska Gergely. email: felho@ustream.tv twitter: @felhobacsi. Web Engineering Manager as Ustream. May 7, 2012 SCALABILITY Hodicska Gergely Web Engineering Manager as Ustream email: felho@ustream.tv twitter: @felhobacsi SCALABILITY BME 1 DEFINING SCALABILITY It is not: Performance Easier to scale HA It is the ability

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010 System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached

More information

Design for Failure High Availability Architectures using AWS

Design for Failure High Availability Architectures using AWS Design for Failure High Availability Architectures using AWS Harish Ganesan Co founder & CTO 8KMiles www.twitter.com/harish11g http://www.linkedin.com/in/harishganesan Sample Use Case Multi tiered LAMP/LAMJ

More information

CI Pipeline with Docker 2015-02-27

CI Pipeline with Docker 2015-02-27 CI Pipeline with Docker 2015-02-27 Juho Mäkinen, Technical Operations, Unity Technologies Finland http://www.juhonkoti.net http://github.com/garo Overview 1. Scale on how we use Docker 2. Overview on the

More information

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity P3 InfoTech Solutions Pvt. Ltd http://www.p3infotech.in July 2013 Created by P3 InfoTech Solutions Pvt. Ltd., http://p3infotech.in 1 Web Application Deployment in the Cloud Using Amazon Web Services From

More information

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012) 1. Computation Amazon Web Services Amazon Elastic Compute Cloud (Amazon EC2) provides basic computation service in AWS. It presents a virtual computing environment and enables resizable compute capacity.

More information

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

the missing log collector Treasure Data, Inc. Muga Nishizawa

the missing log collector Treasure Data, Inc. Muga Nishizawa the missing log collector Treasure Data, Inc. Muga Nishizawa Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data Treasure Data Overview Founded to deliver big data analytics in days

More information

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

Mobile + HA + Cloud. Eugene Ciurana! ! pr3d4t0r - irc.freenode.net! ##java, ##security, #awk, #python, #bitcoin! irc.oftc.net: #tor, #tor-dev, #tails!

Mobile + HA + Cloud. Eugene Ciurana! ! pr3d4t0r - irc.freenode.net! ##java, ##security, #awk, #python, #bitcoin! irc.oftc.net: #tor, #tor-dev, #tails! Mobile + HA + Cloud Eugene Ciurana!! pr3d4t0r - irc.freenode.net! ##java, ##security, #awk, #python, #bitcoin! irc.oftc.net: #tor, #tor-dev, #tails!! qcon2014@cime.net About Eugene... 15+ years building

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Designing Apps for Amazon Web Services

Designing Apps for Amazon Web Services Designing Apps for Amazon Web Services Mathias Meyer, GOTO Aarhus 2011 Montag, 10. Oktober 11 Montag, 10. Oktober 11 Me infrastructure code databases @roidrage www.paperplanes.de Montag, 10. Oktober 11

More information

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011 Real-time Analytics at Facebook: Data Freeway and Puma Zheng Shao 12/2/2011 Agenda 1 Analytics and Real-time 2 Data Freeway 3 Puma 4 Future Works Analytics and Real-time what and why Facebook Insights

More information

3 Case Studies of NoSQL and Java Apps in the Real World

3 Case Studies of NoSQL and Java Apps in the Real World Eugene Ciurana geecon@ciurana.eu - pr3d4t0r ##java, irc.freenode.net 3 Case Studies of NoSQL and Java Apps in the Real World This presentation is available from: http://ciurana.eu/geecon-2011 About Eugene...

More information

white paper imaginea Building Applications for the Cloud Challenges, Experiences and Recommendations

white paper imaginea Building Applications for the Cloud Challenges, Experiences and Recommendations white paper Building Applications for the Cloud Challenges, Experiences and Recommendations Web applications need to be highly reliable. They must scale dynamically, as users and data volumes increase.

More information

TECHNOLOGY WHITE PAPER Jun 2012

TECHNOLOGY WHITE PAPER Jun 2012 TECHNOLOGY WHITE PAPER Jun 2012 Technology Stack C# Windows Server 2008 PHP Amazon Web Services (AWS) Route 53 Elastic Load Balancing (ELB) Elastic Compute Cloud (EC2) Amazon RDS Amazon S3 Elasticache

More information

Glide.me Leverages Redis Cloud to Scale Their 200G In-Memory Database

Glide.me Leverages Redis Cloud to Scale Their 200G In-Memory Database Glide.me Leverages Redis Cloud to Scale Their 200G In-Memory Database Case Study, May 2014 "It's the quietest part of my infrastructure. It's the part that just works. It keeps all the data and it scales

More information

Modern Web development and operations practices. Grig Gheorghiu VP Tech Operations Nasty Gal Inc. @griggheo

Modern Web development and operations practices. Grig Gheorghiu VP Tech Operations Nasty Gal Inc. @griggheo Modern Web development and operations practices Grig Gheorghiu VP Tech Operations Nasty Gal Inc. @griggheo Modern Web stack Aim for horizontal scalability! Ruby/Python front-end servers (Sinatra/Padrino,

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

The CF Brooklyn Service Broker and Plugin

The CF Brooklyn Service Broker and Plugin Simplifying Services with the Apache Brooklyn Catalog The CF Brooklyn Service Broker and Plugin 1 What is Apache Brooklyn? Brooklyn is a framework for modelling, monitoring, and managing applications through

More information

Hints for Service Oriented Architectures. Marius Eriksen @marius Twitter Inc.

Hints for Service Oriented Architectures. Marius Eriksen @marius Twitter Inc. Hints for Service Oriented Architectures Marius Eriksen @marius Twitter Inc. We went from this (circa 2010) LB web web web web queue DB cache workers to this (circa 2015) ROUTING PRESENTATION LOGIC STORAGE

More information

Reliable Data Tier Architecture for Job Portal using AWS

Reliable Data Tier Architecture for Job Portal using AWS Reliable Data Tier Architecture for Job Portal using AWS Manoj Prakash Thalagatti 1, Chaitra B 2, Mohammed Asrar Naveed 3 1,3 M. Tech Student, Dept. of ISE, Acharya Institute of Technology, Bengaluru,

More information

HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz

HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz Overview Google App Engine (GAE) GAE Analytics Libraries

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Big Data Infrastructure at Spotify

Big Data Infrastructure at Spotify Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure June 12, 2013 2 Agenda Let s talk about Data Infrastructure, how we did it, what we learned and how we ve failed Some Context

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

Scalable Web Application

Scalable Web Application Scalable Web Applications Reference Architectures and Best Practices Brian Adler, PS Architect 1 Scalable Web Application 2 1 Scalable Web Application What? An application built on an architecture that

More information

Search Big Data with MySQL and Sphinx. Mindaugas Žukas www.ivinco.com

Search Big Data with MySQL and Sphinx. Mindaugas Žukas www.ivinco.com Search Big Data with MySQL and Sphinx Mindaugas Žukas www.ivinco.com Agenda Big Data Architecture Factors and Technologies MySQL and Big Data Sphinx Search Server overview Case study: building a Big Data

More information

ZingMe Practice For Building Scalable PHP Website. By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG

ZingMe Practice For Building Scalable PHP Website. By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG ZingMe Practice For Building Scalable PHP Website By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG Agenda About ZingMe Scaling PHP application Scalability definition Scaling up vs

More information

Scaling in the Cloud with AWS. By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com

Scaling in the Cloud with AWS. By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com Scaling in the Cloud with AWS By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com Welcome! Why is this guy talking to us? Please ask questions! 2 What is Scaling anyway? Enabling

More information

Scaling Graphite Installations

Scaling Graphite Installations Scaling Graphite Installations Graphite basics Graphite is a web based Graphing program for time series data series plots. Written in Python Consists of multiple separate daemons Has it's own storage backend

More information

Evaluation of NoSQL databases for large-scale decentralized microblogging

Evaluation of NoSQL databases for large-scale decentralized microblogging Evaluation of NoSQL databases for large-scale decentralized microblogging Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Decentralized Systems - 2nd semester 2012/2013 Universitat Politècnica

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions

STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions 11 th International Conference on Software Testing June 2014 at Bangalore, Hyderabad, Pune - INDIA Performance testing Hadoop based big data analytics solutions by Mustufa Batterywala, Performance Architect,

More information

Cloud computing - Architecting in the cloud

Cloud computing - Architecting in the cloud Cloud computing - Architecting in the cloud anna.ruokonen@tut.fi 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices

More information

Web Application Hosting in the AWS Cloud Best Practices

Web Application Hosting in the AWS Cloud Best Practices Web Application Hosting in the AWS Cloud Best Practices September 2012 Matt Tavis, Philip Fitzsimons Page 1 of 14 Abstract Highly available and scalable web hosting can be a complex and expensive proposition.

More information

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL

More information

Cloud Databases and Oracle

Cloud Databases and Oracle The following text is partly taken from the Oracle book Middleware and Cloud Computing It is available from Amazon: http://www.amazon.com/dp/0980798000 Cloud Databases and Oracle When designing your cloud

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Yahoo! Cloud Serving Benchmark

Yahoo! Cloud Serving Benchmark Yahoo! Cloud Serving Benchmark Overview and results March 31, 2010 Brian F. Cooper cooperb@yahoo-inc.com Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears System setup and

More information

Tushar Joshi Turtle Networks Ltd

Tushar Joshi Turtle Networks Ltd MySQL Database for High Availability Web Applications Tushar Joshi Turtle Networks Ltd www.turtle.net Overview What is High Availability? Web/Network Architecture Applications MySQL Replication MySQL Clustering

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

Removing Failure Points and Increasing Scalability for the Engine that Drives webmd.com

Removing Failure Points and Increasing Scalability for the Engine that Drives webmd.com Removing Failure Points and Increasing Scalability for the Engine that Drives webmd.com Matt Wilson Director, Consumer Web Operations, WebMD @mattwilsoninc 9/12/2013 About this talk Go over original site

More information

Building a Scalable News Feed Web Service in Clojure

Building a Scalable News Feed Web Service in Clojure Building a Scalable News Feed Web Service in Clojure This is a good time to be in software. The Internet has made communications between computers and people extremely affordable, even at scale. Cloud

More information

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II) UC BERKELEY Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II) Anthony D. Joseph LASER Summer School September 2013 My Talks at LASER 2013 1. AMP Lab introduction 2. The Datacenter

More information

TECHNOLOGY WHITE PAPER Jan 2016

TECHNOLOGY WHITE PAPER Jan 2016 TECHNOLOGY WHITE PAPER Jan 2016 Technology Stack C# PHP Amazon Web Services (AWS) Route 53 Elastic Load Balancing (ELB) Elastic Compute Cloud (EC2) Amazon RDS Amazon S3 Elasticache CloudWatch Paypal Overview

More information

Building a logging pipeline with Open Source tools. Iñigo Ortiz de Urbina Cazenave

Building a logging pipeline with Open Source tools. Iñigo Ortiz de Urbina Cazenave Building a logging pipeline with Open Source tools Iñigo Ortiz de Urbina Cazenave NLUUG Utrecht - Netherlands 28 May 2015 whoami; 2 Iñigo Ortiz de Urbina Cazenave Systems Engineer whoami; groups; 3 Iñigo

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Table of Contents. Overview... 1 Introduction... 2 Common Architectures... 3. Technical Challenges with Magento... 6. ChinaNetCloud's Experience...

Table of Contents. Overview... 1 Introduction... 2 Common Architectures... 3. Technical Challenges with Magento... 6. ChinaNetCloud's Experience... Table of Contents Overview... 1 Introduction... 2 Common Architectures... 3 Simple System... 3 Highly Available System... 4 Large Scale High-Performance System... 5 Technical Challenges with Magento...

More information

Open Source Technologies on Microsoft Azure

Open Source Technologies on Microsoft Azure Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions

More information

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues Dharmit Patel Faraj Khasib Shiva Srivastava Outline What is Distributed Queue Service? Major Queue Service

More information

EFFICIENT ANALYSIS OF APPLICATION SERVERS IN THE CLOUD

EFFICIENT ANALYSIS OF APPLICATION SERVERS IN THE CLOUD EFFICIENT ANALYSIS OF APPLICATION SERVERS IN THE CLOUD Progress report meeting December 2012 Phuong Tran Gia gia-phuong.tran@polymtl.ca Under the supervision of Prof. Michel R. Dagenais Dorsal Laboratory,

More information

Sentimental Analysis using Hadoop Phase 2: Week 2

Sentimental Analysis using Hadoop Phase 2: Week 2 Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular

More information

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services

More information

LARGE-SCALE DATA STORAGE APPLICATIONS

LARGE-SCALE DATA STORAGE APPLICATIONS BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort

More information

Table Of Contents. 1. GridGain In-Memory Database

Table Of Contents. 1. GridGain In-Memory Database Table Of Contents 1. GridGain In-Memory Database 2. GridGain Installation 2.1 Check GridGain Installation 2.2 Running GridGain Examples 2.3 Configure GridGain Node Discovery 3. Starting Grid Nodes 4. Management

More information

Last time. Today. IaaS Providers. Amazon Web Services, overview

Last time. Today. IaaS Providers. Amazon Web Services, overview Last time General overview, motivation, expected outcomes, other formalities, etc. Please register for course Online (if possible), or talk to Yvonne@CS Course evaluation forgotten Please assign one volunteer

More information

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН Zettabytes Petabytes ABC Sharding A B C Id Fn Ln Addr 1 Fred Jones Liberty, NY 2 John Smith?????? 122+ NoSQL Database

More information

Cloud Computing For Bioinformatics

Cloud Computing For Bioinformatics Cloud Computing For Bioinformatics Cloud Computing: what is it? Cloud Computing is a distributed infrastructure where resources, software, and data are provided in an on-demand fashion. Cloud Computing

More information

BASICS OF SCALING: LOAD BALANCERS

BASICS OF SCALING: LOAD BALANCERS BASICS OF SCALING: LOAD BALANCERS Lately, I ve been doing a lot of work on systems that require a high degree of scalability to handle large traffic spikes. This has led to a lot of questions from friends

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace Introduction to Polyglot Persistence Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace FOSSCOMM 2016 Background - 14 years in databases and system engineering - NoSQL DBA @ ObjectRocket

More information

High-Availability in the Cloud Architectural Best Practices

High-Availability in the Cloud Architectural Best Practices 1 High-Availability in the Cloud Architectural Best Practices Josh Fraser, VP Business Development, RightScale Brian Adler, Sr. Professional Services Architect 2 # RightScale World s #1 cloud management

More information

MapReduce with Apache Hadoop Analysing Big Data

MapReduce with Apache Hadoop Analysing Big Data MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues

More information

Server Architecture for High- Performance Drupal

Server Architecture for High- Performance Drupal Server Architecture for High- Performance Drupal Robert Ristroph rgristroph@gmail.com @robgr http://www.drupalcampphoenix.com /high-performance-serverarchitecture Outline What is performance? Scaling?

More information

Ryan Horn, Lead Software Engineer at Twilio. November 12, 2014 Las Vegas. BDT312 Using the Cloud to Scale from a Database to a Data Platform

Ryan Horn, Lead Software Engineer at Twilio. November 12, 2014 Las Vegas. BDT312 Using the Cloud to Scale from a Database to a Data Platform BDT312 Using the Cloud to Scale from a Database to a Data Platform Ryan Horn, Lead Software Engineer at Twilio November 12, 2014 Las Vegas 2014 Amazon.com, Inc. and its affiliates. All rights reserved.

More information

NoSQL: Going Beyond Structured Data and RDBMS

NoSQL: Going Beyond Structured Data and RDBMS NoSQL: Going Beyond Structured Data and RDBMS Scenario Size of data >> disk or memory space on a single machine Store data across many machines Retrieve data from many machines Machine = Commodity machine

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

MakeMyTrip CUSTOMER SUCCESS STORY

MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip is the leading travel site in India that is running two ClustrixDB clusters as multi-master in two regions. It removed single point of failure. MakeMyTrip frequently

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

High Throughput Computing on P2P Networks. Carlos Pérez Miguel carlos.perezm@ehu.es

High Throughput Computing on P2P Networks. Carlos Pérez Miguel carlos.perezm@ehu.es High Throughput Computing on P2P Networks Carlos Pérez Miguel carlos.perezm@ehu.es Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured

More information

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social Connecting the World Through Games Zynga Analytics Leveraging Big Data to Make Games More Fun and Social Daniel McCaffrey General Manager, Platform and Analytics Engineering World s leading social game

More information

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Scalability of web applications. CSCI 470: Web Science Keith Vertanen Scalability of web applications CSCI 470: Web Science Keith Vertanen Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing Approaches

More information

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords

More information

Preparing Your Data For Cloud

Preparing Your Data For Cloud Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability

More information

Practical Load Balancing

Practical Load Balancing Practical Load Balancing Ride the Performance Tiger Illtil Peter Membrey David Hows Eelco Plugge Apress8 Contents About the Authors About the Technical Reviewers Special Thanks to serverlove Acknowledgments

More information

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012 Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics data 4

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, 2015. Anthony Yeh, Software Engineer, YouTube. http://vitess.

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, 2015. Anthony Yeh, Software Engineer, YouTube. http://vitess. YouTube Vitess Cloud-Native MySQL Oracle OpenWorld Conference October 26, 2015 Anthony Yeh, Software Engineer, YouTube http://vitess.io/ Spoiler Alert Spoilers 1. History of Vitess 2. What is Cloud-Native

More information

Cache All The Things

Cache All The Things Cache All The Things About Me Mike Bell Drupal Developer @mikebell_ http://drupal.org/user/189605 Exactly what things? erm... everything! No really... Frontend: - HTML - CSS - Images - Javascript Backend:

More information

Wisdom from Crowds of Machines

Wisdom from Crowds of Machines Wisdom from Crowds of Machines Analytics and Big Data Summit September 19, 2013 Chetan Conikee Irfan Ahmad About Us CloudPhysics' mission is to discover the underlying principles that govern systems behavior

More information

BeBanjo Infrastructure and Security Overview

BeBanjo Infrastructure and Security Overview BeBanjo Infrastructure and Security Overview Can you trust Software-as-a-Service (SaaS) to run your business? Is your data safe in the cloud? At BeBanjo, we firmly believe that SaaS delivers great benefits

More information

Spotify services. The whole is greater than the sum of the parts. Niklas Gustavsson. måndag 4 mars 13

Spotify services. The whole is greater than the sum of the parts. Niklas Gustavsson. måndag 4 mars 13 Spotify services The whole is greater than the sum of the parts Niklas Gustavsson Me Distributed systems geek Spotify since 2011 ngn@spotify.com @protocol7 Last year Architectural overview Lots of questions!

More information

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects

More information