Removing Failure Points and Increasing Scalability for the Engine that Drives webmd.com



Similar documents
XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Tushar Joshi Turtle Networks Ltd

[Hadoop, Storm and Couchbase: Faster Big Data]

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

The deployment of OHMS TM. in private cloud

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

High Availability Databases based on Oracle 10g RAC on Linux

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Datacenter Operating Systems

Big Data Analytics - Accelerated. stream-horizon.com

Bryan Tuft Sr. Sales Consultant Global Embedded Business Unit

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

membase.org: The Simple, Fast, Elastic NoSQL Database NorthScale Matt Ingenthron OSCON 2010

How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

SCALABLE DATA SERVICES

Building Scalable Applications Using Microsoft Technologies

Eloquence Training What s new in Eloquence B.08.00

Session Storage in Zend Server Cluster Manager

Search Big Data with MySQL and Sphinx. Mindaugas Žukas

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava

Distributed File Systems

Cloud Based Application Architectures using Smart Computing

Performance and Scalability Overview

Hybrid Solutions Combining In-Memory & SSD

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

<Insert Picture Here> Oracle Cloud Storage. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

Tier Architectures. Kathleen Durant CS 3200

Liferay Portal s Document Library: Architectural Overview, Performance and Scalability

Enterprise Edition Scalability. ecommerce Framework Built to Scale Reading Time: 10 minutes

Product Overview. UNIFIED COMPUTING Managed Hosting - Storage Data Sheet

Cloud Computing Is In Your Future

Open-Xchange Server High availability Daniel Halbe, Holger Achtziger

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

MySQL High-Availability and Scale-Out architectures

Client Overview. Engagement Situation

A simple object storage system for web applications Dan Pollack AOL

Practical Cassandra. Vitalii

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

SQL Server 2012 Performance White Paper

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

CONNECTRIA MANAGED AMAZON WEB SERVICES (AWS)

Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day

Best Practices for Managing Storage in the Most Challenging Environments

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Real-time reporting at 10,000 inserts per second. Wesley Biggs CTO 25 October 2011 Percona Live

BUILDING HIGH-AVAILABILITY SERVICES IN JAVA

Monitoring Remedy with BMC Solutions

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

NoSQL Data Base Basics

Getting Started with SandStorm NoSQL Benchmark

The CF Brooklyn Service Broker and Plugin

Understanding Neo4j Scalability

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

ZingMe Practice For Building Scalable PHP Website. By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG

Exadata for Oracle DBAs. Longtime Oracle DBA

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

FINANCIAL SERVICES: FRAUD MANAGEMENT A solution showcase

Informix Dynamic Server May Availability Solutions with Informix Dynamic Server 11

Bricks Cluster Technical Whitepaper

Outdated Architectures Are Holding Back the Cloud

WSO2 Business Process Server Clustering Guide for 3.2.0

Wikimedia architecture. Mark Bergsma Wikimedia Foundation Inc.

Software-defined Storage Architecture for Analytics Computing

Structured Data Storage

Tipping The Scale Tips, Tools, and Techniques For Building Scalable. Steve French Senior Software Engineer digg.com

Putting Apache Kafka to Use!

Preparing Your IT for the Holidays. A quick start guide to take your e-commerce to the Cloud

Monitoring Databases on VMware

STeP-IN SUMMIT June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions

<Insert Picture Here> Managing Storage in Private Clouds with Oracle Cloud File System OOW 2011 presentation

Oracle BI Publisher Enterprise Cluster Deployment. An Oracle White Paper August 2007

Table of Contents. Overview... 1 Introduction... 2 Common Architectures Technical Challenges with Magento ChinaNetCloud's Experience...

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

IOS110. Virtualization 5/27/2014 1

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Google Cloud Platform The basics

Avamar Backup and Data De-duplication Exam

OWB Users, Enter The New ODI World

Scaling Pinterest. Yash Nelapati Ascii Artist. Pinterest Engineering. Saturday, August 31, 13

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Software Pricing. Operating System

BIRT Document Transform

A Study of Application Performance with Non-Volatile Main Memory

Social Networks and the Richness of Data

JBoss & Infinispan open source data grids for the cloud era

FlashSoft Software from SanDisk : Accelerating Virtual Infrastructures

Assignment # 1 (Cloud Computing Security)

Transcription:

Removing Failure Points and Increasing Scalability for the Engine that Drives webmd.com Matt Wilson Director, Consumer Web Operations, WebMD @mattwilsoninc 9/12/2013

About this talk Go over original site architecture and challenges How a request goes through the system Caching DB / NAS How were the challenges addressed Reasons why we picked the technology we did

About WebMD Technology WebMD, Medscape, MedicineNet, emedicine, UK cobrand Serving nearly 1 Billion Pageviews a month, 132 million unique visitors Running over 200 separate applications, vast majority in-house developed Environments: Dev/Devint, QA01/02, QA00, Production/DR Two main data centers, geographically diverse OS: mix of Linux and Windows Datastores: Sql Server, Oracle, Mongo, Vertica and mysql Web: mix of Apache and IIS App: mix of Tomcat, ASP,.Net 2.x - 4.x Service: ActiveMQ, Memcache,

Anatomy of a Request www.webmd.com/allergies WebMD User Layer 7 Switch Load Balancer WebMD Runtime Server Runtime DB Server (Clustered) NAS

Anatomy of a Request What assets/xml/xsl are associated with this URL? www.webmd.com/allergies Runtime DB Server (Clustered) WebMD User Layer 7 Switch Load Balancer WebMD Runtime Server NAS

Anatomy of a Request www.webmd.com/allergies Here you go sir. \\nasserver\blah\blah\ blah.xml \\nasserver\blah\blah\blah.xsl Runtime DB Server (Clustered) WebMD User Layer 7 Switch Load Balancer WebMD Runtime Server NAS

Anatomy of a Request www.webmd.com/allergies Runtime DB Server (Clustered) WebMD User Layer 7 Switch Load Balancer WebMD Runtime Server Fetch Content NAS

Anatomy of a Request www.webmd.com/allergies Runtime DB Server (Clustered) WebMD User Layer 7 Switch Load Balancer Return content Blah.xml Blah.xsl WebMD Runtime Server NAS

Anatomy of a Request www.webmd.com/allergies Server processes XML/ XSL and returns content to user Runtime DB Server (Clustered) WebMD User Layer 7 Switch Load Balancer WebMD Runtime Server NAS

Where s the cache bro? Page object cached in server s memory for 5 min

Where s the cache bro? Page object cached in server s memory for 5 min Widgets code snippets on page are cached at different variables. 60 min, 24 hours, 3 hours.

Where s the cache bro? Page object cached in server s memory for 5 min Widgets code snippets on page are cached at different variables. 60 min, 24 hours, 3 hours. Widget caching is determined at page design time in the content publishing system

Where s the cache bro? Page object cached in server s memory for 5 min Widgets code snippets on page are cached at different variables. 60 min, 24 hours, 3 hours. Widget caching is determined at page design time in the content publishing system Runtime system caches widgets on disk and/or in memory which is configurable in the publishing system

Caching, Caching, Caching How are existing cached objects updated?

Caching, Caching, Caching How are existing cached objects updated? How are in-memory page objects updated?

Caching, Caching, Caching Background thread calls NAS/DB for data and replaces object for widget and page cache pulls content Background thread works from a queue like data structure

This is all fine and dandy until

Caching, Caching, Caching What s the problem with this method? What does this method protect? Is it good enough?

Caching, Caching, Caching The background thread queues calls to the NAS/DB for update requests which creates a natural barrier to new content herding problems

Caching, Caching, Caching The background thread queues calls to the NAS/DB for update requests which creates a natural barrier to new content herding problems Not all web servers will get the updated content at the same time

Caching, Caching, Caching The background thread queues calls to the NAS/DB for update requests which creates a natural barrier to new content herding problems Not all web servers will get the updated content at the same time Additional web servers means more calls to the NAS/DB

Caching, Caching, Caching The background thread queues calls to the NAS/DB for update requests which creates a natural barrier to new content herding problems Not all web servers will get the updated content at the same time Additional web servers means more calls to the NAS/DB Individual Web Servers do not have the same cache

Caching, Caching, Caching The background thread queues calls to the NAS/DB for update requests which creates a natural barrier to new content herding problems Not all web servers will get the updated content at the same time Additional web servers means more calls to the NAS/DB Individual Web Servers do not have the same cache Publishing event could take up to an hour to refresh content

NAS Problem Problem Constraint: Still need a proven storage method Ubiquitous protocol Solution Constraint: 200 apps to update or not Use NAS as a backup method

What can replace a NAS Filestore? Does the solution need to provide SMB / NFS interface? Can we use something else

What can replace a NAS Filestore? Does the solution need to provide SMB / NFS interface? Can we use something else Remember, we have 200 apps to update

What can replace a NAS Filestore? Does the solution need to provide SMB / NFS interface? Can we use something else Remember, we have 200 apps to update Looked at Scality, Cassandra, MongoDB, Couchbase

What can replace a NAS Filestore? Does the solution need to provide SMB / NFS interface? Can we use something else Remember, we have 200 apps to update Looked at Scality, Cassandra, MongoDB, Couchbase

Why Couchbase? Memcached protocol already in use at WebMD Add servers to cluster without client reconfiguration Support for hundreds of thousands of transactions per second Content stored in memory fast set/get

We have to fail over the DB Cluster

Problem Constraints: Improve Availability Improve Scalability DB Problem Solution Constraints: No code updates. Needs to just work

Data Access Pattern narrow-read workload many copies on many nodes (above line borrowed from Theo) Database workload does not exceed server hardware All data is read only Load balancing works better than clustering in this pattern

DB Solution Read/only DB Calls Peer to Peer Replication Publishing System Writes to one SQL server

DB Solution Read/only DB Calls Peer to Peer Replication Publishing System Writes to one SQL server

Putting it all together Couchbase Persistent Data Store Read Only DB Requests WebMD Web Servers ActiveMQ NAS Publishing System

Putting it all together Content is written to DB / NAS and Couchbase. Couchbase gets same content as NAS. DB has metadata about the content. All are part of a transaction Couchbase Persistent Data Store WebMD Web Servers ActiveMQ NAS Publishing System

Putting it all together Couchbase Persistent Data Store WebMD Web Servers ActiveMQ NAS Publish Object IDs Publishing System

Putting it all together Couchbase Persistent Data Store Web Server Gets Object ID s off the Queue WebMD Web Servers ActiveMQ NAS Publishing System

Putting it all together Fetches xml/xsl content from Couchbase Couchbase Persistent Data Store WebMD Web Servers ActiveMQ NAS Publishing System

Putting it all together Web Server compiles page and stores cache objects on disk and in Couchbase Cache Object Couchbase Persistent Data Store WebMD Web Servers ActiveMQ NAS Publishing System

Add New Server Cache Object New Web Server gets cache objects from couchbase All Web Servers have the same cache Couchbase Persistent Data Store WebMD Web Servers ActiveMQ NAS Publishing System

Was it worth it? Fixed DB problem by using Peer to Peer replication and Load Balancing no code changes Fixed NAS problem by adding caching layer to reduce calls to NAS Fixed cache pull model with push model for the content publishing system reduces publishing times to all web servers in seconds and all web servers have the same cached content Serving content is now faster less latency Able to virtualize the web servers

Why was this Successful? Multi-Disciplined Team Operations, Development, QA and Project Management Buy-in from Senior Management Creative solutions within constraints resources, time, problem and solution Phased Implementation

Questions? www.webmd.com/careers