How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations



Similar documents
Building a large scale CDN with Apache Trafficserver. Jan van Doorn jan_vandoorn@cable.comcast.com

Web Application Hosting Cloud Architecture

Load Balancing Web Applications

Accelerating Wordpress for Pagerank and Profit

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

LinuxWorld Conference & Expo Server Farms and XML Web Services

Scalable Architecture on Amazon AWS Cloud

The Application Front End Understanding Next-Generation Load Balancing Appliances

LOAD BALANCING TECHNIQUES FOR RELEASE 11i AND RELEASE 12 E-BUSINESS ENVIRONMENTS

Architecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

Database Scalability and Oracle 12c

Configuring HAproxy as a SwiftStack Load Balancer

THE MASTER LIST OF DNS TERMINOLOGY. v 2.0

Implementing a secure high visited web site by using of Open Source softwares. S.Dawood Sajjadi Maryam Tanha. University Putra Malaysia (UPM)

GLOBAL SERVER LOAD BALANCING WITH SERVERIRON

Deployment Topologies

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon Max Putas

Load Balancing Microsoft Sharepoint 2010 Load Balancing Microsoft Sharepoint Deployment Guide

Copyright

Assignment # 1 (Cloud Computing Security)

Chapter 10: Scalability

Building Success on Acquia Cloud:

Linux VPS with cpanel. Getting Started Guide

Designing, Scoping, and Configuring Scalable Drupal Infrastructure. Presented by David Strauss

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

OnApp Cloud. The complete platform for cloud service providers. 114 Cores. 286 Cores / 400 Cores

DNS ROUND ROBIN HIGH-AVAILABILITY LOAD SHARING

Cisco Videoscape Distribution Suite Service Broker

Wikimedia architecture. Mark Bergsma Wikimedia Foundation Inc.

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

Lecture 8b: Proxy Server Load Balancing

CS 188/219. Scalable Internet Services Andrew Mutz October 8, 2015

Cloud Computing with Amazon Web Services and the DevOps Methodology.

Lab 5 Explicit Proxy Performance, Load Balancing & Redundancy

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

How To Run A Web Farm On Linux (Ahem) On A Single Computer (For Free) On Your Computer (With A Freebie) On An Ipv4 (For Cheap) Or Ipv2 (For A Free) (For

Monitoring Remedy with BMC Solutions

A simple object storage system for web applications Dan Pollack AOL

Barracuda Load Balancer Online Demo Guide

Yahoo! Communities Architectures Ian Flint

THE MASTER LIST OF DNS TERMINOLOGY. First Edition

CS514: Intermediate Course in Computer Systems

Wikimedia Architecture Doing More With Less. Asher Feldman Ryan Lane Wikimedia Foundation Inc.

CS312 Solutions #6. March 13, 2015

KT ucloud storage. Two Years of Life with OpenStack Swift / Jaesuk Ahn, Cloud OS Dev. Team, Korea Telecom

Getting Started with AWS. Hosting a Static Website

Apache Traffic Server Extensible Host Resolution

HW9 WordPress & Google Analytics

API documentation - 1 -

ISPS & WEBHOSTS SETUP REQUIREMENTS & SIGNUP FORM LOCAL CLOUD

Q&A Session for Understanding Atrium SSO Date: Thursday, February 14, 2013, 8:00am Pacific

Internet Content Distribution

the missing log collector Treasure Data, Inc. Muga Nishizawa

Security of IPv6 and DNSSEC for penetration testers

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

OVERVIEW. The complete IaaS platform for service providers

Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION

Overview. The OnApp Cloud Platform. Dashboard APPLIANCES. Used Total Used Total. Virtual Servers. Blueprint Servers. Load Balancers.

ZingMe Practice For Building Scalable PHP Website. By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

MID-TIER DEPLOYMENT KB

LOCKSS on LINUX. Installation Manual and the OpenBSD Transition 02/17/2011

Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence

IPV6 SERVICES DEPLOYMENT

Web Caching and CDNs. Aditya Akella

How To Understand The Power Of A Content Delivery Network (Cdn)

BASICS OF SCALING: LOAD BALANCERS

Real-Time Analysis of CDN in an Academic Institute: A Simulation Study

Global Server Load Balancing

Testing & Assuring Mobile End User Experience Before Production. Neotys

Maintaining Non-Stop Services with Multi Layer Monitoring

DEDICATED MANAGED SERVER PROGRAM

Migrating SaaS Applications to Windows Azure

CA Virtual Assurance/ Systems Performance for IM r12 DACHSUG 2011

DEPLOYMENT GUIDE Version 1.0. Deploying the BIG-IP LTM with Apache Tomcat and Apache HTTP Server

Investment Management System. Connectivity Guide. IMS Connectivity Guide Page 1 of 11

The deployment of OHMS TM. in private cloud

v7.8.2 Release Notes for Websense Content Gateway

Practical Load Balancing

Purpose-Built Load Balancing The Advantages of Coyote Point Equalizer over Software-based Solutions

DPtech ADX Application Delivery Platform Series

CHAPTER 4 PERFORMANCE ANALYSIS OF CDN IN ACADEMICS

VMware vcenter Log Insight Getting Started Guide

Managed Servers ASA Extract FY14

Scaling Analysis Services in the Cloud

Chapter 2 TOPOLOGY SELECTION. SYS-ED/ Computer Education Techniques, Inc.

GeoCloud Project Report USGS/EROS Spatial Data Warehouse Project

CLOUD BASED SERVICE (CBS STORAGE)

Technical Overview Simple, Scalable, Object Storage Software

Cloud Computing Trends

Getting Started with AWS. Static Website Hosting

The last 18 months. AutoScale. IaaS. BizTalk Services Hyper-V Disaster Recovery Support. Multi-Factor Auth. Hyper-V Recovery.

Stackato PaaS Architecture: How it works and why.

SwiftStack Filesystem Gateway Architecture

1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment?

Transcription:

How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations Jan van Doorn Distinguished Engineer VSS CDN Engineering 1

What is a CDN? 2 Content Router get customer to best cache for his requested content in his location Health Protocol a way to tell CR what caches are able to take work Hundreds of Caches the HTTP/1.1 compatible work horses in multiple tiers and edge locations Management and Monitoring System a way to manage a geographically disperse set of servers Reporting System log file analysis of edge, mid and CR contacts

Our Overall Design Goals 3 Open standards based No vendor or system lock-in Cost effective All customer facing parts are IPv6 and IPv4 Horizontally scalable Optimized for video, but not exclusively for video Loosely coupled components, stateless 100% availability, handle component failure gracefully Simple

The Caches Hardware & OS 4 Off the shelf hardware - ride Moore's Law! Spinning disks (!) 24 900Gb SATA disks for caching 2 mirrored OS drives Lots of memory, for linear TV 1x10GE initially, 2x10GE upgrades being planned Connected to Aggregation Routers (first application to do so) Linux CentOS 6.1 / 6.2

The Caches - Software 5 Any HTTP 1.1 Compliant cache will work We chose Apache Traffic Server () Top Level Apache project (NOT httpd!) Extremely scalable and proven Very good with our VOD load Efficient storage subsystem uses raw disks Extensible through plugin API Vibrant development community Added handful of plugins for specific use cases

The Comcast Content Router (CCR) Tomcat Java application built in-house Multiple VMs around the country in DNS Round Robin Routes by DNS, HTTP 302, or REST Can route based on: 6 Regexp on URL host name (DNS and HTTP 302 redirect) Regexp on URL Path and headers (HTTP 302 redirect) Client location Coverage Zone File from network Geo IP lookup Edge cache health Edge cache load

The Health Protocol Rascal Server 7 Tomcat Java application built in-house HTTP GETs vital stats from each cache every 5 seconds Modified stats_over_http plugin on caches exposes app and system stats Determines and exposes state of caches to CRs Allows for real time monitoring / graphing of CDN stats Exposes 5 min avg/min/max to NE&TO Service Performance Database Redundant by having 2 instances running independent of each other. CRs pick one randomly

Configuration Management 8 Twelve Monkeys tool built in-house Web based jquery UI Mojolicious Perl framework MySQL database REST interfaces Integrated into standard Ops methods and best practices from day one Monitoring from Health Protocol through Rascal server

Log files and Reporting 9 Splunk> The only commercial product we used Well defined interfaces No vendor lock-in possible Easy to get started, may move to Open Source on this later ipcdn usage metrics by delivery service

Choosing Open Source Components 10 License Functionality Performance Plugin architecture / flexibility Development Community Vibrant, active project Friendly / Helpful

What About Support? 11 Most Open Source projects now have third parties selling support The active community is really important here Keep dev close to ops (DevOps model) DIY surgical patches for your problem are usually much faster than a release (from either a vendor, or a Open Source project) Get someone on staff to become part of the Open Source community

Comcast ipcdn NE&TO SPDB Splunk> logs 12M Config RASCAL CCR N x cache per mid cluster NE&TO nagios ORIGIN N x cache per mid cluster Mid tier caches are fwd proxies 12 N x cache per edge cluster 14 total edge clusters N x cache per edge cluster Edge caches are reverse proxies Client

Comcast ipcdn Summary 13 Comcast Content Router Caches in hierarchy Stateless 3 Mid Tier locations DNS Round Robin 14 Edge Tier locations Rascal Health Monitoring 7 12 Caches / location 12 Monkeys Configuration Management 22TB disk / cache No clustering, content affinity by url hash Small number of plugins Splunk Logging Collection and Analytics

National Engineering & Technical Operations 1

Comcast ipcdn - Health NE&TO SPDB Splunk> logs H3 12M Config RASCAL H2 CCR H1 NE&TO nagios H1 ORIGIN 15 Client

Comcast ipcdn - Health H1: Rascal does HTTP GET for key application and system stats from an plugin every 5 second H2: CCR does HTTP GET of combined edge status every 1 second H3: SPDB front-end pulls avg/min/max of certain key stats into SPDB by HTTP GET every 5 mins Rascal is redundant by having 2 separate instances of it running completely independent of each other 16

Comcast ipcdn - Config NE&TO SPDB Splunk> logs NE&TO nagios 12M Config C2 C3 CCR C1 ORIGIN 17 RASCAL Client

Comcast ipcdn - Config C1: Each Cache does a HTTP GET for it's configuration every 15 minutes; auto-applies safe changes C2: Each Rascal does a HTTP GET for it's configuration every X seconds; auto-applies C3: Each CCR does a HTTP GET for it's configuration every X seconds; auto-applies. 12 Monkeys is not redundant at this time 18

Comcast ipcdn Client Request (DNS) NE&TO SPDB Splunk> logs 12M Config RASCAL CCR R2 NE&TO nagios DNS resolver R1 R4 R3 R5 ORIGIN 19 Client

Comcast ipcdn Client Request (DNS) R1: Client does DNS Q for host in delivery URL R2: DNS resolver queries DNS auth servers, last one is CCR; CCR answers with IP address of best cache based on DNS Resolver IP R3: Client does HTTP GET for /path to edge, edge applies reverse proxy rule to map to org server name R4: On miss, edge selects mid from parent location based on /path hash, does HTTP GET for http://org/path (fwd proxy request) R5: On miss mid does HTTP GET for /path on org 20

Comcast ipcdn Client Request (302) NE&TO SPDB Splunk> logs 12M Config RASCAL CCR R2 NE&TO nagios DNS resolver R3 R5 R1 R4 R6 ORIGIN 21 Client

Comcast ipcdn Client Request (302) R1: Client does DNS Q for host in delivery URL R2: DNS resolver queries DNS auth servers, last one is CCR; CCR returns it's own IP R3: Client does HTTP GET for /path to CCR, CCR responds with 302 of best edge cache R4: Client does HTTP GET for /path to edge, edge applies reverse proxy rule to map to org server name R5: On miss, edge selects mid from parent location based on /path hash, does HTTP GET for http://org/path (fwd proxy request) R6: On miss mid does HTTP GET for /path on org 22

Comcast ipcdn Client Request (ALT) NE&TO SPDB Splunk> logs 12M Config RASCAL CCR R2 NE&TO nagios DNS resolver R3 R5 R1 R4 R6 ORIGIN 23 Client

Comcast ipcdn Client Request (ALT) R1: Client does DNS Q for host in delivery URL R2: DNS resolver queries DNS auth servers, last one is CCR; CCR returns it's own IP R3: Client does HTTP GET for /path?formatjson to CCR, CCR responds with 200 OK and json pointing to best edge cache R4: Client does HTTP GET for /path to edge, edge applies reverse proxy rule to map to org server name R5: On miss, edge selects mid from parent location based on /path hash, does HTTP GET for http://org/path (fwd proxy request) R6: On miss mid does HTTP GET for /path on org 24

ipcdn and IPv6 25 Client decides to do IPv6 by doing a AAAA DNS query; if it gets a response on the AAAA request, and has a non link-local IPv6 address itself, it'll use that. If not, it'll fall back to IPv4 All hostnames and domain names are the same for IPv6 and IPv4 All Edge contact points are be IPv6 On the Edge cache, the same remap rule applies for IPv6 and IPv4 Our Mid <> Edge traffic is IPv4 initially Our Mid <> ORG traffic is IPv4 Our management and monitoring traffic is IPv4

Questions / Links jan_vandoorn@cable.comcast.com NETO-VSS-CDNENG@cable.comcast.com 26 : http://trafficserver.apache.org/ OmniTI ( support): http://omniti.com/ Mojolicious: http://mojolicio.us/ jquery: http://jquery.com/

National Engineering & Technical Operations 2