Measuring CDN Performance. Hooman Beheshti, VP Technology

Similar documents

Getting a Grip on CDN Performance Why & How.

AKAMAI WHITE PAPER. Delivering Dynamic Web Content in Cloud Computing Applications: HTTP resource download performance modelling

Web Caching and CDNs. Aditya Akella

Rigorous Performance Testing on the Web. Grant Ellis Senior Performance Architect, Instart Logic

Testing & Assuring Mobile End User Experience Before Production. Neotys

Internet Content Distribution

Request Routing, Load-Balancing and Fault- Tolerance Solution - MediaDNS

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Accelerating Wordpress for Pagerank and Profit

How To Understand The Power Of A Content Delivery Network (Cdn)

Meeting the challenges of modern website performance Developments in monitoring strategies

Internet Content Distribution

Content Delivery Networks. Shaxun Chen April 21, 2009

Monitoring the Real End User Experience

Dynamic Content Acceleration: Lightning-Fast Web Apps with Amazon CloudFront and Amazon Route 53

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

CSC2231: Akamai. Stefan Saroiu Department of Computer Science University of Toronto

DATA COMMUNICATOIN NETWORKING

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

Rapid IP redirection with SDN and NFV. Jeffrey Lai, Qiang Fu, Tim Moors December 9, 2015

Content Delivery Networks

HIGH-SPEED BRIDGE TO CLOUD STORAGE

CSE 135 Server Side Web Languages Lecture # 12. Web Performance Notes

First Midterm for ECE374 03/24/11 Solution!!

Q: What is the difference between the other load testing tools which enables the wan emulation, location based load testing and Gomez load testing?

Measuring the Web: Part I - - Content Delivery Networks. Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD

Content Distribu-on Networks (CDNs)

Optimize Your Microsoft Infrastructure Leveraging Exinda s Unified Performance Management

Global Server Load Balancing

Front-End Performance Testing and Optimization

Experimentation with the YouTube Content Delivery Network (CDN)

HOW IS WEB APPLICATION DEVELOPMENT AND DELIVERY CHANGING?

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Cache All The Things

making drupal run fast

Why Mobile Performance is Hard

Update logo and logo link on A Master. Update Date and Product on B Master

End User Monitoring. AppDynamics Pro Documentation. Version Page 1

From Internet Data Centers to Data Centers in the Cloud

CDN and Traffic-structure

In Memory Accelerator for MongoDB

Application Performance Monitoring (APM) Technical Whitepaper

The Value of Content Distribution Networks Mike Axelrod, Google Google Public

The Effectiveness of Request Redirection on CDN Robustness

(An) Optimal Drupal 7 Module Configuration for Site Performance JOE PRICE

Characterizing and Mitigating Web Performance Bottlenecks in Broadband Access Networks

Site24x7: Powerful, Agile, Cost-Effective IT Management from the Cloud. Ensuring Optimal Performance and Quality Web Experiences

CS514: Intermediate Course in Computer Systems

First Midterm for ECE374 02/25/15 Solution!!

Netcraft Analysis: Online Speed Testing Tools

D. SamKnows Methodology 20 Each deployed Whitebox performs the following tests: Primary measure(s)

Content Delivery Networks and their interconnection (tutorial) Dr. M. Oskar van Deventer

Teridion. Rethinking Network Performance. The Internet. Lightning Fast. Technical White Paper July,

GLOBAL SERVER LOAD BALANCING WITH SERVERIRON

Solbox Cloud Storage Acceleration

A Talari Networks White Paper. Turbo Charging WAN Optimization with WAN Virtualization. A Talari White Paper

Content Delivery and the Natural Evolution of DNS

QUIC. Quick UDP Internet Connections. Multiplexed Stream Transport over UDP. IETF-88 TSV Area Presentation

Magento Performance Optimization Whitepaper

Key Components of WAN Optimization Controller Functionality

Distributed Systems. 24. Content Delivery Networks (CDN) 2013 Paul Krzyzanowski. Rutgers University. Fall 2013

Measuring and Mitigating Web Performance Bottlenecks in Broadband Access Networks

Content Delivery Networks

Web Page Response Time

Titolo del paragrafo. Titolo del documento - Sottotitolo documento The Benefits of Pushing Real-Time Market Data via a Web Infrastructure

DNS, CDNs Weds March Lecture 13. What is the relationship between a domain name (e.g., youtube.com) and an IP address?

Demand Routing in Network Layer for Load Balancing in Content Delivery Networks

Content Delivery Networks (CDN) Dr. Yingwu Zhu

Unibet.com Architecture

SiteCelerate white paper

FortiBalancer: Global Server Load Balancing WHITE PAPER

First Midterm for ECE374 03/09/12 Solution!!

Enabling Media Rich Curriculum with Content Delivery Networking

End-User Mapping: Next Generation Request Routing for Content Delivery

Edge-based Performance Analysis. Susan Hinrichs, Eric Schwartz, François Pesce, Huan Yang, Dan States ATS Summit Fall 2015

FIVE WAYS TO OPTIMIZE MOBILE WEBSITE PERFORMANCE WITH PAGE SPEED

A TECHNICAL REVIEW OF CACHING TECHNOLOGIES

Content Delivery Network. Version 0.95

Top 10 reasons your ecommerce site will fail during peak periods

Simple Tips to Improve Drupal Performance: No Coding Required. By Erik Webb, Senior Technical Consultant, Acquia

Transcription:

Measuring CDN Performance Hooman Beheshti, VP Technology

Why this matters Performance is one of the main reasons we use a CDN Seems easy to measure, but isn t Performance is an easy way to comparison shop Nuanced Metric overload

Common mistakes Getting lost in data Focusing on one thing and one thing alone We like numbers! Forgetting about our applications Letting vendors influence us (I m fully aware of the irony!)

Goals Share measurement experiences What does the measurement landscape look like Help guide towards good testing plan Avoid pitfalls

Background

CDNs: delivery platforms We won t talk about Extra stuff: security, GSLB, TLS offload, etc We also won t talk about page- level optimizations We will focus on the delivery side Delivering HTTP objects

Delivery: static/cached objects CDN Node Client Origin

Delivery: dynamic/uncached objects

What we ll be focusing on How we measure Metrics to measure What to measure Some gotchas, misconceptions, and common mistakes

Measurement Techniques (how we measure)

Measurement techniques Pretend Users Synthetic tests Not actual users Real Users In the browser Actual users

Synthetic testing

Synthetic testing Usually a large network of test nodes all over the globe Highly scalable, can do lots of tests at once Many vendors that have this model Examples: Catchpoint, Dynatrace(Gomez), Keynote, Pingdom, etc

Synthetic testing Built to do full performance and availability testing Lots of monitors emulating what real users do DNS, Traceroute, Ping, Streaming, Mobile HTTP Object Browser Transactions/Flows Tests set up with some frequency to repeatedly test things Aggregates reported

Backbone nodes Test machines sitting in datacenters all around the globe Terrible indicators of raw performance No latency Infinite bandwidth But really good at: Availability Scale Backend problems Global reach

https://www.flickr.com/photos/stars6/4381851322/

Backbone nodes Test machines sitting in datacenters all around the globe Terrible indicators of raw performance No latency Infinite bandwidth But really good at: Availability Scale Backend problems Global reach

Last mile nodes Test machines sitting behind a real home- like internet connection Much better at reporting what you can expect from users, but sometimes unreliable Also not as dense in deployment

backbone last mile

Synthetic testing Pros Geographic distribution Lots of options for testing Really good for on- the- spot troubleshooting Last- mile nodes can be pretty good proxies for performance Cons Not real users! Backbone nodes can be misleading

Real users (RUM)

RUM Use javascript to collect timing metrics Can collect lots of things through browser APIs Page metrics, asset metrics, user- defined metrics

Use test assets Use this model to initiate tests in the browser Some vendors: Cedexis, TurboBytes, CloudHarmony, more Usually, this isn t their business, but the data drives their main business objectives You can build this yourself too

Use real assets in the page Collect timings from actual objects Resource timing Vendors SOASTA, New Relic, most synthetic vendors Boomerang (open source) Google Analytics User Timings

DATA, DATA, DATA For either RUM technique, we need A LOT of data Too many variances Most vendors don t use averages Medians and percentiles

Real user measurements Pros Real users, real browsers, real world conditions If you use your own content, could be close to what your users experience With enough data, great for granular analysis Cons We need a lot of data If you do it yourself, data infrastructures aren t trivial

Measurement Metrics

Client Server

Client Server 1 x RTT

Client Server DNS DNS

Client Server DNS DNS TCP

Client Server DNS DNS TCP (TLS)

Client Server DNS DNS TCP (TLS) HTTP

Client Server DNS DNS TCP (TLS) HTTP Download

Time DNS TCP (TLS) TTFB Download (TTLB- TTFB)

Time DNS TCP (TLS) TTFB Download (TTLB- TTFB) DNS RTT to DNS server, DNS iterations, DNS caching and TTLs

Time DNS TCP (TLS) TTFB Download (TTLB- TTFB) DNS RTT to DNS server, DNS iterations, DNS caching and TTLs TCP RTT to cache server (CDN footprint & routing algorithms)

Time DNS TCP (TLS) TTFB Download (TTLB- TTFB) DNS RTT to DNS server, DNS iterations, DNS caching and TTLs TCP RTT to cache server (CDN footprint & routing algorithms) (TLS) RTT to cache server (or RTTs depending on TLS False Start), efficiency of TLS engine

Time DNS TCP (TLS) TTFB Download (TTLB- TTFB) DNS RTT to DNS server, DNS iterations, DNS caching and TTLs TCP RTT to cache server (CDN footprint & routing algorithms) (TLS) RTT to cache server (or RTTs depending on TLS False Start), efficiency of TLS engine TTFB RTT to where the object is stored + storage efficiency (different for requests to origin); lower bound = network RTT

Time DNS TCP (TLS) TTFB Download (TTLB- TTFB) DNS RTT to DNS server, DNS iterations, DNS caching and TTLs TCP RTT to cache server (CDN footprint & routing algorithms) (TLS) RTT to cache server (or RTTs depending on TLS False Start), efficiency of TLS engine TTFB RTT to where the object is stored + storage efficiency (different for requests to origin); lower bound = network RTT TTLB- TTFB Bandwidth, congestion avoidance algorithms (and RTT!)

Core object metrics Not every request experiences every metric: DNS: once per domain TCP/TLS once per connection HTTP/Download for every object (not already in browser cache) All techniques/tools measure and report these metrics

Resource timing http://www.w3.org/tr/resource- timing/

Resource timing window.performance.getentries()

object metrics or page metrics

Download: 15Mbps Upload: 5Mbps Latency: 10 ms, 25 ms

10 msec 25 msec

10 msec 25 msec

onload Speed Index Start Render 10 msec 25 msec

What the??? We always assume all things equal Too many factors affect page load time 3 rd parties (sometimes varying), content form origin, layout, JS execution, etc Too much variance

3 rd parties Source: httparchive.org

To be clear Always use webpagetest (or something like it) to understand your application s performance profile Continue to monitor application performance, and always spot check Be extremely careful when using it to gage/compare CDN performance, it can mislead you If using RUM to measure page metrics, with lots of data, things become a little more meaningful (data volume handles variance)

What to measure (plus the right metrics)

Most commonly Pick a normal object e.g. some object on the home page Set up testing from multiple places (usually with a synthetic vendor) And hopefully not backbone! Compare either overall load time, or some object metrics

Totally application- dependent

Example: web application

Web application: objects Ratio of objects coming from CDN cache vs those coming from origin (through CDN) should determine objects to test If HTML is from origin, we must measure it Essential to critical page metrics

Web application: object sizes

Web application: metrics On any page DNS queries only happen a small number of times 6 TCP connections per domain Many many many HTTP fetches Core metrics TTFB Download (TTLB- TTFB) if important large objects Should have a good idea of DNS/TCP/TLS, but less critical

Web application If CDN only for static/cacheable objects: One or two representative assets TTFB and maybe download most important Client CDN Node

X-Cache: HIT

Web application If CDN also for whole site Sample of key HTML pages, delivered from origin TTFB will show efficiency of routing (and connection management) to origin TTLB will show efficiency of delivery Client CDN Node Web Server

Web application If CDN also for whole site Sample of key HTML pages, delivered from origin TTFB will show efficiency of routing (and connection management) to origin TTLB will show efficiency of delivery Client CDN Node CDN Node Web Server

Example: software download

Software download: objects Pick a standard file that users will be downloading Representative file size Also pick something you expect to be on the CDN but not fetched all that often More on this later.

Software download: metrics Ratio of TCP- to- HTTP is closer to 1-1 Especially if you have a dedicated download domain Could mean the same for DNS For large files, we care about download time Core metrics: TTFB (+ TCP and maybe DNS, if applicable) TTLB TTLB- TTFB will usually be a larger component

Bandwidth/Download measurements

Download time Careful about where you expect this download to happen in the lifetime of a TCP connection In the beginning of the connection Function of init_cwnd and TCP slow start (and RTT) Later in the connection Function of congestion avoidance and bandwidth Large files will experience both

bytes time

Download time Careful about where you expect this download to happen in the lifetime of a TCP connection In the beginning of the connection Function of init_cwnd and TCP slow start (and RTT) Later in the connection Function of congestion avoidance and bandwidth Large files will experience both

Cache hit ratios

Cache hit ratio: traditional calculation 1 - Requests to Origin Total Requests

Origin

Cache Origin

Cache TCP Origin

Cache HTTP Origin

Cache HTTP Origin

Cache HTTP Origin

Cache HTTP Origin

Cache HTTP Origin

Cache Origin HOT COLD

Cache Origin cache hit

Cache hit ratio: traditional calculation 1 - Requests to Origin Total Requests

Isn t this better? Hits Total Requests @edge

Isn t this better? Hits Hits + Misses @edge

Cache hit ratio 1 - Requests to Origin Total Requests vs. Hits Hits + Misses @edge

Cache hit ratio 1 - Requests to Origin Total Requests vs. Hits Hits + Misses @edge Offload

Cache hit ratio 1 - Requests to Origin Total Requests vs. Hits Hits + Misses @edge Offload Performance

Effect on long tail content

Effect on long tail content (long tail: Cacheable but seldom fetched)

Popular Medium Tail (1hr) Long tail (6hr)

Popular Medium Tail (1hr) Long tail (6hr) Connect (median) Popular 1hr Tail 6hr Tail 14msec 15msec 16msec

Popular Medium Tail (1hr) Long tail (6hr) 77,000+ measurements 38,000+ measurements 6,400+ measurements Popular 1hr Tail 6hr Tail Connect (median) 14msec 15msec 16msec

Popular Medium Tail (1hr) Long tail (6hr) 77,000+ measurements 38,000+ measurements 6,400+ measurements Popular 1hr Tail 6hr Tail Connect (median) 14msec 15msec 16msec

Popular Medium Tail (1hr) Long tail (6hr) 77,000+ measurements 38,000+ measurements 6,400+ measurements Connect (median) Wait (median) Popular 14msec 19msec 1hr Tail 15msec 26msec 6hr Tail 16msec 32msec

Isn t this better? Popular Medium Tail (1hr) Long tail (6hr)

Now

Does all this really matter?

Does all this really matter? Yes, but

The bigger picture It s really easy to lock in on a metric Performance absolutely matters True performance isn t always as easy to measure

Storage model and long tail content

Cache hit ratios for offload and performance

Footprint

HTTP vs TLS footprint

Caching something you didn t think you could cache

Serve stale content if necessary

Key takeaways Everything is application- dependent Evaluate how your application works and what impacts performance the most Don t get locked into a single number Always know your application performance and bottlenecks Be mindful of the bigger picture!

Thank you! hooman@fastly.com