Rigorous Performance Testing on the Web Grant Ellis Senior Performance Architect, Instart Logic
Who is Instart Logic? Software company focused on Application Delivery We work with globally known brands whose business depends on performance, and make their sites and apps really fast Team includes big data, virtualization and web performance experts from Google, Facebook, Akamai, Cisco, Citrix, VMware, and Aster Data
Who Uses Instart Logic?
Response Time: 4.98 seconds How was the data collected? Aggregated? Normalized? What is response time? What does that mean for the users? Did any actual human beings see this response time? What devices/browsers were used? Laptop? Phone? Tablet? Where were the users located?
Performance Testing: Two Truths 1. Methodology matters more than results 2. Statistical analysis can (and sometimes does) lie. Ø It is really easy to Ø make great results look poor, Ø make poor results look great, Ø either deliberately or accidentally.
Table of Contents The Internet, The Bottleneck, and The Test: A brief history Last-Mile Performance Tools (It s dangerous to go alone!) Now I have data Lots of data But, wait, there s more (data)! Need more? Meet the CDF. Tie it all together
First: A quick network primer! Need For Speed: Packet Edition, created by Raphaël Luta http://www.aptiwan.com/packetstory/
The Internet, The Bottleneck, and The Test: A brief history The Dawn of the (World Wide) Web Adoption viable for commerce and business Performance detractors: - Weak server hardware - Clumsy scaling technology - Poor first-mile connectivity Primary Bottlenecks: - Hardware - First-mile connectivity
The Internet, The Bottleneck, and The Test: A brief history ISP ADC LAST MILE MIDDLE MILE FIRST MILE HARDWARE Bottleneck Bottleneck Repeatedly loads whole pages. Measured performance takes into account the page, the embedded objects, and the server latency introduced by a then-traditional three-tier architecture.
The Internet, The Bottleneck, and The Test: A brief history Data center scale was conquered. Adoption on the web increased again: - Google, Facebook, fully-baked e-commerce, others - Governments digitized records and moved vital functions to the Web Performance detractors: - Middle-mile copper - Congested switches - Poorly maintained peering points Primary Bottlenecks: - Middle-mile
The Internet, The Bottleneck, and The Test: A brief history CDN ISP ADC LAST MILE MIDDLE MILE FIRST MILE HARDWARE Bottleneck Backbone products from Gomez and Keynote Enables ongoing performance testing (e.g. monitoring) from multiple geographies at the same time. Beware: Some content delivery networks have taken care to place their nodes on the same network, or even the same rack, as synthetic testing nodes. Look for unrealistically low response times in your embedded objects!
The Internet, The Bottleneck, and The Test: A brief history CDN ISP ADC LAST MILE MIDDLE MILE FIRST MILE HARDWARE Bottleneck Last mile latency, packet loss Browser mechanics
The Application Delivery Challenge Today 250 200 Latency (ms) 150 100 50 0 Wired LTE WiFi 4G 3G High Performance Browser Networking by Ilya Grigorik, Figures 7-16 and 10-6 Available for free online: http://chimera.labs.oreilly.com/books/1230000000545/index.html
Table of Contents The Internet, The Bottleneck, and The Test: A brief history Last-Mile Performance Tools (It s dangerous to go alone!) Now I have data Lots of data But, wait, there s more (data)! Need more? Meet the CDF. Tie it all together
Last-Mile Performance Tools JMeter and LoadRunner measure: From a single geography (usually on-premise) With a single browser Keynote backbone / Gomez backbone: Report only on average Use fixed (backbone) connectivity Still simulate data (It s dangerous to go alone!) None of the above measure: Multiple devices Multiple connection types True user experience Impact from wireless technologies So, we need more tools!
Last-Mile Performance Tools (It s dangerous to go alone!) Synthetic Testing Real User Monitoring (RUM) boomerang.js Pros User Experience metrics Open source! Multiple device types Multiple connection types (traffic shaping) Great reports Captures waterfall diagrams Pros True user experience Easy set-up Great browser support Multiple device types Multiple connection types Open source tools available Cons Cons Limited analysis tools Difficult to monitor performance Platform stability It s still synthetic Requires live traffic - Responsive, not preemptive Measurement impacts results Safari data is limited Outliers are can be extreme and must be removed
First: New vocab for last-mile tools Fully Loaded - Entire page has been loaded - Including asynchronous functions like analytics beacons. - The browser hasn t utilized the Internet Connection for a while - Generally transparent from a users perspective For a long time, fully loaded is all we had. With mature client-side technologies, the Fully Loaded metric is much less relevant: Does not take into account browser mechanics Fires after connection is disused nothing to do with user experience!
First: New vocab for last-mile tools Fully Loaded - Entire page has been loaded - Including asynchronous functions like analytics beacons. - The browser hasn t utilized the Internet Connection for a while - Generally transparent from a users perspective Document Complete (or Onload) - The page is assembled by the browser and ready for the user. - (Almost) always visually complete - User can use the scroll bars, click links, or search. - The browser may still be doing things in the background. Some sites defer loading of prominent content until after document complete. Some Front-End Optimization (FEO) packages defer script execution for document complete. In this case, an interactive site may look visually complete at document complete, but won t actually be responsive or usable until after those scripts execute!
First: New vocab for last-mile tools Fully Loaded - Entire page has been loaded - Including asynchronous functions like analytics beacons. - The browser hasn t utilized the Internet Connection for a while - Generally transparent from a users perspective Document Complete (or Onload) - The page is assembled by the browser and ready for the user. - (Almost) always visually complete - User can use the scroll bars, click links, or search. - The browser may still be doing things in the background. Start Render (or Render Start) - Browser paints something (anything) on the screen. - May be all or most of the page, or a single image, or a single paragraph, or a single pixel. - The moment your user knows that the web site is actually working. - Abandonment (usually) happens before Start Render.
First: New vocab for last-mile tools Load Time Otherwise known as Document Complete. First Byte Network latency plus server latency. Start Render Otherwise known as Render Start. Transparent for users. Critical path for all browser functions Minimize wherever possible. Visually Complete All visual components of the page are painted on the screen. Speed Index Loosely, the average time for visual components to be painted on the screen. Fully Loaded The same Fully Loaded. The Browser stops using the connection.
First: New vocab for last-mile tools Load Time Otherwise known as Document Complete. First Byte Network latency plus server latency. Start Render Otherwise known as Render Start. Visually Complete All visual components of the page are painted on the screen. Speed Index Loosely, the average time for visual components to be painted on the screen. Fully Loaded The same Fully Loaded. The Browser stops using the connection. BEWARE: Visually complete is not the same as functional. Some Front-End Optimizations defer JavaScript execution to make the page look visually complete faster but users may not be able to click links, scroll the window, or search!
First: New vocab for last-mile tools Load Time Otherwise known as Document Complete. First Byte Network latency plus server latency. Start Render Otherwise known as Render Start. Visually Complete All visual components of the page are painted on the screen. More technically: the integration of the area above the curve if all paint events are plotted (lower is better). The same warnings around visual completeness apply. Sites with great speed indexes are not necessarily functional as quickly as they are visible. Speed Index Loosely, the average time for visual components to be painted on the screen. Fully Loaded The same Fully Loaded. The Browser stops using the connection.
Table of Contents The Internet, The Bottleneck, and The Test: A brief history Last-Mile Performance Tools (It s dangerous to go alone!) Now I have data Lots of data But, wait, there s more (data)! Need more? Meet the CDF. Tie it all together
Now I have data lots of data Over 6,000 data points. à What can we do with this?
Possible interpretations Average Median Standard Devia/on blue 8.947 7.323 4.792 red 9.239 7.168 5.357 green 8.155 6.977 4.844 purple 14.104 Over 13.109 6,000 data points. 4.397 à Gross oversimplification May be useful. But, look at how the graph changes with slightly different cuts. à Could be highly misleading!
Table of Contents The Internet, The Bottleneck, and The Test: A brief history Last-Mile Performance Tools (It s dangerous to go alone!) Now I have data Lots of data But, wait, there s more (data)! Need more? Meet the CDF. Tie it all together
But, wait! There s more (data)! None of these representations capture the whole picture! There are hundreds of permutations of variability- different: Internet connection types Devices Browsers Geographies Wireless connection quality Computing power And then, there s the natural variability of the Internet. Plots over time usually aren t that relevant for web performance: Oversimplification sometimes misleading! Rarely actionable
But, wait! There s more (data)! We can t take all these things and distill them into one number, or even one number plotted over time. Enter the histogram: The histogram expresses how many users experienced a particular page load time.
But, wait! There s more (data)! We can t take all these things and distill them into one number, or even one number plotted over time. Enter the histogram: Taller bars mean that more users saw the load time in that interval.
But, wait! There s more (data)! We can t take all these things and distill them into one number, or even one number plotted over time. Enter the histogram: Shorter bars mean that fewer users saw the load time in that interval.
But, wait! There s more (data)! We can t take all these things and distill them into one number, or even one number plotted over time. Enter the histogram: Faster transaction times are on the left side of the histogram.
But, wait! There s more (data)! We can t take all these things and distill them into one number, or even one number plotted over time. Enter the histogram: When the taller bars are on the left side, it means that more users saw a fast experience. If you are comparing two experiences, plot the histograms on the same chart!
But, wait! There s more (data)! We can t take all these things and distill them into one number, or even one number plotted over time. Enter the histogram: Red is definitely faster than blue: Fast users got faster Medium users got faster Slow users got faster
Table of Contents The Internet, The Bottleneck, and The Test: A brief history Last-Mile Performance Tools (It s dangerous to go alone!) Now I have data Lots of data But, wait, there s more (data)! Need more? Meet the CDF. Tie it all together
Need More? Meet the Cumulative Distribution Function (CDF) We all love histograms: - Everything is represented - Easy to consume But, they still have shortcomings: - Finite granularity - Arbitrary bucket designations Maybe we need something a little more high-octane!
Need More? Meet the Cumulative Distribution Function (CDF) The Cumulative Distribution Function (CDF) expresses the percentage of page loads completed after a given amount of elapsed time.
Need More? Meet the Cumulative Distribution Function (CDF) So, for blue, approximately 20% of page loads were completed in 5 seconds or less. 5s
Need More? Meet the Cumulative Distribution Function (CDF) Slightly less than 70% of transactions were done in 10 seconds or less. 10s
Need More? Meet the Cumulative Distribution Function (CDF) As with histograms, a better (faster) CDF is one with a curve to the left and above this one. Two data sets are easy to compare!
Need More? Meet the Cumulative Distribution Function (CDF) The red line is higher and more to the left. A greater percentage of users are done with their page load at any given time.
Need More? Meet the Cumulative Distribution Function (CDF) The gap between the lines is the differential. Right here, only 80% of blue users were done with their page load. After the same amount of time, more than 90% of red users were done.
Need More? Meet the Cumulative Distribution Function (CDF) The red curve is above and to the left of the blue curve in all cases. Red is faster for all users.
Table of Contents The Internet, The Bottleneck, and The Test: A brief history Last-Mile Performance Tools (It s dangerous to go alone!) Now I have data Lots of data But, wait, there s more (data)! Need more? Meet the CDF. Tie it all together
Tie it all together The Internet is a jungle. Methodology matters more than results. Statistics can lie. Pick your tool wisely. Irrelevant metrics mislead. Performance is never a single number. Powerful visualizations trump aggregate figures. Spreadsheets are your friend.
Thanks! http://grant.ellis.io grant@instartlogic.com