CSE 135 Server Side Web Languages Lecture # 12. Web Performance Notes

Similar documents
Front-End Performance Testing and Optimization

Accelerating Wordpress for Pagerank and Profit

SiteCelerate white paper

Measuring CDN Performance. Hooman Beheshti, VP Technology

Web Performance. Lab. Bases de Dados e Aplicações Web MIEIC, FEUP 2014/15. Sérgio Nunes

Improving Magento Front-End Performance

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

FIVE WAYS TO OPTIMIZE MOBILE WEBSITE PERFORMANCE WITH PAGE SPEED

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014

Update logo and logo link on A Master. Update Date and Product on B Master

Internet Content Distribution

Web Performance. Sergey Chernyshev. March '09 New York Web Standards Meetup. New York, NY. March 19 th, 2009

How To Understand The Power Of A Content Delivery Network (Cdn)

How To Optimize Your Website With Radware Fastview

Drupal Performance Tuning

Magento Performance Optimization Whitepaper

Why Mobile Performance is Hard

Content Delivery Networks (CDN) Dr. Yingwu Zhu

79 Tips and Tricks for Magento Performance Improvement. for Magento Performance Improvement

Content Delivery Networks. Shaxun Chen April 21, 2009

Web Caching and CDNs. Aditya Akella

Web Programming Languages Overview

Mobile Application Performance Report

Web Server Languages Summer Thomas A. Powell

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

Using Steelhead Appliances and Stingray Aptimizer to Accelerate Microsoft SharePoint WHITE PAPER

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

From Internet Data Centers to Data Centers in the Cloud

The Application Front End Understanding Next-Generation Load Balancing Appliances

CS 188/219. Scalable Internet Services Andrew Mutz October 8, 2015

HIGH-SPEED BRIDGE TO CLOUD STORAGE

E-commerce is also about

Distributed Systems 19. Content Delivery Networks (CDN) Paul Krzyzanowski

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

WordPress Optimization

DATA COMMUNICATOIN NETWORKING

making drupal run fast

Distributed Systems. 24. Content Delivery Networks (CDN) 2013 Paul Krzyzanowski. Rutgers University. Fall 2013

AKAMAI WHITE PAPER. Delivering Dynamic Web Content in Cloud Computing Applications: HTTP resource download performance modelling

SharePoint Performance Optimization

WompMobile Technical FAQ

Protocolo HTTP. Web and HTTP. HTTP overview. HTTP overview

W3Perl A free logfile analyzer

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

Implementing Reverse Proxy Using Squid. Prepared By Visolve Squid Team

Website Performance: Kyle Simpson

Unibet.com Architecture

CS514: Intermediate Course in Computer Systems

Large-Scale Web Applications

Dynamic Content Acceleration: Lightning-Fast Web Apps with Amazon CloudFront and Amazon Route 53

Key Components of WAN Optimization Controller Functionality

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

Cisco Application Networking for BEA WebLogic

A TECHNICAL REVIEW OF CACHING TECHNOLOGIES

networks Live & On-Demand Video Delivery without Interruption Wireless optimization the unsolved mystery WHITE PAPER

9 Tried and Tested Tips to Increase the Power of your Magento Store

CSC2231: Akamai. Stefan Saroiu Department of Computer Science University of Toronto

Global Server Load Balancing

BASICS OF SCALING: LOAD BALANCERS

Web Conferencing Version 8.3 Troubleshooting Guide

Cisco Application Networking for Citrix Presentation Server

Apache Tomcat. Load-balancing and Clustering. Mark Thomas, 20 November Pivotal Software, Inc. All rights reserved.

The Devil is in the Details. How to Optimize Magento Hosting to Increase Online Sales

The Application Delivery Controller Understanding Next-Generation Load Balancing Appliances

Mobile Performance Testing Approaches and Challenges

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad

Q: What is the difference between the other load testing tools which enables the wan emulation, location based load testing and Gomez load testing?

Load Balancing Web Applications

Project #2. CSE 123b Communications Software. HTTP Messages. HTTP Basics. HTTP Request. HTTP Request. Spring Four parts

603: Enhancing mobile device experience with NetScaler MobileStream Hands-on Lab Exercise Guide

The importance of Drupal Cache. Luis F. Ribeiro Ci&T Inc. 2013

White Paper. How To Deliver Fast, Engaging Responsive Web Design Sites

A Tool for Evaluation and Optimization of Web Application Performance

AUDIT REPORT EXAMPLE

The Critical Role of an Application Delivery Controller

How to Build a Massively Scalable Next-Generation Firewall

Cisco Application Networking for IBM WebSphere

HTTP. Internet Engineering. Fall Bahador Bakhshi CE & IT Department, Amirkabir University of Technology

Computer Networks. Lecture 7: Application layer: FTP and HTTP. Marcin Bieńkowski. Institute of Computer Science University of Wrocław

Deployment Guide Microsoft IIS 7.0

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment?

Single Pass Load Balancing with Session Persistence in IPv6 Network. C. J. (Charlie) Liu Network Operations Charter Communications

Understanding Slow Start

Web Development. Owen Sacco. ICS2205/ICS2230 Web Intelligence

Mike Canney Principal Network Analyst getpackets.com

The Evolution of Application Acceleration:

GLOBAL SERVER LOAD BALANCING WITH SERVERIRON

Request Routing, Load-Balancing and Fault- Tolerance Solution - MediaDNS

Transcription:

Web Performance Notes

Core Ideas Given the trade-off of server side we really need to think about time, but interestingly most gains come on client side! To a user time passed matters not bytes sent There is a difference between perceived time and actual time Page paint time matter Amount of screen refresh matters Frames, Emphasis on reflows in HTML/CSS parse How the screen refreshes matters All at once vs. incrementally Application pacing matters Preloaders, travel search, etc.

Core Ideas To some Web owners bytes sent may matter quite a bit as well because of cost. Obviously cost is bandwidth How much does 50K cost nothing How much does 50K * thousands of customers cost maybe something? Note the design focus of e-commerce sites, $ and bytes in content not in navigation Heavy pages don t just cause bandwidth they may cost hardware in terms of scalability, servers can t be done with a connection as quick thus you will need more of them more quickly

Speed is not just a simple case of bytes

Core Idea Golden Rule Golden Rule of Optimization Less data, less often and close by! Send as little as you need to as infrequently as you need to if you want to go faster Example: compression -> little to send Example: caching -> less frequent requests Close by reduce latency by traveling less distance

Scale!= Speed Adding more servers doesn t make a site faster Scaling does not mean faster unless things are overloaded If a server is overloaded by offloading it, it may appear to go faster but in that sense the server was not operating at optimal efficiency Do you know the tolerance of a server, connection, etc.? Number of concurrent connections it could handle, amount of traffic before the pipe is saturated, etc.

Latency Your Real Enemy Upgrading 5-10mbps gets only about 5% load time improvement! -20ms roundtrip time = linear load time improvement Bandwidth and latency better know what each mean!

Simple (Re)view to Think about Optimization <----Network----> User Agent of some sort HTTP Request HTTP Response Web Server Hardware & Software Server Side programming technology Backend System (e.g. Database) Apache, IIS, Zeus, etc. CGI Apache Module, ISAPI Scripting Tech (PHP)

But Steve Souders (and others) have stated that 80-90%* of your user-response time is client-side Start there most gains Simple to do Easy to measure http://stevesouders.com/ Shouldn t this be in client-side class? Yes but many aspects have to be performed upon delivery

Optimizations Step by Step

Web Overview: Steps 1 & 2 The Request

Web Overview: Steps 1 & 2 Issues Main challenge is DNS which is both fragile and robust Don t skimp on DNS servers Consider DNS replication or managed services UltraDNS (www.ultradns.com) Consider using shortened and contingency names to help users Forget the www Minimal domains (e.g www.pint.com ~ pint.com) Contingency hosts w, ww, wwww All pretty much free just DNS entries Contingency domains Expanded out: powellinternet.com Products and brands: www.ipod.com Typos: www.gooogle.com, www.amazom.com Misspells: www.zerox.com Forgot the perioud: wwwpint.com (new domain in minimal form)

Web Overview: Step 3 Transmit Request

Step 3 Notes To improve this step reduce travel time close the gap closer à less hops, less distance geographical/network geographical sensitivity implies edge servers a la CDNs (content distribution network) Getting beer at the ball park example Reduce the request size make payload smaller; analyze payload no savings here as request is small, response though will be large Increase bandwidth? Not really helpful 10k on a T3 vs 10k on a T1!?

Step 4: Processing Response

Step 4 Issues Bottleneck: Server Capacity Can the server take my request? No too busy right now -Not enough capacity or incoming bandwidth hardware/software flash traffic too many requests holding requests open too long (many slow downloaders) taking too long to fulfill requests» processing time of request are significant Throw bandwidth and hardware at the problem

Step 4 Solution 1

Step 4 Solution 2

Server Capacity Notes Simple solution of DNS round-robin is often used for sites with only a few servers While easy to set up, it has cons Sub-optimal distribution of traffic across the servers Does not deal well with hung servers To improve the situation you can create a cluster using software or hardware Hardware often makes more sense (except cost) Many intelligent switch/loadbalancer vendors (Foundry, F5, Cisco, Radware, etc.)

Server Capacity Notes A load balancer will distribute load across a server farm based upon metrics such as least busy, most available, closest network wise, fastest response, etc. While you may not be able to afford a load balancer you can segment server traffic as well Consider a machine like images.pint.com to handle your image traffic, store.pint.com to handle your SSL traffic, and so on. Then by just using your links and changing HTML you will distribute some load around This particular approach with images actually may have a benefit in user download speed as well since browsers will open new connections to the other domain and parallelize your requests a bit better

Step 4 Solution 2 Redux

Server Capacity Contd. Can the server take my request now? Yes but -Response is still slow! Static content» disk problems» disk to network copy delays Get a faster disk drive Most Web servers are not CPU bound they are disk and network bound Best solution is memory caching but that does cost money

Server Capacity Contd. Can the server take my request now? Yes but -Dynamic content problems Generation times It does take time to build a page on the fly and if you are doing a lot of this your box may be CPU bound A significant problem here is the so called static dynamic http://www.xyz.com/pressrelease.asp?id=5 in most sites is the same page regardless of the users yet it is been rebuilt every single visit to the site which costs time and CPU resources Solution: self-generate content into HTML Solution: use a reverse proxy cache

Step 4 Solution Contd.

Step 4 Solution Cache Issues Note the similarity between interpretation and compilation problems with coding in the caching solution A cache functions by responding to requests for Web objects. If the object requested is unknown, it is fetched from the origin server; if it has been fetched recently, the request is served from the cache. What you can t put into a proxy cache easily: Extremely dynamic content particularly personalized content You can do it and recode pages to let the proxy know what is cacheable and what is not, but the work involved may be significant. This problem also plagues CDNs see later slide

Step 4 Solution Connection Offload Many sites get network bound since they cannot let go of a connection until the last ACK packet is back from the user Given the mix of fast and slow users you may find that a box will get saturated sooner that it should Solution: network stack tuning Solution: TCP termination at the Load Balancer Terminating and muxing the connections gives the servers orderly workloads to handle Add in the overhead of crypto (SSL) and you really want to offload to a server with a special card or have SSL decryption in your terminating device

On to the real problem Once the request has been processed either by the server or a cache and is ready to be sent, you hit the real trouble spot in the process result delivery

Step 6: Returning the Data

Payload Issues The return is composed of headers and data The bulk of the payload is of course the data of a Web page which is composed of two types of components Text Binary HTML/XHTML/XML Images (GIF, JPEG, PNG) CSS Animations JavaScript (Animated GIFs, Flash, Shockwave) Video Audio PDF Downloads (.exe,.zip, )

Addressing Payload Issues Once again our network Aware Web Development Mantra Send less, less often Reduce content sent Do you really need that image, Flash splash page, etc. Cool design may = more money in delivery The fallacy of flat bandwidth rates Incremental bandwidth costs Real world examples: goodbye graphic rollovers, CSS oriented design, advertisements as a higher % of byte payload

Addressing Payload Issues Contd. Compress images properly Beware of decompression time with large physical dimension images. Paint delay may far outweigh delivery savings. Don t be packet stupid The envelope holds a basic min. amount about 1K making it smaller doesn t help! Designers beware that some acceleration devices will recompress your images, sorry to say what you do may not make it to the end user the way you intended.

Addressing Payload Issues Contd. Crunch HTML Who needs white space, comments, etc? Some types of <meta> is wasted Color remap #ff0000 -> red In some cases other way: name -> hex Entity remap -> In some cases the other way as well Most changes would not hurt search engines, users, etc. Most of these byte shaves have to be done automatically to be of value but they add up I am anti-view source!? (for many reasons)

Addressing Payload Issues Contd. Crunch CSS Same whitespace and comment issues as HTML You can also use shorter id and classnames.p1 instead of.paragraph1 or.tc instead of.tablecell This should be done automatically because of readability Color condensing #FF0000 can become #f00 Rule condensing Short hand rules background not background-image Rule rewriting to take advantage or repetition

Addressing Payload Issues Contd. Crunch JavaScript Same whitespace and comment issues as HTML & CSS Variable and function renaming can produce significant savings function validation() becomes function v() or similar Some basic dead code elimination Semi-colon removal in some places Object remapping var d= document; d.write(); d.write(); etc. Script roll up would be very useful and would also reduce a request <script src= one.js >, <script src= two.js > becomes <script src= three.js > Most web sites the separate JS files is a developer value not delivery value Code for development, prepare for delivery

Addressing Payload Issues Contd. URL and Filename optimatizations Index file removal <a href= products/index.html > becomes <a href= products/ > Issue with having a Web server around during development Dependent file renaming instead of <img src= bnrolloveron.gif > remap to <img src= b.gif > Path reduction Instead of paths like../../images/logo.gif remap to /i/logo.gif or better yet /i/l.gif Saves huge amounts of space file wise since the names are often repeated all over the place User never types in so no hurt there, some obfuscation benefit as well

Addressing Payload Issues Contd. Source optimization of (X)HTML actually can be more beneficial than it would seem because it tends to be the root document from which future requests are made Slow it down and you add small additions to everything else Do not confuse source optimization with obfuscation, they have different goals Unless you are a massive traffic site you should do these types of optimizations automatically using a tool like w3compiler.com otherwise it just isn t worth the effort most of the time. With Web programming Develop for maintenance, but prepare for delivery Other aspects of a site like PDF and Flash can be compressed with tools but I have been placing focus on that which is most commonly used.

HTTP Compression Transparent and harmless given that browsers send an acceptencoding header to negotiate this Some of the biggest sites use this Google, Amazon, etc. Only works on text formats: HTML, CSS, JS, PDF, and some Office formats Savings as high as 70% Implementations: Apache: mod_gzip IIS: httpzip or use an appliance like a Redline Will use CPU cycles to do this, but your server isn t doing much It does increase the Time to First Byte (TTFB) but significantly decreases the Time to Last Byte (TTLB) Latency issue with compression (LAN vs dial-up value) Saves bandwidth no matter what Be careful with the hit or recompressing dynamic content

Addressing Payload Issues Contd. - Caching Why do you keep sending me that logo?! Signed your browser 304 not modified and network chatter issues Control the cache and download only what you need and when you need it JS and CSS is in page is not so good, linked with good cache control headers is much better Unfortunately little control is possible unless we design for it in the first place Design a cache control policy : when do things expire? Consider organizing your site to help this /images/cached Note: Caching and compression address different things and can sometimes be in conflict.

Addressing Payload Issues Contd. - Caching The danger of a stale cache The browser or some intermediary is holding my image until next Tuesday and I really need to update it! Solution: Rename the object <img src= logo.gif > becomes <img src= logo1.gif > Lots of work, but easy in CMS systems Watch out with caching your base documents then! Fineground has an interesting automatic caching policy generation technique Some browsers can cut corners though and this can cause you trouble

Addressing Payload Issues Contd. Exotic Stuff Delta encoding Notice that most pages have similar structures and sometimes even content Why do we keep sending the same html tags, tables, etc.? You don t have to you could send a base page and then send only the differences from page to page Read about the idea here: http://webreference.com/internet/software/servers/http/ deltaencoding/intro/ Some AFE (Application Front End) appliances implements this using a proxy and JavaScript and it produces amazing results though it is obviously more dangerous than other solutions

Still having troubles You can do all this and still have a slow site, at least to some users Point source web serving will always have latency problems You could set up multiple Web farms around the world and then perform global load balancing between them Redirection choices based upon server availability, network distance, geography or some mixture of these metrics The downside to multiple farms is of course increased data center and hardware costs

Global Web Farm Idea

Solving the Latency Issue: CDNs Because hardware and co-location costs go way up, some people use CDN services. CDNs replicate and move content to the edge of the network improving reliability, scalability, and performance. In order to redirect an edge cache DNS must be modified or use special URLs used [e.g. ARL] Obviously the second takes more effort, but may allow for more flexibility in caching decisions.

CDN Solution Move content to network edge caches

Edge Server Example Old Akamai Approach

Implications of CDN Besides Cost Even with CDN you still have last mile issues which can be significant. Another problem is dynamic content assembly at edge Indicate what s cacheable and what is not Edge Side Includes (www.esi.org) Suggests that edge caches may become more intelligent edge servers in the future, thus moving us to a distributed computing style Who knew it was going to get this complicated?!

Request Reduction In modern broadband situations the number of requests can significantly effect the performance of a page Bundling dependent objects can potentially tremendously improve the performance of a site Sometime the separation frankly is more out of convention that being appropriate Example JavaScript <script src= file1.js ></script> <script src= file2.js ></script> Becomes <script src= filebundle.js ></script> Given JavaScripts shared namespace there is no reason why not

Request Reduction Contd. For CSS files you see a similar situation as JS For images you could adopt an idea called CSS sprites where you make a large image tile of all the independent pieces and then show portions of the image <img src= " pixel.gif" style="background:url('image_1878169298.png') -3095px 0px no-repeat; height: 45px; width: 33px;" /> Next image <img src="pixel.gif" style="background:url('image_1878169298.png') -2896px 0px no-repeat; height: 32px; width: 199px;" />"

Request Reduction in Action

All About End Users? -Bytes vs. Time -Read, Decide, Click, Wait, Repeat -Download ahead of time Flash preloaders, cache tricks, JS preload Mozilla prefetch http://www.mozilla.org/projects/netlib/ Link_Prefetching_FAQ.html Precache example http://ajaxref.com/ch8/longscroll.html Don t forget Browser Bulk IE vs Opera vs Mozilla vs Safari they are different pieces of software with different qualities of execution

How do you know you are doing well? Measure server time, network time, and paint time Server time is easy, network time is harder, and paint time requires a JavaScript injection to then start and stop a timer http://ajaxref.com/ch6/connectionspeed.html Interesting to note that such features are coming directly to browsers now ( http://blogs.msdn.com/b/ie/archive/2010/06/28/measuringweb-page-performance.aspx) and the W3C is creating performance working group ( http://www.w3.org/2010/06/webperf.html)

SPDY, Sockets and Beyond Can we fix HTTP? How hard would it be to do HTTP 2.0? Pretty hard, we can t even do simple HTTP as it stands Evidence: Proxies, Get/Post, Header Issues, Compression, etc. SPDY offers some solution using an SSL tunnel If you can t fix it then offer something else Parallel protocol? WebSocket? Does this get at underlying file protocol vs. app protocol difference? I think so.