HTTP Caching & Cache-Busting for Content Publishers



Similar documents
The Hyper-Text Transfer Protocol (HTTP)

Security-Assessment.com White Paper Leveraging XSRF with Apache Web Server Compatibility with older browser feature and Java Applet

Protocolo HTTP. Web and HTTP. HTTP overview. HTTP overview

Project #2. CSE 123b Communications Software. HTTP Messages. HTTP Basics. HTTP Request. HTTP Request. Spring Four parts

HTTP. Internet Engineering. Fall Bahador Bakhshi CE & IT Department, Amirkabir University of Technology

1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment?

Network Technologies

Hypertext for Hyper Techs

Internet Technologies Internet Protocols and Services

Demystifying cache. Kristian Lyngstøl Product Specialist Varnish Software AS

reference: HTTP: The Definitive Guide by David Gourley and Brian Totty (O Reilly, 2002)

CloudOYE CDN USER MANUAL

Playing with Web Application Firewalls

Research of Web Real-Time Communication Based on Web Socket

The Web: some jargon. User agent for Web is called a browser: Web page: Most Web pages consist of: Server for Web is called Web server:

CS640: Introduction to Computer Networks. Applications FTP: The File Transfer Protocol

No. Time Source Destination Protocol Info HTTP GET /ethereal-labs/http-ethereal-file1.html HTTP/1.

GET /FB/index.html HTTP/1.1 Host: lmi32.cnam.fr

THE PROXY SERVER 1 1 PURPOSE 3 2 USAGE EXAMPLES 4 3 STARTING THE PROXY SERVER 5 4 READING THE LOG 6

sessionx Desarrollo de Aplicaciones en Red Web Applications History (1) Content History (2) History (3)

CDN Operation Manual

Varnish Tips & Tricks, 2015 edition

Alteon Browser-Smart Load Balancing

Cache All The Things

1945: 1989: ! Tim Berners-Lee (CERN) writes internal proposal to develop a. 1990:! Tim BL writes a graphical browser for Next machines.

HTTP Protocol. Bartosz Walter

By Bardia, Patit, and Rozheh

Deployment Guide. Caching (Static & Dynamic) Deployment Guide. A Step-by-Step Technical Guide

The Web History (I) The Web History (II)

HTTP Response Splitting

Architecture of So-ware Systems HTTP Protocol. Mar8n Rehák

World Wide Web. Before WWW

SiteCelerate white paper

Cyber Security Workshop Ethical Web Hacking

Acunetix Website Audit. 5 November, Developer Report. Generated by Acunetix WVS Reporter (v8.0 Build )

Computer Networks. Lecture 7: Application layer: FTP and HTTP. Marcin Bieńkowski. Institute of Computer Science University of Wrocław

CTIS 256 Web Technologies II. Week # 1 Serkan GENÇ

ivoyeur: permission to parse

Implementing Reverse Proxy Using Squid. Prepared By Visolve Squid Team

Outline Definition of Webserver HTTP Static is no fun Software SSL. Webserver. in a nutshell. Sebastian Hollizeck. June, the 4 th 2013

Arnaud Becart ip- label 11/9/11

N-tier ColdFusion scalability. N-tier ColdFusion scalability WebManiacs 2008 Jochem van Dieten

CONTENT of this CHAPTER

Chapter 27 Hypertext Transfer Protocol

Speed up your web site. Alan Seiden Consulting alanseiden.com

DEPLOYMENT GUIDE Version 1.1. Deploying the BIG-IP LTM v10 with Citrix Presentation Server 4.5

Web Security: SSL/TLS

Web applications. Web security: web basics. HTTP requests. URLs. GET request. Myrto Arapinis School of Informatics University of Edinburgh

Web. Services. Web Technologies. Today. Web. Technologies. Internet WWW. Protocols TCP/IP HTTP. Apache. Next Time. Lecture # Apache.

Table of Contents. Open-Xchange Authentication & Session Handling. 1.Introduction...3

Crowbar: New generation web application brute force attack tool

Anatomy of a Pass-Back-Attack: Intercepting Authentication Credentials Stored in Multifunction Printers

Front-End Performance Testing and Optimization

Information Extraction Art of Testing Network Peripheral Devices

TCP/IP Networking An Example

Sticky Session Setup and Troubleshooting

Internet Technologies. World Wide Web (WWW) Proxy Server Network Address Translator (NAT)

Application layer Web 2.0

Barracuda Networks Web Application Firewall

Internet Technologies 4-http. F. Ricci 2010/2011

All You Can Eat Realtime

Chapter 5 Configuring the Remote Access Web Portal

Vodia PBX RESTful API (v2.0)

Web Services April 21st, 2009 with Hunter Pitelka

International Journal of Engineering & Technology IJET-IJENS Vol:14 No:06 44

Developing Applications With The Web Server Gateway Interface. James Gardner EuroPython 3 rd July

People Data and the Web Forms and CGI CGI. Facilitating interactive web applications

Playing with Web Application Firewalls

Chapter 6 Virtual Private Networking Using SSL Connections

ATS Test Documentation

Web Application Security

WHAT IS A WEB SERVER?

79 Tips and Tricks for Magento Performance Improvement. for Magento Performance Improvement

1 Introduction: Network Applications

P and FTP Proxy caching Using a Cisco Cache Engine 550 an

URLs and HTTP. ICW Lecture 10 Tom Chothia

Lecture 8a: WWW Proxy Servers and Cookies

DEPLOYMENT GUIDE DEPLOYING THE BIG-IP LTM SYSTEM WITH CITRIX PRESENTATION SERVER 3.0 AND 4.5

Networks and the Internet A Primer for Prosecutors and Investigators

Lecture 8a: WWW Proxy Servers and Cookies

CIS 551 / TCOM 401 Computer and Network Security. Spring 2007 Lecture 20

Application Layer: HTTP and the Web. Srinidhi Varadarajan

HTTP Authentifizierung

T14 SECURITY TESTING: ARE YOU A DEER IN THE HEADLIGHTS? Ryan English SPI Dynamics Inc BIO PRESENTATION. Thursday, May 18, :30PM

PHP code audits. OSCON 2009 San José, CA, USA July 21th 2009

CS 188/219. Scalable Internet Services Andrew Mutz October 8, 2015

Module 45 (More Web Hacking)

CS 213, Fall 2000 Lab Assignment L5: Logging Web Proxy Assigned: Nov. 28, Due: Mon. Dec. 11, 11:59PM

HOST EUROPE CLOUD STORAGE REST API DEVELOPER REFERENCE

Data Communication I

Performance Report for: Report generated: Friday, April 24, 2015, 7:29 AM (via API)

APACHE WEB SERVER. Andri Mirzal, PhD N

Deployment Guide. AX Series with Microsoft Office SharePoint Server

Securing The Apache Web Server. Agenda. Background. Matthew Cook

Chapter 2: Interactive Web Applications

An Insight into Cookie Security

making drupal run fast

How to Run an Apache HTTP Server With a Protocol

DEERFIELD.COM. DNS2Go Update API. DNS2Go Update API

COMP 112 Assignment 1: HTTP Servers

Transcription:

HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin http://public.yahoo.com/~radwin/ OSCON 2005 Thursday, August 4th, 2005 1 1

Agenda HTTP in 3 minutes Caching concepts Hit, Miss, Revalidation 5 techniques for caching and cache-busting Not covered in this talk Proxy deployment HTTP acceleration (a k a reverse proxies) Database query results caching 2 Motivation: Publishers have a lot of web content HTML, images, Flash, movies Speed is important part of user experience Bandwidth is expensive Use what you need, but avoid unnecessary extra Personalization differentiates Show timely data (stock quotes, news stories) Get accurate advertising statistics Protect sensitive info (e-mail, account balances) Not covered: Proxy deployment is an interesting subject and deserves an entire lecture by itself Configuring proxy cache servers (i.e. Squid) Configuring browsers to use proxy caches Transparent/interception proxy caching Intercache protocols (ICP, HTCP) 2

HTTP and Proxy Review 3 3

HTTP: Simple and elegant 1. Client connects to www.example.com port 80 Client Server Internet 2. Client sends GET request Internet 4 4

HTTP: Simple and elegant 3. Server sends response Internet 4. Client closes connection Internet 5 5

HTTP example mradwin@machshav:~$ telnet www.example.com 80 Trying 192.168.37.203... Connected to w6.example.com. Escape character is '^]'. GET /foo/index.html HTTP/1.1 Host: www.example.com HTTP/1.1 200 OK Date: Wed, 28 Jul 2004 23:36:12 GMT Last-Modified: Thu, 12 May 2005 21:08:50 GMT Content-Length: 3688 Connection: close Content-Type: text/html <html><head> <title>hello World</title>... 6 6

Browsers use private caches GET /foo/index.html HTTP/1.1 Host: www.example.com HTTP/1.1 200 OK Last-Modified: Thu, 12 May 2005 21:08:50 GMT Content-Length: 3688 Content-Type: text/html Browser Cache 7 Client stores copy of http://www.example.com/foo/index.html on its hard disk with timestamp. 7

Revalidation (Conditional GET) GET /foo/index.html HTTP/1.1 Host: www.example.com If-Modified-Since: Thu, 12 May 2005 21:08:50 GMT HTTP/1.1 304 Not Modified Revalidate using Last-Modified time 8 The presence of If-Modified-Since header is what makes this a Conditional GET. Sometimes called an IMS GET. If content had actually changed, server would simply reply with a 200 OK and send full content. 8

Non-Caching Proxy GET /foo/index.html HTTP/1.1 Host: www.example.com Proxy GET /foo/index.html HTTP/1.1 Host: www.example.com HTTP/1.1 200 OK Last-Modified: Thu,... Content-Length: 3688 Content-Type: text/html HTTP/1.1 200 OK Last-Modified: Thu,... Content-Length: 3688 Content-Type: text/html 9 9

Caching Proxy: Miss GET /foo/index.html HTTP/1.1 Host: www.example.com Proxy GET /foo/index.html HTTP/1.1 Host: www.example.com HTTP/1.1 200 OK Last-Modified: Thu,... Content-Length: 3688 Content-Type: text/html HTTP/1.1 200 OK Last-Modified: Thu,... Content-Length: 3688 Content-Type: text/html Proxy Cache (Saves copy) 10 10

Caching Proxy: Hit GET /foo/index.html HTTP/1.1 Host: www.example.com Proxy HTTP/1.1 200 OK Last-Modified: Thu,... Content-Length: 3688 Content-Type: text/html Proxy Cache (Fresh copy!) 11 11

Caching Proxy: Revalidation GET /foo/index.html HTTP/1.1 Host: www.example.com Proxy GET /foo/index.html HTTP/1.1 Host: www.example.com If-Modified-Since: Thu,... HTTP/1.1 200 OK Last-Modified: Thu,... Content-Length: 3688 Content-Type: text/html HTTP/1.1 304 Not Modified Proxy Cache (Stale copy) 12 12

Top 5 Caching Techniques 13 13

Assumptions about content types Rate of change once published Frequently Occasionally Rarely/Never HTML CSS JavaScript Images Flash PDF Dynamic Content Static Content Personalized Same for everyone 14 14

Top 5 techniques for publishers 1. Use Cache-Control: private for personalized content 2. Implement Images Never Expire policy 3. Use a cookie-free TLD for static content 4. Use Apache defaults for occasionallychanging static content 5. Use random tags in URL for accurate hit metering or very sensitive content 15 15

1. Cache-Control: private for personalized content Rate of change once published Frequently Occasionally Rarely/Never HTML CSS JavaScript Images Flash PDF Dynamic Content Static Content Personalized Same for everyone 16 16

Bad Caching: Jane s 1st visit The URL isn't all that matters GET /inbox?msg=3 HTTP/1.1 Host: webmail.example.com Cookie: user=jane Proxy GET /inbox?msg=3 HTTP/1.1 Host: webmail.example.com Cookie: user=jane HTTP/1.1 200 OK Last-Modified: Thu,... Content-Type: text/html HTTP/1.1 200 OK Last-Modified: Thu,... Content-Type: text/html Proxy Cache (Saves copy) 17 17

Bad Caching: Jane s 2nd visit Jane sees same message upon return GET /inbox?msg=3 HTTP/1.1 Host: webmail.example.com Cookie: user=jane Proxy HTTP/1.1 200 OK Last-Modified: Thu,... Content-Type: text/html Proxy Cache (Fresh copy of Jane's) 18 18

Bad Caching: Mary s visit Witness a false positive cache hit GET /inbox?msg=3 HTTP/1.1 Host: webmail.example.com Cookie: user=mary Proxy HTTP/1.1 200 OK Last-Modified: Thu,... Content-Type: text/html Proxy Cache (Fresh copy of Jane's) 19 19

What s cacheable? HTTP/1.1 allows caching anything by default Unless overridden with Cache-Control header In practice, most caches avoid anything with Cache-Control/Pragma header Cookie/Set-Cookie header WWW-Authenticate/Authorization header POST/PUT method 302/307 status code (redirects) SSL content 20 13.4 Response Cacheability Unless specifically constrained by a cache-control (section 14.9) directive, a caching system MAY always store a successful response (see section 13.8) as a cache entry, MAY return it without validation if it is fresh, and MAY return it after successful validation. If there is neither a cache validator nor an explicit expiration time associated with a response, we do not expect it to be cached, but certain caches MAY violate this expectation (for example, when little or no network connectivity is available). A client can usually detect that such a response was taken from a cache by comparing the Date header to the current time. 20

Cache-Control: private Shared caches bad for shared content Mary shouldn t be able to read Jane s mail Private caches perfectly OK Speed up web browsing experience Avoid personalization leakage with single line in httpd.conf or.htaccess Header set Cache-Control private 21 Note that HTTP/1.0 proxies aren t expected to understand Cache-Control header. If you re really concerned about user information leakage and there s a possibility that your users are behind HTTP/1.0 proxies, use technique #5 (random strings in the URL). 21

2. Images Never Expire policy Rate of change once published Frequently Occasionally Rarely/Never HTML CSS JavaScript Images Flash PDF Dynamic Content Static Content Personalized Same for everyone 22 22

Images Never Expire Policy Dictate that images (icons, logos) once published never change Set Expires header 10 years in the future Use new names for new versions http://us.yimg.com/i/new.gif http://us.yimg.com/i/new2.gif Tradeoffs More difficult for designers Faster user experience, bandwidth savings 23 Pushing images to a separate server typically means that designers can t use 1- click publishing solutions such as Microsoft Frontpage. 23

Imgs Never Expire: mod_expires # Works with both HTTP/1.0 and HTTP/1.1 # (10*365*24*60*60) = 315360000 seconds ExpiresActive On ExpiresByType image/gif A315360000 ExpiresByType image/jpeg A315360000 ExpiresByType image/png A315360000 24 24 * 60 * 60 * 365 * 10 = 315360000 seconds in ten years. You may wish to add other mime types such as application/x-shockwave-flash 24

Imgs Never Expire: mod_headers # Works with HTTP/1.1 only <FilesMatch "\.(gif jpe?g png)$"> Header set Cache-Control \ "max-age=315360000" </FilesMatch> # Works with both HTTP/1.0 and HTTP/1.1 <FilesMatch "\.(gif jpe?g png)$"> Header set Expires \ "Mon, 28 Jul 2014 23:30:00 GMT" </FilesMatch> 25 You may wish to add other file extensions such as swf Cache-Control is preferred for HTTP/1.1 Expires is for compatibility with HTTP/1.0 clients and proxies When both headers are present, HTTP/1.1 clients typically prefer the Cache- Control header 25

mod_images_never_expire /* Enforce policy with module that runs at URI translation hook */ static int translate_imgexpire(request_rec *r) { const char *ext; if ((ext = strrchr(r->uri, '.'))!= NULL) { if (strcasecmp(ext,".gif") == 0 strcasecmp(ext,".jpg") == 0 strcasecmp(ext,".png") == 0 strcasecmp(ext,".jpeg") == 0) { if (ap_table_get(r->headers_in,"if-modified-since")!= NULL ap_table_get(r->headers_in,"if-none-match")!= NULL) { /* Don't bother checking filesystem, just hand back a 304 */ return HTTP_NOT_MODIFIED; } } } return DECLINED; } 26 Also http://use.perl.org/~geoff/journal/22049 26

3. Cookie-free static content Rate of change once published Frequently Occasionally Rarely/Never HTML CSS JavaScript Images Flash PDF Dynamic Content Static Content Personalized Same for everyone 27 27

Use a cookie-free Top Level Domain for static content For maximum efficiency use 2 domains www.example.com for dynamic HTML static.example.net for images Many proxies won t cache Cookie requests But: multimedia is never personalized Cookies irrelevant for images 28 static.example.com won t cut it, because many cookies will be issued with domain=.example.com. Unless you re 100% sure you ll only issue cookies with domain=www.example.com, you ll need to use a completely different TLD. Yahoo!, for example, uses yahoo.com for dynamic HTML content and yimg.com for images and other static content. 28

Typical GET request w/cookies 29 GET /i/foo/bar/quux.gif HTTP/1.1 Host: www.example.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-us; rv:1.7) Gecko/20040707 Firefox/0.8 Accept: application/x-shockwaveflash,text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0. 8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1 Cookie: U=mt=vtC1tp2MhYv9RL5BlpxYRFN_P8DpMJoamllEcA--&ux=IIr.AB&un=42vnticvufc8v; brandflash=1; B=amfco1503sgp8&b=2; F=a=NC184LcsvfX96G.JR27qSjCHu7bII3s. txa44psmlliftvojb_m5wecwy_.7&b=k1it; LYC=l_v=2&l_lv=7&l_l=h03m8d50c8bo &l_s=3yu2qxz5zvwquwwuzv22wrwr5t3w1zsr&l_lid=14rsb76&l_r=a8&l_um=1_0_1_0_0; GTSessionID835990899023=83599089902340645635; Y=v=1&n=6eecgejj7012f &l=h03m8d50c8bo/o&p=m012o33013000007&jb=16 47 &r=a8&lg=us&intl=us&np=1; PROMO=SOURCE=fp5; YGCV=d=; T=z=iTu.ABiZD/AB6dPWoqXibIcTzc0BjY3TzI3NTY0MzQ- &a=yae&sk=daawrz5hldun2t&d=c2wbt0rbekfurxdprfv3twpfek5ets0byqfzquubb2sbwlcwlqf0axabw UhaTVBBAXp6AWlUdS5BQmdXQQ--&af=QUFBQ0FDQURCOUFIQUJBQ0FEQUtBTE FNSDAmdHM9MTA5MDE4NDQxOCZwcz1lOG83MUVYcTYxOVouT2Ftc1ZFZUhBLS0-; LYS=l_fh=0&l_vo=myla; PA=p0=dg13DX4Ndgk-&p1=6L5qmg--&e=xMv.AB; YP.us=v=2&m=addr&d=1525+S+Robertson+Blvd%01Los+Angeles%01CA%0190035-4231%014480%0134.051590%01-118.384342%019%01a%0190035 Referer: http://www.example.com/foo/bar.php?abc=123&def=456 Accept-Language: en-us,en;q=0.7,he;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Since a Cookie header is sent, some proxies will refuse to cache the response. 29

Same request, no Cookies GET /i/foo/bar/quux.gif HTTP/1.1 Host: static.example.net User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-us; rv:1.7) Gecko/20040707 Firefox/0.8 Accept: application/x-shockwaveflash,text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0. 8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1 Referer: http://www.example.com/foo/bar.php?abc=123&def=456 Accept-Language: en-us,en;q=0.7,he;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Bonus: much smaller GET request Dial-up MTU size 576 bytes, PPPoE 1492 1450 bytes reduced to 550 30 30

4. Apache defaults for static, occasionally-changing content Rate of change once published Frequently Occasionally Rarely/Never HTML CSS JavaScript Images Flash PDF Dynamic Content Static Content Personalized Same for everyone 31 31

Revalidation works well Apache handles revalidation for static content Browser sends If-Modified-Since request Server replies with short 304 Not Modified No special configuration needed Use if you can t predict when content will change Page designers can change immediately No renaming necessary Cost: extra HTTP transaction for 304 Smaller with Keep-Alive, but large sites disable 32 Each HTTP request has some latency. When you disable Keep-Alive (as any large site typically must do to scale), each HTTP request requires a full 3-way TCP handshake. The handshake latency can be perceptible, especially on a slow connection such as a 56k modem. 32

Successful revalidation GET /foo/index.html HTTP/1.1 Host: www.example.com If-Modified-Since: Thu, 12 May 2005 21:08:50 GMT HTTP/1.1 304 Not Modified Browser Cache 33 Apache simply stat()s the file and compares the timestamp to the If-Modified-Since timestamp. If the file s timestamp is less than or equal to the If-Modified-Since header, it returns 304 Not Modified. 33

Updated content GET /foo/index.html HTTP/1.1 Host: www.example.com If-Modified-Since: Thu, 12 May 2005 21:08:50 GMT HTTP/1.1 200 OK Last-Modified: Wed, 13 Jul 2005 12:57:22 GMT Content-Length: 4525 Content-Type: text/html Browser Cache 34 Content has been modified. Client tries to revalidate again, but revalidation fails because URI has been updated. Apache returns 200 OK with full content. 34

5. URL Tags for sensitive content, hit metering Rate of change once published Frequently Occasionally Rarely/Never HTML CSS JavaScript Images Flash PDF Dynamic Content Static Content Personalized Same for everyone 35 35

URL Tag technique Idea Convert public shared proxy caches into private caches Without breaking real private caches Implementation: pretty simple Assign a per-user URL tag No two users use same tag Users never see each other s content 36 36

URL Tag example Goal: accurate advertising statistics Do you trust proxies? Send Cache-Control: must-revalidate Count 304 Not Modified log entries as hits If you don t trust em Ask client to fetch tagged image URL Return 302 to highly cacheable image file Count 302s as hits Don t bother to look at cacheable server log 37 37

Hit-metering for ads (1) <script type="text/javascript"> var r = Math.random(); var t = new Date(); document.write("<img width='109' height='52' src='http://ads.example.com/ad/foo/bar.gif?t=" + t.gettime() + ";r=" + r + "'>"); </script> <noscript> <img width="109" height="52" src= "http://ads.example.com/ad/foo/bar.gif?js=0"> </noscript> 38 No, this is not RFC 2227, which uses headers like Connection: meter and Meter: count=1/0 38

Hit-metering for ads (2) GET /ad/foo/bar.gif?t=1090538707;r=0.510772917234983 HTTP/1.1 Host: ads.example.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-us; rv:1.7) Gecko/20040707 Firefox/0.8 Referer: http://www.example.com/foo/bar.php?abc=123&def=456 Cookie: uid=c50df33e-e202-4206-b1f3-946aedf9308b HTTP/1.1 302 Moved Temporarily Date: Wed, 28 Jul 2004 23:45:06 GMT Location: http://static.example.net/i/foo/bar.gif Content-Type: text/html <a href="http://static.example.net/i/foo/bar.gif">moved</a> 39 39

Hit-metering for ads (3) GET /i/foo/bar.gif HTTP/1.1 Host: static.example.net User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-us; rv:1.7) Gecko/20040707 Firefox/0.8 Referer: http://www.example.com/foo/bar.php?abc=123&def=456 HTTP/1.1 200 OK Date: Wed, 28 Jul 2004 23:45:07 GMT Last-Modified: Mon, 05 Oct 1998 18:32:51 GMT ETag: "69079e-ad91-40212cc8" Cache-Control: public,max-age=315360000 Expires: Mon, 28 Jul 2014 23:45:07 GMT Content-Length: 6096 Content-Type: image/gif GIF89a... 40 40

URL Tags & user experience Does not require modifying HTTP headers No need for Pragma: no-cache or Expires in past Doesn t break the Back button Browser history & visited-link highlighting JavaScript timestamps/random numbers Easy to implement Breaks visited link highlighting Session or Persistent ID preserves history A little harder to implement 41 41

Breaking the Back button User expectation: Back button works instantly Private caches normally enable this behavior Aggressive cache-busting breaks Back button Server sends Pragma: no-cache or Expires in past Browser must re-visit server to re-fetch page Hitting network much slower than hitting disk User perceives lag Use aggressive approach very sparingly Compromising user experience is A Bad Thing 42 42

Summary 43 43

Review: Top 5 techniques 1. Use Cache-Control: private for personalized content 2. Implement Images Never Expire policy 3. Use a cookie-free TLD for static content 4. Use Apache defaults for occasionallychanging static content 5. Use random tags in URL for accurate hit metering or very sensitive content 44 44

Pro-caching techniques Cache-Control: max-age=<bignum> Expires: <10 years into future> Generate static content headers Last-Modified, ETag Content-Length Avoid cgi-bin,.cgi or? in URLs Some proxies (e.g. Squid) won t cache Workaround: use PATH_INFO instead 45 In other words, these are ways to make dynamic content look like static content. 45

Cache-busting techniques Use POST instead of GET Use random strings and? char in URL Omit Content-Length & Last-Modified Send explicit headers on response Breaks the back button Only as a last resort Cache-Control: max-age=0,no-cache,no-store Expires: Tue, 11 Oct 1977 12:34:56 GMT Pragma: no-cache 46 46

Recommended Reading Web Caching and Replication Michael Rabinovich & Oliver Spatscheck Addison-Wesley, 2001 Web Caching Duane Wessels O'Reilly, 2001 47 47

Slides: http://public.yahoo.com/~radwin/ 48 48