The Web History (I) The Web History (II)

Similar documents
The Web: some jargon. User agent for Web is called a browser: Web page: Most Web pages consist of: Server for Web is called Web server:

Protocolo HTTP. Web and HTTP. HTTP overview. HTTP overview

Network Technologies

1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment?

CONTENT of this CHAPTER

Project #2. CSE 123b Communications Software. HTTP Messages. HTTP Basics. HTTP Request. HTTP Request. Spring Four parts

Application Layer: HTTP and the Web. Srinidhi Varadarajan

HTTP. Internet Engineering. Fall Bahador Bakhshi CE & IT Department, Amirkabir University of Technology

Internet Technologies. World Wide Web (WWW) Proxy Server Network Address Translator (NAT)

World Wide Web. Before WWW

Evolution of the WWW. Communication in the WWW. WWW, HTML, URL and HTTP. HTTP Abstract Message Format. The Client/Server model is used:

CS640: Introduction to Computer Networks. Applications FTP: The File Transfer Protocol

By Bardia, Patit, and Rozheh

The Hyper-Text Transfer Protocol (HTTP)

1 Introduction: Network Applications

HTTP Protocol. Bartosz Walter

reference: HTTP: The Definitive Guide by David Gourley and Brian Totty (O Reilly, 2002)

Web Development. Owen Sacco. ICS2205/ICS2230 Web Intelligence

Computer Networks. Lecture 7: Application layer: FTP and HTTP. Marcin Bieńkowski. Institute of Computer Science University of Wrocław

Computer Networks 1 (Mạng Máy Tính 1) Lectured by: Dr. Phạm Trần Vũ MEng. Nguyễn CaoĐạt

Lecture 8a: WWW Proxy Servers and Cookies

Chapter 27 Hypertext Transfer Protocol

Lecture 8a: WWW Proxy Servers and Cookies

Introduction to LAN/WAN. Application Layer (Part II)

Evolution of the WWW. Communication in the WWW. WWW, HTML, URL and HTTP. HTTP - Message Format. The Client/Server model is used:

DATA COMMUNICATOIN NETWORKING

Application layer Web 2.0

Lektion 2: Web als Graph / Web als System

WHAT IS A WEB SERVER?

Web Caching and CDNs. Aditya Akella

WWW. World Wide Web Aka The Internet. dr. C. P. J. Koymans. Informatics Institute Universiteit van Amsterdam. November 30, 2007

Internet Technologies Internet Protocols and Services


Hypertext for Hyper Techs

Web. Services. Web Technologies. Today. Web. Technologies. Internet WWW. Protocols TCP/IP HTTP. Apache. Next Time. Lecture # Apache.

Application Example: WWW. Communication in the WWW. WWW, HTML, URL and HTTP. Loading of Web Pages. The Client/Server model is used in the WWW

CloudOYE CDN USER MANUAL

Lecture 2. Internet: who talks with whom?

Architecture of So-ware Systems HTTP Protocol. Mar8n Rehák

Content'Delivery'Infrastructure' HTTP'Overview' A'Web'Page' Computer Networks. Lecture'9:'HTTP'

SWE 444 Internet and Web Application Development. Introduction to Web Technology. Dr. Ahmed Youssef. Internet

Streaming Stored Audio & Video

Proxies. Chapter 4. Network & Security Gildas Avoine

LabVIEW Internet Toolkit User Guide

Content Delivery Networks

How To Understand The Power Of A Content Delivery Network (Cdn)

Blue Coat Security First Steps Solution for Deploying an Explicit Proxy

DATA COMMUNICATOIN NETWORKING

Single Pass Load Balancing with Session Persistence in IPv6 Network. C. J. (Charlie) Liu Network Operations Charter Communications

Introduction to Computer Security

DEPLOYMENT GUIDE Version 1.1. Deploying the BIG-IP LTM v10 with Citrix Presentation Server 4.5

Front-End Performance Testing and Optimization

Internet Technologies 4-http. F. Ricci 2010/2011

Web Programming. Robert M. Dondero, Ph.D. Princeton University

M3-R3: INTERNET AND WEB DESIGN

Connecting with Computer Science, 2e. Chapter 5 The Internet

Implementing Reverse Proxy Using Squid. Prepared By Visolve Squid Team

Internet Information TE Services 5.0. Training Division, NIC New Delhi

Content Distribu-on Networks (CDNs)

Domain Name System (DNS)

Outline Definition of Webserver HTTP Static is no fun Software SSL. Webserver. in a nutshell. Sebastian Hollizeck. June, the 4 th 2013

Chapter 6 Virtual Private Networking Using SSL Connections

Introduction to Web Technology. Content of the course. What is the Internet? Diana Inkpen

Data Communication I

Configuration Guide. BES12 Cloud

No. Time Source Destination Protocol Info HTTP GET /ethereal-labs/http-ethereal-file1.html HTTP/1.

Internet Content Distribution

Introduction to Computer Security Benoit Donnet Academic Year

THE PROXY SERVER 1 1 PURPOSE 3 2 USAGE EXAMPLES 4 3 STARTING THE PROXY SERVER 5 4 READING THE LOG 6

DEPLOYMENT GUIDE DEPLOYING THE BIG-IP LTM SYSTEM WITH CITRIX PRESENTATION SERVER 3.0 AND 4.5

DEERFIELD.COM. DNS2Go Update API. DNS2Go Update API

FAQs for Oracle iplanet Proxy Server 4.0

APACHE WEB SERVER. Andri Mirzal, PhD N

Apache Server Implementation Guide

Deployment Guide Microsoft IIS 7.0

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

2 Downloading Access Manager 3.1 SP4 IR1

Web Application Firewall

BITS-Pilani Hyderabad Campus CS C461/IS C461/CS F303/ IS F303 (Computer Networks) Laboratory 3

Lecture 11 Web Application Security (part 1)

Cookies Overview and HTTP Proxies

DEPLOYMENT GUIDE Version 1.2. Deploying the BIG-IP system v10 with Microsoft Exchange Outlook Web Access 2007

Instructor: Betty O Neil

DEPLOYMENT GUIDE Version 1.2. Deploying the BIG-IP System v10 with Microsoft IIS 7.0 and 7.5

Using TestLogServer for Web Security Troubleshooting

Transport Layer Security Protocols

Terminology. Internet Addressing System

1. The Web: HTTP; file transfer: FTP; remote login: Telnet; Network News: NNTP; SMTP.

Remote login (Telnet):

The World Wide Web: History

CTIS 256 Web Technologies II. Week # 1 Serkan GENÇ

Guide to Analyzing Feedback from Web Trends

TCP/IP Networking An Example

Transcription:

Goals of Today s Lecture EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats http://inst.eecs.berkeley.edu/~ee122/ (Materials with thanks to Vern Paxson, Jennifer Rexford, and colleagues at UC Berkeley) Main ingredients of the Web URIs, HTML, HTTP Key properties of HTTP Request-response, stateless, and resource meta-data Performance of HTTP Parallel connections, persistent connections, pipelining Web components, proxies, and servers Caching vs. replication 1 2 The Web History (I) The Web History (II) Vannevar Bush (1890-1974) 1945: Vannevar Bush, Memex: "a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility" Ted Nelson 1967, Ted Nelson, Xanadu: A world-wide publishing network that would allow information to be stored not as separate files but as connected literature Owners of documents would be automatically paid via electronic means for the virtual copying of their documents Coined the term Hypertext (See http://www.iath.virginia.edu/elab/hfl0051.html) Memex 3 4 The Web History (III) Web Components Tim Berners-Lee World Wide Web (WWW): a distributed database of pages linked through Hypertext Transport Protocol (HTTP) First HTTP implementation - 1990 Tim Berners-Lee at CERN HTTP/0.9 1991 Simple GET command for the Web HTTP/1.0 1992 Client/ information, simple caching HTTP/1.1-1996 5 Content Objects Send requests / Receive responses s Receive requests / Send responses Store or generate the responses Proxies Placed between clients and servers Act as a server for the client, and a client to the server Provide extra functions Caching, anonymization, logging, transcoding, filtering access Explicit or transparent ( interception ) 6 1

HTML A Web page has several components Base HTML file Referenced objects (e.g., images) Content: How? HyperText Markup Language (HTML) Representation of hypertext documents in ASCII format Web browsers interpret HTML when rendering a page Several functions: Format text, reference images, embed hyperlinks (HREF) Straight-forward to learn Syntax easy to understand Authoring programs can auto-generate HTML Source almost always available 7 URI Content: How? Uniform Resource Identifier (URI) Uniform Resource Locator (URL) Provides a means to get the resource http://www.ietf.org/rfc/rfc3986.txt! Uniform Resource Name (URN) Names a resource independent of how to get it urn:ietf:rfc:3986 is a standard URN for RFC 3986 8 URL Syntax Content: How? protocol://hostname[:port]/directorypath/resource (e.g., http://inst.eecs.berkeley.edu/~ee122/fa08/index.html)! protocol hostname port directory path resource http, ftp, https, smtp, rtsp, etc. Fully Qualified Domain Name (FQDN), IP address Defaults to protocol s standard port e.g. http: 80/tcp https: 443/tcp Hierarchical, often reflecting file system Identifies the desired resource Can also extend to program executions: http://us.f413.mail.yahoo.com/ym/showletter?box= %40B %40Bulk&MsgId=2604_1744106_29699_1123_1261_0_28917 9 _3552_1289957100&Search=&Nhead=f&YY=31454&order=do wn&sort=date&pos=0&view=a&head=b HTTP Client-: How? HyperText Transfer Protocol (HTTP) Client-server protocol for transferring resources Important properties: Request-response protocol Resource metadata Stateless ASCII format % telnet www.cs.berkeley.edu 80! GET /istoica/ HTTP/1.0! 10 <blank line, i.e., CRLF> HTTP Big Picture Finish display page Client Request image 1 Transfer image 1 Request image 2 Transfer image 2 Request text Transfer text Client-to- Communication HTTP Request Message Request line: method, resource, and protocol version Request headers: provide information or modify request Body: optional data (e.g., to POST data to the server) request line GET /somedir/page.html HTTP/1.1 Host: www.someschool.edu header User-agent: Mozilla/4.0 lines Connection: close Accept-language: fr (blank line) 11 carriage return line feed indicates end of message 12 2

Client-to- Communication HTTP Request Message Request line: method, resource, and protocol version Request headers: provide information or modify request Body: optional data (e.g., to POST data to the server) Request methods include: GET: Return current value of resource, run program, HEAD: Return the meta-data associated with a resource POST: Update resource, provide input to a program, Headers include: Useful info for the server (e.g. desired language) 13 -to-client Communication HTTP Response Message Status line: protocol version, status code, status phrase Response headers: provide information Body: optional data status line (protocol, status code, status phrase) header lines data e.g., requested HTML file HTTP/1.1 200 OK Connection close Date: Thu, 06 Aug 2006 12:00:15 GMT : Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 2006... Content-Length: 6821 Content-Type: text/html (blank line) data data data data data... 14 -to-client Communication HTTP Response Message Status line: protocol version, status code, status phrase Response headers: provide information Body: optional data Response code classes Similar to other ASCII app. protocols like SMTP Code Class Example 1xx Informational 100 Continue! 2xx Success 200 OK! 3xx Redirection 304 Not Modified! 4xx Client error 404 Not Found! Web : Generating a Response Return a file URL matches a file (e.g., /www/index.html) returns file as the response generates appropriate response header Generate response dynamically URL triggers a program on the server runs program and sends output to client Return meta-data with no body 5xx error 503 Service Unavailable! 15 16 HTTP Resource Meta-Data HTTP is Stateless Client-: How? Meta-data Info about a resource A separate entity Examples: Size of a resource Last modification time Type of the content Data format classification e.g., Content-Type: text/html Enables browser to automatically launch an appropriate viewer Borrowed from e-mail s Multipurpose Internet Mail Extensions (MIME) Usage example: Conditional GET Request Client requests object If-modified-since If object hasn t changed, server returns HTTP/1.1 304 Not Modified No body in the server s response, only a header 17 Stateless protocol Each request-response exchange treated independently s not required to retain state This is good Improves scalability on the server-side Don t have to retain info across requests Can handle higher rate of requests Order of requests doesn t matter This is bad Some applications need persistent state Need to uniquely identify user or store temporary info e.g., Shopping cart, user preferences and profiles, usage tracking, 18 3

State in a Stateless Protocol: Cookies Client-side state maintenance Client stores small(?) state on behalf of server Client sends state in future requests to the server Can provide authentication Request Response Set-Cookie: XYZ Request Cookie: XYZ 19 Putting All Together Client : How? Client- Request-Response HTTP Stateless Get state with cookies Content URI/URL HTML Meta-data 20 Web Browser Is the client Generates HTTP requests User types URL, clicks a hyperlink or bookmark, clicks reload or submit Automatically downloads embedded images Submits the requests (fetches content) Via one or more HTTP connections Presents the response Parses HTML and renders the Web page Invokes helper applications (e.g., Acrobat, MediaPlayer) Maintains cache Stores recently-viewed objects and ensures freshness Web Browser History 1990, WorldWideWeb, Tim Berners-Lee, NeXT computer 1993, Mosaic, Marc Andreessen and Eric Bina 1994, Netscape 1995, Internet Explorer. 21 22 Web Serve Handle client request: 1. Accept a TCP connection 2. Read and parse the HTTP request message 3. Translate the URI to a resource 4. Determine whether the request is authorized 5. Generate and transmit the response Web site vs. Web server Web site: one or more Web pages and objects united to provide the user an experience of a coherent collection Web server: program that satisfies client requests for Web resources 23 5 Minute Break Questions Before We Proceed? 24 4

HTTP Performance Fetch HTTP Items: Stop & Wait Most Web pages have multiple objects ( items ) e.g., HTML file and a bunch of embedded images How do you retrieve those objects? One item at a time Client Start fetching page Request item 1 Transfer item 1 Request item 2 Transfer item 2 Time What transport behavior does this remind you of? Finish; display page Request item 3 Transfer item 3 2 RTTs per object 25 26 Concurrent Requests & Responses Pipelined Requests & Responses Use multiple connections in parallel Does not necessarily maintain order of responses Client = Why? = Why? Network = Why? R1 T1 Is this fair? N parallel connections use bandwidth N times more aggressively than just one What s a reasonable/fair limit as traffic competes with that of other users? R2 T2 R3 T3 27 Batch requests and responses Reduce connection overhead Multiple requests sent in a single batch Small items (common) can also share segments Maintains order of responses Item 1 always arrives before item 2 How is this different from concurrent requests/responses? Client Request 1 Request 2 Request 3 Transfer 1 Transfer 2 Transfer 3 28 Persistent Connections Enables multiple transfers per connection Maintain TCP connection across multiple requests Including transfers subsequent to current page Client or server can tear down connection Performance advantages: Avoid overhead of connection set-up and tear-down Allow TCP to learn more accurate RTT estimate Allow TCP congestion window to increase i.e., leverage previously discovered bandwidth Default in HTTP/1.1 Can use this to batch requests on a single connection Improving HTTP Performance Many clients transfer same information Generates redundant server and network load experience unnecessary latency Example: 5 objects, RTT=50ms 29 30 5

Caching Caching on the Client How? Modifier to GET requests: If-modified-since returns not modified if resource not modified since specified time Response header: Expires how long it s safe to cache the resource No-cache ignore all caches; always get resource directly from server 31 Example: Conditional GET Request Return resource only if it has changed at the server Save server resources! Request from client to server: GET /~ee122/fa08/ HTTP/1.1! Host: inst.eecs.berkeley.edu! User-Agent: Mozilla/4.03! If-Modified-Since: Sun, 27 Aug 2006 22:25:50 GMT! <CRLF>! How? Client specifies if-modified-since time in request compares this against last modified time of desired resource returns 304 Not Modified if resource has not changed. or a 200 OK with the latest version otherwise 32 Caching with Reverse Proxies Cache documents close to server decrease server load Typically done by content providers Only works for static content Caching with Forward Proxies Cache documents close to clients reduce network traffic and decrease latency Typically done by ISPs or corporate LANs Reverse proxies Reverse proxies Forward proxies 33 34 Caching w/ Content Distribution Networks Caching with CDNs (cont.) Integrate forward and reverse caching functionality One overlay network (usually) administered by one entity e.g., Akamai Provide document caching Pull: Direct result of clients requests Push: Expectation of high access rate Also do some processing Handle dynamic web pages Forward proxies CDN Transcoding 35 36 6

Example: Akamai Example: Akamai Akamai creates new domain names for each client content provider. e.g., a128.g.akamai.net The CDN s DNS servers are authoritative for the new domains The client content provider modifies its content so that embedded URLs reference the new domains. DNS server for www.cnn.com www.cnn.com Akamaizes its content. get http://www.cnn.com lookup a128.g.akamai.net local DNS server akamai.net DNS servers Akamaize content, e.g.: http://www.cnn.com/imageof-the-day.gif becomes http://a128.g.akamai.net/image- Akamaized response object has inline URLs for secondary content at a128.g.akamai.net of-the-day.gif and other Akamai-managed DNS names. 37 38 a Akamai servers store/ cache secondary content for Akamaized services. b c Caching vs. Replication Why move content closer to users? Reduce latency for the user Reduce load on the network and the server How? Caching Replicate content on demand after a request Store the response message locally for future use Challenges: May need to verify if the response has changed and some responses are not cacheable Replication Planned replication of content in multiple locations Update of resources handled outside of HTTP Can replicate scripts that create dynamic responses 39 Conclusions Key ideas underlying the Web Uniform Resource Identifier (URI), Locator (URL) HyperText Markup Language (HTML) HyperText Transfer Protocol (HTTP) Browser helper applications based on content type Performance implications Concurrent connections, pipelining, persistent conns. Main Web components, servers, proxies, CDNs Next lecture: drilling down to the link layer K & R 5.1-5.3, 5.4.1 40 7