Evolution of the WWW Communication in the WWW World Wide Web (WWW) Access to linked documents, which are distributed over several computers in the History of the WWW Origin 1989 in the nuclear research laboratory CERN in Switzerland. Developed to exchange data, figures, etc. between a large number of geographically distributed project partners via. First text-based version in 1990. First graphic interface (Mosaic) in February 1993, developed on to Netscape, Explorer Standardization by the WWW consortium (http//www.w3.org). The Client/Server model is used Client (a Browser) Presents the actually loaded WWW page Permits navigating in the network (e.g. through clicking on a hyperlink) Offers a number of additional functions (e.g. external viewer or helper applications). Usually, a browser can also be used also for other services (e.g. FTP, e-mail, news, ). Server Process which manages WWW pages. Is addressed by the client e.g. through indication of an URL (Uniform Resource Locator = logical address of a web page). The server sends the requested page (or file) back to the client. Page 1 Page 2 WWW, HTML, URL and HTTP HTTP - Format WWW stands for World Wide Web and means the world-wide cross-linking of information and documents. command URL GET http//server.name/path/file.type The standard protocol used between a web server and a web client is the HyperText Protocol (HTTP). uses the TCP port 80 defines the allowed requests and responses is an ASCII protocol protocol HTTP server domain name path name file name Each web page is addressed by a unique URL (Uniform Resource Locator) (e.g. http//www-i4.informatik.rwth-aachen.de/education/tcpip). The standard language for web documents is the HyperText Markup Language (HTML). Page 3 GET http// www.informatik.rwth-aachen.de / info / general.html Instructions on a URL are GET Load a web page HEAD Load only the header of a web page PUT Store a web page on the server POST Append something to the request passed to the web server DELETE Delete a web page Page 4
Loading of Web Pages Browser PC TCP/IP network Browser asks DNS for the IP address of the server DNS answers Browser opens a TCP connection to port 80 of the computer Browser sends the command GET /info/general.html WWW server DNS server Loading of Web Pages Example Call of the URL http//www.informatik.rwth-aachen.de/material/general.html 1. The Browser determines the URL (which was clicked or typed). 2. The Browser asks the DNS for the IP address of the server www.informatik.rwthaachen.de. 3. DNS answers with 137.226.116.241. 4. The browser opens a TCP connection to port 80 of the computer 137.226.116.241 5. Afterwards, the browser sends the command GET /material/general.html 6. The WWW server sends back the file general.html. 7. The connection is terminated. 8. The browser analyzes the WWW page general.html and presents the text. 9. If necessary, each picture is reloaded over a new connection to the server (The address is included in the page general.html in form of an URL). WWW server sends back the file general.html Connection is terminated Page 5 Note! Step 9 applies only to HTTP/1.0! With the newer version HTTP/1.1 all referenced pictures are loaded before the connection termination (more efficiently for pages with many pictures). Page 6 Usual URLs HTTP Request Header The main application are web pages, but URLs are usable for other types of documents also URL name http FTP Hypertext (HTML) FTP Used for http//www.cs.vu.nl/~ast Example FTP//FTP.cs.vu.nl/pub/minix/README method sp URL sp version cr lf Request line necessary part, e.g. GET server.name/path/file.type Header lines optionally, further information to the host/document, e.g. Accept-language fr file news news Gopher Local File Newsgroup News Article Gopher file///usr/suzanne/prog.c newscomp.os.minix newsaa0134223112cs.utah.edu gopher//gopher.tc.umn.edu/11/libraries header field name calue cr lf cr lf Entity Body optionally. Further data, if the Client transmits data (POST method) mailto E-Mail mailtojohn@acm.org telnet Remote login telnet//www.w3.org80 sp space cr/lf carriage return/line feed Page 7 Page 8
HTTP Response Header version sp status code sp phrase cr lf cr lf Entity Body inquired data HEAD method the server answers, but does not transmit the inquired data (debugging) Status LINE status code and phrase indicate the result of an inquiry and an associated message, e.g. 200 OK 400 Bad Request 404 Not Found Groups of status messages 1xx Only for information 2xx Successful inquiry 3xx Further activities are necessary 4xx Client error (syntax) 5xx Server error Page 9 Proxy Server A Proxy is an intermediate entity used by several browsers. It takes over tasks of the browsers (complexity) and servers for more efficient page loading! HTTP Browser Proxy Server Server e.g. HTTP Caching of WWW pages A proxy temporarily stores the pages loaded by browsers. If a page is requested by a browser which already is in the cache, the proxy controls whether the page has changed since storing it. If not, the page can be passed back from the cache. If yes, the page is normally loaded from the server and again stored in the cache, replacing the old version. Support when using additional protocols A browser enables also access to FTP, News, Gopher or telnet servers etc. Instead of implementing all protocols in the browser, it can be realized the proxy. The proxy then speaks HTTP with the browser and e.g. FTP with a FTP server. Integration into a Firewall The proxy can deny the access to certain web pages (e.g. inside companies). Page 10 FTP - File Protocol FTP - Confirmed file transfer FTP is the standard for the transmission of files FTP is used to copy a complete file from one computer to another FTP offers different possibilities apart from the pure file transfer Interactive access by the user (e.g. change of directories) Format specification (binary or text files, ASCII or EBCDIC code) Authentication (login name and password) Structure FTP-Client FTP-Server FTP Client A Connection setup TCP (port 21) FTP connect to the server Login Login OK Password Password OK logged in GET file B FTP server Connection setup TCP (port 20) exchange Control Process Control Kontrollverbindung Connection Datenverbindung Connection Control Process Operating Betriebssystem System Operating Betriebssystem System Port B Port A Port 21 Port 20 TCP/IP Network Netz Termination of TCP connection (port 20) command QUIT Termination of connection (port 21) Page 11 Page 12
Course of an FTP Session FTP - Commands 1. The FTP client selects a random port number A for itself. 2. It contacts the master control process of the FTP server on port 21 (for FTP control connections). The login name and password are being queried. 3. The FTP server provides a slave control process for the control connection between client Port A and server port 21. 4. Over the control connection, the FTP client can send commands (e.g. folder directory, showing of directory contents, transfer of a file). 5. If the FTP client requests a file, then it first selects a random port B, sets up a data transfer process and tells the server the port number by using the control connection. 6. The FTP server sets up a data transfer process with local port 20 (standard port for FTP data connections), which accepts only connections of port B of the FTP client (by this it is ensured that the transfer is made to the correct process no further authentication is necessary). 7. A connection between server port 20 and client port B is established, the data are being transferred. 8. Afterwards, the connection is terminated and both data transfer processes terminate as well (i.e. each transfer needs new processes and a new connection). Command open disconnect user cd lcd pwd get/mget put/mput binary ascii dir/ls help delete bye Effect Connect to the FTP server Terminate the FTP session Send user information after connecting Change directory on the remote computer Change directory on the own computer Show the path to the directory on the remote computer The client receives a (resp. several) document The client sends a (resp. several) document Set the transmission mode to binary Set the transmission mode to ASCII List contents of the remote directory Help for commands Delete a remote file Terminate the FTP session, abort Page 13 Page 14 FTP - Responses TFTP - Trivial File Protocol Response Effect Tentative positive response the action was started, but the client 1yz must be waiting for another response. 2yz The inquiry was completely worked on. 3yz Positive intermediate response the command was accepted, but a further command is expected. 4yz Temporary negative response the request was not worked on, but the reason for the fault is only temporarily, later on a repetition can take place. 5yz Durably negative response the command was not accepted and should not be repeated. x0z Syntax error x1z Information x2z The message refers to the connection x3z Answers to login commands x4z No fixed usage x5z Status of the file system Page 15 TFTP is a very simple protocol for file transfer Communication runs over port 69 and uses UDP, not TCP TFTP does not have authentication TFTP always uses 512-byte blocks TFTP Client A GET Path/file.name (512 bytes) ACK (512 bytes) (512 bytes) ACK (350 bytes) ACK B TFTP Server Timeout Page 16
Electronic Mail E-Mail Early systems A simple file transmission took place, with the convention that the first line contains the address of the receiver of the file. Problems E-Mail to groups, structuring of the e-mail, delegation of the administration to a secretary, file editor as user interface, no mixed media Solution X.400 as standard for e-mail transfer. This specification was however too complex and badly designed. Generally accepted only became a simpler system, cobbled together by a handful of computer science students the Simple Mail Protocol (SMTP). Electronic Mail E-Mail An e-mail system generally consists of two subsystems (UA, normal e-mail program) Usually runs on the computer of the user and helps during the processing of e-mails Creation of new and answering of old e-mail Receipt and presentation of e-mail Administration of received e-mail (MTA, e-mail server) Usually runs in the background (around the clock) Delivery of e-mail which is sent by s Intermediate storage of messages for users or other s Page 17 Page 18 Structure of an E-Mail E-Mail Header For sending an e-mail, the following information is needed from the user (usually normal text + attachments, e.g. word file, GIF image ) Destination address (in general in the form mailbox@location, e.g. thissen@informatik.rwth-aachen.de) Possibly additional parameters concerning e.g. priority or security E-Mail formats two used standards RFC 822 MIME (Multipurpose Mail Extensions) With RFC 822 an e-mail consists of a simple envelope (created by the based on the data in the e-mail header), a set of header fields (each one line ASCII text), a blank line, and the actual message ( Body). Header Meaning To Address of the main receiver (possibly several receivers or also a mailing list) Cc Carbon copy, e-mail addresses of less important receivers Bcc Blind carbon copy, a receiver which is not indicated to the other receivers From Person who wrote the message Sender Address of the actual sender of the message (possibly different to From person) Received One entry per on the path to the receiver Return Path Path back to the sender (usually only e-mail address of the sender) Date Transmission date and time Reply to E-Mail address to which answers are to be addressed -Id Clear identification number of the e-mail (for later references) In-Reply-to -Id of the message to which the answer is directed References Other relevant -Ids Subject One line to indicate the contents of the message (is presented the receiver) Page 19 Page 20
E-Mail Header RFC 822 only suitably for messages of pure ASCII text without special characters. Nowadays demanded additionally E-Mail in languages with special characters (e.g. French or German) E-Mail in languages not using the Latin alphabet (e.g. Russian) E-Mail in languages not at all using an alphabet (e.g. Japanese) E-Mail not completely consisting of pure text (e.g. audio or video) MIME keeps the RFC-822 format, but additionally defines a structure in the Body (by using additional headers), and coding rules for non-ascii characters. Header Meaning MIME-Version Content-Description Content-Id Content-- Encoding Content-Type Used version of MIME is marked String which describes the contents of the message Clear identifier for the contents Coding which was selected for the contents of the email (some networks understand e.g. only ASCII characters). Examples base64, quoted-printable Type/Subtype regarding RFC 1521, e.g. text/plain, image/jpeg, multi-part/mixed MIME MIME-Version 1.0 Content-Type MULTIPART/MIXED; BOUNDARY= "8323328-2120168431-824156555=325" --8323328-2120168431-824156555=325 Content-Type TEXT/PLAIN; charset=us-ascii A picture is in the appendix --8323328-2120168431-824156555=325 Content-Type IMAGE/JPEG; name="picture.jpg" Content--Encoding BASE64 Content-ID <PINE.LNX.3.91.960212212235.325B@localhost> Content-Description /9j/4AAQSkZJRgABAQEAlgCWAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQIBAQEBA QIBAQECAgICAgICAgIDAwQDAwMDAwICAwQDAwQEBAQEAgMFBQQEBQQEBAT/ 2wBDAQEBAQEBAQIBAQIEAwIDBAQEBA [ ] KKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAoooo AKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiig AooooAD//Z ---8323328-2120168431-824156555=325 Page 21 Page 22 E-Mail over POP3 and SMTP E-Mail over POP3 and SMTP Simple Mail Protocol (SMTP) Sending e-mails over a TCP connection (port 25) SMTP is a simple ASCII protocol Without checksums, without encryption Receiving machine is the server and begins with the communication If the server is ready for receiving, it signals this to the client. This sends the information from whom the e-mail comes and who the receiver is. If the receiver is known to the server, the client sends the message, the server confirms the receipt. Post Office Protocol version 3 (POP3) Get e-mails from the server over a TCP connection, port 110 Commands for logging in and out, message download, deleting messages on the server (maybe without transferring them to the client) Only copies e-mails of the remote server to the local system 1 writes an e-mail Client 1 (UA 1) formats the e-mail, produces the receiver list, and sends the e-mail to its mail server (MTA 1) Server 1 (MTA 1) Sets up a connection to the SMTP server (MTA 2) of the receiver and sends a copy of the e-mail Server 2 (MTA 2) Produces the header of the e-mail and places the e-mail into the appropriate mailbox Client 2 (UA 2) sets up a connection to the mail server and authenticates itself with username and password (unencrypted!) Server (MTA 2) sends the e-mail to the client Client 2 (UA 2) formats the e-mail 2 reads the e-mail 1 1 2 2 SMTP POP3 SMTP Page 23 Page 24
SMTP - Command Sequence Communication between partners (from abc.com to beta.edu) in text form of the following kind S 220 <beta.edu> Service Ready /* Receiver is ready/* C HELO <abc.com> /* Identification of the sender/* S 250 <beta.edu> OK /* Server announces itself */ C MAIL FROM<Krogull@abc.com> /* Sender of the e-mail */ S 250 OK /* Sending is permitted */ C RCPT TO<Bolke@beta.edu> /* Receiver of the e-mail */ S 250 OK /* Receiver known */ C DATA /* The data are following */ S 354 Start mail inputs; end with <crlf>.<crlf> on a line by itself C From Krogull@. <crlf>.<crlf> /* of the whole e-mail, including all headers */ S 250 OK C QUIT /* Terminating the connection */ S 221 <beta.edu> Server Closing S = server, receiving MTA / C = Client, sending MTA Page 25 POP3 Get e-mails from the server by means of POP3 Client (UA) PC TCP/IP network TCP connection port 110 Greetings Commands Replies POP3 Server (MTA) Minimal protocol with only two command types Copy e-mails to the local computer Delete e-mails from the server Authorizing phase USER name PASS string Transaction phase STAT LIST [msg] RETR msg DELE msg NOOP RSET QUIT Page 26 POP3 Protocol IMAP as POP3 Variant Authorizing phase user identifies the user pass is its password +OK or -ERR are possible server answers Transaction phase list for the listing of the message numbers and the message sizes retr to requesting a message by its number dele deletes the appropriate message S +OK POP3 server ready C user alice S +OK C pass hungry S +OK user successfully logged in C list S 1 498 S 2 912 S. C retr 1 S <message 1 contents> S. C dele 1 C retr 2 S <message 2 contents> S. C dele 2 C quit S +OK Page 27 Enhancement of POP3 IMAP (Interactive Mail Access Protocol) TCP connection over port 143 E-Mails are not downloaded and stored locally, but remain on the server The client performs all actions remotely. This is suitable for users who need access to their e-mails from different hosts Protocols are more complex than with POP3 set up and manage remote mailboxes Meanwhile also many operators of web pages offer email services gmx, web.de, yahoo, Here finally again HTTP serves as protocol for the access to the e-mails. The management is similar as with IMAP, only that the client is integrated into the web server. Page 28