QAFQAZ UNIVERSITY Computer Engineering Department Internet Technologies Internet Protocols and Services Dr. Abzetdin ADAMOV Chair of Computer Engineering Department aadamov@qu.edu.az http://ce.qu.edu.az/~aadamov
HTTP Protocol Hyper Text Transfer Protocol is one of the most frequently using application layer network protocol designed for data communication in World Wide Web. HTTP protocol has been developed by Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C), as a result was published as RFC 2616 in 1999. HTTP defines a simple request-response language A web client establishes a connection with a web server by using HTTP HTTP defines how to correctly phrase the request and how the response should look like Note: HTTP does not define how the network connection is made or managed, nor how the information is actually transmitted; it is done by the lower-level protocols such as TCP/IP
Uniform Resource Locator HTTP client uses Uniform Resource Identifiers (URIs) in their request in order to specify an address of the document to retrieve. URI is described in RFC 3986 and URL (Uniform Resource Locator) which is better known version of URI updated by IETF (RFC 1738). Figure shows the syntax and components of URL standard. The URL is the syntax of specifying the location of resource as well as the mechanism for retrieving it. Actually, Hyper Text Transfer Protocol, Uniform Resource Locator and Hyper Text Markup Language in conjunction form the World Wide Web.
HTTP request-response communication One time HTTP request-response communication between client and server is called HTTP session. The typical HTTP session diagram is shown in Figure. Any HTTP session is initiated by user agent s (browser) request. It establishes TCP connection to a particular port of a server. Generally, a Web-server listens 80 th port waiting for client s request. After processing the request, the Web-server sends back a status code, description about requested resource and instructions (in the form of headers), resource itself, an error message (in the case of wrong request)
HTTP request HTTP request consist of following 4 components: 1. Request Line: It includes request method (indicates the purpose of client request), resource path and name (URI), and HTTP version supported by user agent, for example GET /index.html HTTP/1.1. At the table 3.1 to you can see the list and purposes of the most important methods. 2. Request Header Fields: enables client to send additional information about requested resource as well to introduce itself to server (all header fields except host are optional). At the table 3.2 presents the list of Request Header Fields and their purposes. 3. An empty line to indicate the end of header field 4. Message body supposed to be sent to a server
An example of HTTP request GET /index.php HTTP/1.1 Host: qu.edu.az Connection: keep-alive Referer: http://www.google.com/?q=qafqaz%20university User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.68 Safari/534.24 Accept: text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Encoding: gzip,deflate,sdch Accept-Language: en-us,en;q=0.8 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cache-Control: no-cache
HTTP request methods Method GET POST HEAD PUT DELETE CONNECT Description Request the specified resource from server; may be used to retrieve the information only Send information to be stored on the server. Generally is used to submit Web form data to be processed on server Works like GET request, but returns only meta-information about the resource not the content of the resource Send a new copy of an existing resource to the server. Permanently delete the specified resource from server. Reserved for TCP/IP tunneling to facilitate SSL implementation (e.g. HTTPS)
HTTP request header fields Header Fields Description Host Specifies the host and port resource requested from Accept User agent uses this field to specify certain media types (MIME) it can accept as response User-Agent Provide descriptive information about user agent performed request and operating system on which user agent works Accept- Encoding Uses to restrict the content-encoding types that are acceptable as a response Accept- Language Specifies preferred human languages of user agent which made request Accept-Charset Specifies preferable charsets for the content comes as response Cookie Specifies the name and value of cookie previously sent by the server with Set-Cookie header Allows user agent to inform server about address (URI) from which Referer request address (URI) was taken (may be used for statistics, optimization, etc.) Connection Specifies the kind of connection preferred by user agent Cache-Control Enables user agent to allow or forbid caching of response content
HTTP response 1. Status Line: starts with HTTP protocol version followed by status code and its associated reason phrase. The status code is intended to inform client agent about the status of response before the client starts interpretation of response (look at the tables 3.3 and 3.4). 2. Response Header Fields: allow the server to send instructions for client agent, additional information about server as well as descriptive and instructive information about resource requested by the URI. 3. Information: body of the resource
HTTP response Status Code HTTP response Status Codes classification Class Purpose Description 1xx 2xx Informational Success Server uses this codes to inform client about the state of request accomplishing Informs about successful acceptance and accomplishment of the request 3xx Redirection Request should be redirected to another location (URI) 4xx 5xx Client-side Error Server-side Error Request cannot be accomplished due to syntax error in the request Request cannot be accomplished due to some server-side problems
HTTP response Status Code Code Reason Phrase Description 200 Document follows The request succeeded. The information requested follows. 301 Moved Permanently The document has moved to a new URL 302 Moved Temporarily The document has moved temporarily to a new URL 304 Not Modified 404 Not Found 401 Unauthorized 402 Payment Required 403 Forbidden Access is forbidden The document has not been modified since the date specified in a GET request with if-modified-since. The information could not be found or permission was denied. This error is returned if the requested URL does not exist or was misspelled The information is restricted; please retry with proper authentication. The information requires paying a fee; please retry with proper payment (not used often) 500 Server Error The server experienced an error
HTTP response header fields Header Fields Server Date Content-Length Content-Type Content-Language Content-Encoded Last-Modified Connection Expires Allow Location Refresh Transfer-Encoding Set-Cookie WWW-Authenticate Description Server presents its software working and operating system this software based on. Actual date and time of server response Indicates how many bytes are going to be sent as response The MIME type of content which forms response Specify the language of content Indicates the type of encoding, for example, gzip Specify the last date/time when document was changed last time Specify desired option for communication after response Date/time after which cached document will consider stale Indicate request methods supported by server Specify the resource new location (URI) in the of redirection (302 status code) Indicates how soon client agent should apply for update (request) of resource. This field may also be used for redirection after particular amount of time Indicate the transformation applied to the message in order to transform from server to client agent Creates a cookie related with resource Used to specify authentication scheme when resource requires authentication (401 status code) to be accessed
Example of HTTP server response Successful (status code 200) server response: HTTP/1.1 200 OK Date: Sat, 21 May 2011 02:43:55 GMT Server: Apache/2.2.16 (Unix) mod_ssl/2.2.16 OpenSSL/0.9.8a PHP/5.3.3 Expires: Mon, 26 Jul 1990 05:00:00 GMT Cache-Control: no-cache, must-revalidate Last-Modified: Mon, 16 May 2011 13:14:11 GMT Set-Cookie: lang_id=2; expires=sat, 28-May-2011 02:43:55 GM Connection: close Transfer-Encoding: chunked Content-Type: text/html; charset=utf-8 Failure (status code 404) server response: HTTP/1.1 404 Not Found Date: Sat, 21 May 2011 02:55:10 GMT Server: Apache/2.2.16 (Unix) mod_ssl/2.2.16 OpenSSL/0.9.8a PHP/5.3.3 Content-Length: 0 Connection: close Content-Type: text/html; charset=utf-8
MIME Standard MIME (Multipurpose Internet Mail Extensions) or Content-Type is a standard was designed to indicate file or information formats to use in SMTP protocols, but actually it is used by other protocols like HTTP. Various formats were defined in RFC 2046.
HTTP session management Each moment Web-server can serve hundreds client requests, what is more, all of them request different resources. How does server manage to send right response to particular request. The typical session management scenario consists of following interactions: 1. User agent sends an HTTP request; 2. Web-server sends back an HTTP response that includes instruction (Set- Cookie header) to create cookie; 3. User agent send an HTTP request that includes cookies (Cookie header) received from server; 4. Web-server sends back an HTTP response; 5. Steps 3 and 4 may be repeated until the cookie is expired.
HTTP session management (cont) HTTP session management by cookie:
Electronic Mail The email address standard is described in RFC 5322 (section 3.4.1). According to standard an email address is a string of ASCII characters separated into two parts: the part before symbol @ called local-part " or username, and the part after @" symbol called domain name, which is the destination of the email. In the same way as the postal mailbox cluster may have one or many mailboxes with same address, each domain may contain one or even millions email-boxes (like Gmail, Hotmail, Yahoo, etc.).
Email Using Progress by Years Electronic mail or email is the method of delivering the digital message to one or more recipients in electronic environment. Email is one of the most popular services of Internet Email increases the ability of people to communicate even if they are not geographically local.
Email General Structure As other Internet services the email system based on Internet standards and some dedicated protocols. There are a lot of different email protocols implemented by different email servers, however, some of them are common for all email servers and email clients: Basic email format standard RFC 5322 Multipurpose Internet Mail Extensions (MIME) standard Simple Message Transfer Protocol (SMTP) Post Office Protocol (POP3) or Internet Message Access Protocol (IMAP)
The Internet Massage (Email) Format The first Internet message standard was described by RFC 733 in 1977, which was renewed by RFC 822 in 1982 had been using for almost twenty years. The newest email standard is described in RFC 5322 was published in 2008. According to the last standard the Internet message (or email) consists of an envelope and content (for further more information see RFC 5321).
Header From: Date: Message-ID: In-Reply-To: To: Cc: Bcc: Subject: Content Type: Received: References: Keywords: Reply-To: Return-Path: Delivered-To: Sender: Internet Message Header Fields Description The name and email address of the message originator The local date and time when the message was written or sent Machine readable unique identifier generated by mail server; designated to prevent multiple delivery, and to use as reference in In-Reply-To Used for reply messages only, and contain Message-ID of the original message(s), creating relational tree of messages Email address(es) of the primary recipient(s) Email address(es) of the secondary recipient(s). Generally, used to indicate recipients whose don t have immediate relation to the matter, however should be informed Same as Cc, but hidden from recipients. SMTP removes this header field before delivering of the message Textual human readable summary of message MIME type of the message content, designed for email agent to display message properly Contain information about all mail servers that were involved in the message delivery Like In-Reply-To uses Message-ID(s), but designed to identify a thread of correspondence Keywords specified by sender Email address should be used when recipient replies to message This header indicates the email address of message s sender. The value of this header has to be same as From address of the SMTP Envelope The email address of recipient Actual sender of the message (generally, used address listed in the From)
Received Header and Spam The one of the most important headers Received: is deserved to be reviewed in more detailed way. This header significantly simplifies the fight against spam and spammers. When we receive unsolicited bulk email, our email agent program normally shows only the standard To:, From:, Subject:, and Date: headers, as for any other email. At the same time, the From: address may appear to be from someone we well know, or from some organization whose name we respect or trust. In reality these spoofed messages do not originate from the address that appears in the From: header. To see the real address message was sent from, it is necessary to control Received: filed, which tells us the route the message took when it was sent to us
Received Header and Spam Delivered-To: my.address@gmail.com Return-Path: <SRS0=M78ycc=RT=p3slh174.shr.phx3.secureserver.net= lindaadleen2@qafqaz.edu.az> Received: Received: by 10.220.162.197 with SMTP id w5cs344529vcx; Sun, 17 Oct 2010 05:24:20-0700 (PDT) Received: from bosmailscan05.eigbox.net ([10.20.15.5]) by bosmailout03.eigbox.net with esmtp (Exim) id 1P7SHj-0007rH-Qy for www.adamov@gmail.com; Sun, 17 Oct 2010 08:24:19-0400 Received: from p3slh174.shr.phx3.secureserver.net (localhost.localdomain [127.0.0.1]) by p3slh174.shr.phx3.secureserver.net (8.12.11.20060308/8.12.11) with ESMTP id o9hcof7n030063 for <aict2011@qafqaz.edu.az>; Sun, 17 Oct 2010 05:24:15-0700 Received: (from lindaadleen2@localhost) by p3slh174.shr.phx3.secureserver.net (8.12.11.20060308/8.12.11/Submit) id o9hcoevk030054; Sun, 17 Oct 2010 05:24:14-0700 Date: Sun, 17 Oct 2010 05:24:14-0700 Message-Id: <201010171224.o9HCOEvK030054@p3slh174.shr.phx3.secureserver.net> To: aict2011@qafqaz.edu.az Subject: xxxxxxxxxxxxxxxxx!!!!! From: vangelis@mail.ru
Email Physical Architecture and Protocols General architecture of email system and protocols:
How email delivery works? The email delivery is a whole process of massage transfer from the source to the destination. Let see the process step by step: 1. Using email agent the sender is submitted email for smith@b.com. 2. The SMTP service of the mail server received sender s message resolves the email domain b.com. To do so the mail server using DNS service (see DNS resolving ) asks the NS server of b.com for the MX record. The MX record specifies the mail server, which is destined to gets all emails with domain name b.com. The name of such a male server is in our example is mail.b.com. 3. Email is routed to the receiver s mail server mail.b.com. 4. The SMTP service of mail.b.com places the email into recipient s mailbox smith in the mail store. 5. The recipient checks for email for user smith@b.com using the POP3 service of his email agent. To be able to access to mailbox user has to pass authentication process of the POP3 service. 6. If the authentication module accepts eligibility of the user, the email is downloaded to the user s email agent.
How email delivery works? Detailed structure of email delivery:
SMTP, Simple Mail Transfer Protocol SMTP protocol was defined in 1982 by RFC 821 and renewed later many times. The last update was made in 2008 by RFC 5321. Advantages of SMTP: SMTP is very popular because it is supported by all platforms and most of vendors SMTP is simple, so it has low implementation and administration costs SMTP uses persistent connection, so it tries to resend failure messages many times Disadvantages of SMTP: SMTP does not support binary data SMTP does not support sender authentication SMTP does not have any embedded encryption mechanism
SMTP protocol commands Command HELO EHLO MAIL From RCPT To DATA QUIT VRFY Subject: Cc: Reply-To: Description This command starts the SMTP conversation. This command comes before the domain name of the sender Has same meaning as HELO, but is used in extended SMTP (ESMTP) protocol. The command initiates new mail transaction session. May have From argument with the sender s original email address, which is used as From: field of an SMPT envelope. This command identifies the recipient s email address, which is used as To: field of an SMPT envelope. Command has to be repeated with different arguments in order send message to multiple recipients. This command signifies that the email message body is following. The message body s end is terminated by a "." on a line by itself. This command releases a TCP/IP connection SMTP. To start another mail transaction session use command MAIL before using of QUIT command. This command is used by source SMTP server to request the destination SMTP about existence of a given email username. Some servers disable this feature for security purposes. These header lines can be included into the content of the command DATA. They are not the SMTP commands in their own right. They should be separated from a message body by an empty line.
POP3, Post Office Protocol (ver. 3) Command USER PASS STAT LIST LAST RETR TOP DELE RSET NOOP APOP QUIT UIDL Description Specifies valid username who has account in the POP3 server. This command follows immediately after USER command, and specifies password for user authentication in the POP3 server. Returns the number of messages and total size of mailbox. Lists message number and size of each message. If a message number is specified, returns the size of the specified message. Returns the message number of the last message not marked as read or deleted. Removed according to RFC 1725 Returns the full text of the specified message, and marks that message as read. Returns the specified number of lines from the specified mesasge number. Marks the specified message for deletion. Resets any messages which have been marked as read or deleted to the standard unread state. Returns a simple acknowledgement, without performing any function. Allows for a secure method of POP3 authentication, in which a cleartext password does not have to be sent. Instead, the client creates an MD5 encrypted string from the password, process id, and timestamp, and sends it to the POP3 server. Ends the POP3 session Returns "unique-id listing" consisting of characters in the range 0x21 to 0x7E. Server never reuses unique-id until the entity using the unique-id exists.
SMTP/POP3 Emulation by Telnet C:\> telnet smtp.mail.ru 25 220 smtp15.mail.ru ESMTP ready helo 501 5.5.4 Invalid argument helo ff 250 smtp15.mail.ru auth login...... 235 Authentication succeeded mail from:adamov_a@yahoo.com 501 sender address must match authenticated user 421 smtp15.mail.ru: SMTP command timeout - closing connection
SMTP/POP3 Emulation by Telnet telnet pop.mail.ru 110 user username_here pass password_here list retr number dele number quit