Huffman Code. Information Theory



Similar documents
Evolution of the WWW. Communication in the WWW. WWW, HTML, URL and HTTP. HTTP Abstract Message Format. The Client/Server model is used:

Evolution of the WWW. Communication in the WWW. WWW, HTML, URL and HTTP. HTTP - Message Format. The Client/Server model is used:

Application Protocols in the TCP/IP Reference Model. Application Protocols in the TCP/IP Reference Model. DNS - Concept. DNS - Domain Name System

Application Example: WWW. Communication in the WWW. WWW, HTML, URL and HTTP. Loading of Web Pages. The Client/Server model is used in the WWW

Application Protocols in the TCP/IP Reference Model

Application Protocols in the TCP/IP Reference Model. Application Protocols in the TCP/IP Reference Model. DNS - Domain Name System

Chapter 4: Application Protocols

Naming. Name Service. Why Name Services? Mappings. and related concepts

2- Electronic Mail (SMTP), File Transfer (FTP), & Remote Logging (TELNET)

1 Introduction: Network Applications

2- Electronic Mail (SMTP), File Transfer (FTP), & Remote Logging (TELNET)

CPSC Network Programming. , FTP, and NAT.

Networking Applications

The Application Layer. CS158a Chris Pollett May 9, 2007.

INTERNET DOMAIN NAME SYSTEM

Applications and Services. DNS (Domain Name System)

CONTENT of this CHAPTER

DATA COMMUNICATOIN NETWORKING

Protocolo FTP. FTP: Active Mode. FTP: Active Mode. FTP: Active Mode. FTP: the file transfer protocol. Separate control, data connections

Internet Technology 2/13/2013

Motivation. Domain Name System (DNS) Flat Namespace. Hierarchical Namespace

Domain Name System (DNS)

Electronic Mail

FTP and . Computer Networks. FTP: the file transfer protocol

Application-layer Protocols and Internet Services

Lecture 2 CS An example of a middleware service: DNS Domain Name System

FTP: the file transfer protocol

Course Overview: Learn the essential skills needed to set up, configure, support, and troubleshoot your TCP/IP-based network.

Understanding TCP/IP. Introduction. What is an Architectural Model? APPENDIX

CS43: Computer Networks . Kevin Webb Swarthmore College September 24, 2015

Domain Name System WWW. Application Layer. Mahalingam Ramkumar Mississippi State University, MS. September 15, 2014.

Terminology. Internet Addressing System

Chakchai So-In, Ph.D.

Network Services. SMTP, Internet Message Format. Johann Oberleitner SS 2006

TCP/IP and the Internet

Chapter 2 Application Layer. Lecture 5 FTP, Mail. Computer Networking: A Top Down Approach

, SNMP, Securing the Web: SSL

How do I get to

Networking Test 4 Study Guide

Introduction to Computer Networks

Computer Networks 1 (Mạng Máy Tính 1) Lectured by: Dr. Phạm Trần Vũ MEng. Nguyễn CaoĐạt

FTP: the file transfer protocol

Application Layer. CMPT Application Layer 1. Required Reading: Chapter 2 of the text book. Outline of Chapter 2

CS3250 Distributed Systems

Remote login (Telnet):

Basic Networking Concepts. 1. Introduction 2. Protocols 3. Protocol Layers 4. Network Interconnection/Internet

First Workshop on Open Source and Internet Technology for Scientific Environment: with case studies from Environmental Monitoring

Introduction to LAN/WAN. Application Layer (Part II)

1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment?

Domain Name System Richard T. B. Ma

Networks University of Stirling CSCU9B1 Essential Skills for the Information Age. Content

1. The Web: HTTP; file transfer: FTP; remote login: Telnet; Network News: NNTP; SMTP.

Protocols and Architecture. Protocol Architecture.

CITS1231 Web Technologies. Client, Server, the Internet, and the Web

Internet Technologies Internet Protocols and Services

DNS Domain Name System

SWE 444 Internet and Web Application Development. Introduction to Web Technology. Dr. Ahmed Youssef. Internet

The OSI and TCP/IP Models. Lesson 2

The Application Layer: DNS

Internetworking with TCP/IP Unit 10. Domain Name System

Oct 15, Internet : the vast collection of interconnected networks that all use the TCP/IP protocols

Ethernet. Ethernet. Network Devices

Sending MIME Messages in LISTSERV DISTRIBUTE Jobs

Basic Network Configuration

The Domain Name System (DNS)

TCP/IP works on 3 types of services (cont.): TCP/IP protocols are divided into three categories:

Domain Name System. CS 571 Fall , Kenneth L. Calvert University of Kentucky, USA All rights reserved

Packet Capture. Document Scope. SonicOS Enhanced Packet Capture

Computer Networks CS321

Computer Networks/DV2 Lab

CSCI-1680 SMTP Chen Avin

A host-based firewall can be used in addition to a network-based firewall to provide multiple layers of protection.

Internet Concepts. What is a Network?

Connecting with Computer Science, 2e. Chapter 5 The Internet

1 Data information is sent onto the network cable using which of the following? A Communication protocol B Data packet

Transport and Network Layer

Objectives of Lecture. Network Architecture. Protocols. Contents

The Domain Name System

Slide 1 Introduction cnds@napier 1 Lecture 6 (Network Layer)

Digital Communication in the Modern World Application Layer cont. DNS, SMTP

01 - Introduction. Internet Technology. MSc in Communication Sciences Program in Technologies for Human Communication.

Cape Girardeau Career Center CISCO Networking Academy Bill Link, Instructor. 2.,,,, and are key services that ISPs can provide to all customers.

Computer Networks. Instructor: Niklas Carlsson

HTTP. Internet Engineering. Fall Bahador Bakhshi CE & IT Department, Amirkabir University of Technology

DNS : Domain Name System

Telematics. 13th Tutorial - Application Layer Protocols

Network: several computers who can communicate. bus. Main example: Ethernet (1980 today: coaxial cable, twisted pair, 10Mb 1000Gb).

Communication Systems Network Applications - Electronic Mail

Network Technologies

Network Programming TDC 561

Network Security TCP/IP Refresher

Internet-Praktikum I Lab 3: DNS

19 Domain Name System (DNS)

Overview of TCP/IP. TCP/IP and Internet

E-Commerce Security. The Client-Side Vulnerabilities. Securing the Data Transaction LECTURE 7 (SECURITY)

Unit 4. Introduction to TCP/IP. Overview. Description. Unit Table of Contents

Names & Addresses. Names & Addresses. Names vs. Addresses. Identity. Names vs. Addresses. CS 194: Distributed Systems: Naming

Technical Support Information Belkin internal use only

Protocols. Packets. What's in an IP packet

CS 348: Computer Networks. - DNS; 22 nd Oct Instructor: Sridhar Iyer IIT Bombay

Transcription:

Layer 5: Session Layer Application Protocols Chapter 3: Internet Protocols Chapter 2: Computer Networks OSI Reference Model Application Layer Presentation Layer Session Layer Transport Layer Network Layer Data Link Layer Physical Layer Layer 5 is the lowest of the application orientated layers; it controls dialogs, i.e. the exchange of related information: Synchronization of partner instances by synchronization points: data can have been transferred correctly but have to be nevertheless partially retransmitted. (Crash of a sender in the mid of the data transmission process.) Therefore, synchronization points can be set on layer 5 at arbitrary times of the communication process. If a connection breaks down, not the entire data transmission has to be repeated; the transmission can remount at the last synchronization point. Dialog management during half duplex transmission: layer 5 controls the order in which the communication partners are allowed to send their data. Connection establishment, data transmission, and connection termination for layer 5 to 7. Use of different tokens for the assignment of transmission authorizations, for connection termination, and for the setting synchronization points. Page 1 Page 2 Layer 6: Presentation Layer Codes Layer 6 hides the use of different data structures or differences in their internal representation The same meaning of the data with the sender and the receiver is guaranteed Adapt character codes ASCII 7-bit American Standard Code for Information Interchange EBCDIC 8-bit Extended Binary Coded Digital Interchange Code Adapt number notation 32/40/56/64 bits Little Endian (byte 0 of a word is right) vs. Big Endian (byte 0 is left) Abstract Syntax Notation One, ASN.1 as transfer syntax Substantial tasks of layer 6: 1.) Negotiation of the transfer syntax 2.) Mapping of the own data to the transfer syntax 3.) and further data compression, data encryption (source coding) Page 3 Source coding generally converts the representation of messages into a sequence of code words Efficient coding Remove redundancies Data compression Codes are meaningful only if they are clearly decodable, i.e. each sequence of characters, which consists of code words, can be divided definitely into a sequence of code words. In communication, immediately decodable codes are important, i.e. character sequences from code words can be decoded definitely from the beginning of the character sequence word by word, without considering following characters. Prefix code: no code word may be a prefix of another. Example: C = {0, 10, 011, 11111} is a definite code, but not immediately decodable To each definite code, an immediately decodable code exists, which is not longer. Page 4

Information Theory What is information? Definition: The mean information content (entropy) of a character is defined by p log p with N - Number of different characters p i - Frequency of a character i (i=1,, N) A -Basis In a transferred sense: The entropy indicates, how surprised we are, which character comes next. N i= 1 i a i = N p log Example 1: Given: 4 characters All N=4 characters are equivalent frequently (p i = 0.25 i) 4 Entropy: 0,25log 4 = log 4 = 2[ bit] 2 2 = i 1 There does not exist a better coding as with 2 bits per character i= 1 i a 1 p i Example 2: Given: 4 characters The first character has the frequency p 1 =1, thus is p 2 = p 3 = p 4 = 0 Entropy: 1 1 log 1+ lim3* p log = 0 + 0 = 0 2 a p 0 p The entropy is 0 [bit], i.e. because anyway only character 1 is transferred, we did not even code and transfer it. Page 5 Huffman Code The entropy indicates how many bits at least are needed for coding. A good approximation to that theoretical minimum (for mean code word length) is the use of a binary tree. The characters which are to be coded are at the leafs. Huffman code (a prefix code) Precondition: the frequency of the occurrence of all characters is well-known. Principle: more frequently arising characters are coded shorter than rarer ones 1.) List all characters as well as their frequencies 2.) Select the two list elements with the smallest frequency and remove them from the list 3.) Make them the leafs of a tree, whereby the probabilities for both elements are being added; place the tree into the list 4.) Repeat steps 2 and 3, until the list contains only one element 5.) Mark all edges: Father left son with 0 Father right son with 1 The code words result from the path from the root to the leafs Page 6 Huffman Code - Example The characters A, B, C, D and E are given with the probabilities p(a) = 0.27, p(b) = 0.36, p(c) = 0.16, p(d) = 0.14, p(e) = 0.07 Entropy: 2,13 2 4 p(ced) = 0.37 0 1 p(c) = 0.16 1 p(ed) = 0.21 p(adceb) = 1.00 0 1 0 1 p(e) = 0.07 p(d) = 0.14 3 p(ab) = 0.63 0 1 p(a) = 0.27 p(b) = 0.36 Resulting Code Words: w(a) = 10, w(b) = 11, w(c) = 00, w(d) = 011, w(e) = 010 Frequency of Characters and Character Sequences (English language) Letters Digrams Trigrams E 13,05 TH 3,16 THE 4,72 T 9,02 IN 1,54 ING 1,42 O 8,21 ER 1,33 AND 1,13 A 7,81 RE 1,30 ION 1,00 N 7,28 AN 1,08 ENT 0,98 I 6,77 HE 1,08 FOR 0,76 R 6,64 AR 1,02 TIO 0,75 S 6,46 EN 1,02 ERE 0,69 H 5,85 TI 1,02 HER 0,68 D 4,11 TE 0,98 ATE 0,66 L 3,60 AT 0,88 VER 0,63 C 2,93 ON 0,84 TER 0,62 F 2,88 HA 0,84 THA 0,62 U 2,77 OU 0,72 ATI 0,59 M 2,62 IT 0,71 HAT 0,55 P 2,15 ES 0,69 ERS 0,54 Y 1,51 ST 0,68 HIS 0,52 W 1,49 OR 0,68 RES 0,50 G 1,39 NT 0,67 ILL 0,47 B 1,28 HI 0,66 ARE 0,46 V 1,00 EA 0,64 CON 0,45 K 0,42 VE 0,64 NCE 0,43 X 0,30 CO 0,59 ALL 0,44 J 0,23 DE 0,55 EVE 0,44 Q 0,14 RA 0,55 ITH 0,44 Z 0,09 RO 0,55 TED 0,44 Codes like the Huffman code are not limited necessarily to individual characters. It can be more meaningful (depending on the application) to code directly whole character strings example: the English language. Page 7 Page 8

Arithmetic Coding Arithmetic Coding: Example Characteristics: Achieves optimality (coding rate) as the Huffman coding Difference to Huffman: the entire data stream has an assigned probability, which consists of the probabilities of the contained characters. Coding a character takes place with consideration of all previous characters. The data are coded as an interval of real numbers between 0 and 1. Each value within the interval can be used as code word. The minimum length of the code is determined by the assigned probability. Disadvantage: the data stream can be decoded only as a whole. 0 0 0.35 0.35 Code data ACAB with p A = 0.5, p B = 0.2, p C = 0.3 p A = 0.5 pb = 0.2 p C = 0.3 0.5 p AA = 0.25 p AB = 0.1p AC = 0.15 p BA p BB p BC p CA p CB p CC 0.25 0.35 0.5 p ACA = 0.075 p ACB = 0.03 p ACC = 0.045 0.425 0.7 1 0.6 0.68 0.7 0.85 0.91 0.455 p ACAA = 0.0375 p ACAB = 0.015 p ACAC = 0.0225 0.3875 0.4025 1 0.5 0.425 ACAB can be coded by each binary number from the interval [0.3875, 0.4025), rounded up to log 2 (p ACAB ) = 6.06 i.e. 7 bit, e.g. 0.0110010 Page 9 Page 10 Layer 7: Application Layer Application Protocols in the TCP/IP Reference Model File Transfer E-Mail Network Management Collection of often used communication services Identification of communication partners Detection of the availability of communication partners Authentication Negotiation of the grade of the transmission quality Synchronization of cooperating applications Internet protocols WWW HTTP Virtual Terminal Name Service File Transfer FTP Telnet SMTP DNS SNMP TFTP TCP UDP ARP RARP IP ICMP IGMP Layer 1/2 Ethernet Token Ring Token Bus Wireless LAN Page 11 Page 12

Application Protocols in the TCP/IP Reference Model DNS - Domain Name System Protocols of the application layer are common communication services Protocols of the application layer are defined for special purposes and specify Thetypes of the sent messages Thesyntax of the message types Thesemantics of the message types Rules for definition, when and how an application process sends a message resp. responses to it Top level Domain de rwth-aachen IP addresses are difficult to remember for humans, but computers can deal with them perfectly. Symbolic s are simpler for humans to handle, but computers can unfortunately not deal with them. Usually: Client/ structure. Processes on the application layer are using TCP(UDP)/IP-Sockets informatik metatron.informatik.rwth-aachen.de 137.226.12.221 Page 13 Page 14 DNS - Concept DNS - Architecture 1. DNS manages the mapping of logical computer s to IP addresses (and further services) 2. DNS is a distributed database, i.e. the individual segments are subject to local control 3. The structure of the used space of the database shows the administrative organization of the Internet 4. Data of each local area are available by means of a Client/ architecture in the entire network 5. Robustness and speed of the system are being achieved by replication and caching of the naming data 6. Main components: Name : which manages information about a part of the database Resolver: Client which requests naming information from the Resolver Resolver Request Response Name Name Program Master Files Request Response Update Resolver Shared Database Name References References Requests Responses Responses Requests Administrative Requests Administrative Responses Remote Name Remote Resolver Remote Name Page 15 Page 16

Structure of the Database For structuring of all information: the database can be represented as a tree Each node of the tree is marked with a label, which identifies it relatively to the father node Each (internal) node is root of a sub-tree Each of those sub-trees represents a domain Each domain can be divided into sub-domains Domain Names The of a domain consists of the sequence of labels (separated by. ) beginning with the root of the domain and going up to the root of the whole tree In the leaf nodes the IP addresses associated with the s given by the label sequence are being stored Domain de com edu gov mil se de rwth-aachen Sub-domain Oxford rwth-aachen informatik cs Generic informatik Countries logical : metatron.informatik.rwth-aachen.de metatron Associated IP address: 137.226.12.221 Page 17 Page 18 Administration of a Domain Each domain can be managed by another organization The responsible organization can split a domain into sub-domains and delegate the responsibility for them to other organizations The father domain manages pointers to the roots of the sub-domains to be able to forward requests to them The of a domain corresponds to the domain of the root node Index of the Database The s of the domains serve as index for the database Each computer in the network has a domain which refers to further information concerning the computer Managed by the Network Information Center edu com gov mil ca or nv Berkeley oakland ba rinkon la The data associated with a domain are stored in socalled Resource Records (RR) Managed by the UC Berkeley IP address: 192.2.18.44 (domain berkeley.edu) Page 19 Page 20

Domain Name Aliases Computers can have one or more secondary s, so-called Domain Name Aliases Aliases are pointers of one domain to another one (canonical domain ) us Name Space The reverse tree represents the Domain Name Space The depth of the tree is limited to 127 levels Domain s can have up to 63 characters A label of the length 0 is reserved for the root node ( ) TheFully Qualified Domain Name (FQDN) is the absolute domain, which is declared with reference to the root of the tree ba ca or la nv mailhub Example: informatik.rwth-aachen.de. Domain s which are declared not with reference to the root of the tree, but with reference to another domain, are called relative domain s oakland rinkon IP address: 192.2.18.44 No IP address is stored, but a logical : rinkon.ba.ca.us. Page 21 Page 22 Domains Top Level Domains A domain consists of all computers whose domain is within the domain Leafs of the tree represent individual computers and refer to network addresses, hardware information and mail routing information Internal nodes of the tree can describe both a computer and a domain Domains are denoted often relatively or regarding their level: Top-Level Domain: child of the root node First Level Domain: child of the root node (top-level domain) Second Level Domain: child of a first level of domain etc. Page 23 Originally the space was divided into seven top-level domains: 1. com: commercial organizations 2. edu: educational organizations 3. gov: government organizations 4. mil: military organizations 5. net: network organizations 6. org: non-commercial organizations 7. int: international organizations Additionally, each country got its own top-level domain The space was extended in the meantime by further top-level domains Within the individual top-level domains, different conventions for structuring are given: Australia: edu.au, com.au, etc. UK: co.uk (for commercial organizations), ac.uk (for academic organizations), etc. Germany: completely unstructured Page 24

Name s and Zones Domains and Zones Domain and zone are different concepts: Information about the space are stored in s Name s manage the whole information for a certain part of the space; this part is called zone com edu edu zone org The information about a zone is loaded either from a file or from another The has the authority for the zone A can be responsible for several zones berkeley berkeley.edu zone nwu purdue purdue.edu zone edu domain Delegation Zones are (except within the lowest levels of the tree) smaller than domains, therefore s have to manage less information Page 25 Page 26 Zones Name Resolution There are no guidelines how domains are divided into zones. Each domain can select a dividing for itself. Generally mapping of s to addresses The term Name Resolution also designates the process, in which a searches the space for data, for which he is not responsible For the searching, a needs the domain and the addresses of the root s A can ask a root for each in the space Root s know the responsible s for each top-level domain On request, a root can return s and addresses of s responsible for the top-level domain of the searched The top level again manages references to s which are responsible for the second level domain If additional information is missing, each search begins with the root s Some zones (e.g. edu) do not manage IP addresses. As information they only store references to other zones Page 27 Page 28

Iterative Name Resolution Recursive Resolution Request Name Name Response Resolver Resolver Request for address of girigiri.gbrmpa.gov.au Reference to au Request for address of girigiri.gbrmpa.gov.au Reference to gov.au Request for address of girigiri.gbrmpa.gov.au Reference to gbrmpa.gov.au Request for address of girigiri.gbrmpa.gov.au Address of girigiri.gbrmpa.gov.au root root au au gov.au gov.au gbrmpa.gov.au gbrmpa.gov.au au gov nz edu sg sa ips gbrmpa Distinction between recursive and iterative requests resp. recursive and iterative resolution In case of recursive resolution, a resolver sends a recursive inquiry to a The must answer either with the searched information or an error message, i.e. the may not refer to another If the addressed is not responsible for the searched information, it must contact other s The can start a recursive or iterative inquiry; usually it will use an iterative inquiry With the inquiry, the tries to shorten the resolution process by directing the inquiry to the most suitable regarding the searched information (i.e. if known, a on a lower level is contacted instead of the root ) Page 29 Page 30 Root Name Mapping of Addresses to Names Requests to which a cannot answer, are handed upward in the tree Name on the upper levels are heavily loaded Inquiries, which go into another zone, often run over the root Thus, the root must always be available Therefore: replication - there are 13 instances of the root, more or less distributed over the whole world Problem: very central placement of the s! Information in the database is indicated by s Mapping of a to an address is simple Mapping of an address onto a is more difficult to realize (complete search of space) Solution: Place a special area in the space, which uses addresses as label; the in-addr.arpa domain Nodes in this domain are marked in accordance with the usual notation for IP addresses (four octets separated by points) The in-addr.arpa domain has 256 sub-domains, each of which again having 256 sub-domains, On the fourth level, the appropriate resource records are assigned with the octet, which refers to the domain of the computer or the network with the indicated address The IP address appears backwards because it is read beginning with the leaf node (IP address: 15.16.192.152 => sub-domain: 152.192.16.15.in-addr.arpa) Page 31 Page 32

Mapping of Addresses to Names Caching & Time to Live arpa in-addr 0 15 255 0 16 255 0 192 255 Caching is the process of buffering information in a not responsible for those information. In further requests these information are present and the resolution process can be speeded up Stored are not only information about the requested hosts, but additionally all information about other s used in the resolution process TheTime to Live (TTL) indicates how long data are allowed to be buffered The TTL guarantees that no outdated information is used Small TTL gives a high consistency Large TTL gives a faster resolution of a 0 255 152 host winnie.corp.hp.com Page 33 Page 34 DNS Protocol DNS defines only one protocol format, which is used both for inquiries and for responses: Identification: 16 bits for the definite identification of an inquiry, to match requests and responses Flag: 4 Bit, marking of (1) request/response, (2) authorative/not authorative, (3) iterative/recursive, (4) recursion possible Number of : Indication of the contained number of inquiries resp. data records Questions: Names to be resolved Answers: Resource records to the previous inquiry Authority: Identification of passed responsible s Additional information: further data to the inquiry. If the searched is only an alias, the belonging resource record for the correct is placed here Identification Number of Questions Number of Authority RR Flag Number of Answers RR Number of Additional RR Questions (variable number of RR) Answers (variable number of RR) Authority (variable number of RR) Additional information (variable number of RR) Page 35 Evolution of the WWW World Wide Web (WWW) Access to linked documents, which are distributed over several computers in the Internet History of the WWW Origin: 1989 in the nuclear research laboratory CERN in Switzerland. Developed to exchange data, figures, etc. between a large number of geographically distributed project partners via Internet. First text-based version in 1990. First graphic interface (Mosaic) in February 1993, developed on to Netscape, Internet Explorer Standardization by the WWW consortium (http://www.w3.org). Page 36

Communication in the WWW WWW, HTML, URL and HTTP The Client/ model is used: Client (a Browser) Presents the actually loaded WWW page Permits navigating in the network (e.g. through clicking on a hyperlink) Offers a number of additional functions (e.g. external viewer or helper applications). Usually, a browser can also be used also for other services (e.g. FTP, e-mail, news, ). Process which manages WWW pages. Is addressed by the client e.g. through indication of an URL (Uniform Resource Locator = logical address of a web page). The sends the requested page (or file) back to the client. WWW stands for World Wide Web and means the world-wide cross-linking of information and documents. The standard protocol used between a web and a web client is the HyperText Transfer Protocol (HTTP). uses the TCP port 80 defines the allowed requests and responses is an ASCII protocol Each web page is addressed by a unique URL (Uniform Resource Locator) (e.g. http://www-i4.informatik.rwth-aachen.de/education/tcpip). The standard language for web documents is the HyperText Markup Language (HTML). Page 37 Page 38 HTTP - Message Format Loading of Web Pages command URL GET http://./path/file.type protocol HTTP domain path file DNS Browser PC TCP/IP network WWW Browser asks DNS for the IP address of the DNS answers GET http:// www.informatik.rwth-aachen.de / info / general.html Instructions on a URL are GET: Load a web page HEAD: Load only the header of a web page PUT: Store a web page on the POST: Append something to the request passed to the web DELETE: Delete a web page Page 39 Browser opens a TCP connection to port 80 of the computer Browser sends the command GET /info/general.html WWW sends back the file general.html Connection is terminated Page 40

Loading of Web Pages Example: Call of the URL http://www.informatik.rwth-aachen.de/material/general.html 1. The Browser determines the URL (which was clicked or typed). 2. The Browser asks the DNS for the IP address of the www.informatik.rwthaachen.de. 3. DNS answers with 137.226.116.241. 4. The browser opens a TCP connection to port 80 of the computer 137.226.116.241 5. Afterwards, the browser sends the command GET /material/general.html 6. The WWW sends back the file general.html. 7. The connection is terminated. 8. The browser analyzes the WWW page general.html and presents the text. 9. If necessary, each picture is reloaded over a new connection to the (The address is included in the page general.html in form of an URL). Note! Step 9 applies only to HTTP/1.0! With the newer version HTTP/1.1 all referenced pictures are loaded before the connection termination (more efficiently for pages with many pictures). HTTP Request Header method sp URL sp version cr lf header field : value cr lf header field : value cr lf : : header field : calue cr lf cr lf Data sp: space cr/lf: carriage return/line feed Request line: necessary part, e.g. GET path/file.type Header lines: optionally, further information to the host/document, e.g. Host: www.rwth-aachen.de Accept-language: fr -agent: Opera /6.0 Entity Body: optionally. Further data, if the Client transmits data (POST method) Page 41 Page 42 HTTP Response Header Proxy version sp status code sp phrase cr lf header field : value cr lf header field : value cr lf : : header field : value cr lf cr lf Data Entity Body: inquired data HEAD method: the answers, but does not transmit the inquired data (debugging) Status LINE: status code and phrase indicate the result of an inquiry and an associated message, e.g. 200 OK 400 Bad Request 404 Not Found Groups of status messages: 1xx: Only for information 2xx: Successful inquiry 3xx: Further activities are necessary 4xx: Client error (syntax) 5xx: error Page 43 A Proxy is an intermediate entity used by several browsers. It takes over tasks of the browsers (complexity) and s for more efficient page loading! HTTP Browser Proxy Internet e.g. HTTP Caching of WWW pages A proxy temporarily stores the pages loaded by browsers. If a page is requested by a browser which already is in the cache, the proxy controls whether the page has changed since storing it. If not, the page can be passed back from the cache. If yes, the page is normally loaded from the and again stored in the cache, replacing the old version. Support when using additional protocols A browser enables also access to FTP, News, Gopher or telnet s etc. Instead of implementing all protocols in the browser, it can be realized the proxy. The proxy then speaks HTTP with the browser and e.g. FTP with a FTP. Integration into a Firewall The proxy can deny the access to certain web pages (e.g. in schools). Page 44

Electronic Mail: E-Mail Early systems A simple file transmission took place, with the convention that the first line contains the address of the receiver of the file. Problems E-Mail to groups, structuring of the e-mail, delegation of the administration to a secretary, file editor as user interface, no mixed media Solution X.400 as standard for e-mail transfer. This specification was however too complex and badly designed. Generally accepted only became a simpler system, cobbled together by a handful of computer science students : the Simple Mail Transfer Protocol (SMTP). Electronic Mail: E-Mail An e-mail system generally consists of two subsystems: (UA, normal e-mail program) Usually runs on the computer of the user and helps during the processing of e-mails Creation of new and answering of old e-mail Receipt and presentation of e-mail Administration of received e-mail Message Transfer (MTA, e-mail ) Usually runs in the background (around the clock) Delivery of e-mail which is sent by s Intermediate storage of messages for users or other Message Transfer s Message Transfer Internet Page 45 Page 46 Structure of an E-Mail E-Mail Header For sending an e-mail, the following information is needed from the user: Message (usually normal text + attachments, e.g. word file, GIF image ) Destination address (in general in the form mailbox@location, e.g. thissen@informatik.rwth-aachen.de) Possibly additional parameters concerning e.g. priority or security E-Mail formats: two used standards RFC 2822 MIME (Multipurpose Internet Mail Extensions) With RFC 2822 an e-mail consists of a simple envelope (created by the Message Transfer based on the data in the e-mail header), a set of header fields (each one line ASCII text), a blank line, and the actual message (Message Body). Header Meaning To: Address of the main receiver (possibly several receivers or also a mailing list) Cc: Carbon copy, e-mail addresses of less important receivers Bcc: Blind carbon copy, a receiver which is not indicated to the other receivers From: Person who wrote the message Sender: Address of the actual sender of the message (possibly different to From person) Received: One entry per Message Transfer on the path to the receiver Return Path: Path back to the sender (usually only e-mail address of the sender) Date: Transmission date and time Reply to: E-Mail address to which answers are to be addressed Message-Id: Clear identification number of the e-mail (for later references) In-Reply-to: Message-Id of the message to which the answer is directed References: Other relevant Message-Ids Subject: One line to indicate the contents of the message (is presented the receiver) Page 47 Page 48

E-Mail Header RFC 822: only suitably for messages of pure ASCII text without special characters. Nowadays demanded additionally: E-Mail in languages with special characters (e.g. French or German) E-Mail in languages not using the Latin alphabet (e.g. Russian) E-Mail in languages not at all using an alphabet (e.g. Japanese) E-Mail not completely consisting of pure text (e.g. audio or video) MIME keeps the RFC-2822 format, but additionally defines a structure in the Message Body (by using additional headers), and coding rules for non-ascii characters. Header Meaning MIME-Version: Content-Description: Content-Id: Content-Transfer- Encoding: Content-Type: Used version of MIME is marked String which describes the contents of the message Clear identifier for the contents Coding which was selected for the contents of the email (some networks understand e.g. only ASCII characters). Examples: base64, quoted-printable Type/Subtype regarding RFC 1521, e.g. text/plain, image/jpeg, multi-part/mixed MIME MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY= "8323328-2120168431-824156555=:325" --8323328-2120168431-824156555=:325 Content-Type: TEXT/PLAIN; charset=us-ascii A picture is in the appendix --8323328-2120168431-824156555=:325 Content-Type: IMAGE/JPEG; ="picture.jpg" Content-Transfer-Encoding: BASE64 Content-ID: <PINE.LNX.3.91.960212212235.325B@localhost> Content-Description: /9j/4AAQSkZJRgABAQEAlgCWAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQIBAQEBA QIBAQECAgICAgICAgIDAwQDAwMDAwICAwQDAwQEBAQEAgMFBQQEBQQEBAT/ 2wBDAQEBAQEBAQIBAQIEAwIDBAQEBA [ ] KKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAoooo AKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiig AooooAD//Z ---8323328-2120168431-824156555=:325 Page 49 Page 50 E-Mail over POP3 and SMTP E-Mail over POP3 and SMTP Simple Mail Transfer Protocol (SMTP) Sending e-mails over a TCP connection (port 25) SMTP is a simple ASCII protocol Without checksums, without encryption Receiving machine is the and begins with the communication If the is ready for receiving, it signals this to the client. This sends the information from whom the e-mail comes and who the receiver is. If the receiver is known to the, the client sends the message, the confirms the receipt. Post Office Protocol version 3 (POP3) Get e-mails from the over a TCP connection, port 110 Commands for logging in and out, message download, deleting messages on the (maybe without transferring them to the client) Only copies e-mails of the remote to the local system Message Transfer Message Transfer Internet Internet 1: writes an e-mail Client 1 (UA 1): formats the e-mail, produces the receiver list, and sends the e-mail to its mail (MTA 1) 1 (MTA 1): Sets up a connection to the SMTP (MTA 2) of the receiver and sends a copy of the e-mail (MTA 2): Produces the header of the e-mail and places the e-mail into the appropriate mailbox Client 2 (UA 2): sets up a connection to the mail and authenticates itself with user and password (unencrypted!) (MTA 2): sends the e-mail to the client Client 2 (UA 2): formats the e-mail 2: reads the e-mail Message Transfer Internet Message Transfer SMTP POP3 SMTP Page 51 Page 52

SMTP - Command Sequence Communication between partners (from abc.com to beta.edu) in text form of the following kind: S: 220 <beta.edu> Service Ready /* Receiver is ready/* C: HELO <abc.com> /* Identification of the sender/* S: 250 <beta.edu> OK /* announces itself */ C: MAIL FROM:<Krogull@abc.com> /* Sender of the e-mail */ S: 250 OK /* Sending is permitted */ C: RCPT TO:<Bolke@beta.edu> /* Receiver of the e-mail */ S: 250 OK /* Receiver known */ C: DATA /* The data are following */ S: 354 Start mail inputs; end with <crlf>.<crlf> on a line by itself C: From: Krogull@. <crlf>.<crlf> /* Transfer of the whole e-mail, including all headers */ S: 250 OK C: QUIT /* Terminating the connection */ S: 221 <beta.edu> Closing S =, receiving MTA / C = Client, sending MTA Page 53 POP3 Get e-mails from the by means of POP3: Client (UA) PC TCP/IP network TCP connection port 110 Greetings Commands Replies POP3 (MTA) Minimal protocol with only two command types: Copy e-mails to the local computer Delete e-mails from the Authorizing phase: USER PASS string Transaction phase: STAT LIST [msg] RETR msg DELE msg NOOP RSET QUIT Page 54 POP3 Protocol IMAP as POP3 Variant Authorizing phase user identifies the user pass is its password +OK or -ERR are possible answers Transaction phase list for the listing of the message numbers and the message sizes retr to requesting a message by its number dele deletes the appropriate message S: +OK POP3 ready C: user alice S: +OK C: pass hungry S: +OK user successfully logged in C: list S: 1 498 S: 2 912 S:. C: retr 1 S: <message 1 contents> S:. C: dele 1 C: retr 2 S: <message 2 contents> S:. C: dele 2 C: quit S: +OK Page 55 Enhancement of POP3: IMAP (Interactive Mail Access Protocol) TCP connection over port 143 E-Mails are not downloaded and stored locally, but remain on the The client performs all actions remotely. This is suitable for users who need access to their e-mails from different hosts Protocols are more complex than with POP3: set up and manage remote mailboxes Meanwhile also many operators of web pages offer email services: gmx, web.de, yahoo, Here finally again HTTP serves as protocol for the access to the e-mails. The management is similar as with IMAP, only that the client is integrated into the web. Page 56

Conclusion IP is the core protocol which enables the Internet IPv4 still in use Lots of helper protocols, e.g. RSVP for connection-oriented communication, DHCP for mobile devices, IPv6 would make easier lots of things, but migration is hard TCP and UDP as two different transport protocols TCP is connection-oriented, UDP is connectionless Several other protocols like RTP to fill the gap between them for today s needs Application protocols Nothing to do with physical communication Dealing more with contents of communication All using on the client/ paradigm Page 57