Computer Networks Lecture 7: Application layer: FTP and Marcin Bieńkowski Institute of Computer Science University of Wrocław Computer networks (II UWr) Lecture 7 1 / 23
Reminder: Internet reference model 5 4 DNS BSD sockets interface UDP FTP TCP 3 IP ICMP 2 1 Ethernet ARP / RARP Computer networks (II UWr) Lecture 7 2 / 23
Outlook 1 FTP 2 Proxy servers Computer networks (II UWr) Lecture 7 3 / 23
FTP FTP Computer networks (II UWr) Lecture 7 4 / 23
FTP FTP File Transfer Protocol Protocol for sending/receiving files to/from server. Server listens on port 21. After connection, client uses unix-like commands For data transmission an additional port is opened. presentation Computer networks (II UWr) Lecture 7 5 / 23
FTP Connection for data (e.g., downloading a file) Active mode FTP client chooses a port, informs server about it and starts listening on that port. FTP server connects to this port and sends the data to it. Problematic if the client is behind a firewall. Passive mode Client requests that the server should choose a port Server picks a port, informs the client about it and starts to listen. Client connects to this port and receives data from it. Computer networks (II UWr) Lecture 7 6 / 23
Computer networks (II UWr) Lecture 7 7 / 23
HyperText Transfer Protocol Protocol for sending files (as FTP) Very mature and complex protocol version 1.1 Uses different namespace than FTP Uses port 80. Computer networks (II UWr) Lecture 7 8 / 23
URL (Uniform Resource Locator) (1) URL: Identifies a given resource Consists of two colon-separated parts scheme: (http, ftp, mailto, file,...) resource-dependent part Examples: http://www.ii.uni.wroc.pl/index.html http://pl.wikipedia.org/wiki/url ftp://ftp.kernel.org/pub/index.html mailto:jan.kowalski@serwer.com Computer networks (II UWr) Lecture 7 9 / 23
URL (2) URL for schemes http, ftp Part after colon: // domain name optionally :port / resource identifier within a server Example: http://www.ii.uni.wroc.pl:80/ mbi/dyd/sieciw_10s/ Note: / in the identifier for denoting hierarchical structure. Resource identifier is not necessarily a path to the file! Computer networks (II UWr) Lecture 7 10 / 23
URL (2) URL for schemes http, ftp Part after colon: // domain name optionally :port / resource identifier within a server Example: http://www.ii.uni.wroc.pl:80/ mbi/dyd/sieciw_10s/ Note: / in the identifier for denoting hierarchical structure. Resource identifier is not necessarily a path to the file! Computer networks (II UWr) Lecture 7 10 / 23
request and reply How it works: User enters URL in the web browser, it is split into parts (we assume that scheme = http). Web browser establishes a connection with a web server on port 80. It sends a request (GET method) example. Server analyses the request, fetches appropriate file from the disk. Server set an appropriate reply header and MIME type. Server sends the file example. Server closes the connection (or waits for another request) Web browser performs an action depending on the MIME type (displays / uses plugin / uses external application). Computer networks (II UWr) Lecture 7 11 / 23
Keep-alive connections TCP connection hand-shake = large overhead. Usually web browser want to download many documents at once (e.g., html web page + pictures). /1.1 standard: connection is kept alive by default. Connection is closed if the requests contains Connection: close presentation Computer networks (II UWr) Lecture 7 12 / 23
MIME type For every file sent, the server should set Content-type field appropriately. Examples: text/plain text file text/html HTML page image/jpeg JPEG picture video/mpeg MPEG video application/msword DOC document application/pdf PDF document application/octet-stream sequence of bytes without an interpretation. Computer networks (II UWr) Lecture 7 13 / 23
replies Important types of replies: 200 OK 301 Moved Permanently 302 Found 304 Not Modified 401 Unauthorized 403 Forbidden 404 Not Found 500 Internal Server Error Computer networks (II UWr) Lecture 7 14 / 23
HTML was designed for sending hypertext = text + links to other texts. This role is played by HTML. + HTML = WWW. HTML standardization is a W3C task. Computer networks (II UWr) Lecture 7 15 / 23
HTML versions Quick look into history HTML 1.0, 2.0, mainly academic usage, content is most important. HTML 3.0, 3.2, 4.0, the emphasis is shifted to presentation (mixed with content) HTML 4.01 also known as everything is allowed, many sloppily written webpages the webbrowser has to cope not only with the complicated standard but also with dozens deviations from it. XHTML 1.0, based on XML, rigid structure, separates content and structure (HTML) from the presentation (CSS styles) rigid format = easier processing automatic processing of data on the webpage one HTML, many CSS = different versions for different recipients (PDA, phones, visibility impaired,...) Computer networks (II UWr) Lecture 7 16 / 23
Dynamic WWW Client-side dynamics Javascript: simple object-oriented interpreted language, code embedded in the HTML. Java applets, Flash, Silverlight application execution by different web browser plugins. Server-side dynamics URI may point to the program, whose output is HTML (+ header) CGI (Common Gateway Interface): standard allowing for execution of an arbitrary external program. Mechanisms integrated with the webserver (PHP, JSP, ASP, mod_perl,...) Forms, parameter passing (GET and POST methods) Cookies = session handling, itself is stateless. Computer networks (II UWr) Lecture 7 17 / 23
Dynamic WWW Client-side dynamics Javascript: simple object-oriented interpreted language, code embedded in the HTML. Java applets, Flash, Silverlight application execution by different web browser plugins. Server-side dynamics URI may point to the program, whose output is HTML (+ header) CGI (Common Gateway Interface): standard allowing for execution of an arbitrary external program. Mechanisms integrated with the webserver (PHP, JSP, ASP, mod_perl,...) Forms, parameter passing (GET and POST methods) Cookies = session handling, itself is stateless. Computer networks (II UWr) Lecture 7 17 / 23
Abuses of protocol Part of WWW services allows for non-human automatized access Instead of creating a new protocol use as transport. REST (Representational State Transfer) creating a web service using existing methods (GET, PUT, POST, DELETE) REST is not a standard, rather a philosophy. Easy to automatize, but also human-readable. Example services: ebay, Amazon, Twitter, Flickr,... Computer networks (II UWr) Lecture 7 18 / 23
Proxy servers Proxy servers Computer networks (II UWr) Lecture 7 19 / 23
Proxy servers Instead of direct connection to webserver, the browser may connect with the proxy server. What for? Limiting the traffic to the remote web pages web content is stored in proxy cache. Controlling access to web resources. Note: proxy server usually means WWW proxy, but other services also have proxy servers (ARP, DNS, DHCP, whois programming task!) Computer networks (II UWr) Lecture 7 20 / 23
Proxy servers Instead of direct connection to webserver, the browser may connect with the proxy server. What for? Limiting the traffic to the remote web pages web content is stored in proxy cache. Controlling access to web resources. Note: proxy server usually means WWW proxy, but other services also have proxy servers (ARP, DNS, DHCP, whois programming task!) Computer networks (II UWr) Lecture 7 20 / 23
Proxy server Proxy servers How it works: It listens usually on port 8080. If its cache does not contain the requested page or if it is outdated, then proxy connects to a given page, stores the reply in the cache. Proxy returns the answer to the client. Computer networks (II UWr) Lecture 7 21 / 23
Proxy server Proxy servers How it works: It listens usually on port 8080. If its cache does not contain the requested page or if it is outdated, then proxy connects to a given page, stores the reply in the cache. Proxy returns the answer to the client. Computer networks (II UWr) Lecture 7 21 / 23
Proxy server, cont. Proxy servers How the proxy checks whether the page in cache is up to date: WWW server sets a field Expires: in the reply header after this date, proxy evicts the page from the cache. WWW server may set the field Pragma: no-cache and/or Cache-Control: no-cache this page will not be stored in proxy cache at all. Client may set these fields in the request proxy will neglect the contents of its cache. In the remaining cases: heuristic based on the Last-modified: field. Computer networks (II UWr) Lecture 7 22 / 23
Anonymous proxy servers Proxy servers Normal proxy server adds its own fields to our request, e.g., X-Forwarded-For: (our IP address) Via: (proxy IP address) There are anonymous proxy server which do not add these headers presentation. Computer networks (II UWr) Lecture 7 23 / 23