CS3250 Distributed Systems

CS3250 Distributed Systems Lecture 4 More on Network Addresses Domain Name System DNS Human beings (apart from network administrators and hackers) rarely use IP addresses even in their human-readable dotted quad format. Instead they use more descriptive domain addresses such as cs-server.aston.ac.uk or www.cs.aston.ac.uk which are easy to remember and give strong hints as to the location of the machine, the purpose of the machine and/or the organisation owning it. In the early days of the ARPANET (from which the Internet grew) a flat naming scheme was used with all machines on the net listed with their IP addresses in a single hosts database file (often a text file called /etc/hosts). However as the number of machines grew this scheme quickly became unworkable as the host files became too big, had to be changed frequently to add entries and name clashes often arose. A hierarchical domain-based naming scheme Domain Name System (DNS) was introduced with a distributed database system for implementing the mapping to IP addresses. The Internet is divided into several hundred top-level domains: the top-level domains are generic domains such as int edu mil gov net com org various international organisations educational establishments mainly in US military establishments in US US government organisations network providers companies various non-commercial organisations plus geographic domains (normally individual countries), uk, nl (Netherlands) jp (Japan) etc.. There is also a us domain although most organisations in US belong directly to one of the generic top-level domains. Until a few years ago machines belonging to these generic domains (except int) where located within the USA, but particularly in the case of the com and net domains this is no longer the case. The responsibility for subdividing the top-level domains into sub-domains is the responsibility of individual national organisations or other similar bodies for the generic domains. Note that domain names are case-insensitive so that edu and EDU denote the same domain. Below the top-level generic domains there are domains for individual organisations (universities, individual companies etc.); any sub-domains within these is the responsibility of the individual organisations themselves. Some geographical domains (uk, jp) have subdomains such as ac, org and co etc. which mirror the generic US-based domains whereas other countries (for example nl) have individual organisation domains directly under the toplevel domain. In USA the geographical domains are subdivided into domains for each state. The domains for individual organisations may be further sub-divided for example into domains for different university departments or domains for various sections of a large company (research and development, marketing, personnel etc.). Again the responsibility for naming within these is delegated to individual departments. Within these departmental domains there are names which denote individual hosts or services (although further subdivision into domains is possible. A typical domain name is cs.aston.ac.uk which denotes the Computer Science domain within Aston which is part of the academic community domain of the national UK domain. Names need only be unique with their local domain. Thus cs.aston.ac.uk, cs.bham.ac.uk and cs.aston.co.uk are all valid domain names and do not clash. Note that these days domain names are invariably given in little-endian form (although this was not the case about 10 years ago when both big-endian and little-endian forms were used). Each domain is responsible for providing a name server which manages and provides access to information on the names within that domain. Some small departmental sub-domains may arrange that a name-server for the organisation handles the names within the sub-domain. A Barnes, 2003 1 CS3250/L4

Generic domains Country domains com edu org uk jp nl sun ac co vu oce eng aston bham cs cs www flits fluit www ftp Name server zones It is important to realise that the name hierarchy is distinct from the division of the Internet into physical networks and internets. One departmental domain could span several distinct LANs or two or more departments sharing the same LAN could have different name domains. The Domain Name Database The domain name database for a particular domain contains a number of resource records for each name which have the form DomainName TimeToLive Class Type Value The DomainName is the name to which this record applies, often there are several records for each name. TimeToLive is a time scale (in seconds) over which the information in the record is likely to remain valid. The Class always has the value IN for the Internet but other values are possible for different network protocols The Type and Value fields specify the type of information and its value. Type Meaning Value SOA Start of Authority Parameters for this domain A IP address 32-bit IP address MX Mail exchange Priority, a domain for email sent to this domain NS Name Server Name of a name server for this domain CNAME Canonical name Real domain name for the canonical name PTR Pointer Alias for an IP address HINFO Host Info CPU and OS type of host as ASCII text TXT Text Uninterpreted text giving general info on name The SOA fields gives information about the domain (administrators email address, a serial number and various time-out values and flags). The A field gives the IP address of a host (multi-homed host can have several A resource records). The MX field specifies a host willing to accept email for this domain. It may not be possible for some hosts (eg PCs which are switched off overnight) to accept incoming email or in other cases it may not be convenient for some hosts to receive email (e.g. individual workstations in a public laboratory) even if the machines have the capability of handling incoming email. The priority level in an MX record gives the order of preference in which to try various email servers for this domain. CNAME is a canonical name for a service; the record gives the name of a real host which provides this service. For example www.cs.aston.ac.uk is the canonical name for the CS web server. At A Barnes, 2003 2 CS3250/L4

one time the real host could be cs-server.aston.ac.uk and later it could be changed to csultra.aston.ac.uk without the need change URLs on web pages etc.. PTR is another form of alias which is often used to allow look up of the domain name corresponding to an IP address. Note that a domain name (such as a CNAME entry) may not always refer to an actual machine. Also the IP address corresponding to domain name may vary depending on the type of service required. For example telnet services may connect direct to the machine but email services for a name with an MX resource record will connect to another machine. Name Servers Each domain is responsible for arranging for one or more name servers to store and provide access to the resource records relating to that domain. Generally there is a primary name server plus one or more back-up name servers in case the primary server goes down. A local name server will store definitive information about its own local domain (these are authoritative records which are guaranteed to be up to date) and in addition it may store information about nearby domains which are accessed frequently from this domain. In addition it caches information about domains that have been recently accessed. These records are not authoritative as they may be out-of-date -- the TimeToLive field gives an indication of the time that cached records are likely to remain valid. After that time has expired the cached information should not be used but should be retrieved from an authoritative source over the network. However although they are not authoritative, cached records are important as the speed up the name look-up process -- if a domain is contacted once it is likely to be contacted again in the near future. These requests can be satisfied immediately from the cached copy rather than retrieving the information each time over the network from the authoritative source. Suppose a user on a machine in the domain cs.aston.ac.uk wishes to look up information relating to the domain cs.bham.ac.uk. The machine contacts the name server for cs.aston.ac.uk, which looks up the name in its local database; the protocol used to communicate with name servers is called DNS, an application level protocol in the TCP/IP suite. If the information is not cached there it contacts the name server for the domain aston.ac.uk. If the information is not cached there the name server for bham.ac.uk which we will suppose does not contain the cached information and so contacts the name server for cs.bham.ac.uk which returns an authoritative record of the information to the local name server cs.aston.ac.uk. This caches the information in its local database and returns the required information to the requesting host. Subsequent requests from domain cs.aston.ac.uk to domain cs.bham.ac.uk will use the local cached information until its time to live has expired. Two types of searches are possible: recursive and non-recursive. Recursive Host cs.aston.ac.uk aston.ac.uk bham.ac.uk cs.bham.ac.uk In the former if the local name server cs.aston.ac.uk 1 cannot satisfy the query itself it contacts the name server of a higher level domain aston.ac.uk. This name server then contacts the server at bham.ac.uk and so on till the request is satisfied and then the information is passed back through the call chain to the original requester. In this case each name server on the call chain will probably cache the information as well as returning it to the requesting name server. 1 This example is for illustration purposes only there is no name server for this domain; the Aston name-server handles such requests. A Barnes, 2003 3 CS3250/L4

Non-Recursive Host cs.aston.ac.uk aston.ac.uk bham.ac.uk cs.bham.ac.uk In the non-recursive case if the name server at aston.ac.uk cannot satisfy the request it returns a message to the name server at cs.aston.ac.uk telling it contact the name server at bham.ac.uk (and giving it the IP address to use). The name server cs.aston.ac.uk then contacts the name server for bham.ac.uk itself. If this server can satisfy the request it does so, if not it directs the local name server to contact the name server for cs.bham.ac.uk. The program nslookup can be used to retrieve DNS information about a particular host name. For example to retrieve information about the host www.aberdeen.ac.uk via the Aston name server called terrapin(.aston.ac.uk) do /usr/sbin/nslookup -d www.aberdeen.ac.uk terrapin The -d is the ''debug' option which gives some interesting extra feedback. Do man nslookup for more info on the other facilities provided by the utility. Communication Endpoints in the Transport Layer In the sections above the delivery and routing of packets has been discussed using IP addresses. In effect the discussion has concerned the network (or internet) layer which deals with the delivery of packets from one host machine to another across the network. However machines are not the final destination of packets. The information needs to be delivered to a particular application executing in a process on the destination machine. However regarding a particular process as the endpoint of a communication channel is not satisfactory. An application running on the source machine is unlikely to have detailed information about individual processes running on the destination machine. Even if it did, suppose a process dies and an application is restarted (perhaps due to the remote machine being rebooted; the process ID (etc.) of the process running the application is now completely different. However from the point of view of the source machine communication is taking place over the same channel before and after the reboot on the destination machine. To overcome this problem the transport layer TCP and UDP protocols in the TCP/IP suite identify communication endpoints by an IP address and a port number. The port number is an integer denoting an abstract destination point on the machine specified by the IP number. The local operating system provides one or more interface mechanisms via system calls for processes to specify and access communication ports. In later lectures we will discuss in detail the socket mechanism which is the most commonly used interface to transport layer communication ports. Other mechanisms are available and will be discussed briefly; they are usually similar to the socket mechanism in many respects. A communication channel can be thought of as a link between two communication endpoints. Note however that a particular endpoint on a local machine can be used to communicate with several different endpoints on remote machines; we should think of there being several distinct communication channels which happen to share one endpoint. Generally ports are buffered by the operating system so that if data arrives on a port before a process is ready to accept it, the data is stored in an FIFO input queue until the process is ready to accept it. Similarly several output requests may be buffered until a sufficient amount of data is ready which is then sent out onto the network as a single packet. A Barnes, 2003 4 CS3250/L4

The same port number belonging to different protocols are regarded as be distinct communication endpoints. Thus a communication endpoint in the transport layer is specified by IP number port number protocol Traditionally certain low port numbers are reserved for services such as SMTP, TELNET, FTP, HTTP and other standard network services such as finger, date and time services etc.. These are reserved port numbers generally in the range 1-512. On systems where the concept of a super-user makes sense port numbers in the range 513-1024 are reserved for use by processes running with root (super-user) privileges. Normal application programs must use port numbers above this range unless of course they are connecting to a standard network service. If two or more processes on the same machine try to create a communication endpoint with the same port number and same protocol then only one of the attempts will succeed; the other attempts will fail as the address is already in use. As a general convention if an application program specifies zero as a port number as part of the specification of a communication endpoint then the operating system allocates an unused port number for the endpoint. A Barnes, 2003 5 CS3250/L4