INTRODUCTION Computer Network Programming allows information to be exchanged across and between networks. The primary connection point for this communication is known as a network socket or socket pair. Because computing platforms often vary from one end system device to the next, protocols have been developed to reconcile conflicts. The most commonly used protocol for network sockets is Transmission Control Protocol/Internet Protocol (TCP/IP) and the vast majority of network sockets in use are Internet Sockets (Wikipedia, Network Sockets). The primary Application Programming Interface (API) for Inter-Process Communication (IPC) is known as Berkeley Sockets (BSD) and all contemporary operating systems include some version of BSD. Berkeley Sockets originally began as a part of the proprietary Unix library in 1983. The library was released to the public in 1989 and by 1991, Microsoft was able to incorporate a Winsock version into the Windows OS. Other variations evolved over time including POSIX and a variation known as STREAMS for transport layer interface (TLI) is available today. (Wikipedia, Berkeley Sockets) For our project, we decided to create a couple of Internet socket pairs in Python to learn more about TCP socket programming and functionality. We chose Python because neither of us had worked with it before and we wanted to gain some experience in this area also. What we learned is that Python is an open source high-level programming language that s fast, flexible, easy- to- learn and simple- to- understand. It s an ideal scripting tool for preliminary programming models. Web developers often use it for prototyping because it requires considerably fewer lines of code allowing them to get products to the market much faster than otherwise possible. It s also compatible with such programming languages such as C, C++ and Java. In fact, versions of Python and Python language extension packages are available including CPython, Jython, and Cython (Wikipedia, Python_(programming_language). Many scientific communities such as NASA and the National Weather Service, educational institutions including the University of California at Irvine and the web-based companies Yahoo!, Google and Youtube make use of Python functionality (Python.org, Organizations Using Python). NETWORK SOCKETS There are some rudimentary functions that all web servers need to provide. These include creating the socket, binding it to the port and IP address, providing the server with the ability to listen for incoming client requests, enabling the client to make the connection to the server, allowing both server and client to send and receive messages, handling errors and providing a closing method. The following diagram illustrates the flow of function between the server and client.
While we were given a skeleton code for building our basic server prototype, we still needed to do some research in order to discover just how Python would be able to accomplish these objectives. It was surprisingly simple to locate various short coding examples, but that still left us with three main problems to resolve. These were: 1. Learning how the Python split methods worked so we could parse data from the message header. For a look at how accomplished this, see the section on Basic Socket Pair. 2. Deciding how we could best implement looping to allow us to process serial GET requests from the client without having to terminate and restart the connection each time. We discuss this in the section on Moduled Socket Pair. 3. Resolving an open port bug discovered during the testing phase of programming our sockets. This proved to be our biggest challenge during this exercise and we cover this in greater detail in the section on Simple Server. BASIC SOCKET PAIR We began creating a basic web server and client from the ground up in order to learn as much as possible about the fundamentals of TCP socket pairs. On the server side, creating the socket begins with initializing server socket with arguments for designating an IP address and port
number. Once we had done this, we passed the port number and bound it to the host (here: localhost). Next, we set the server to listen for requests. When a request is received, the server will then examine the file name to see if it is formatted according to html standard. If it s not formatted correctly, the server will adjust the file name so that it does (i.e. index.html ). This parsing is done using the splitmessage function shown in our code. We use a similar approach for extracting information from the request header. The coding for the client side was even simpler as it needed to allow the connection, take input for the file name request, send it properly formatted to the correct server. We set the maximum buffer size to 1024 bytes and allowed it to maintain the open connection until all bytes have been received. A close operation is then invoked for both the client and the server. We were able to get our server to send the requested html file fairly easily, but discovered quickly that there were several different ways in which the file data could be sent using Python methods. One method was to send one line at a time, which was our original approach so that we could observe how this worked. We immediately noticed problems as only one of browsers tested was able to receive the file properly. We then tried the send () method, but then learned that this left us with no way to know how many bytes were successfully received in the event that the transfer experienced a difficulty or interruption. We found that the sendall() method ensured that all bytes would be sent until the maximum buffer size was achieved. On testing, we again found that method was not working well on all browsers and so, we decided to use a for-loop that would run until it reached the end of the file. This approach worked the best
during all testing. Finally, we decided to implement the atexit method for closing the socket and are able to close both the socket by restarting the Python shell. The socket options are set to reuse the port address if it is open. While this isn t exactly the solution we had been hoping for, it was much better than what we had been working with so we considered this a success. MODULED SOCKET PAIR After more research, we decided to create a modilfied socket pair using modules. Modules allow the programmer to use file definitions (or scripts) as input for the interpreter. These can be accessed by either the main method or by other modules using the import statement. Being able to access pre-existing socket libraries is far more efficient than trying to re-code everything from the ground-up whenever you want to create a socket pair. Our moduled client uses the import statement to access the http library in order to handle request formatting and general socket functions. We then added more detail to refine the parsing process for user input so that it can detect domain dot syntax and directory slash syntax. This allows us to make use of information across multiple browsing platforms. While we can still get an invalid url message on some requests, this seems to be a common outcome for all browsers.
We also set to work trying to find ways that the client can keep the connection to the server open until the user quits or enters a new server. Problems arose during this phase when a server instance crashed or got interrupted and we experienced the port-still-in-use problem. This has to do with the TIME_WAIT state and is explained in section 2.7 of http://www.faqs.org/faqs/unix-faq/socket/. This had been an ongoing problem since we created the first socket pair. Up until this point, we had to keep changing the port # s in both server and client when something would go wrong such as a crash or premature termination and our initial scripting attempts to solve this bug had failed. When this error message occurs, the port is still open even though both client and server sockets have been closed. In this event, the only remedy is to restart Python. We developed a preliminary work-around for this by including code to keep the server running until the user enters control-c in the Python console. SIMPLE SERVER After still more research, we found some new ways to approach. The first approach involved attempting to add quit functionality for the server using the Python shell. Unfortunately, it still didn t work with a persistent server. Whenever we tried to obtained raw_input, the program would stop and wait. Then, we tried to put in a control-c method for stopping the program. This worked, but we still wanted more functionality.
At this point, we decided to code a third server version Simple Server incorporating the SimpleHTTPServer and SocketServer modules into our code. This was definitely a better approach as it allows for the handling secondary GET images requests from browsers, which is a real time-saver. RESULTS AND ANALYSIS While it is relatively easy to get a basic TCP/IP server and client coded in Python without a great deal of knowledge or experience working in the language, it is far easier to make use of the preexisting modules and libraries. Most of the functionality we have come to expect for Internet Protocol has already been created. Additionally, new browser versions come out presupposing that most people will be using these established methods. One important item not yet discussed is HTTP status code handling. Our code for Basic Server uses 200 OK and 404 Not Found. After looking around at the research and comparing our base code with other examples, we decided this was a good place to begin. We had success in getting our Basic Server to close on a 404 Not Found error. When we compare the three server versions, we see that the Simple Server is far and away the most functional. It allows us to better parse information being sent from client to server as well as provides a way to reuse the same connection for multiple exchanges between users. Furthermore, we are able to take raw input directly from the client so that we can tailor the file transfer to meet the user s specific needs as well as giving us the ability to send multiple image
files in a single exchange. The next server in terms of ease of use is the moduled Server. This server uses the BaseHTTPServer module allowing you to handle HTTP GET requests. Specifically, the BaseHTTPRequest handler allows you to access variables useful for process both GET and POSTS. The most useful function is serve_forever (). Using this method, we can continue to use the server until there is a keyboard interrupt call to terminate the server. The other unique aspect of the moduled server is that we used it to learn more about Python. We explored the following using an if name == main statement block in order to make it increasingly moduled if we later desired. Again, our basic TCP server only provides bare bones functionality and is useful, though not as useable as the other two versions. On the client side, the most functional is our moduled client. This client connects with a default server connection. The client then gives the user options for input including a single file, a full URL with file, a file path, a file path on the current server and quit. For instance, the user enters a URL, then has the ability to navigate to another site and then stop the client. Codewise the client parses the user input by creating split lists for full path directories, splitting the first elements off the lists by dots in the event of domains, and then checking to see if the first element is htm or html. We also did some error checking by entering invalid user input. The client handles the information appropriately. Additionally, the client checks to see if there is a specific port as well as a re-assembling the file path. Of course, none of these features are found on our basic TCP server. LESSONS LEARNED In order to the run both the client and the server from the same end-device, you must open two separate Python shells. Then open the server in one instance and the client in another. Otherwise you would need to run a multi-threaded process, which doesn t make sense within the scope of this project. Python is a very efficient language for low-level network programming. There are many modules (libraries) available, which provide most of the common network programming functions and protocols. We feel that in most cases, these modules should be utilized rather than attempting to program these functions yourself. If you don t require functionality outside of routine network programming models, it doesn t make sense to try to re-invent the wheel except as a learning exercise. We learned that it s important to be able to manage port assignment and access. If a port number stays open even after the socket has been closed, you will get an error message, only one usage of each socket address is normally permitted. The work-arounds for this are not as straight-forward as one might think, as it requires reusing the open port rather than closing it.
It s critical to be able to handle this because the TIME_WAIT state is important, making sure that all of the data has gone through. From FAQs.org, A couple of points about the TIME_WAIT state. o The end that sends the first FIN goes into the TIME_WAIT state, because that is the end that sends the final ACK. If the other end's FIN is lost, or if the final ACK is lost, having the end that sends the first FIN maintain state about the connection guarantees that it has enough information to retransmit the final ACK. o Realize that TCP sequence numbers wrap around after 2**32 bytes have been transferred. Assume a connection between A.1500 (host A, port 1500) and B.2000. During the connection one segment is lost and retransmitted. But the segment is not really lost, it is held by some intermediate router and then re-injected into the network. (This is called a "wandering duplicate".) But in the time between the packet being lost & retransmitted, and then reappearing, the connection is closed (without any problems) and then another connection is established between the same host, same port (that is, A.1500 and B.2000; this is called another "incarnation" of the connection). But the sequence numbers chosen for the new incarnation just happen to overlap with the sequence number of the wanderingp0 duplicate that is about to reappear. (This is indeed possible, given the way sequence numbers are chosen for TCP connections.) Bingo, you are about to deliver the data from the wandering duplicate (the previous incarnation of the connection) to the new incarnation of the connection. To avoid this, you do not allow the same incarnation of the connection to be reestablished until the TIME_WAIT state terminates. Even the TIME_WAIT state doesn't complete solve the second problem, given what is called TIME_WAIT assassination. RFC 1337 has more details. o The reason that the duration of the TIME_WAIT state is 2*MSL is that the maximum amount of time a packet can wander around a network is assumed to be MSL seconds. The factor of 2 is for the round-trip. The recommended value for MSL is 120 seconds, but Berkeley-derived implementations normally use 30 seconds instead. This means a TIME_WAIT delay between 1 and 4 minutes. Solaris 2.x does indeed use the recommended MSL of 120 seconds. ( http://www.faqs.org/faqs/unix-faq/socket/ )
CITATIONS AND SOURCES FAQs.org. Ed. Vic Metcalf and Andrew Gierth. FAQ, 22 Jan. 1998. Web. 3 Dec. 2013. Stack Overflow. Close Server Socket on Quit Crash. N.p., 14 Oct. 2011. Web. 1 Dec. 2013. Stack Overflow. Python Binding Socket: "Address already in use". N.p., 14 Oct. 2011. Web. 17 June 2013. Triajianto, Junian. "Simple HTTP Server and Client in Python." Code Project. N.p., 21 Sept. 2012. Web. 1 Dec. 2013. Wiki: HTML Parser. Python.org, 2 July 2013. Web. 30 Nov. 2013. Wiki: Web Client Programming. Python.org, 2 July 2013. Web. 30 Nov. 2013. Wiki: HTML Parser. Python.org, 2 July 2013. Web. 30 Nov. 2013. Wiki: Organizations Using Python. Python.org, 2 July 2013. Web. 30 Nov. 2013. Wikipedia. Berkeley Sockets. Wikipedia Foundation, n.d. Web. 30 Nov. 2013. Wikipedia. Network Sockets. Wikipedia Foundation, n.d. Web. 30 Nov. 2013. Wikipedia. Python_(programming_language). Wikipedia Foundation, n.d. Web. 30 Nov. 2013.
APPENDIX #CS 336 Fall 2013 #Adam Callaway and Kathleen Bodi #Final Project - Python Web Server and Client #sockserver.py : Rudimentary TCP Server responding to HTTP GET requests #Version 1.4 #!/usr/bin/env python #import socket module from socket import * import atexit import sys #turn code into a modular function rather than a standalone program def main(): #Prepare a server socket serversocket = socket(af_inet, SOCK_STREAM) serversocket.setsockopt(sol_socket, SO_REUSEADDR, 1) serverport = 8000 serversocket.bind(('localhost',serverport)) serversocket.listen(1) #attemping to prevent port lockout on crash or bug def close_socket(): serversocket.close() atexit.register(close_socket) while True: #Establish the connection print ('Server is ready on port: '), serverport connectionsocket, addr = serversocket.accept() try: clientmessage = connectionsocket.recv(1024) message" #for debugging print "client message:: \n",clientmessage,"\n::end client #python splits string, assigns element 1 of array to var filename splitmessage = clientmessage.split() #for debugging print '\nsplitmessage::',splitmessage,'\n::end splitmessage' #get element 1 from the split (element 0 is 'GET') filename = splitmessage[1]
print filename if not filename[1:]: filename = '/index.html' element #opens a file using filename variable, split starting at 1st #presumably stripping the '/' off the start f = open(filename[1:]) #pass outputdata = f.read() print "Sending data: ",filename print outputdata #Send one HTTP header line into socket connectionsocket.send('\nhttp/1.1 200 OK\n\n') #Send the content of the requested file to the client #sending it one line at a time for i in range (0, len(outputdata)): connectionsocket.send(outputdata[i]) #alternate method below #sendall instead of send, as the send() method is not guaranteed #to send all of the data you pass it and will simply return #the number of bytes that were successfully sent #reference: https://synack.me/blog/using-python-tcp-sockets #connectionsocket.sendall(outputdata) connectionsocket.close() except IOError: #Send response message for file not found print "404 Not Found" connectionsocket.send('\http/1.1 404 Not Found \n\n') #Close client socket print "Closing socket on port: ", serverport connectionsocket.close() break #redundant safety catch serversocket.close() if name == ' main ': main()
#CS 336 Fall 2013 #Adam Callaway and Kathleen Bodi #Final Project - Python Web Server and Client #sockclient.py : Rudimentary TCP HTTP Client: #opens socket to server, sends HTTP GET requests, prints data, closes socket #Version 1.5 #!/usr/bin/env python #import socket module from socket import * servername = '127.0.0.1' serverport = 8000 clientsocket = socket(af_inet, SOCK_STREAM) clientsocket.connect((servername,serverport)) filename = raw_input('enter filename to GET (or \'quit\' to close server): ') clientsocket.send('get /%s HTTP/1.0\r\nHost: %s\r\n\r\n' % (filename, servername)) receiveddata = clientsocket.recv(1024) print receiveddata printstring = "" while len(receiveddata): printstring += receiveddata receiveddata = clientsocket.recv(1024) print printstring clientsocket.close()
#CS 336 Fall 2013 #Adam Callaway and Kathleen Bodi #Final Project - Python Web Server and Client #modserver.py : Rudimentary TCP Server responding to HTTP GET requests #uses Python's BaseHTTPServer module #Version 1.2 #!/usr/bin/env python from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer import BaseHTTPServer import os import time HOST_NAME = "127.0.0.1" PORT_NUMBER = 8000 # class derived from BaseHTTPRequestHandler class BC_HTTPRequestHandler(BaseHTTPRequestHandler): # GET request handling def do_get(self): httpdir = '.' #default directory is same as server program try: #debugging print path print "requested file path: " + self.path if self.path == '/': self.path = '/index.html' if self.path.endswith('.html') or self.path.endswith('.htm'): sendfile = open(httpdir + self.path) #send the 200 OK message self.send_response(200) #send the header self.send_header("content-type", "text-html") self.end_headers() self.wfile.write(sendfile.read()) sendfile.close() return except IOError: self.send_error(404, 'The requested resource could not be found but may be available again in the future.') if name == ' main ': #start the server print "Attempting to Start Server..." server_class = BaseHTTPServer.HTTPServer httpd = server_class((host_name, PORT_NUMBER), BC_HTTPRequestHandler) #server has started
print"(",time.asctime(),")", "Started Server at %s:%s" % (HOST_NAME, PORT_NUMBER) print "Now handling HTTP requests. Use \'ctrl+c\' to stop server." try: httpd.serve_forever() except KeyboardInterrupt: #hit ctrl-c in console to stop server pass httpd.server_close() print "(",time.asctime(),")", "Stopped Server at %s:%s" % (HOST_NAME, PORT_NUMBER)
#CS 336 Fall 2013 #Adam Callaway and Kathleen Bodi #Final Project - Python Web Server and Client #modclient.py : Rudimentary TCP HTTP Client: #uses Python's httplib module #allows user to enter server name, file name, port number #Version 1.9 #!/usr/bin/env python import httplib import sys def main(): servername = '127.0.0.1' serverport = '8000' filename = '' serverconnection = httplib.httpconnection(servername, serverport) while True: print "\nconnection open to server: %s on port: %s" % (servername, serverport) print "Your options for input are:" print " * a file to GET from current server (i.e. homepage.html)" print " * a complete URL to GET (i.e. www.bob.com/homepage.html)" print " * a new directory and/or file to GET on current server" print " (i.e. /sub1/info.html or /sub2/)" print " * nothing to GET the current server's default page (index.html)" print " * quit the client (quit)" userinput = raw_input("what would you like? ") if userinput == 'quit': print("quitting the HTTP client.") serverconnection.close() break #print "\n[> debug start userinput::\n",userinput,"\n::end userinput debug <]" splitinput = userinput.split('/') #print "\n[> debug start splitinput::\n",splitinput,"\n::end splitinput debug <]" dotsplit = splitinput[0].split('.') #print "\n[> debug start dotsplit::\n",dotsplit,"\n::end dotsplit debug <]" #user hit 'enter' for current server's default if userinput == "":
filename = "/index.html" #user entered an html filename elif splitinput[0].endswith('.html') or splitinput[0].endswith('.htm'): filename = splitinput[0] #user input splits to one entry and is not a domain or webpage elif len(splitinput) == 1 and len(dotsplit) == 1: print "\nerror: \'%s\' is not a valid server." % (splitinput[0]) continue #assume we have a new address of some sort else: #if input is not another directory on old server, close connection to old server if not userinput.startswith('/'): print "Closing connection to %s." % servername serverconnection.close() #check if there's a port specified portsplit = splitinput[0].split(':') #print "[> debug start portsplit::\n",portsplit,"\n::end portsplit debug <]" if len(portsplit) > 1: servername = portsplit[0] serverport = portsplit[1] else: servername = splitinput[0] serverport = '80' #if there's more than one piece to the filename path if len(splitinput) > 1: #stitch it back together filestring = '' #for each element in the split from the second to last for s in splitinput[1:]: filestring += '/' + s filename = filestring #print "\n[> debug servername: %s:%s <]" % (servername, serverport) #print "[> debug filename:", filename," <]" serverconnection = httplib.httpconnection(servername, serverport) serverconnection.request("get", filename) serverresponse = serverconnection.getresponse() print "\nhttp Response from %s: " % servername print "=========================================================="
print(serverresponse.status, serverresponse.reason) serverdata = serverresponse.read() print "\nhttp Data from %s:%s%s::" % (servername, serverport, filename) print "==========================================================" print(serverdata) print "==========================================================" print "::end of HTTP Data from %s:%s%s." % (servername, serverport, filename) if name == ' main ': main()
#CS 336 Fall 2013 #Adam Callaway and Kathleen Bodi #Final Project - Python Web Server and Client #simpserver.py : Rudimentary TCP Server responding to HTTP GET requests #uses Python's SimpleHTTPServer and SocketServer modules #Version 1.7 #!/usr/bin/env python import socket import SimpleHTTPServer import SocketServer import time HOST_NAME = '127.0.0.1' PORT_NUMBER = 8000 #setting up socket to be reused #prevents TCP port lockout issue (TIME_WAIT) on crash or bug class SimpleServer(SocketServer.TCPServer): def server_bind(self): self.socket.setsockopt(socket.sol_socket, socket.so_reuseaddr, 1) self.socket.bind(self.server_address) if name == ' main ': print "Attempting to Start Server..." Handler = SimpleHTTPServer.SimpleHTTPRequestHandler httpd = SimpleServer((HOST_NAME, PORT_NUMBER), Handler) #server has started print"(",time.asctime(),")", "Started Server at %s:%s" % (HOST_NAME, PORT_NUMBER) print "Now handling HTTP requests. Use \'ctrl+c\' to stop server." try: httpd.serve_forever() except KeyboardInterrupt: #hit ctrl-c in console to stop server pass httpd.shutdown() print "(",time.asctime(),")", "Stopped Server at %s:%s" % (HOST_NAME, PORT_NUMBER)