Virtual Private Network Using Peer-to-Peer Techniques Peer-to-Peer VPN Daniel Kasza Massachusetts Academy of Math and Science Abstract The low performance of traditional, client-server model based, virtual private networks (VPNs) led to the investigation of using peer-to-peer communication to improve the bandwidth and latency of the communication between the connected clients. A new peer-to-peer connection based VPN protocol was engineered. The protocol uses both TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) communication to transfer Ethernet frames between the connected clients over IPv4 and IPv6 networks, and it improves the network performance by making direct communication for the clients possible. An IPv4 compatible implementation was done in Java and C programming languages using the Java Native Interface. The tests were done using Ubuntu Linux on three computers connected to a test computer network, and it was concluded that the new protocol has better client-to-client performance than traditional VPN protocols, while it decreases the load on the server. The protocol can be used to create VPNs for applications that require low latency communication including computer games and Voice over IP. Because the protocol encapsulates Ethernet frames, it can be also used to interconnect separate Ethernet networks. Introduction Recently, high speed Internet connections became inexpensive and reliable. This led to the spread of VPN connections instead of direct dial-up or leased line connections. They connect offices in different parts of the world and make corporate networks accessible for the employees anywhere using a single Internet connection. They also help reserving the remaining IPv4 address space by making computers accessible without public IP addresses; however, they are still not practical for real-time, delay or speed sensitive applications. Online games, video streaming, and file sharing services are just a few of the numerous protocols that cannot be efficiently used with the current, point-to-point virtual private network protocols; however, recently many applications of peer-to-peer technologies proved to be efficient, fast and reliable. By combining peer-to-peer technologies with VPNs, it is possible to create faster VPN networks. Literature Review Computer Networks Modern computer networks handle data in form of small data blocks called packets. The name of this method is packet switching. Every data is encapsulated in these packets and moved toward its destination using the address provided in the packet through different networks. Packet switching can be connection less and connection-oriented. In case of a connectionless protocol, data is sent and received in form of packets by the two communicating peers; however, connection oriented protocols present an interface for the two peers to communicate using data streams. The main protocol used over the Internet is the Internet Protocol version 4 (IPv4). It was designed to interconnect packet-switched computer networks ( RFC 791, 1981). Hosts are identified by their IP address, which is a 32bit long number. It does also provide packet fragmentation. IPv4 does not provide any sort of error handling. Packets can be lost or can arrive out of order. The process of moving the packets toward their 1
direction is called routing and is done by routers. These devices connect subnets. An IP address contains the address of the subnet and the address of a host on that network. The successor of IPv4 is IPv6, which is incompatible with the IPv4. IPv6 has a larger, 128b address space, and numerous new features compared to IPv4 ( RFC 2460, 1998). It drops the support for fragmentation. If a packet is too big to be transmitted through a network segment, it is simply dropped by the routers. It simplifies the routing process. IPv6 is also more secure, has better support for multicasting, and has a new addressing method called anycast. The number of available IPv addresses is decreasing. It is predicted that the full IPv4 address space will be assigned by the middle of 2011. This exhaustion led to the development of different IP address preserving techniques. One of these is Network Address Translation (NAT), which provides a way to hide a private IP network behind only one public IP address ( RFC 3022, 2001). There are IP address ranges assigned to be used on private networks. The problem with NATs is that each connection has to be initiated from the private network, and hosts from the public network cannot reach the hosts of the private network. It does also mean that two hosts behind different NATs cannot connect to each other. For this reason, NAT traversal techniques were developed. One of those is UDP hole punching, which solves the problem by using a server on the public network to open the connection. The two clients connect to the public server. This opens a way through the NAT for this connection; however, the server sends the clients where they can find each other, so they can use the holes originally created for the communication with the server. The User Datagram Protocol (UDP) is a connectionless protocol used over IPv4 and IPv6. It does not guarantee that the messages (datagrams) will arrive or arrive in order ( RFC 768, 1980). It is used where lost packages do not need to be sent again because they would be invalidated before they would arrive, or where minimum delay of communication is needed. Voice over IP and gaming are good examples of these protocols. UDP is also suitable for simple devices because of its simplicity. That is why it is used for the Trivial File Transfer Protocol (TFTP). The Domain Name System is also based on the UDP protocol. Transmission Control Protocol (TCP) is a connection oriented protocol ( RFC 761, 1980). Every data sent over a TCP connection is guaranteed to arrive in order; however, that means that lost packets have to be retransmitted before the communication can be reassembled on the receiver s side. This retransmission can cause delays in the communication. TCP is also more complicated than UDP, which makes it harder to implement in embedded systems and makes connecting over TCP a relatively slow process. These make TCP suitable for applications that require long term, reliable connections. TCP is used for the Hyper Text Transfer Protocol (HTTP), the File Transfer Protocol (FTP), Post Office Protocol 3 (POP3), Simple Mail Transfer Protocol (SMTP) and many other protocols. 2
Attacks on Computer Networks Peer-to-Peer VPN host Hello! attacker Hi! host (man-in-the-middle) Hello! storage Figure 1. MITM attack. The attacker relays the communication between the two hosts. It can record and even modify packets. One of the basic attacks is the Man-In-The-Middle (MITM) attack, which has the goal to bias the packet flow between two hosts and relay it through the attacker s computer (Schneier, 2004), which can record and modify the communication on the network. Encryption and authentication can be used to protect protocols from MITM attacks. Replay attacks are usually used together with MITM attacks. The attacker replays (resends) previously recorded packets, usually without knowing their exact content. This way a vulnerable protocol can identify the packets as legitimate communication and authenticate the attacker s computer or do other operations. Figure 2. Real-world DDoS example. A Dunkin Donuts can serve limited number of customers in a given time period. If there are too many customers waiting, some of them will timeout and leave the restaurant ( Schedule, 2009; Starset, 2009). Denial-of-Service (DoS) attack is a method for rendering services unavailable by excessive traffic. The goal is to consume the resources of a service (bandwidth, computing power), so it cannot handle other requests ( RFC 4732, 2006). Distributed DoS (DDoS) is a variant of DoS attacks, where multiple computers are used to attack a service. It is harder to protect systems against DDoS attacks. On Layer 2 Ethernet networks, hosts are identified by their Media Access Control (MAC) addresses. MAC is a 6Byte long number. The parity of the first Byte determines the addressing of the packet. MACs with even first Bytes are unicast addresses (Gergő Koós, personal communication). 3
VPNs A virtual private network is a computer network that uses an existing computer network infrastructure to provide secure access to a network. It encapsulates the communication between the connected network devices. There are many different VPN uses of VPNs; however this paper focuses on the VPN protocols used over the Internet to provide inexpensive connections between computers. Figure 3. A typical VPN topology. Both clients are connected to the Internet, but every VPN traffic has to go through a single server. These virtual private networks were originally created to satisfy the need for a less expensive way to interconnect corporate networks than leased or owned lines. Because they were originally designed to replace leased or owned lines, a typical VPN uses a point-to-point topology; however, currently, VPNs are also used to connect individual computers together. The problem in this case is that the communication between two computers connected to the VPN is slowed and limited by the Internet connection and performance of the VPN server, and the reliability of the network depends on a single computer because every packet has to reach the VPN server first, which relays it to its destination computer. In this scenario the communication between the two computers put excessive traffic on the VPN server, and the network infrastructure rendering the communication slow, while the two communicating computers could reach each other through the underlying network of the VPN connection. Client Internet Protocol PPP PPTP Internet Protocol Internet Server Figure 4. A VPN connection the PPTP protocol. A PPTP tunnel is created above the Internet Protocol for a PPP connection that encapsulates Internet Protocol packages. A frequently used protocol for VPNs is PPTP (Point-to-Point Tunneling Protocol), which encapsulates PPP (Point-to-Point Protocol) connections ( RFC 2637, 1999). When a client wants to connect to a VPN, it has to build up a PPTP connection first, and then it should use this connection to create the actual network connection by tunneling PPP through the PPTP connection. Because PPP was already supported by most operating systems and devices this protocol was simple to implement. Any PPP traffic can be transferred transparently through a PPTP tunnel, which makes it compatible with existing software and devices. Although the point-to-point topology can make the communication slow, it has a security advantage. Because clients communicate only with the VPN server through the sometimes not secure Internet connection they have 4
to make only this connection secure by authentication and encryption; however, the security provided by the VPN protocols becoming less important with the increasing security of the protocols on other levels. Peer-to-Peer Communication A peer-to-peer (P2P) network comprises equally privileged participants. No participants are clearly servers or clients in the communication. They both provide and consume resources. Peer-to-peer networks are already used for Voice over IP, file sharing, and several other applications. client peer client Server peer tracker client peer Figure 5. Server-client file sharing. In the classical serverclient model the server has to provide the file to every client. Figure 6. BitTorrent file sharing. The only role of the tracker is to help peers connecting to each other. The peers download the files from each other. A highly used P2P protocol is the BitTorrent, which enables fast file sharing over the Internet by making the downloaders uploaders at the same time (Cohen, 2008). As soon as a slice of the file is available at a peer, others can start downloading that slice from that source, too. That way the original uploader of the file does not have to upload it for every client, but the spare bandwidth of the peers can be utilized. P2P protocols can be more reliable, too because they are distributed across a network. The disadvantage of a P2P network is that they are usually more difficult to develop and implement, and they can also have from security issues. Peers do not only have to authenticate a single server, but multiple peers. If encryption is used, it is also important to have different encryption keys with each connected peer. Software and Tools Linux is the name of the Unix-like operating systems based on the Linux kernel, which was originally created and released by Linus Torvalds in 1991. The kernel is typically packed together with other software to form a desktop or server operating system. These packages are called distributions. The Linux kernel is a highly 5
scalable portable monolithic modular kernel, which runs on numerous kinds of computing devices. It can be found on small embedded computers, mobile phones, desktops, servers, and even mainframe computers. The TAP interface is a virtual Ethernet interface implemented as a kernel driver that allows userland applications to easily communicate with the network stack of the host operating system (Krasnyansky, 2001). Because TAP is a virtual Ethernet interface it works with Ethernet packets. Every TAP has two end points: one for the kernel, and one for the userland application. From the point of view of the host operating system the TAP appears as a usual Ethernet interface; however, every packet sent to it does not go to a physical Ethernet card or interface, but is received by the application connected to the other end of the TAP. The application can also construct and send packets to the operating system using the same mechanism. That way TAP can be used to implement virtual network interfaces without modifying the operating system. It is used several VPN client applications, and it is also used to connect operating systems running in a virtual machine to the host system. Wireshark is an open source network analyzer ( Wireshark, n.d.). It can be used to capture raw Ethernet traffic on a network and analyze the contents of the packets. It is available to the major operating systems because it relies on other cross-platform open source technologies. It uses the pcap Application Programming Interface to capture network traffic, which is available for both Unix and Unix-like operating systems (Solaris, Mac OS X, Linux, BSD) in the form of libpcap and for Windows in the form of WinPcap. Wireshark is useful for troubleshooting networks, software, and network protocols. Research Plan The goal is to design, implement, and test a new virtual private network protocol that uses the recent improvements in peer-to-peer communication to make client-to-client communication through the VPN faster. The server application will be programmed in Java. Only a simple server will be made to test the protocol.the client application will be programmed in Java and C. The TAP interface will be used to communicate with the host operating system. Because there is no TAP library available for Java, it will be programmed using the Java Native Interface (JNI). TAP interface is a virtual Ethernet network interface driver, which provides userland applications a way to create virtual network interfaces to communicate with the built-in networking stack of the operating system. It is already in use by several VPN applications and available for the major operating systems, including Linux, Windows, Mac OS X, and different BSD variants. That makes it suitable for a project like this. Although Java applications are platform independent, the client application will be only Linux compatible because the JNI code is Linux-specific; however, the program will be easily portable for other platforms. The client-client communications will use the User Datagram Protocol (UDP) for fast communication. UDP hole punching will be utilized to traverse Network Address Translators. The clients will use the Transmission Control Protocol (TCP) to communicate with the server. Although TCP is slower than UDP, it is more reliable. Because every communication between the clients and the server is critical, TCP is more suitable for this part of the communication because it removes error handling from the protocol. Although encryption will not be used, a challenge-response authentication method will be used to authenticate the clients on the network, and the protocol will be designed to be able to handle encryption with later extensions. During the testing a server computer and at least two client computers will be used. The computers will run Linux. The program code will be written using NetBeans. To find programming errors in the communication Wireshark, a network sniffing application, will be used to record the communication between the clients. 6
Methodology The programs were written using NetBeans (version 6.9.1, downloaded from netbeans.org). The computers used for testing were running Ubuntu Linux (version 10.10, downloaded from ubuntu.com) with the latest updates and the default-jdk installed. The computers were connected to the Internet through a standard 10Mb Ethernet hub. One computer was used to write the program and run the server application. Two computers were used to test the client application. One computer was running Wireshark and recording communication. This data was used to find the causes of unexpected errors. A single computer was set up as a VPSN and PPTP server, and it was connected to a Cisco Systems router (Cisco 2620XM). Two other computers were connected to a standard 10Mb Ethernet hub that was also connected to the router. The router was set to add a 25ms delay to the communication between the two subnets. The two computers were connected to the VPSN and PPTP networks served by the third computer. A command was given to one of the client computers to simultaneously measure and record the latency between the two clients through the two VPNs and the direct path through the Ethernet hub and the latency to the server computer. The latency was measured using the built-in ping application of the operating system. 2000 measurements were made using 200Byte packets. Five measurements were done in every second. Data was recorded to text files. After these measurements, both the direct connection and the VPSN connection were flooded with 20000 ICMP echo requests (200B each). Total time and packet loss were measured and recorded in text files. The data was processed using Microsoft Excel. 7
Results Figure 7. Comparison of latency on different networks. Network latency is an important aspect of network performance. The smaller values are better. 8
Table 1. Average latency and performance comparison. Connection to server Direct Connection VPSN PPTP Average 5.59 0.53 49.49 99.48 (ms) Performance: 100% 1046% 11% 6% Table 2. Flooding data with 20000 packets. Direct Connection VPSN Time 6238 74192 (ms) Figure 8. Comparison of network traffic between the clients and between the clients and the server on a logarithmic scale. In traditional protocols, the clients traffic would go through the server. 9
Data Analysis and Discussion The average latency values on Table 1 show that Virtual Private Switched Network (VPSN, the peer-to-peer protocol) decreases the latency between clients compared to other protocols. The direct connection is even faster than the VPSN connection, but this is expected because VPSN also uses this connection. Figure 7 shows that although the average latency is low, the pairing process that happens every 60 seconds (300 packets) slows down the connection for a short period of time. The origin of the other high values is unknown. They may be caused by other traffic on the network during the measurements. Table 2 shows that the performance of the network does not decrease under heavy load. Although it takes about twelve times more time for VPSN to handle 20000 packets, it is expected because it has ten times higher latency than the direct connection. Because VPSN uses the direct connection to transfer data and that was also flooded, it was expected that VPSN will not have the same performance as the direct connection. An important aspect of peer-to-peer protocols is that they decrease the load on the servers. Figure 8 shows that VPSN can make client-client communication more efficient. The traffic on the server is less than the of the traffic between the clients. That means a server that could serve a single network with traditional VPNs could serve more than 170 networks with a peer-to-peer protocol. Conclusions Based on the data collected, peer-to-peer communication is a viable solution for the drawbacks of traditional VPN protocols in case of client-client communication. Although the current version of VPSN is not ready for everyday use, it shows that with further research peer-to-peer VPNs could be real replacements for server-client model based protocols. The collected data also shows which parts of the protocol should be changed to improve the overall performance of the protocol. Limitations and Assumptions The protocol does not support encryption. It is assumed that encryption would not change the speed of the communication significantly. The protocol was designed to speed up unicast communication. It is assumed that the majority of the communication between the clients is unicast traffic. In some rare cases, it is possible that the clients could communicate faster through the server than directly. Although the timeout in the pairing process may filter these cases, it was generally assumed that the direct communication is faster because other cases are very rare and the role of direct communication is not only to speed up the communication, but to decrease the load on the server. 10
Applications and Future Experiments The protocol in its current state could be used where security is less important than low latency or where encryption is already provided on higher protocol levels. Although VPSN does not currently support encryption, it was designed to make the later addition of encryption algorithms simple. Key exchange and authentication could be part of the pairing process. One of the current weaknesses of VPSN is the slow pairing process. It could be improved to make the value of latency more stable. One way to do this would be to start the re-pairing before the last pairing expires. That way the clients would be always paired and packets would not have to wait until pairing process is completed. Literature Cited Schneier, B. (2004). Crypto-Gram. Retrieved November 15, 2010, from http://www.schneier.com/crypto-gram-0404.html#6 Starset, R. (2009). Dunkin Donuts DDoS. Retrieved November 15, 2010, from http://www.flickr.com/photos/rubin110/4229606152/in/photostream/ Internet Denial-of-Service Considerations (RFC 4732). (2006). Retrieved November 15, 2010, from http://datatracker.ietf.org/doc/rfc4732/ Internet Protocol (RFC 791). (1981). Retrieved November 15, 2010, from http://datatracker.ietf.org/doc/rfc791/ Internet Protocol, Version 6 (RFC 2460). (1998). Retrieved November 15, 2010, from http://datatracker.ietf.org/doc/rfc2460/ Point-to-Point Tunneling Protocol (RFC 2637). (1999). Retrieved November 15, 2010, from http://datatracker.ietf.org/doc/rfc2637/ Schedule 26C3 Public wiki. (2009). Retrieved November 15, 2010, from http://events.ccc.de/congress/2009/wiki/schedule Cohen, B. (2008). The BitTorrent Protocol Specification. Retrieved November 15, 2010, from http://www.bittorrent.org/beps/bep_0003.html Traditional IP Network Address Translator (RFC 3022). (2001). Retrieved November 15, 2010, from http://datatracker.ietf.org/doc/rfc3022/ Transmission Control Protocol (RFC 761). (1980). Retrieved November 15, 2010, from http://datatracker.ietf.org/doc/rfc761/ Krasnyansky, M. (2001). Universal TUN/TAP device driver Frequently Asked Question. Retrieved November 15, 2010, from http://vtun.sourceforge.net/tun/faq.html 11
User Datagram Protocol (RFC 768). (1980). Retrieved November 15, 2010, from http://datatracker.ietf.org/doc/rfc768/ Wireshark Go Deep. (n.d.) Retrieved November 15, 2010, from http://www.wireshark.org/ Included appendices: Appendices VPSN Alpha 0 specifications VPSN simple server source code Acknowledgements I would like to express my appreciation for Ms. Karen Lang, my advisor who helped me with several aspects of my project. I would like to show my gratitude to Dr. Judith Sumner for helping me writing this paper. I would also like to say thanks to Gergő Koós who helped me figure out how to handle the Ethernet packets. 12