CGHub Client Security Guide Documentation Release 3.1 University of California, Santa Cruz April 16, 2014
CONTENTS 1 Abstract 1 2 GeneTorrent: a secure, client/server BitTorrent 2 2.1 GeneTorrent protocols..................................... 2 3 Secure installation and use of the CGHub client 4 3.1 Firewall configuration..................................... 4 3.2 Secondary security mechanisms................................ 4 3.3 Managing CGHub user authentication............................. 5 3.4 Network resource utilization.................................. 5 4 Frequently Asked Questions 6 4.1 Why BitTorrent?........................................ 6 4.2 Interaction with generic BitTorrent servers.......................... 7 Bibliography 8 i
CHAPTER ONE ABSTRACT This document is intended to address local security concerns regarding the use of Cancer Genomic Hub s client software GeneTorrent. GeneTorrent s use of this TCP/IP based protocol, provides the benefit of a mature and efficient parallel file transfer protocol which replaces distributed peer-to-peer topology with restricted client-authenticated client/server communication, to implement secure transfers. CGHub clients communicate only with CGHub over client-authenticated SSL. The recommended client-side firewall configuration is described. 1
42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 CHAPTER TWO GENETORRENT: A SECURE, CLIENT/SERVER BITTORRENT GeneTorrent uses the BitTorrent protocol [1], which is a highly efficient, mature, robust TCP/IP protocol for parallel file transfer. While the original design goal of BitTorrent is to support distributed peer-topeer (P2P) file transfer applications, GeneTorrent uses BitTorrent protocol in a more restricted client-server configuration. Instead of using distributed, unauthenticated volunteer seed node servers, all of the seed node servers are provided by CGHub at the San Diego Supercomputer Center. GeneTorrent clients do not publicly advertise or seed, they communicate only with servers provided by CGHub. Unlike P2P BitTorrent clients, GeneTorrent uses client-authenticated SSL connections to implement secure transfers. The figure below contrasts the GeneTorrent client-server model with peer-to-peer applications. firewall CGHub firewalls (a) BitTorrent peer-to-peer (b) GeneTorrent client-server Figure 2.1: Comparison of peer-to-peer and GeneTorrent network architectures. In (a), a peer-to-peer application has no central server, with all nodes being able to talk directly to any other node using the BitTorrent protocol. With GeneTorrent (b), BitTorrent file transfers are sent over parallel, SSL encrypted TCP connections directly between clients and the CGHub servers. 2.1 GeneTorrent protocols GeneTorrent uses three underlying protocols to implement both uploads and downloads: CGHub Web Services API queries via HTTPS to access the metadata. 2
CGHub Client Security Guide Documentation, Release 3.1 CGHub Transactor queries using the BitTorrent Tracker protocol via HTTPS. This is used to direct the client to the CGHub applications servers for parallel transfers Multiple, client-authenticated, SSL encrypted BitTorrent sockets that transfer the data files. To do a transfer, a client is authenticated as a valid NIH era/commons user and then authorized for the particular data set (TCGA, TARGET, etc) and action (upload or download). GeneTorrent communication is built on the OpenSSL library, which implements the Secure Sockets Layer (SSL v3) and Transport Layer Security (TLS v1) protocols [2]. CGHub encrypts the data with AES-256 [8] [4] and uses 2048-bit RSA keys that are encrypted with the sha1withrsaencryption algorithm [3]. 2.1. GeneTorrent protocols 3
CHAPTER THREE SECURE INSTALLATION AND USE OF THE CGHUB CLIENT This section covers best practices on configuring your network and system to allow GeneTorrent in a manner that does not compromise security. The secure use of the CGHub client is tied to the security requirements for using NCI limited access NCI data, as described in the dbgap Security Best Practices [6] document. 3.1 Firewall configuration CGHub clients require the following firewall configuration: Open up the firewall for only outgoing TCP connections to CGHub. Incoming connections are not required. We recommend allowing access to the full CGHub network: Host IP Address Protocol Ports *.cghub.ucsc.edu 192.35.223.0/24 TCP 443, 20893-20923, 21111 This provides a simple configuration and prevents problems as CGHub adds more servers to accommodate growing capacity. It is possible to use a more restrictive IP address range. Host IP Address Protocol Ports cghub.ucsc.edu 192.35.223.5 TCP 443 tracker.cghub.ucsc.edu 192.35.223.52 TCP 21111 app01.cghub.ucsc.edu 192.35.223.51 TCP 20893-20923 app02.cghub.ucsc.edu 192.35.223.52 TCP 20893-20923............ app16.cghub.ucsc.edu 192.35.223.66 TCP 20893-20923 If the number of systems accessing CGHub is limited, configure firewalls to allow traffic to CGHub from only those systems. White list the CGHub servers to not perform intrusion prevention deep packet inspection (DPI), as it will slow down transfers and is not useful for SSL encrypted connections. 3.2 Secondary security mechanisms Since GeneTorrent runs as an unprivileged program, it is not possible for the client to enforce security. All CGHub security is enforced at the server. Client system network security, as with all networked applications, 4
CGHub Client Security Guide Documentation, Release 3.1 is enforced using firewalls. While a firewall should always be the first line of defense against disallowed network use, we recognize that restricted systems are often important to organizations as a secondary or internal security mechanism. We will work with institutions that desire locked down systems to provide optional, compile-time restrictions in GeneTorrent that meet their needs. These changes will be incorporated into the standard code base. If multiple institutions require the same set of restrictions, we will provide binary distributions of GeneTorrent that meet their needs. 3.3 Managing CGHub user authentication Users are authenticated with CGHub using their era/commons account [7]. Logging into era/commons via CGHub creates and downloads a CGHub credential associated with the user. The credential file contains a cryptographically encoded key which when decoded identifies the user to CGHub; the file has a.key extension. The CGHub credential is valid for 365 days, at which point a new credential must be created using the CGHub login facility. By having the authentication process create this certificate, operations against CGHub can be scripted without requiring a P.I. to reveal their era password. The CGHub credential should be protected with the same rigor as the data it allows you to access. If should not be stored in a public directory on your system. If you feel the security of your CGHub credential may have been compromised, please contact CGHub Support support@cghub.ucsc.edu to have it revoked. 3.4 Network resource utilization GeneTorrent is a highly efficient protocol. It is possible to have a download consuming a large portion of a network s bandwidth. This can be address by using the --rate-limit option to control the transfer rate of a GeneTorrent instance. 3.3. Managing CGHub user authentication 5
CHAPTER FOUR FREQUENTLY ASKED QUESTIONS 4.1 Why BitTorrent? Q: Why did you use a product that uses BitTorrent protocol? Although it is used in a secure, client-server configuration, why set off all those red flags around torrents? A: Various parallel software transfer packages were considered, both commercial and open source. BitTorrent was chosen for the following reasons: BitTorrent is a mature, widely used protocol with several high-quality, free implementations available, along with associated tools to assist in debugging and analyzing behavior and traffic. It is a connection oriented transfer protocol, which integrates well with the SSL security model. BitTorrent is horizontally scalable across multiple systems on both the server and the client. To take advantage of 10-100 gigabit networks, the entire system must be balanced to match the transfer rate of the network. Without redundant disk subsystems, disk I/O will be a bottleneck. BitTorrent allows us to parallelize transfers across multiple server systems with independent I/O subsystem. Currently the GeneTorrent client is restricted to multiple threads on a single system. However, the architecture allows for distributing downloads and uploads across multiple client systems. The random, block-orient nature of the BitTorrent protocol provides an ideal platform for implementing random, remote access to BAM files. This is a very important feature planned for CGHub that will allow an application to access BAM data for to small portions of the genome without download the full BAM. The GeneTorrent BitTorrent over SSL conforms to the OSI security model [5]. We believe that using a well mature, well-understood protocol provides a better foundation for a secure system than using a proprietary protocol. Our use of BitTorrent designed to take advantage of its performance advantages without supporting unauthorized and unintended transfers. The choice of a TCP-based protocol over UDP enables use of TCP s mature congestion avoidance algorithms. The ability to prevent network congestion is important for transfers over wide, diverse networks. 6
CGHub Client Security Guide Documentation, Release 3.1 4.2 Interaction with generic BitTorrent servers Q: Other than by use of a firewall or other network-level security that only permits connections to CGHub IP addresses, is GeneTorrent in client capable of exchanging files from other BitTorrent servers? For example, could it connect to other BitTorrent servers running inside my network? Or could it connect to BitTorrent servers anywhere if my firewall is relatively permissive to outbound connections? A: Due to the additional, non-bittorrent, protocol steps used by GeneTorrent when initiating a transfer, GeneTorrent will not connect with standard BitTorrent programs. It would take non-trivial modifications to the GeneTorrent code to enable it to communicate with standard BitTorrent clients. It should be noted that there are freely available and easy to install standard BitTorrent clients that could be run in an environment within a permissive firewall. The design of modern computer systems places network security enforcement at the firewall level. The NCI data use agreements require sites to follow the dbgap Security Best Practices [6] guidelines for secure the network where GeneTorrent is used to download limited access NCI data. 4.2. Interaction with generic BitTorrent servers 7
BIBLIOGRAPHY [1] B. Cohen. The BitTorrent Protocol Specification Version 1103. February 2008. [2] T. Dierks and E. Rescorla. The Transport Layer Security (TLS) Protocol, Version 1.2. August 2008. [3] B. Kaliski and J. Staddon. PKCS #1: RSA Cryptography Specifications (RFC 2437). October 1998. [4] Arjen Lenstra. Unbelievable Security: Matching AES Security Using Public Key Systems. In Colin Boyd, editor, Advances in Cryptology - ASIACRYPT 2001, volume 2248 of Lecture Notes in Computer Science, pages 67 86. Springer Berlin / Heidelberg, 2001. [5] Unknown. ISO 7498-2:1989: Open Systems Interconnection - Basic Reference Model - Part 2: Security Architecture. 1989. [6] Unknown. dbgap Best Practices Requirements: Security Best Practices - Level 2b. 2008. [7] Unknown. era: Electronic Research Administration. 2012. [8] National Institute of Standards and Technology. Federal Information Processing Standards Publication 197: Announcing the Advanced Encryption Standard (AES). November 2001. 8