Async: Secure File Synchronization Vera Schaaber, Alois Schuette University of Applied Sciences Darmstadt, Department of Computer Science, Schoefferstr. 8a, 64295 Darmstadt, Germany vera.schaaber@stud.h-da.de alois.schuette@h-da.de http://www.fbi.h-da.de Abstract. The present paper presents and compares multiple products for file synchronization between devices. The focus is on the architecture of the products, their range of features, their usability and security aspects. Additionally, the development of the software Async is described, which is a secure cloud storage service that offers synchronization. Due to client-side encryption and a peer-to-peer architecture of the cloud, Async offers strong protection from unauthorized access. Blocks of files are placed in a chord network as key-value pairs. All devices involved detect changes in local files on the basis of changes in meta data and synchronize these with the cloud. Conflicting versions of files are detected reliably and displayed to the user. The interaction of parallel processses is coordinated by transactions and a compare-and-swap operation. Keywords: Cloud, Security, File Synchronization, Chord, Peer-to-Peer 1 Introduction For the past few years services that synchronize files between multiple devices have grown in popularity. They allow a user to access personal files anywhere with minimal effort. However, many of the available products have weaknesses in terms of privacy. They use server side encryption, meta data is transmitted unencrypted, they depend on a central server that offers a single point of attack and the source is not made public, which allows for hidden backdoors for the provider or public authorities. In order to provide better privacy, other services do not offer any cloud storage and synchronize directly between devices instead. This results in a drop in usability as the user has to operate a server. This situation leads to the goal of the present paper to design and implement a secure file synchronizer that offers cloud storage in a peer-to-peer network. In order to synchronize files between devices the following problems have to be solved: published at IS 2015 (The 11th International Conference on Interactive Systems, Ulyanovsk 2015)
2 V. Schaaber, A. Schuette 1. transfer files between devices, 2. detect changes in a file and propagate them to other devices, 3. decide which version of a file should be propagated, 4. coordinate parallel processes. The software Async consists of three components as illustrated in figure 1. The user can choose files for synchronization with the front end program Async. User input is transmitted to the Async daemon via HTTPS protocol. The Async daemon runs in the background and checks regularly which steps are necessary to synchronize files on the local device with the cloud and carries them out. The cloud is hosted in a peer-to-peer network with a Chord [2] architecture. The Async daemon communicates with the cloud via the achordfs protocol, which is described in section 3. Fig. 1. The interaction of the front end program Async, the Async daemon and the chord network The paper proceeds as follows: In section 2 an overview of the chord network is given. Section 3 describes how files are uploaded and downloaded from that network by means of the achordfs protocol. The synchronization is discussed in section 4. Section 5 addresses the coordination of concurrent processes and section 6 outlines how a user can share content with other users. Subsequently, a comparison of Async to other synchronizers follows. Finally, a conclusion is presented in section 8. 2 The Chord Network Every node in the Chord network is assigned an identifier (node id) with a fixed length of m bits. The identifier space is represented as a circle of numbers, ranging from 0 to 2 m 1. The identifier of a node represents its place on this circle, which is also called the Chord ring. Similar to a distributed hash table,
Async: Secure File Synchronization 3 data can be placed in the Chord network as key-value pairs, the key space being equal to the node id space. The key determines on which node the value will be saved. The value with key k will be placed at the first node whose identifier is equal to or follows k in the identifier space. On the ring this is the first node clockwise from k. A computer that is not part of the network has to have knowledge of the IP address of only one of the participants in order to use the key-value storage. The known participant is asked via a lookup operation which node is responsible for a certain key. The node will either return the responsible node or ask a node that is closer to the responsible node to perform the lookup. Each node in a Chord network of N nodes has to maintain information about O(logN) other nodes in order to perform the lookup with O(logN) messages. The finger table that stores the information on other nodes in the network is updated regularly. When a new node joins the Chord network, it is assigned a node id and thereby given a place on the Chord ring. As a consequence all stored values that the new node is responsible for are migrated there. Similarly, a node should migrate its values to its immediate successor, before leaving the Chord network. 3 The Achordfs Protocol In order to use the Chord network to store files, the achordfs protocol [1] divides files into blocks of fixed length. If the file size is not divisible by the block length the last block will be shorter than the others. For each block a 160 bit key is calculated as the sha1 hash function of the encrypted content of the block. The encrypted block is then stored in the Chord network under the corresponding key, which is also called score. Since the hash function is calculated on the basis of the encrypted block and a random number of 192 bits is involved in the encryption a key collision even of different files with the exact same content is very unlikely. The scores of all the blocks that make up one file are collected in a structure called stat, that also contains the meta data of the file. The stat is then encrypted and stored in the Chord network under a score as well. To keep track of all the files that belong to one user a structure called syncmessage is created. For each file a user wants to synchronize between devices the syncmessage contains the file s path name, its score and a version number. The syncmessage is similar to the stat, but scores in the syncmessage always address a stat, while scores in a stat alwas address a block of content. The syncmessage is encrypted and stored in the Chord network under a key that is derived from the user s public key. This enables a user to download and decrypt all files from the Chord network on a new device simply by entering his or her private and public key. The syncmessage provides all necessary information on where to find the stats, which in turn provide information on where to find the blocks for one file.
4 V. Schaaber, A. Schuette 4 Synchronization The Async daemon regularly synchronizes the files on the device it is running on, with the files in the Chord network. It detects changes in local files by comparing the modification time of a file, with this file s modification time during the last synchronization. Changes in the Chord network are detected by comparing the version number of a remote file, which can be read in the syncmessage, with the last version number this device downloaded. If a local change is detected the changes in the affected file are uploaded to the Chord network together with a new stat structure. If there is a new version in the Chord network it is downloaded. If there are changes both in the Chord network and on the local device the user will be informed of a conflict. When all files are synchronized a new syncmessage that contains the new scores addressing the changed stats is uploaded to the Chord network to replace the old one. Afterwards the current modification time and the number of the latest downloaded version are stored on the local device for each file to assist in future synchronizations. 5 Coordination of Concurrent Processes Multiple devices accessing files in the Chord network create a need for coordination to avoid lost updates and other concurrency problems. Since blocks of data are addressed by a score that depends on their content and a random number, a changed block will not overwrite any old data. Instead it is uploaded under a unique new score. The syncmessage on the other hand is always stored under the same key for one user. To avoid data loss in the syncmessage a compareand-swap operation was implemented. It only allows to overwrite data if the responsible Chord node receives a hash of the current content of the syncmessage. The Chord network receives encrypted data only, so it is a hash of the encrypted syncmessage. Therefore a client needs to calculate this hash when downloading a syncmessage and before decrypting it. If a client is denied an overwrite it will download the current syncmessage and add its local changes to that, unless the new information in the current syncmessage leads to a conflict. The compare-and-swap operation like all other reads and writes on the Chord network is realized as an atomic transaction. 6 Share Content A user can use Async to share files or folders with friends. 1 The selected files will be encrypted with a new randomly generated key. A new shared syncmessage will be stored in the Chord network under a hash of this key. A synchronization of the shared files will be performed besides the normal synchronization. A link can be created that allows the person invited to a shared content to skip entering a long key. Secure transmission of the secret key or link is left to the user, which 1 This feature has not been implemented, yet.
Async: Secure File Synchronization 5 bears certain risks. Another way to share is to enter the public key of another user while specifying which files to share. The shared key is then calculated by means of Diffie-Hellman key exchange. This way only the public keys need to be exchanged manually by the user. 7 Comparison Async has a lot of advantages over some popular synchronization services. This section compares Async to three other products regarding key features and security. A comparison of more supplementary features may favour other products that have been in development over a longer period of time. The three products chosen are BitTorrent Sync [3], Dropbox [4] and Wuala [5]. Table 1 on page 6 summarises the comparison and classifies the properties as good (green), ok (yellow) and bad (red). Reasons for the classification can be found in the description of the corresponding property. 7.1 Encryption Dropbox is the only one of the compared products that uses server side encryption. Client and server negotiate a key for encryption during transmission. Afterwards the Server decrypts the data and encrypts it again with a different key for storage. This enables employees of Dropbox Inc. to access the user s data. With client side encryption the data is encrypted on the client machine with the user s private key, which is not available to the provider of the cloud or software. 7.2 Open Source The source code of Async has not been published yet, but it will be as soon as the development status allows it. We believe that publishing the source code is the only way to make sure there are no hidden backdoors. BitTorrent Sync is not open source, but is based on the BitTorrent Protocol, which is open source and well documented. 7.3 Architecture Most services use a classical client server architecture. For Async a peer-topeer architecture was chosen for its better security. Peer-to-peer services are not dependent on any central server and therefore do not include a single point of attack. 7.4 Cloud Storage and Limit of Data Volume Some synchronization services do not offer any cloud storage, but synchronize directly between devices. As a result the user has to have one device that is always online. This does not correspond to average user behaviour. It does however allow
6 V. Schaaber, A. Schuette for synchronization of an unlimited amount of data, since the software vendor does not provide any storage and does not bear the costs for it. The indicated data limits for services with cloud storage equate to the amount a user gets for free and without any bonuses. 7.5 Conflict Resolution Most synchronizers notify the user if a conflict arises. BitTorrent Sync on the other hand does not. It resolves conflicts by choosing the newest version of a file as the correct one. The newest is defined as the one with the highest modification time. This can lead to old versions being overwritten without the user being informed. Old versions can however be recovered because BitTorrent Sync offers version control. 7.6 Multiple Paths Async offers the possibility to add any file or folder on a device to synchronization. Other services synchronize only one folder with an unlimited amount of files and subfolders. As a consequence users have to move all files they want to synchronize into this folder. With Async no moving of files is necessary. A user can simply add multiple paths to synchronization. This also allows a user to have one file in a folder synchronized, without affecting the other files in the same folder. Async BitTorrent Sync Dropbox Wuala encryption client side client side server side client side open source yes (planned) in parts no no architecture peer-to-peer peer-to-peer client-server client-server cloud storage yes no yes yes data limit n/a unlimited 2 Gb 0 Gb conflict resolution user newest user user multiple paths yes no no no Table 1. Comparison of file synchronizers 8 Conclusion In the present paper a secure file synchronizer that offers cloud storage in a peerto-peer network was presented. All data and meta data is transmitted encrypted and can only be decrypted with the user s private key, which is not available to the software or server provider. The decentralized architecture of the cloud does not offer a single point of attack because it is not dependant on any central server. The source code will be published soon in order to assure users that there are no hidden backdoors. Users do not need to operate a server or an always-online-device. The result is a usable and secure synchronizer.
Async: Secure File Synchronization 7 References 1. Seipel, L., Schuette, A.: Providing File Services using a Distributed Hash Table. In: Proceedings of the 11th International Conference on Interactive Systems, Ulyanovsk, Russia (2015) 2. Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek, M. F., Dabek, F. and Balakrishnan, H.: Chord: a scalable peer-to-peer lookup protocol for internet applications. In: IEEE/ACM Transactions on Networking 11 (2003), p. 17-32 3. BitTorrent Sync, BitTorrent Inc., www.getsync.com 4. Dropbox, www.dropbox.com 5. Wuala, LaCie AG www.wuala.com