BigShare - A scalable file sharing service

Size: px
Start display at page:

Download "BigShare - A scalable file sharing service"

Transcription

1 BigShare - A scalable file sharing service Alexandros Daglis, Manolis Karpathiotakis, Georgios Psaropoulos October 18, Introduction In recent years, file hosting services have grown to become an established domain, as an important part of IaaS. The increasing popularity of such services has resulted in the creation of both dedicated cloud services (e.g. Dropbox) as well as services incorporated to the environment provided by major IT vendors (e.g. Microsoft Azure, Google Drive). In this document, we present the design of BigShare, a file hosting service to be deployed by EPFL. The purpose of BigShare is to offer file hosting and sharing services to users throughout the world. The basic concern for our system is to offer flexible file sharing to our users. Intuitive access, easy file organization and detailed control over the access permissions of each file and directory must be provided. We also aim for high durability guarantees, scalability, and judicious use of storage and bandwidth. BigShare is designed as a client-server system that provides a file hosting and sharing service. The system consists of the client software that runs on a variety of user devices, and the infrastructure hosting the service itself, that handles file storage and management while interacting with the client software. The environment of this system comprises of the service s users, as well as the physical environment surrounding our infrastructure and the Internet that connects the clients to the service. Our service provides each user with a private storage space and a variety of interfaces to manage it. Beside file transfers to and from their space, a user can also organize it with directories, edit file and directory metadata, share files and directories with other users of the service. To facilitate that, we provide a flexible API that can be used to build a wide range of desktop and mobile client apps. These clients communicate with the server side of our system, which is responsible for reliably providing our service. Our infrastructure comprises of our on-premises cluster that handles user requests and manages user information, and our storage system, which can either be Amazon S3 or our own data warehouse. While our design assumes the use of the former, we also discuss the alternative of maintaining the storage on-premises as well. The rest of this document is organized as follows: Section 2 presents the requirements based on which BigShare was designed in more detail. Section 3 presents the architecture of BigShare. The layers of the architecture and the interactions between them are outlined. Section 4 discusses implementation details behind the modules of BigShare. Section 5 explains the API exposed by each module. Section 6 discusses an alternative approach to our service s data storage, focusing on a potential on-premises deployment of BigShare. Section 7 introduces metrics we could use to evaluate various aspects of our system in a service deployment scenario. Finally, Section 8 concludes. 1

2 2 System Requirements BigShare is designed according to a list of requirements. A prioritized list of our design goals is the following: Usability: We need to provide functionality that users have grown to expect from a file hosting service. File management options must be complete, intuitive, and easy to use. For instance, we need to support uploading, downloading, directory creation, sharing, deleting and renaming. To achieve that, we want to provide an interface all users are already familiar with, i.e., an interface that resembles a conventional file system. Therefore, our aim is to provide an abstraction of a file hierarchy stemming from a single home directory, and consisting of user-created files and directories. Durability: We have to provide strong guarantees that an uploaded file will be retrievable in the future. In other words, data loss should be an extremely rare case. We address this concern with redundancy at the low level, by replicating data on multiple locations. Scalability: BigShare has to be scalable, providing our file hosting and sharing service to millions of users. Our design tries to avoid bottlenecks that would limit the number of users. To this end, different sub-domains are used for discrete parts of the service, and communication between these parts is kept to a minimum. Moreover, BigShare exploits the aforementioned redundancy to support load-balancing. Storage efficiency: We want to minimize the amount of data stored on our data servers. We achieve that through both intra- and inter-user deduplication: we identify identical data and store them once rather than multiple times. While replication might seem to act against the purpose of deduplication, i.e., storage-saving, it is essential for preventing data loss and providing reliability. Thus, deduplication and replication are two conscious design decisions with orthogonal purposes. We also discuss various secondary characteristics that are important for internet services. Security is such a characteristic. We are aware of possible security issues and discuss some of them. However, we do not delve into details, as it is not on the top of our requirements priority list. While we do not design a system with obvious and naive security holes, building a system that provides exceptionally strong security guarantees is not one of our primary concerns either. Finally, availability is another important requirement of online internet services. 3 Architecture BigShare is designed to be modular. The system consists of well-defined modules with discrete functionality and appropriate interfaces that enable straightforward interoperability. This section describes BigShare s modules and abstractions, how we apply a naming scheme on top of them, and how they are combined into a layered architecture. 3.1 Modules and Abstractions Each module of our system has a specific role. Apart from the client, a module on its own, we identify three discrete functionalities that are essential for our service. These are: Authentication Metadata control 2

3 Data Storage This section provides a brief description of the four modules that comprise BigShare Client The client is the software that is essential to access the service. It is the service s interface to the users; it exposes the service s functionality to the users, but also restricts the form of the requests to comply with the API and semantics of the service. Internally, it is responsible for file transmission to our storage component. Files are compressed and then split in chunks. Both the compressed chunks and any auxiliary payload information are encrypted before transmission, so that our metadata and data components are populated with encrypted information. As an additional security mechanism, TLS is used as the transmission protocol for security reasons. We will elaborate on the chunking mechanism in Section Authentication The authentication module is responsible for initializing the client s communication with the service. User interaction with BigShare begins when a user decides to register to our service through a BigShare client. A registration request including a proposed username and password is sent to the authentication module, which processes it and responds with an acknowledgment message containing an appropriate status code that indicates either successful registration or failure, in case of which the client side notifies the user on the reason of their failed attempt. Once a user has created a BigShare account with a unique username, they can use it to log into the service. Log-in utilizes a token-based mechanism: when a user wants to log into the service, their client communicates with the authentication module that either provides a token which validates the client to interact with BigShare for a limited time, or responds with an error message that notifies the user of the error cause (usually wrong login credentials) Data Storage The role of the data module role is the management of the data. All files are split into chunks of an upper-bounded size. Physical data representation and data loss prevention are the data storage module s responsibilities. Reliability is achieved through transparent data replication. The abstraction provided by the module is simple: it receives and stores chunks of data, which have to be retrieved and returned unmodified upon subsequent requests. In our current BigShare design, the storage module is implemented using the Amazon S3 data storage infrastructure. The data storage module has no comprehension of files or directories. Chunks are the module s first-class citizens. Any further blocking is handled internally by S3. Its only responsibilities are: Receiving storage requests that lead to the storage of new data chunks. Receiving querying requests for already stored chunks and returning those requested chunks. 3

4 Each chunk is identified by a unique SHA-2 signature. This uniqueness is verified by the metadata module (Section 3.1.4), which also uses these signatures for deduplication. The answer to each successful chunk storage request is an acknowledgement. A subsequent querying request for the chunk with this identifier is resolved by the data module, and the contents of the appropriate chunk are returned Metadata Control The metadata control module is responsible for keeping track of user file permissions and the mapping between files and their constituent chunks. Essentially, it provides the abstraction of a file system. The module is itself multilayered and is described in more detail in Section After the authentication, clients need to communicate with the metadata control module prior to any actual file transfer. The metadata control module is responsible to list a user s accessible files upon request, check permissions upon upload, handle a download or modification request, modify permissions upon request, handle file sharing requests, and initiate the actual data transfer (upload or download) after all the appropriate checks have been successful. 3.2 Layering BigShare s modules are logically organized in two layers: the frontend and the backend. The frontend of BigShare is the client, a form of which runs on each user s machine (desktop/mobile app). The backend consists of three peer modules: authentication, metadata control, and data storage. Having peer modules in a single layer rather than each module in a separate layer benefits BigShare, as each request does not have to go through several layers, but directly communicates with a certain module, according to the request s state. While the three peer modules materialize the backend layer of our service, each of them has a discrete functionality. Figure 1 illustrates BigShare s layered architecture. The client transforms the user s requests into a sequence of communication with all three modules of the backend layer. While most communication occurs between the client and each of the modules of the backend, there is also some limited communication between the metadata module with the data storage and authentication modules, as we mention in Section 5. Frontend (client) Authentication Metadata Control Backend Data Storage Figure 1: High-level layered architecture of BigShare 4

5 3.2.1 Frontend layer The frontend layer includes the client module. This layer defines the user s interaction with the service. It exposes a user-friendly API and translates the user requests to a form appropriate for the underlying service. Depending on the request and its state, the client knows with which backend module to communicate, and how to formulate the message. The client also receives responses from the backend and transforms them to user-friendly messages. Client sublayers: The client layer is itself multilayered, as illustrated in Figure 2. In the left side of the figure, the direct communication of the GUI sublayer with the Request Generation sublayer is needed for actions that do not require data transfers, such as the list command. The sublayers on the right side are needed for data transfers. To illustrate their functionality, we describe a file upload scenario. A user uses the client s GUI to initiate a file upload. The file is first compressed and then split in chunks in the second and third sublayers. These two steps take place in a pipelined fashion to achieve high performance. For each chunk, a SHA-2 signature is generated (Sublayer 4). The encryption layer encrypts the data that is being sent. In the general case, the encryption key is provided by BigShare s backend. Data encryption is not trying to prevent cases of eavesdropping; this is prevented by using the TLS protocol for the client-backend communication. Instead, data are uploaded and stored encrypted in the data module, to ensure data privacy even in the case of an attack that may result in data leaks. To provide stronger security and privacy for sensitive files, users may use a custom encryption key. We discuss the role of the SHA-2 signatures and the need for an optional custom encryption key in Section 4.3. Finally, the bottom sublayer (request generation) formulates the request and sends it to the appropriate backend module. Client GUI Compression File Chunking & Chunk Handling SHA 2 Gen. Encryption Request Generation Figure 2: Internal layers of client 5

6 3.2.2 Backend layer Apart from the two high-level layers of the system, the Metadata Control and the Data Storage modules are also layered. The description of these modules sublayers follows. Metadata Control sublayers BigShare allows its users to manipulate their own file system instance located on the cloud. To efficiently achieve this, BigShare borrows a subset of the straightforward - yet effective - design of the UNIX file system. As a result, the BigShare file system stack is as follows: Link layer: Enable sharing by providing links of a user s files/directories to specified users. Absolute path name layer: Provide a root for the naming hierarchies. While there is a single root, each user sees their own home directory as the file hierarchy s root. Path name layer: Organize files into naming hierarchies. File name layer: Provide human-oriented names for files. Inode number layer: Provide machine-oriented names for files. File layer: Organize chunks into files. The file layer contains the information about the chunks that constitute each file. Every file entry contains the names of the relevant chunks along with the order in which the file can be reconstructed. The chunk names are hashcodes generated using SHA-2 and thus we consider them adequate for chunk identification. We will see how they are utilized throughout the rest of this document. These file entries are similar to the concept of inodes, and we will use the two terms interchangeably. On top of the file layer, the inode number layer is responsible for implementing a naming mechanism for the file entries. It is implemented in a straightforward manner, returning integer IDs to represent each file entry/inode. The first layer that deals with human-readable names is the file name layer. This layer associates the inode numbers with human-readable names, as provided by the users. Building on top of the file layer, the path name layer provides support for user-specified directories. Every directory is characterized by a directory inode, which maintains the information about all the directory s content as well as the user-provided name for it. This name also includes context information, which specifies its parent directories. Supporting the presence of multiple users requires our system to accommodate growth. We therefore need to be able to handle user additions in an elegant manner. To this end, the absolute path name layer provides a universal context (root) in the directory service to facilitate horizontal growth in the number of users. Specifically, any user registered in BigShare and added to the system is provided a home directory. The way to unify these directory trees is by providing a global context. This global information can be used to facilitate sharing, as paths between different user home directories can be specified. The information about this global context is only accessible to the system itself and not its users. Besides restricting the visibility of the global context, a permission mechanism ensures that each user has access only to files and directories uploaded by them or shared with them. This is achieved by incorporating ownership information in the 6

7 inode of each file or directory. Every uploaded file is assigned a single owner, namely the uploader. To enable sharing, read and write permissions can be granted to additional users or groups of users. However, enhancing a file entry with permissions for an additional user is not enough for them to access the file; the sharing mechanism must also create a link of the file in the user s home directory. This is handled by the metadata module s top layer, namely the link layer. Data Storage sublayers: The data storage module is also organized in layers. The minimum layer requirements are a file layer, to match files (chunks) to disk blocks, and a block layer responsible for identifying and managing physical disk blocks. As we are using Amazon S3 for our current BigShare design, the module s layer stack is much deeper, as it supports all the functionality and flexibility of a file system. 3.3 Naming As we saw in the previous section, naming is utilized in the file system abstraction that our service provides. In addition to that, BigShare also uses different namespaces to distinguish the discrete parts of the service. In the current design of BigShare, each client request is forwarded to a different backend sub-layer, depending on the request s stage. Each of the system s modules is assigned a different sub-domain. Specifically, the authentication module owns the auth.bigshare.epfl.ch. Requests to the metadata control module employ the control[x].bigshare.epfl.ch sub-domain. X is replaced with an integer number based on the actual metadata server the client communicates with. In a similar manner, the data storage module is exposed using the data[y].bigshare.epfl.ch sub-domain. We vary X and Y in the requests in order to enable scalability and load balancing; clients alternate between these domain names to equally distribute the load to the backend servers. Furthermore, as the number of users increases, we can easily scale the system with the increased load by adding more control and data sub-domains. Control servers load balance the number of the requests they serve, and data servers also equally distribute the chunks stored. Users are distributed to the service s metadata servers; the clients direct the requests to a certain server according to the user account. As the request proceeds and data access is required, the metadata module notifies the client about the sub-domain(s) of the data server(s) that will need to be contacted. We further discuss the scaling of the data and metadata modules in the following section. 4 Implementation Considerations In this section we elaborate on some important implementation aspects of our design. We analyze the mechanisms of file sharing and file update. We address performance concerns, presenting a deduplication mechanism that enables bandwidth and storage savings, and also discuss how our design is influenced by our requirements for scalability. 4.1 File Sharing Sharing and permissions are handled by the various layers of the metadata module, as described in Section At the high level, every user account has a default Shared 7

8 folder under the home directory. This is where the link layer creates the references to the files other users have shared with the interested user. In other words, every user can find the files that have been shared with them under their User Home/Shared directory. 4.2 File Update Special care needs to be taken for updates of existing files, especially shared ones. The concern arises from the possibility of one user updating a file, that is being downloaded by another. The update itself is not different from a normal upload, in the sense that it is as if a new file is uploaded, replacing an older file that had the same name before. However, as an update usually means that the new file is based on that file s already uploaded previous version, it is highly likely that some of the file s chunks will not need to be uploaded, thanks to the deduplication mechanism described above. Thus, the primary concern about updates that differentiates them from normal uploads is concurrent access. To address the concurrent access problem, we follow a versioning approach. The metadata layer keeps track of files that are being downloaded. If during the download an update request arrives, the metadata layer creates a new version of that file and initiates the update by responding to the client s update request. New download requests during the update are served the old version of the file. When the update is complete, subsequent download requests will get the latest version. When all active downloads on the older version complete, the metadata module transparently deletes that older version and also notifies the data module to delete the old version s corresponding chunks once no user refers to them any more. Thus, the semantics provided are that a user gets the latest complete version of the file at the time the download request is initiated. A similar approach is followed when multiple concurrent update requests to the same file are received: after both versions are successfully uploaded, the one that was initiated last is eventually kept as the one valid version. Intuitively, a last write wins policy is enforced, where the ordering is based on the time of the request initiation. As a final remark, these mechanisms are not only required for files that are shared by multiple users, but also in the general case, as a private file might be modified by multiple clients on different devices of the same user. 4.3 Data Deduplication BigShare needs to reduce the redundant network bandwidth and storage volume on the side of the data servers to offer competitive performance. To achieve this, redundant information must not be transmitted blindly to the data module. Therefore, we introduce a data deduplication mechanism to reduce the number of duplicate data copies to be transmitted and stored. Note that this mechanism is orthogonal to any replication taking place in the data module to achieve reliability. BigShare tries to minimize both data transmission from the clients and the data volume to be stored in the data module. As previously explained, files are split into fixed-size chunks prior to being transmitted. When a chunk of data is to be uploaded, we initially submit the SHA-2 signature of the chunk in the metadata module. We decrypt the signature and check whether the data chunk has been uploaded in the past; if so, the client is notified that no extra action is required. In effect, the user will experience a sped up upload. It is enough to add the chunk s identifier in the inode entry of the file it belongs to. The rest of the chunks, for which already stored duplicates have not been identified, will be uploaded normally and stored in the data module. To 8

9 achieve additional savings in the bandwidth use, any chunk sent to the data module is compressed prior to its transmission. Our decision to use chunks as first-class citizens of the data module is also related to deduplication efforts. By applying deduplication at this chunk level, storage savings can be applied at the sub-file level. Another benefit it that as files are deterministically chunked at the client and uploaded one by one, uploading can be resumed from an arbitrary chunk, if the file upload was previously interrupted. The choice of the chunk size comes as a tradeoff. A smaller chunk could further improve the rate of deduplication, and thus further decrease the storage capacity needed. However, the smaller the chunk, the larger the cost of keeping track of the chunks that comprise each file. Therefore, the chunk size usually used by existing services is several MB. For instance, GFS [5] uses a chunk size of 64MB, while 4MB is the chunk size used by Dropbox [4] and also the default size for Windows Azure [2]. The collision concern that may arise about deduplication using hash signatures, i.e., two different chunks having the same signature, thus losing one because it is mistakenly regarded to be the same as the other, is not unfounded. However, while SHA-1 was reported to be susceptible to such a security flaw, SHA-2 is much stronger, and so far proved to be collision-free. Thus, such a concern is not substantial. If we utilize cross-user data deduplication, privacy of the data in question can be thought to be compromised [6]. For example, a malevolent user could try to upload a sensitive file, and, based on the upload time they experience, find out whether some other user has already uploaded the file. A different scenario is the following: If a bank uses a specific document template to inform a customer of their PIN number, a malevolent user could upload multiple versions of this document, keeping the name of the target customer fixed and only changing the PIN number. Again, the upload time can indicate whether a match has been found. As we realize that different users have different privacy requirements, we offer the users two options. Users who opt for the vanilla version of BigShare are offered the mechanisms explained so far. Otherwise, users have the option of picking the encryption key that will be used for data encryption themselves. By having the client use this key, BigShare won t be able to cross-compare this specific user s data with that of different users. Still, intra-user data de-duplication will be available to handle the cases where the user uploads the same file again, using the same encryption method. 4.4 Scalability As many other distributed systems, BigShare is subject to the CAP theorem [1]. Therefore, as its data module relies on Amazon S3, it employs eventual consistency for the chunks stored, aiming for high availability and partition tolerance. A similar tradeoff needs to be taken into account regarding the metadata module; one of our options for this module would be sacrificing high availability and employing a similar design to the Google File System [5]. In this case, the metadata module would comprise a single main node. In case of failure, shadow nodes could be used to fallback and only then provide an eventually consistent view of the metadata. Though simple in its design, this solution could obviously cause issues in case of master failure or significant load. Alternatively, we opt for a distributed solution, as depicted in Figure 3: user storage would take place in one of N metadata servers available. Each node is assigned an area of responsibility from the users domain. Small node groups (e.g. groups of 3-4 nodes) would be formed, and rigid consensus mechanisms [7] would be applied to each of them. Consistent hashing mechanisms are 9

10 actually applied in cloud-based solutions to this end [3], yet using a sloppy consensus mechanism. By opting for this case, we are able to handle more efficiently potential crashes and increased load. Figure 3: Distributed deployment of metadata servers Employing the decentralized solution also means that we need to resort to a besteffort inter-user deduplication. Finding out whether a data chunk has already been uploaded by a different user requires multiple metadata nodes to be probed. Probabilistic data structures such as bloom filters could be used to this end to reduce information transmitted. Removing from storage duplicate chunks whose metadata is stored on different metadata servers can happen offline, when the service s load is low. In that case, the benefit of reduced storage requirements is still eventually gained, but the performance benefits for the clients stemming from online inter-user deduplication are reduced. In this document, we favor the distributed solution as a single-master solution could significantly affect availability. 5 API This section describes the service s APIs. All of the backend layer s modules, i.e., the Authentication, Metadata Control, and Data Storage, expose a set of functions. The vast majority of those are directly used by the client. Our RPC semantics stick to the HTTP semantics of at most once. To avoid hanging, all RPCs have a default timeout. When a timeout expires, the call and its effects are assumed to have failed, thus the RPC has to be re-initiated by the client. 5.1 Authentication API User Registration: register <user><password> ACK / NACK Username used / NACK 10

11 The client uses this to request the creation of a new account. The authentication module creates a new account entry in the database, if the proposed username is unique, and acknowledges the account creation. User Sign-In: login <user><password> Token / NACK The client requests to login the service by sending the user s username and password. The authentication module looks up the (user, password) key-value pair in the database, and replies with a token that grants access to the service s modules to the client that is acting on behalf of that user account. Password change: change pwd <user><old password> <new password> ACK / NACK The client requests to change the password for a given account. First, the authentication checks if the (user, old password) key-value pair is valid, and if so, the old password is replaced with the new password. 5.2 Metadata API File Upload - Request: upload request <pathname><filename><token> <unique key>/ Non-existing path / Permission denied The client requests to upload a new file. If the action is allowed, the metadata module returns a unique key that will be used subsequently for the upload. List Contents: list <pathname><token> List<name>/ Non-existing path / Permission denied The client asks the metadata module to list the directories and files that are under a certain path (the requested-path prefix is always the user s home folder). The metadata layer first checks if the requested path corresponds to a valid directory, and if so, verifies whether the requesting user (given their token) has permission to view the content of the requested directory. File Upload - Hashmap: upload hashmap <token><unique key> (unique key acquired from File Upload - Request) missing chunks<hashmap> The client sends a hashmap that contains the hashcode for each chunk of the to-beuploaded file to the metadata module. The metadata module filters those hashcodes that already exist in the service (deduplication) and responds by sending the hashmap with the remaining hash values back to the client. File Upload - Chunks: register uploaded chunks <list chunks><unique key> ACK / NACK 11

12 After the client has successfully stored the chunks in the data module, it notifies the metadata module about the successful upload. The metadata module creates a new file entry for this user (identified by the unique key that was generated earlier), and adds the chunk ids that comprise this file. The file upload procedure is thus completed with this last API call. File Download: download <path/to/file><token> list<chunk hashes>/ Not found / Permission denied Used by clients to request a file download. The metadata module verifies is the requested file exists and if the requesting user has the required permissions. If these conditions are satisfied, the response contains a list of the chunk signatures hashes that comprise the file. File Deletion: delete <path/to/file><token> ACK / Not found/ Permission issue Used by clients to delete a file. The metadata module verifies if the file exists and the required permissions are fulfilled by the user. If these conditions are satisfied, the metadata removes the file s metadata from the user s account, and acknowledges the deletion. If the chunk hashes of which the deleted file consisted of are not also owned by another user (deduplication effect), the metadata module creates a deletion request for those chunks and sends it to the data module. File Share: share file <pathname>list<userid><permissions> <token> ACK / Invalid path / Invalid User id / Permission denied Used by a client to share a file. If the requesting user is the file s owner, the request succeeds, and the metadata module creates a link under each of the requested users Shared directory, providing the requested permissions (read/write). Create directory: mkdir <pathname><directory name><token> ACK / Invalid path / Permission denied Requested by a client. If allowed, the metadata module adds a new directory node in the path name hierarchy of that user. Delete directory: rmdir <path/to/directory><token> ACK / Invalid path / Permission denied Requested by a client. If allowed, the metadata module deletes that node from the user s path name hierarchy and also recursively deletes all files contained in that directory, as described in the File Deletion function. Rename directory: d rename <path/to/directory><new name> <token> ACK / Invalid path / Name collision / Permission denied Used by a client to rename a target directory. The metadata module renames that directory if allowed by that user s permissions and if no other directory with the same name exists under the same namespace. 12

13 Change permissions: chmod <pathname><permissions><token> ACK / Invalid path / Permission denied Used by a client to change the permissions of a target file or directory. If the action is allowed, the metadata module changes the permission of the file, or, if a directory, the permissions of all the files under that directory, recursively. 5.3 Data API Store Chunks: store chunks list<<chunk id><data chunk>> <authorization key> ACK / NACK Used by a client to store the chunks that have been identified by the metadata module before, as chunks of a file that needs to be uploaded that are not already present in the service s storage module. The data module is responsible to reliably and redundantly store the uploaded data, and acknowledge once it is done. Retrieve Chunks: retrieve chunks list<chunk ids><authorization key> list<chunks>/ chunk not found Used by a client to retrieve the chunks that comprise a file that needs to be downloaded. The list of chunk identifiers that comprise the file were previously acquired by the client from the metadata module. Delete Chunks: delete chunks list<chunk ids><authorization key> ACK / NACK Called by the metadata module, when user deletion requests result in chunks not referenced by anyone. Those chunks need to be deleted, thus the metadata module requests the data module to do so. Note: Since access to the S3 data storage requires authentication, we assume that the clients have suitable authorization keys. Conceptually, these can be obtained from S3 by the metadata module, and then be provided to the clients upon request 6 In-house storage The usage of Amazon S3 storage infrastructure provides us with highly available and scalable storage capabilities. Still, another option would be deploying the entire BigShare system on EPFL-owned machinery, in order to be independent from any third-party service. Given our current design, the S3 data module could be replaced by EPFL data module. The latter would implement the chunk layer of our web-based file system. Below this layer, a key-value store solution could be used to efficiently store and retrieve our chunks. Using an in-house solution deprives us of the flexibility offered by Amazon s services. Specifically, we can no longer have an elastic solution, transparently adding or removing machinery based on our current needs and load. In addition, S3 allowed us to be agnostic to potential hardware crashes. For an in-house solution we would need to manually set up a replication-based mechanism to provide failure protection. Keeping 13

14 these replicas in sync would also require us to implement a state machine replication mechanism. The extra effort required when opting for an in-house solution can be amortized by performance benefits gained from the use of dedicated servers. Price and performance per instance is better, as each node is not shared between us and other users. Performance is also more predictable, as resource allocation is more fine-grained and less random and there is no longer a dependency between the workload of the other users of the physical machine. Performance also benefits from the removal of the extra authorization layer that S3 enforces on data access. As we will be able to build our own layers on top of the storage module, we could establish a unified authorization method to be used for our entire service. In addition, this increased flexibility would also allow for a more versatile communication between the data and metadata modules. As an example, when a client requests to download a file, the metadata module could directly notify the data module so that the latter starts sending the requested data directly to the client, bypassing the need to use the client as a forwarder of the data retrieval request. In general, the predefined API and functionality provided by S3 limits our modules interaction flexibility, and in most cases requires the data-metadata communication to occur via the client. Choosing our own infrastructure also allows us to tune the data module s performance to our needs. For instance, we are free to scale-up the machines comprising the storage clusters if needed, or populate our datacenter with Memcached servers to provide fast access to hot data. Finally, the usage of S3 imposes eventual consistency on our data module. If we aim for full ACID compliance, porting to an in-house solution is necessary. 7 Evaluation Metrics A successful BigShare deployment should conform to the requirements we prioritized in our design: usability, durability, scalability and storage efficiency. An evaluation of such an implementation will be based on metrics that express well each of those properties. Below we discuss representative metrics for each property. Usability cannot be objectively evaluated as it is also a matter of GUI design; thus its evaluation relies on user feedback on how easy it is to navigate and customize their private directories according to their needs. This feedback also includes suggestions for new features, which can lead to a richer, more versatile API. Durability is a property that can be concretely measured as the number of lost data during a specified time period, for instance the number of lost chunks in a year. However, this metric is mainly dependent on the data module and as such, in case of using S3, the evaluation results are representative of Amazon s storage system and not of our service. Scalability refers to offering a stable quality of service to our users, being slightly affected by the increasing number of both users and files. Latency and bandwidth utilization are two metrics that define this estimate. Average bandwidth dedicated to each request, as well as average latency are representative metrics that can be used to evaluate how our system responds to increasing load. 14

15 Storage efficiency is another measurable property. The effectiveness of our chunk deduplication mechanism can be evaluated by comparing the nominal total size of the data stored in our service with the actual space needed to store them. This metric can also be used to decide on the optimal chunk size for BigShare. The efficiency of our eventual inter-user deduplication policy can be evaluated by measuring the average time needed to identify similar chunks owned by different metadata servers. Apart from the metrics that we can use to directly evaluate our primary design goals, there further important characteristics we should be able to evaluate for our system. Thus, we also discuss a few more important metrics: High availability is an important service property, even though we did not focus on providing that. A metric for our service s availability would be the frequency of outages. Based on these data and the identification of the outage reason, we can identify unpredicted weaknesses and thus accordingly provision our infrastructure or redesign parts of the system. Hardware utilization metrics are important for any service provider that also owns at least part of the infrastructure. Both average and peak utilization are crucial for optimal infrastructure provisioning. If the load variation is significant, the use of our infrastructure will not be cost-effective. If our metrics indicate so, we should consider moving to a virtualized environment to allow for efficient consolidation. 8 Conclusion File sharing services have become significantly popular in the past years. Web-based collaborative applications attract a large audience, which expects that the data it shares are persistent and readily accessible. We designed BigShare aiming to address these needs. We achieve scalability of the service by using a modular architecture, employing clearly separated modules with minimal interactions between the layers they belong to. We provide details on internal mechanisms used, and discuss tradeoffs involved with achieving scalability and high performance. Finally, we reconsider our design by substituting the S3-powered black box storage layer with an in-house solution, and discuss the influence of this change to the overall service. References [1] Eric A. Brewer. Towards robust distributed systems (abstract). In PODC, page 7, [2] Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, Jaidev Haridas, Chakravarthy Uddaraju, Hemal Khatri, Andrew Edwards, Vaman Bedekar, Shane Mainali, Rafay Abbasi, Arpit Agarwal, Mian Fahim ul Haq, Muhammad Ikram ul Haq, Deepali Bhardwaj, Sowmya Dayanand, Anitha Adusumilli, Marvin McNett, Sriram Sankaran, Kavitha Manivannan, and Leonidas Rigas. Windows azure storage: a highly available cloud storage service with strong consistency. In SOSP, pages ,

16 [3] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: amazon s highly available key-value store. In SOSP, pages , [4] Idilio Drago, Marco Mellia, Maurizio M. Munafò, Anna Sperotto, Ramin Sadre, and Aiko Pras. Inside dropbox: understanding personal cloud storage services. In Internet Measurement Conference, pages , [5] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In SOSP, pages 29 43, [6] Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. Side Channels in Cloud Services: Deduplication in Cloud Storage. IEEE Security & Privacy, 8(6):40 47, [7] Leslie Lamport. The Part-Time Parliament. ACM Trans. Comput. Syst., 16(2): ,

Cloud Scale Storage Systems. Sean Ogden October 30, 2013

Cloud Scale Storage Systems. Sean Ogden October 30, 2013 Cloud Scale Storage Systems Sean Ogden October 30, 2013 Evolution P2P routing/dhts (Chord, CAN, Pastry, etc.) P2P Storage (Pond, Antiquity) Storing Greg's baby pictures on machines of untrusted strangers

More information

Dynamo: Amazon s Highly Available Key-value Store

Dynamo: Amazon s Highly Available Key-value Store Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and

More information

Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency

Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav,

More information

Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems

Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems Prabhakaran Murugesan Outline File Transfer Protocol (FTP) Network File System (NFS) Andrew File System (AFS)

More information

How To Manage A Network On A Server On A Microsoft Server On An Ipad Or Ipad (For A Supermicroserve) On A Network (For An Ipa) On An Uniden (For Microsoft) Server On Your

How To Manage A Network On A Server On A Microsoft Server On An Ipad Or Ipad (For A Supermicroserve) On A Network (For An Ipa) On An Uniden (For Microsoft) Server On Your Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav,

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Inside Dropbox: Understanding Personal Cloud Storage Services

Inside Dropbox: Understanding Personal Cloud Storage Services Inside Dropbox: Understanding Personal Cloud Storage Services Idilio Drago Marco Mellia Maurizio M. Munafò Anna Sperotto Ramin Sadre Aiko Pras IRTF Vancouver Motivation and goals 1 Personal cloud storage

More information

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved. Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat

More information

Security Architecture Whitepaper

Security Architecture Whitepaper Security Architecture Whitepaper 2015 by Network2Share Pty Ltd. All rights reserved. 1 Table of Contents CloudFileSync Security 1 Introduction 1 Data Security 2 Local Encryption - Data on the local computer

More information

Distributed Data Stores

Distributed Data Stores Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High

More information

Security of Cloud Storage: - Deduplication vs. Privacy

Security of Cloud Storage: - Deduplication vs. Privacy Security of Cloud Storage: - Deduplication vs. Privacy Benny Pinkas - Bar Ilan University Shai Halevi, Danny Harnik, Alexandra Shulman-Peleg - IBM Research Haifa 1 Remote storage and security Easy to encrypt

More information

FileCloud Security FAQ

FileCloud Security FAQ is currently used by many large organizations including banks, health care organizations, educational institutions and government agencies. Thousands of organizations rely on File- Cloud for their file

More information

How To Write A Blog Post On Dropbox

How To Write A Blog Post On Dropbox Inside Dropbox: Understanding Personal Cloud Storage Services Idilio Drago Marco Mellia Maurizio M. Munafò Anna Sperotto Ramin Sadre Aiko Pras IMC 2012 Boston Motivation and goals 1/14 Personal cloud storage

More information

IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, 2016. Integration Guide IBM

IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, 2016. Integration Guide IBM IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, 2016 Integration Guide IBM Note Before using this information and the product it supports, read the information

More information

Sync Security and Privacy Brief

Sync Security and Privacy Brief Introduction Security and privacy are two of the leading issues for users when transferring important files. Keeping data on-premises makes business and IT leaders feel more secure, but comes with technical

More information

The Google File System

The Google File System The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:

More information

Massive Data Storage

Massive Data Storage Massive Data Storage Storage on the "Cloud" and the Google File System paper by: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung presentation by: Joshua Michalczak COP 4810 - Topics in Computer Science

More information

Egnyte Cloud File Server. White Paper

Egnyte Cloud File Server. White Paper Egnyte Cloud File Server White Paper Revised July, 2013 Egnyte Cloud File Server Introduction Egnyte Cloud File Server (CFS) is the software as a service layer that powers online file sharing and storage

More information

www.basho.com Technical Overview Simple, Scalable, Object Storage Software

www.basho.com Technical Overview Simple, Scalable, Object Storage Software www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, 2015. Integration Guide IBM

IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, 2015. Integration Guide IBM IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, 2015 Integration Guide IBM Note Before using this information and the product it supports, read the information in Notices on page 93.

More information

Deduplication as security issue in cloud services, and its representation in Terms of Service Agreements

Deduplication as security issue in cloud services, and its representation in Terms of Service Agreements Deduplication as security issue in cloud services, and its representation in Terms of Service Agreements Cecilia Wirfelt Louise Wallin Email: {cecwi155, louwa538}@student.liu.se Supervisor: Jan-Åke Larsson,

More information

Overview. Timeline Cloud Features and Technology

Overview. Timeline Cloud Features and Technology Overview Timeline Cloud is a backup software that creates continuous real time backups of your system and data to provide your company with a scalable, reliable and secure backup solution. Storage servers

More information

Cloud Computing in Distributed System

Cloud Computing in Distributed System M.H.Nerkar & Sonali Vijay Shinkar GCOE, Jalgaon Abstract - Cloud Computing as an Internet-based computing; where resources, software and information are provided to computers on-demand, like a public utility;

More information

Security Guide. BlackBerry Enterprise Service 12. for ios, Android, and Windows Phone. Version 12.0

Security Guide. BlackBerry Enterprise Service 12. for ios, Android, and Windows Phone. Version 12.0 Security Guide BlackBerry Enterprise Service 12 for ios, Android, and Windows Phone Version 12.0 Published: 2015-02-06 SWD-20150206130210406 Contents About this guide... 6 What is BES12?... 7 Key features

More information

BlackBerry Enterprise Service 10. Secure Work Space for ios and Android Version: 10.1.1. Security Note

BlackBerry Enterprise Service 10. Secure Work Space for ios and Android Version: 10.1.1. Security Note BlackBerry Enterprise Service 10 Secure Work Space for ios and Android Version: 10.1.1 Security Note Published: 2013-06-21 SWD-20130621110651069 Contents 1 About this guide...4 2 What is BlackBerry Enterprise

More information

Technical. Overview. ~ a ~ irods version 4.x

Technical. Overview. ~ a ~ irods version 4.x Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number

More information

Cassandra A Decentralized, Structured Storage System

Cassandra A Decentralized, Structured Storage System Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

Copyright 2014 Jaspersoft Corporation. All rights reserved. Printed in the U.S.A. Jaspersoft, the Jaspersoft

Copyright 2014 Jaspersoft Corporation. All rights reserved. Printed in the U.S.A. Jaspersoft, the Jaspersoft 5.6 Copyright 2014 Jaspersoft Corporation. All rights reserved. Printed in the U.S.A. Jaspersoft, the Jaspersoft logo, Jaspersoft ireport Designer, JasperReports Library, JasperReports Server, Jaspersoft

More information

Aspera Direct-to-Cloud Storage WHITE PAPER

Aspera Direct-to-Cloud Storage WHITE PAPER Transport Direct-to-Cloud Storage and Support for Third Party April 2014 WHITE PAPER TABLE OF CONTENTS OVERVIEW 3 1 - THE PROBLEM 3 2 - A FUNDAMENTAL SOLUTION - ASPERA DIRECT-TO-CLOUD TRANSPORT 5 3 - VALIDATION

More information

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, 2015. Version 4.0

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, 2015. Version 4.0 NS DISCOVER 4.0 ADMINISTRATOR S GUIDE July, 2015 Version 4.0 TABLE OF CONTENTS 1 General Information... 4 1.1 Objective... 4 1.2 New 4.0 Features Improvements... 4 1.3 Migrating from 3.x to 4.x... 5 2

More information

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT 1 SARIKA K B, 2 S SUBASREE 1 Department of Computer Science, Nehru College of Engineering and Research Centre, Thrissur, Kerala 2 Professor and Head,

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

WHITE PAPER NEXSAN TRANSPORTER PRODUCT SECURITY AN IN-DEPTH REVIEW

WHITE PAPER NEXSAN TRANSPORTER PRODUCT SECURITY AN IN-DEPTH REVIEW NEXSAN TRANSPORTER PRODUCT SECURITY AN IN-DEPTH REVIEW INTRODUCTION As businesses adopt new technologies that touch or leverage critical company data, maintaining the highest level of security is their

More information

Side channels in cloud services, the case of deduplication in cloud storage

Side channels in cloud services, the case of deduplication in cloud storage Side channels in cloud services, the case of deduplication in cloud storage Danny Harnik, Benny Pinkas, Alexandra Shulman-Peleg Presented by Yair Yona Yair Yona (TAU) Side channels in cloud services Advanced

More information

Cloud Computing Security Considerations

Cloud Computing Security Considerations Cloud Computing Security Considerations Roger Halbheer, Chief Security Advisor, Public Sector, EMEA Doug Cavit, Principal Security Strategist Lead, Trustworthy Computing, USA January 2010 1 Introduction

More information

WHITE PAPER. Understanding Transporter Concepts

WHITE PAPER. Understanding Transporter Concepts WHITE PAPER Understanding Transporter Concepts Contents Introduction... 3 Definition of Terms... 4 Organization... 4 Administrator... 4 Organization User... 4 Guest User... 4 Folder Hierarchies... 5 Traditional

More information

Overview of Luna High Availability and Load Balancing

Overview of Luna High Availability and Load Balancing SafeNet HSM TECHNICAL NOTE Overview of Luna High Availability and Load Balancing Contents Introduction... 2 Overview... 2 High Availability... 3 Load Balancing... 4 Failover... 5 Recovery... 5 Standby

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

Design Notes for an Efficient Password-Authenticated Key Exchange Implementation Using Human-Memorable Passwords

Design Notes for an Efficient Password-Authenticated Key Exchange Implementation Using Human-Memorable Passwords Design Notes for an Efficient Password-Authenticated Key Exchange Implementation Using Human-Memorable Passwords Author: Paul Seymer CMSC498a Contents 1 Background... 2 1.1 HTTP 1.0/1.1... 2 1.2 Password

More information

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications IBM Software Information Management Scaling strategies for mission-critical discovery and navigation applications Scaling strategies for mission-critical discovery and navigation applications Contents

More information

This paper defines as "Classical"

This paper defines as Classical Principles of Transactional Approach in the Classical Web-based Systems and the Cloud Computing Systems - Comparative Analysis Vanya Lazarova * Summary: This article presents a comparative analysis of

More information

Security Considerations for Public Mobile Cloud Computing

Security Considerations for Public Mobile Cloud Computing Security Considerations for Public Mobile Cloud Computing Ronnie D. Caytiles 1 and Sunguk Lee 2* 1 Society of Science and Engineering Research Support, Korea rdcaytiles@gmail.com 2 Research Institute of

More information

Mobile Admin Security

Mobile Admin Security Mobile Admin Security Introduction Mobile Admin is an enterprise-ready IT Management solution that generates significant cost savings by dramatically increasing the responsiveness of IT organizations facing

More information

Alliance Key Manager Solution Brief

Alliance Key Manager Solution Brief Alliance Key Manager Solution Brief KEY MANAGEMENT Enterprise Encryption Key Management On the road to protecting sensitive data assets, data encryption remains one of the most difficult goals. A major

More information

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wānanga o te Ūpoko o te Ika a Māui

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wānanga o te Ūpoko o te Ika a Māui VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wānanga o te Ūpoko o te Ika a Māui School of Engineering and Computer Science Te Kura Mātai Pūkaha, Pūrorohiko PO Box 600 Wellington New Zealand Tel: +64 4 463

More information

The Security Behind Sticky Password

The Security Behind Sticky Password The Security Behind Sticky Password Technical White Paper version 3, September 16th, 2015 Executive Summary When it comes to password management tools, concerns over secure data storage of passwords and

More information

SECURITY ANALYSIS OF A SINGLE SIGN-ON MECHANISM FOR DISTRIBUTED COMPUTER NETWORKS

SECURITY ANALYSIS OF A SINGLE SIGN-ON MECHANISM FOR DISTRIBUTED COMPUTER NETWORKS SECURITY ANALYSIS OF A SINGLE SIGN-ON MECHANISM FOR DISTRIBUTED COMPUTER NETWORKS Abstract: The Single sign-on (SSO) is a new authentication mechanism that enables a legal user with a single credential

More information

Online Transaction Processing in SQL Server 2008

Online Transaction Processing in SQL Server 2008 Online Transaction Processing in SQL Server 2008 White Paper Published: August 2007 Updated: July 2008 Summary: Microsoft SQL Server 2008 provides a database platform that is optimized for today s applications,

More information

High Security Online Backup. A Cyphertite White Paper February, 2013. Cloud-Based Backup Storage Threat Models

High Security Online Backup. A Cyphertite White Paper February, 2013. Cloud-Based Backup Storage Threat Models A Cyphertite White Paper February, 2013 Cloud-Based Backup Storage Threat Models PG. 1 Definition of Terms Secrets Passphrase: The secrets passphrase is the passphrase used to decrypt the 2 encrypted 256-bit

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

Research and Application of Redundant Data Deleting Algorithm Based on the Cloud Storage Platform

Research and Application of Redundant Data Deleting Algorithm Based on the Cloud Storage Platform Send Orders for Reprints to reprints@benthamscience.ae 50 The Open Cybernetics & Systemics Journal, 2015, 9, 50-54 Open Access Research and Application of Redundant Data Deleting Algorithm Based on the

More information

The Sierra Clustered Database Engine, the technology at the heart of

The Sierra Clustered Database Engine, the technology at the heart of A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel

More information

Network Attached Storage. Jinfeng Yang Oct/19/2015

Network Attached Storage. Jinfeng Yang Oct/19/2015 Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability

More information

ANALYSIS OF SMART METER DATA USING HADOOP

ANALYSIS OF SMART METER DATA USING HADOOP ANALYSIS OF SMART METER DATA USING HADOOP 1 Balaji K. Bodkhe, 2 Dr. Sanjay P. Sood MESCOE Pune, CDAC Mohali Email: 1 balajibodkheptu@gmail.com, 2 spsood@gmail.com Abstract The government agencies and the

More information

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE 1 M.PRADEEP RAJA, 2 R.C SANTHOSH KUMAR, 3 P.KIRUTHIGA, 4 V. LOGESHWARI 1,2,3 Student,

More information

How To Get To A Cloud Storage And Byod System

How To Get To A Cloud Storage And Byod System Maginatics Security Architecture What is the Maginatics Cloud Storage Platform? Enterprise IT organizations are constantly looking for ways to reduce costs and increase operational efficiency. Although

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

Feature and Technical

Feature and Technical BlackBerry Enterprise Server for Microsoft Exchange Version: 5.0 Service Pack: 4 Feature and Technical Overview Published: 2013-11-07 SWD-20131107160132924 Contents 1 Document revision history...6 2 What's

More information

SHARPCLOUD SECURITY STATEMENT

SHARPCLOUD SECURITY STATEMENT SHARPCLOUD SECURITY STATEMENT Summary Provides details of the SharpCloud Security Architecture Authors: Russell Johnson and Andrew Sinclair v1.8 (December 2014) Contents Overview... 2 1. The SharpCloud

More information

SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX

SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX White Paper SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX Abstract This white paper explains the benefits to the extended enterprise of the on-

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul

More information

Security Overview Enterprise-Class Secure Mobile File Sharing

Security Overview Enterprise-Class Secure Mobile File Sharing Security Overview Enterprise-Class Secure Mobile File Sharing Accellion, Inc. 1 Overview 3 End to End Security 4 File Sharing Security Features 5 Storage 7 Encryption 8 Audit Trail 9 Accellion Public Cloud

More information

IT Architecture Review. ISACA Conference Fall 2003

IT Architecture Review. ISACA Conference Fall 2003 IT Architecture Review ISACA Conference Fall 2003 Table of Contents Introduction Business Drivers Overview of Tiered Architecture IT Architecture Review Why review IT architecture How to conduct IT architecture

More information

The Importance of a Resilient DNS and DHCP Infrastructure

The Importance of a Resilient DNS and DHCP Infrastructure White Paper The Importance of a Resilient DNS and DHCP Infrastructure DNS and DHCP availability and integrity increase in importance with the business dependence on IT systems The Importance of DNS and

More information

Gladinet Cloud Backup V3.0 User Guide

Gladinet Cloud Backup V3.0 User Guide Gladinet Cloud Backup V3.0 User Guide Foreword The Gladinet User Guide gives step-by-step instructions for end users. Revision History Gladinet User Guide Date Description Version 8/20/2010 Draft Gladinet

More information

Last Updated: July 2011. STATISTICA Enterprise Server Security

Last Updated: July 2011. STATISTICA Enterprise Server Security Last Updated: July 2011 STATISTICA Enterprise Server Security STATISTICA Enterprise Server Security Page 2 of 10 Table of Contents Executive Summary... 3 Introduction to STATISTICA Enterprise Server...

More information

Considerations In Developing Firewall Selection Criteria. Adeptech Systems, Inc.

Considerations In Developing Firewall Selection Criteria. Adeptech Systems, Inc. Considerations In Developing Firewall Selection Criteria Adeptech Systems, Inc. Table of Contents Introduction... 1 Firewall s Function...1 Firewall Selection Considerations... 1 Firewall Types... 2 Packet

More information

Key Considerations and Major Pitfalls

Key Considerations and Major Pitfalls : Key Considerations and Major Pitfalls The CloudBerry Lab Whitepaper Things to consider before offloading backups to the cloud Cloud backup services are gaining mass adoption. Thanks to ever-increasing

More information

THE WINDOWS AZURE PROGRAMMING MODEL

THE WINDOWS AZURE PROGRAMMING MODEL THE WINDOWS AZURE PROGRAMMING MODEL DAVID CHAPPELL OCTOBER 2010 SPONSORED BY MICROSOFT CORPORATION CONTENTS Why Create a New Programming Model?... 3 The Three Rules of the Windows Azure Programming Model...

More information

User's Guide. Product Version: 2.5.0 Publication Date: 7/25/2011

User's Guide. Product Version: 2.5.0 Publication Date: 7/25/2011 User's Guide Product Version: 2.5.0 Publication Date: 7/25/2011 Copyright 2009-2011, LINOMA SOFTWARE LINOMA SOFTWARE is a division of LINOMA GROUP, Inc. Contents GoAnywhere Services Welcome 6 Getting Started

More information

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise Cloud Service Model Selecting a cloud service model Different cloud service models within the enterprise Single cloud provider AWS for IaaS Azure for PaaS Force fit all solutions into the cloud service

More information

Cloud Gateway. Agenda. Cloud concepts Gateway concepts My work. Monica Stebbins

Cloud Gateway. Agenda. Cloud concepts Gateway concepts My work. Monica Stebbins Approved for Public Release; Distribution Unlimited. Case Number 15 0196 Cloud Gateway Monica Stebbins Agenda 2 Cloud concepts Gateway concepts My work 3 Cloud concepts What is Cloud 4 Similar to hosted

More information

MinCopysets: Derandomizing Replication In Cloud Storage

MinCopysets: Derandomizing Replication In Cloud Storage MinCopysets: Derandomizing Replication In Cloud Storage Asaf Cidon, Ryan Stutsman, Stephen Rumble, Sachin Katti, John Ousterhout and Mendel Rosenblum Stanford University cidon@stanford.edu, {stutsman,rumble,skatti,ouster,mendel}@cs.stanford.edu

More information

Multi-Datacenter Replication

Multi-Datacenter Replication www.basho.com Multi-Datacenter Replication A Technical Overview & Use Cases Table of Contents Table of Contents... 1 Introduction... 1 How It Works... 1 Default Mode...1 Advanced Mode...2 Architectural

More information

Monitoring Traffic manager

Monitoring Traffic manager Monitoring Traffic manager eg Enterprise v6 Restricted Rights Legend The information contained in this document is confidential and subject to change without notice. No part of this document may be reproduced

More information

Release Notes. LiveVault. Contents. Version 7.65. Revision 0

Release Notes. LiveVault. Contents. Version 7.65. Revision 0 R E L E A S E N O T E S LiveVault Version 7.65 Release Notes Revision 0 This document describes new features and resolved issues for LiveVault 7.65. You can retrieve the latest available product documentation

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Bootstrap guide for the File Station

Bootstrap guide for the File Station Bootstrap guide for the File Station Introduction Through the File Server it is possible to store files and create automated backups on a reliable, redundant storage system. NOTE: this guide considers

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

White Paper: Cloud Identity is Different. World Leading Directory Technology. Three approaches to identity management for cloud services

White Paper: Cloud Identity is Different. World Leading Directory Technology. Three approaches to identity management for cloud services World Leading Directory Technology White Paper: Cloud Identity is Different Three approaches to identity management for cloud services Published: March 2015 ViewDS Identity Solutions A Changing Landscape

More information

Novell ZENworks 10 Configuration Management SP3

Novell ZENworks 10 Configuration Management SP3 AUTHORIZED DOCUMENTATION Software Distribution Reference Novell ZENworks 10 Configuration Management SP3 10.3 November 17, 2011 www.novell.com Legal Notices Novell, Inc., makes no representations or warranties

More information

Service Overview CloudCare Online Backup

Service Overview CloudCare Online Backup Service Overview CloudCare Online Backup CloudCare s Online Backup service is a secure, fully automated set and forget solution, powered by Attix5, and is ideal for organisations with limited in-house

More information

Evaluation of different Open Source Identity management Systems

Evaluation of different Open Source Identity management Systems Evaluation of different Open Source Identity management Systems Ghasan Bhatti, Syed Yasir Imtiaz Linkoping s universitetet, Sweden [ghabh683, syeim642]@student.liu.se 1. Abstract Identity management systems

More information

Basic Unix/Linux 1. Software Testing Interview Prep

Basic Unix/Linux 1. Software Testing Interview Prep Basic Unix/Linux 1 Programming Fundamentals and Concepts 2 1. What is the difference between web application and client server application? Client server application is designed typically to work in a

More information

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter

More information

Data Consistency on Private Cloud Storage System

Data Consistency on Private Cloud Storage System Volume, Issue, May-June 202 ISS 2278-6856 Data Consistency on Private Cloud Storage System Yin yein Aye University of Computer Studies,Yangon yinnyeinaye.ptn@email.com Abstract: Cloud computing paradigm

More information

Security Provider Integration LDAP Server

Security Provider Integration LDAP Server Security Provider Integration LDAP Server 2015 Bomgar Corporation. All rights reserved worldwide. BOMGAR and the BOMGAR logo are trademarks of Bomgar Corporation; other trademarks shown are the property

More information

Data Replication in Privileged Credential Vaults

Data Replication in Privileged Credential Vaults Data Replication in Privileged Credential Vaults 2015 Hitachi ID Systems, Inc. All rights reserved. Contents 1 Background: Securing Privileged Accounts 2 2 The Business Challenge 3 3 Solution Approaches

More information

Enterprise SSO Manager (E-SSO-M)

Enterprise SSO Manager (E-SSO-M) Enterprise SSO Manager (E-SSO-M) Many resources, such as internet applications, internal network applications and Operating Systems, require the end user to log in several times before they are empowered

More information

Scalable Multiple NameNodes Hadoop Cloud Storage System

Scalable Multiple NameNodes Hadoop Cloud Storage System Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai

More information

Final Year Project Interim Report

Final Year Project Interim Report 2013 Final Year Project Interim Report FYP12016 AirCrypt The Secure File Sharing Platform for Everyone Supervisors: Dr. L.C.K. Hui Dr. H.Y. Chung Students: Fong Chun Sing (2010170994) Leung Sui Lun (2010580058)

More information

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol-1, Iss.-3, JUNE 2014, 54-58 IIST SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE

More information