An Architecture for Replica Management in Grid Computing Environments

Size: px
Start display at page:

Download "An Architecture for Replica Management in Grid Computing Environments"

Transcription

1 _ An Architecture for Replica Management in Grid Computing Environments Abstract We present the architecture of a replica management service that manages the copying and placement of files in a high-performance, distributed computing environment to optimize the performance of the data-intensive applications. This architecture consists of two parts: a replica catalog or repository where information can be registered about logical files, collections of files, and physical locations where subsets of collections are stored; and a set of registration and query operations that are supported by the replica management service. The replica management service can be used by higher-level services such as replica selection and automatic creation of new replicas to satisfy application performance requirements. We describe important design decisions and implementation issues for the replica management service. Design decisions include a strict separation between file metadata and replication information, no enforcement of replica semantics or file consistency, and support for rollback after failures of complex operations. Implementation issues include options for the underlying technology of the replica catalog and the tradeoff between reliability and complexity. 1 Introduction Data-intensive, high-performance computing applications require the efficient management and transfer of terabytes or petabytes of information in wide-area, distributed computing environments. Examples of such applications include experimental analyses and simulations in scientific disciplines such as high-energy physics, climate modeling, earthquake engineering, and astronomy. In such applications, massive datasets must be shared by a community of hundreds or thousands of researchers distributed worldwide. These researchers need to transfer large subsets of these datasets to local sites or other remote resources for processing. They may create local copies or replicas to overcome long wide-area data transfer latencies. The data management environment must provide security services such as authentication of users and control over who is allowed to access the data. In addition, once multiple copies of files are distributed at multiple locations, researchers need to be able to locate copies and determine whether to access an existing copy or create a new one to meet the performance needs of their applications. We argue that the requirements of such distributed, data intensive applications are best met by the creation of a Data Grid infrastructure that provides a set of orthogonal, application-independent services that can then be combined and specialized in different ways to meet the needs of specific applications. These services include a metadata

2 management service that records information about the contents of files and the experimental conditions under which they were created; a replica management service that registers multiple copies of files at different physical locations and allows users to discover where files are located; a replica selection service that chooses the best replica for a data transfer based on predicted performance; and a secure, reliable, efficient data transfer protocol. In this paper, we present the architecture of a replica management service charged with managing the copying and placement of files in a distributed computing system so as to optimize the performance of the data analysis process. Our goal in designing this service is not to provide a complete solution to this problem but rather to provide a set of basic mechanisms that will make it easy for users or higher-level tools to manage the replication process. Our proposed replica management service provides the following basic functions: The registration of files with the replica management service The creation and deletion of replicas for previously registered files Enquiries concerning the location of replicas In turn, the basic functions provided by the replica management service can be used by higher-level services, for example, by replica selection services that select among available replicas based on the predicted performance of data transfers and by replica creation services that automatically generate and register new replicas in response to data access patterns and the current state of the computational grid. In this paper, we present the basic components of our architecture for a replica management service. To register replicas, users must create entries in a replica catalog or repository. There are three types of entries: logical files, logical collections and locations. We describe these entries and the registration and query operations that are supported by the replica management service. We also present important design decisions for the replica management architecture. These include: Separation of replication and file metadata information: Our architecture assumes a strict separation between metadata information, which describes the contents of files, and replication information, which is used to map logical file and collection names to physical locations. Metadata and replica management are orthogonal services. Replication semantics: Our architecture enforces no replica semantics. Files registered with the replica management service are asserted by the user to be replicas of one another, but the service does not make guarantees about file consistency. Rollback: If a failure occurs during a complex, multi-part operation, we will rollback the state of the replica management service to its previously-consistent state before the operation began.

3 No distributed locking: Our architecture does not assume the existence of a distributed locking mechanism. Because of this, it is possible for users to corrupt the replica management service by changing or deleting files on registered storage systems without informing the replica management service. The paper concludes with a discussion of implementation issues for replica management. 2 A Motivating Example: High-Energy Physics Applications We use high-energy physics experiments to motivate the design of our replica management architecture. We characterize the application with respect to parameters such as average file sizes, total data volume, rate of data creation, type of file access (write-once, write-many), expected access rates, type of storage system (file system or database), and consistency requirements for multiple copies of data. In this application, as well as others that we have examined, such as climate modeling, earthquake engineering and astronomy, we see a common requirement for two basic data management services: efficient access to, and transfer of, large files; and a mechanism for creating and managing multiple copies of files. Experimental physics applications operate on and generate large amounts of data. For example, beginning in 2005, the Large Hadron Collider (LHC) at the European physics center CERN will produce several petabytes of raw and derived data per year for approximately 15 years. The data generated by physics experiments are of two types: experimental data, or information collected by the experiment; and metadata, or information about the experiment, such as the number of events, and the results of analysis. File sizes and numbers of files are determined to some extent by the type of software used to store experimental data and metadata. For example, several experiments have chosen to use the object-oriented Objectivity database. Current file sizes (e.g., within the BaBar experiment) range from 2 to 10 gigabytes in size, while metadata files are approximately 2 gigabytes. Objectivity currently limits database federations to 64K files. However, future versions of Objectivity will support more files, allowing average file sizes to be reduced. Access patterns vary for experimental data files and metadata. Experimental data files typically have a single creator. During an initial production period lasting several weeks, these files are modified as new objects are added. After data production is complete, files are not modified. In contrast, metadata files may be created by multiple individuals and may be modified or augmented over time, even after the initial period of data production. For example, some experiments continue to modify metadata files to reflect the increasing number of total events in the database. The volume of metadata is typically smaller than that of experimental data.

4 The consumers of experimental physics data and metadata will number in the hundreds or thousands. These users are distributed at many sites worldwide. Hence, it is often desirable to make copies or replicas of the data being analyzed to minimize access time and network load. For example, Figure 1 shows the expected replication scheme for LHC physics datasets. Files are replicated in a hierarchical manner, with all files stored at a central location (CERN) and decreasing subsets of the data stored at national and regional data centers. Tier 0 CERN Tier 1 France Tier 1 Italy Tier 1 England Tier 2 Bologna Tier 2 Pisa Tier 2 Padova Figure 1: Scheme for hierarchical replication of Physics data Replication of physics datasets is complicated by several factors. First, security services are required to authenticate the user and control access to storage systems. Next, because datasets are so large, it may be desirable to replicate only interesting subsets of the data. Finally, replication of data subject to modification implies a need for a mechanism for propagating updates to all replicas. For example, consider the initial period of data production, during which files are modified for several weeks. During this period, users want their local replicas to be updated periodically to reflect the experimental data being produced. Typically, updates are batched and performed every few days. Since metadata updates take place over an indefinite period, these changes must also be propagated periodically to all replicas. In Table 1 we summarize the characteristics of high-energy physics applications. Table 1: Chracteristics of high-energy physics applications Rate of data generation (starting 2005) Typical experimental database file sizes Typical metadata database file sizes Maximum number of database files in federation Period of updates to experimental data Period of updates to metadata Type of storage system Number of data consumers Several petabytes per year 2 to 10 gigabytes 2 gigabytes Currently 64K; eventually millions Several weeks Indefinite Object-oriented database Hundreds to thousands

5 3 Data Model We assume the following data model. Data are organized into files. For convenience, users group files into collections. A replica or location is a subset of a collection that is stored on a particular physical storage system. There may be multiple, possibly overlapping subsets of a collection stored on multiple storage systems in a data grid. These grid storage systems may use a variety of underlying storage technologies and data movement protocols, which are independent of replica management. We distinguish between logical file names and physical file names. A logical file name is a globally unique identifier for a file within the data grid s namespace. The logical file name may or may not have meaning for a human, for example, by recording information about the contents of a file. However, the replica management service does not use any semantic information contained in logical file names. The purpose of the replica management service is to map a unique logical file name to a possibly different physical name for the file on a particular storage device. 4 Replica Management in Grid Computing Environments The replica management service is just one component in a computational grid environment that provides support for high-performance, data-intensive applications. A recently proposed architecture for computational grids [1] includes four levels: Fabric: At the lowest level of the grid architecture are the basic components and resources from which a computational grid is constructed. These include storage systems, networks, and catalogs. Connectivity: At the next level of the architecture are services concerned with communication and authentication. Typically, these are standard protocols. Resource: Services at the next highest level are concerned with providing secure, remote access to individual resources. Collective: Services at the collective level support the coordinated management of multiple resources. Figure 2 shows a partial list of components at each level of the proposed grid architecture, with particular emphasis on components related to replica management. At the lowest fabric level of the architecture are the basic components that make up the Grid, including storage systems, networks and computational systems. In addition, the picture includes two catalogs: a metadata catalog that contains descriptive information about files and a replica catalog where information is stored about registered replicas. At the connectivity layer are various standard protocols for communication and security. At the resource level are services associated with managing individual resources, for example, storage and catalog management protocols as well as protocols for network and computation resource management. Finally, at the collective layer of the architecture are

6 higher-level services that manage multiple underlying resources, including the replica management service that is the focus of this paper. Other services at the collective layer include services for replica selection, metadata management, management of replicated and distributed catalogs, and for information services that provide resource discovery or performance estimation. Application Particle physics application, climate modeling application, etc. Collective Replica Mgmt Services Replica Selection Services Metadata Services Distributed Catalog Services Information Services... Resource Storage Mgmt Protocol Catalog Mgmt Protocol Network Mgmt Protocol Compute Mgmt Protocol... Connectivity Communication, service discovery (DNS), authentication, delegation Fabric Storage Systems Networks Compute Systems Replica Catalog Metadata Catalog Figure 2: Shows a partial list of elements of the Data Grid Reference Architecture [1] that are relevant to replica management. One of the key features of our architecture is that the replica management service is orthogonal to other services such as replica selection and metadata management. Figure 3 shows a scenario where an application accesses several of these orthogonal services to identify the best location for a desired data transfer. For example, consider a climate modeling simulation that will be run on precipitation data collected in The scientist running the simulation does not know the exact file names or locations of the data required for this analysis. Instead, the application specifies the characteristics of the desired data at a high level and passes this attribute description to a metadata catalog (1). The metadata catalog queries its attribute-based indexes and produces a list of logical files that contain data with the specified characteristics. The metadata catalog returns this list of logical files to the application (2). The application passes these logical file names to the replica management service (3), which returns to the application a list of physical locations for all registered copies of the desired logical files (4). Next, the application passes this list of replica locations (5) to a replica selection service, which identifies the source and destination storage system locations for all candidate data transfer operations. In our example, the source locations contain files with 1998 precipitation measurements, and the destination location is where the application will access the data. The replica selection service sends the candidate source and destination locations to one or more information services (6), which provide estimates of candidate transfer performance based on grid measurements and/or predictions (7). Based on these estimates, the replica

7 selection service chooses the best location for a particular transfer and returns location information for the selected replica to the application (8). Following this selection process, the application performs the data transfer operations. Application Attributes of Desired Data (1) (2) (3) Logical File Names Metadata Service Replica Management Service (5) Locations of one or more (4) replicas Location of selected replica (8) Replica Selection Service (6) (7) Sources and destinations of candidate transfers Performance Measurements and Predictions Information Services Figure 3: Shows a data selection scenario where the application consults the metadata service, replica management service and replica selection service to determine the best source of data matching a set of desired data attributes. 5 The Replica Management Service Architecture The architecture of the replica management service consists of a replica catalog or repository where information about registered replicas is stored and a set of registration and query operations that are supported by the service. Our architecture does not require a specific implementation for the replica catalog. In this section, we begin by defining the objects that are registered with the service. Next, we present important architecture design decisions that clarify the functionality provided by the replica management service. Finally, we briefly describe the operations supported by the service. 5.1 Managed Objects As already discussed, the purpose of the replica management service is to allow users to register files with the service, create and delete replicas of previously registered files, and make enquiries about the location and performance characteristics of replicas. The replica management service must register three types of entries in a replica catalog or repository:

8 Logical files Logical collections Locations Logical files are entities with globally unique names that may have one or more physical instances. Users characterize individual files by registering them with the replica management service. A logical collection is a user-defined group of files. We expect that users will often find it convenient and intuitive to register and manipulate groups of files as a collection, rather than requiring that every file be registered and manipulated individually. A logical collection is simply a list of files and contains no information about the physical locations where files are stored. Location entries in the replica management system contain all information required to map a logical collection to a particular physical instance of that collection. This might include such information as the hostname, port number and access protocol of the physical storage system where the files are stored. Each location object represents a complete or partial copy of a logical collection on a storage system. One location entry corresponds to exactly one physical storage system location. Each logical collection may have an arbitrary number of associated location objects, each of which contains mapping information for a (possibly overlapping) subset of the files in the collection. To illustrate the use of these objects for registering and querying replica information, we again use the example of precipitation measurements for the year Suppose that files contain one month of measurements, and that file names are jan98, feb98, mar98, etc. The manager of a climate modeling catalog could register all these files as belonging to a logical collection called precipitation98. In addition, the manager could register information, such as file size, about each file in separate logical file entries. If a storage system at site 1 stores a complete copy of the files in this logical collection, the manager would register a location entry in the catalog that contains all information needed to map from logical file names to physical storage locations at site 1. Similarly, if a storage system at site 2 stores only files jan98, feb98 and mar98, this list of files as well as mapping information would be registered with the replica management service using a location entry. Subsequently, if a user queries the replica management service to determine all locations of the logical file feb98, the service will respond with physical storage locations for the file at sites 1 and 2. A query for the file jun98 would return only information about the location of the file at site Architecture Design Decisions Next, we discuss several important design decisions for the replica management service. Our motivation for several of these decisions was to clearly define the role of the service and to limit its complexity.

9 5.2.1 Separation of Replication and Metadata Information One key observation is that the objects that can be registered with the replica management service contain only the information required to map logical file and collection names to physical locations. Any other information that might be associated with files or collections, such as descriptions of file contents or the experimental conditions under which files were created, should be stored in an orthogonal metadata management service. Our architecture places no constraints on the design or the contents of the metadata service. Typically, a user might first consult the metadata management service to select logical files based on metadata attributes such as the type of experimental results needed or the time when data were collected. Once the necessary logical files are identified, the user consults the replica management service to find one or more physical locations where copies of the desired logical files are stored Replication Semantics The word replica has been used in a variety of contexts with a variety of meanings. At one extreme, the word replica is sometimes used to mean a copy of a file that is guaranteed to be consistent with the original, despite updates to the latter. A replica management architecture that supports this definition of replication would be required to implement the full functionality of a wide area, distributed database, with locking of files during modification and atomic updates of all replicas. Because of the difficulty of implementing such a distributed database, our architecture operates at the other extreme: our replica management service explicitly does not enforce any replica semantics. In other words, for multiple replicas (locations) of a logical collection, we make no guarantees about file consistency, nor do we maintain any information on which was the original or source location from which one or more copies were made. When users register files as replicas of a logical collection, they assert that these files are replicas under a user-specific definition of replication. Our replica management service does not perform any operations to check, guarantee or enforce the user s assertion Replica Management Service Consistency Although our architecture makes no guarantees about consistency among registered file replicas, we must make certain guarantees about the consistency of information stored in the replica management service itself. Since computational and network failures are inevitable in distributed computing environments, the replica management service must be able to recover and return to a consistent state despite conflicting or failed operations. One way our architecture remains consistent is to guarantee that no file registration operation should successfully complete unless the file exists completely on the

10 corresponding storage system. Consider a replica copy operation that includes copying a file from a source to a destination storage system and registering the new file in a location entry in the replica service. We must enforce an ordering on operations, requiring that the copy operation successfully completes before registration of the file with the replica management service is allowed to complete. If failures occur and the state of the replica management service is corrupted, we must rollback the replica management service to a consistent state Rollback Certain operations on the replica management service are atomic. If they are completed, then the state of the replica management service is updated. If these operations fail, then the state of the replica management service is unchanged. Examples of atomic operations include adding a new entry to the replica management service, deleting an entry, or adding an attribute to an existing entry. Other operations on the replica management service consist of multiple parts. For example, consider an operation that copies a file to a storage system and registers the file with the replica management service. Our architecture does not assume that complex, multi-part operations are atomic. Depending on when a failure occurs during a multi-part operation, the information registered in the replica management service may become corrupted. We guarantee that if failures occur during complex operations, we will rollback the state of the replica management service to the previously-consistent state before the operation began. This requires us to save sufficient state about outstanding complex operations to revert to a consistent state after failures No Distributed Locking Mechanism It is possible for users to corrupt our replica management service by changing or deleting files on an underlying storage system without informing the replica management service. We strongly discourage such operations, but the architecture does not prevent them. After such operations, information registered in the replica catalog may not be consistent with the actual contents of corresponding storage systems. The replica management service could avoid such corruption if it could enforce that all changes to storage systems be made via calls to the replica management service. Enforcing this requirement would require a distributed locking mechanism that prevents changes to registered storage locations except via authorized replica management operations. Because of the difficulty of implementing such a distributed locking mechanism, our architecture does not assume that locking is available and does not guarantee that catalog corruption will not occur Requirements for Logical Files and Locations We require that files that are registered in location or logical file entries must also be registered in the corresponding logical collection entry. Conversely, we do not require

11 that every file in a logical collection entry be registered in a location or logical file entry. In other words, there may be logical files associated with a logical collection that currently have no registered physical instances in the catalog Post-Processing Files After Data Transfer Our architecture provides limited support for post-processing operations on transferred data. Certain applications would like to perform post-processing after a file is transferred to a destination storage system but before the file is registered in the replica management service. Examples of post-processing include decryption of data, running verification operations such as a checksum to confirm that the file was not corrupted during transfer, or attaching the transferred file to an object-oriented database. We limit the nature of allowed post-processing operations to maintain our consistency guarantees for the replica management service. In particular, we allow only those postprocessing operations that do not alter file contents. Reading the contents of a transferred file to perform verification (checksum) calculations or registering the file in an external database would be allowed. However, decrypting a data file would not be allowed, since the contents of the file would change. These restrictions make it possible for us to roll back failed post-processing operations and restart them, if necessary. 5.3 Replica Management Operations Our replica management architecture includes support for the following operations: Register a new entry in the replica management service: A new logical collection consisting of a list of logical file names A new location containing mapping information for a subset of files in an existing logical collection A new logical file entry with specific information, such as size, describing a single file in an existing logical collection Modify an existing entry in the replica management service: Add or delete a file from an existing logical collection or location entry Add or delete a descriptive attribute of an existing entry Query the replica management service: Find an entry, if it exists, for a specified logical file, logical collection, or location Find all locations that include a physical copy of a specified logical file Return requested attributes associated with an entry. For a logical collection entry, return the names of files in the collection. For a location

12 entry, return attributes used to map logical names to physical names. For a logical file entry, return attributes that describe the logical file. Combined storage and registration operations Copy a file registered in an existing location entry from a source to a destination storage system and register the file in the corresponding location entry Publish a file that is not currently represented in the replica catalog by copying it to a storage system and registering the file in corresponding location and logical collection entries Delete entries from the replica management service 6 Implementation Questions For the replica management service architecture we have described, there are many possible implementations. In this section, we discuss a few implementation issues. 6.1 Storing and Querying Replica Information There are a variety of different technologies that can store and query replica management service information. Two possibilities are relational databases and LDAP directories. A relational database provides support for indexing replica information efficiently and for database language queries (e.g., SQL) of replica management information. An LDAP or Lightweight Directory Access Protocol directory has a simple protocol for entering and querying information in a directory; an LDAP directory can use a variety of storage systems or back-ends to hold data, ranging from a relational database to a file system. 6.2 Reliability and Availability If the replica management service fails during an operation, it will be unavailable for some time and its state upon recovery will be indeterminate. The amount of reliability and availability provided by a replica management service is an implementation decision. As with any system design, there is an inevitable tradeoff between the level of reliability and consistency after failures and the system s cost and complexity. Reliability and availability will be greatly improved in implementations that replicate and/or distribute the replica management service. Our architecture allows implementers to use services provided by relational databases and LDAP directories for distributing and replicating information. If high reliability and availability are required, then system builders must devote adequate resources to replicating the service, performing frequent checkpoint operations to facilitate quick recovery, and avoiding single hardware and software points of failure. This robustness must be engineered into the replica management service.

13 7 Conclusions We have argued that data-intensive, high-performance computing applications such as high-energy physics require the efficient management and transfer of terabytes or petabytes of information in wide-area, distributed environments. Researchers performing these analyses create local copies or replicas of large subsets of these datasets to overcome long wide-area data transfer latencies. A Data Grid infrastructure to support these applications must provide a set of orthogonal, application-independent services that can then be combined and specialized in different ways to meet the needs of specific applications. These services include a metadata management, replica management, replica selection service, and secure, reliable, efficient data transfer. We have presented an architecture for a replica management service. This architecture consists of two parts: a replica catalog or repository where information can be registered about logical files, collections of files, and physical locations where subsets of collections are stored; and a set of registration and query operations that are supported by the replica management service. The replica management service can be used by higher-level services such as replica selection and automatic creation of new replicas to satisfy application performance requirements. In addition to describing the basic entities that are registered with the replica catalog, we presented several important design decisions for the replica management service. To make implementation of the replica management service feasible, we have limited its functionality. For example, our service does not guarantee or enforce any replica semantics or replica consistency. When users register files as replicas of a logical collection, they assert that these files are replicas under a user-specific definition of replication. We do not enforce consistency among replicas because doing so would require us to implement a wide-area, distributed database with a distributed locking mechanism and atomic updates of all replicas. Efficient implementation of such a widearea distributed database represents a difficult, outstanding research problem. Several other architecture decisions designed to clearly define the role and limit the complexity of the replica management service include the separation of replication and file metadata information, providing rollback of complex operations, and limiting the types of post-processing that can be performed on transferred files. Finally, the paper presented a few implementation issues for the replica management service. One issue is the technology used to implement the replica catalog and the query protocol for that catalog. Another is the amount of reliability and availability provided by the replica management service. As with any system implementation, there is a tradeoff of cost and complexity when providing higher reliability and availability. A reliable system would include distribution and replication of the replica catalog, frequent checkpoint operations for fast recovery after failures, and hardware and software redundancy to avoid single points of failure.

14 References [1] I. Foster, C. Kesselman, S. Tuecke, The Anatomy of the Grid: Enabling Scalable Virtual Organizations, IJSA 2001.

Information Sciences Institute University of Southern California Los Angeles, CA 90292 {annc, carl}@isi.edu

Information Sciences Institute University of Southern California Los Angeles, CA 90292 {annc, carl}@isi.edu _ Data Management and Transfer in High-Performance Computational Grid Environments Bill Allcock 1 Joe Bester 1 John Bresnahan 1 Ann L. Chervenak 2 Ian Foster 1,3 Carl Kesselman 2 Sam Meder 1 Veronika Nefedova

More information

Information Sciences Institute University of Southern California Los Angeles, CA 90292 {annc, carl}@isi.edu

Information Sciences Institute University of Southern California Los Angeles, CA 90292 {annc, carl}@isi.edu _ Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing Bill Allcock 1 Joe Bester 1 John Bresnahan 1 Ann L. Chervenak 2 Ian Foster 1,3 Carl Kesselman 2 Sam

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007 Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the

More information

Data Grids. Lidan Wang April 5, 2007

Data Grids. Lidan Wang April 5, 2007 Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural

More information

MS-40074: Microsoft SQL Server 2014 for Oracle DBAs

MS-40074: Microsoft SQL Server 2014 for Oracle DBAs MS-40074: Microsoft SQL Server 2014 for Oracle DBAs Description This four-day instructor-led course provides students with the knowledge and skills to capitalize on their skills and experience as an Oracle

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data

More information

Data and Storage Services

Data and Storage Services Data and Storage Services G. Cancio, D. Duellmann, J. Iven, M. Lamanna, A. Pace, A.J. Peters, R.Toebbicke CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it CERN IT Department CH-1211 Genève

More information

E-mail: guido.negri@cern.ch, shank@bu.edu, dario.barberis@cern.ch, kors.bos@cern.ch, alexei.klimentov@cern.ch, massimo.lamanna@cern.

E-mail: guido.negri@cern.ch, shank@bu.edu, dario.barberis@cern.ch, kors.bos@cern.ch, alexei.klimentov@cern.ch, massimo.lamanna@cern. *a, J. Shank b, D. Barberis c, K. Bos d, A. Klimentov e and M. Lamanna a a CERN Switzerland b Boston University c Università & INFN Genova d NIKHEF Amsterdam e BNL Brookhaven National Laboratories E-mail:

More information

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS 9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence

More information

Web Service Based Data Management for Grid Applications

Web Service Based Data Management for Grid Applications Web Service Based Data Management for Grid Applications T. Boehm Zuse-Institute Berlin (ZIB), Berlin, Germany Abstract Web Services play an important role in providing an interface between end user applications

More information

Using Peer to Peer Dynamic Querying in Grid Information Services

Using Peer to Peer Dynamic Querying in Grid Information Services Using Peer to Peer Dynamic Querying in Grid Information Services Domenico Talia and Paolo Trunfio DEIS University of Calabria HPC 2008 July 2, 2008 Cetraro, Italy Using P2P for Large scale Grid Information

More information

DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD

DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure Arcot (RAJA) Rajasekar DICE/SDSC/UCSD What is SRB? First Generation Data Grid middleware developed at the San Diego Supercomputer Center

More information

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 (Possible) HEP Use Case for NDN Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 Outline LHC Experiments LHC Computing Models CMS Data Federation & AAA Evolving Computing Models & NDN Summary Phil DeMar:

More information

Knowledge based Replica Management in Data Grid Computation

Knowledge based Replica Management in Data Grid Computation Knowledge based Replica Management in Data Grid Computation Riaz ul Amin 1, A. H. S. Bukhari 2 1 Department of Computer Science University of Glasgow Scotland, UK 2 Faculty of Computer and Emerging Sciences

More information

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report. http://hpcrd.lbl.gov/hepcybersecurity/

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report. http://hpcrd.lbl.gov/hepcybersecurity/ An Integrated CyberSecurity Approach for HEP Grids Workshop Report http://hpcrd.lbl.gov/hepcybersecurity/ 1. Introduction The CMS and ATLAS experiments at the Large Hadron Collider (LHC) being built at

More information

Tufts University. Department of Computer Science. COMP 116 Introduction to Computer Security Fall 2014 Final Project. Guocui Gao Guocui.gao@tufts.

Tufts University. Department of Computer Science. COMP 116 Introduction to Computer Security Fall 2014 Final Project. Guocui Gao Guocui.gao@tufts. Tufts University Department of Computer Science COMP 116 Introduction to Computer Security Fall 2014 Final Project Investigating Security Issues in Cloud Computing Guocui Gao Guocui.gao@tufts.edu Mentor:

More information

The glite File Transfer Service

The glite File Transfer Service The glite File Transfer Service Peter Kunszt Paolo Badino Ricardo Brito da Rocha James Casey Ákos Frohner Gavin McCance CERN, IT Department 1211 Geneva 23, Switzerland Abstract Transferring data reliably

More information

IBM Tivoli Storage Manager Version 7.1.4. Introduction to Data Protection Solutions IBM

IBM Tivoli Storage Manager Version 7.1.4. Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.4 Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.4 Introduction to Data Protection Solutions IBM Note: Before you use this

More information

Database 10g Edition: All possible 10g features, either bundled or available at additional cost.

Database 10g Edition: All possible 10g features, either bundled or available at additional cost. Concepts Oracle Corporation offers a wide variety of products. The Oracle Database 10g, the product this exam focuses on, is the centerpiece of the Oracle product set. The "g" in "10g" stands for the Grid

More information

Logical Design of Audit Information in Relational Databases

Logical Design of Audit Information in Relational Databases Essay 25 Logical Design of Audit Information in Relational Databases Sushil Jajodia, Shashi K. Gadia, and Gautam Bhargava In the Trusted Computer System Evaluation Criteria [DOD85], the accountability

More information

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware R. Goranova University of Sofia St. Kliment Ohridski,

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Managing Users and Identity Stores

Managing Users and Identity Stores CHAPTER 8 Overview ACS manages your network devices and other ACS clients by using the ACS network resource repositories and identity stores. When a host connects to the network through ACS requesting

More information

Data sharing and Big Data in the physical sciences. 2 October 2015

Data sharing and Big Data in the physical sciences. 2 October 2015 Data sharing and Big Data in the physical sciences 2 October 2015 Content Digital curation: Data and metadata Why consider the physical sciences? Astronomy: Video Physics: LHC for example. Video The Research

More information

Real-time Data Replication

Real-time Data Replication Real-time Data Replication from Oracle to other databases using DataCurrents WHITEPAPER Contents Data Replication Concepts... 2 Real time Data Replication... 3 Heterogeneous Data Replication... 4 Different

More information

GridFTP: A Data Transfer Protocol for the Grid

GridFTP: A Data Transfer Protocol for the Grid GridFTP: A Data Transfer Protocol for the Grid Grid Forum Data Working Group on GridFTP Bill Allcock, Lee Liming, Steven Tuecke ANL Ann Chervenak USC/ISI Introduction In Grid environments,

More information

From Distributed Computing to Distributed Artificial Intelligence

From Distributed Computing to Distributed Artificial Intelligence From Distributed Computing to Distributed Artificial Intelligence Dr. Christos Filippidis, NCSR Demokritos Dr. George Giannakopoulos, NCSR Demokritos Big Data and the Fourth Paradigm The two dominant paradigms

More information

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High

More information

EII - ETL - EAI What, Why, and How!

EII - ETL - EAI What, Why, and How! IBM Software Group EII - ETL - EAI What, Why, and How! Tom Wu 巫 介 唐, wuct@tw.ibm.com Information Integrator Advocate Software Group IBM Taiwan 2005 IBM Corporation Agenda Data Integration Challenges and

More information

HDFS Architecture Guide

HDFS Architecture Guide by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5

More information

BUSINESS PROCESSING GIANT TURNS TO HENSON GROUP TO ENHANCE SQL DATA MANAGEMENT SOLUTION

BUSINESS PROCESSING GIANT TURNS TO HENSON GROUP TO ENHANCE SQL DATA MANAGEMENT SOLUTION BUSINESS PROCESSING GIANT TURNS TO HENSON GROUP TO ENHANCE SQL DATA MANAGEMENT SOLUTION Overview Country or Region: United States Industry: Business Processing Customer Profile Cynergy Data provides electronic

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Concepts and Architecture of Grid Computing. Advanced Topics Spring 2008 Prof. Robert van Engelen

Concepts and Architecture of Grid Computing. Advanced Topics Spring 2008 Prof. Robert van Engelen Concepts and Architecture of Grid Computing Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Grid users: who are they? Concept of the Grid Challenges for the Grid Evolution of Grid systems

More information

THE CCLRC DATA PORTAL

THE CCLRC DATA PORTAL THE CCLRC DATA PORTAL Glen Drinkwater, Shoaib Sufi CCLRC Daresbury Laboratory, Daresbury, Warrington, Cheshire, WA4 4AD, UK. E-mail: g.j.drinkwater@dl.ac.uk, s.a.sufi@dl.ac.uk Abstract: The project aims

More information

Creating the Conceptual Design by Gathering and Analyzing Business and Technical Requirements

Creating the Conceptual Design by Gathering and Analyzing Business and Technical Requirements Creating the Conceptual Design by Gathering and Analyzing Business and Technical Requirements Analyze the impact of Active Directory on the existing technical environment. Analyze hardware and software

More information

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World COSC 304 Introduction to Systems Introduction Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca What is a database? A database is a collection of logically related data for

More information

RSA Authentication Manager 7.1 Microsoft Active Directory Integration Guide

RSA Authentication Manager 7.1 Microsoft Active Directory Integration Guide RSA Authentication Manager 7.1 Microsoft Active Directory Integration Guide Contact Information Go to the RSA corporate web site for regional Customer Support telephone and fax numbers: www.rsa.com Trademarks

More information

Concepts of Database Management Seventh Edition. Chapter 7 DBMS Functions

Concepts of Database Management Seventh Edition. Chapter 7 DBMS Functions Concepts of Database Management Seventh Edition Chapter 7 DBMS Functions Objectives Introduce the functions, or services, provided by a DBMS Describe how a DBMS handles updating and retrieving data Examine

More information

VIRTUAL MACHINE LOGBOOK

VIRTUAL MACHINE LOGBOOK VIRTUAL MACHINE LOGBOOK DIPLOMA PROJECT SUMMER-FALL 2008 TASKS TO REALIZE August 6, 2008 STUDENTS: SUPERVISORS: EXPERT: ANDREA CAVALLI JULEN POFFET FRÉDÉRIC BAPST PAOLO CALAFIURA OTTAR JOHNSEN YUSHU YAO

More information

Applications of LTFS for Cloud Storage Use Cases

Applications of LTFS for Cloud Storage Use Cases Applications of LTFS for Cloud Storage Use Cases Version 1.0 Publication of this SNIA Technical Proposal has been approved by the SNIA. This document represents a stable proposal for use as agreed upon

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure EUDAT Towards a pan-european Collaborative Data Infrastructure Damien Lecarpentier CSC-IT Center for Science, Finland EISCAT User Meeting, Uppsala,6 May 2013 2 Exponential growth Data trends Zettabytes

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

Database Management. Chapter Objectives

Database Management. Chapter Objectives 3 Database Management Chapter Objectives When actually using a database, administrative processes maintaining data integrity and security, recovery from failures, etc. are required. A database management

More information

Jitterbit Technical Overview : Salesforce

Jitterbit Technical Overview : Salesforce Jitterbit allows you to easily integrate Salesforce with any cloud, mobile or on premise application. Jitterbit s intuitive Studio delivers the easiest way of designing and running modern integrations

More information

Informix Dynamic Server May 2007. Availability Solutions with Informix Dynamic Server 11

Informix Dynamic Server May 2007. Availability Solutions with Informix Dynamic Server 11 Informix Dynamic Server May 2007 Availability Solutions with Informix Dynamic Server 11 1 Availability Solutions with IBM Informix Dynamic Server 11.10 Madison Pruet Ajay Gupta The addition of Multi-node

More information

Recovery System C H A P T E R16. Practice Exercises

Recovery System C H A P T E R16. Practice Exercises C H A P T E R16 Recovery System Practice Exercises 16.1 Explain why log records for transactions on the undo-list must be processed in reverse order, whereas redo is performed in a forward direction. Answer:

More information

AUTOMATED DATA RETENTION WITH EMC ISILON SMARTLOCK

AUTOMATED DATA RETENTION WITH EMC ISILON SMARTLOCK White Paper AUTOMATED DATA RETENTION WITH EMC ISILON SMARTLOCK Abstract EMC Isilon SmartLock protects critical data against accidental, malicious or premature deletion or alteration. Whether you need to

More information

Chapter 10. Backup and Recovery

Chapter 10. Backup and Recovery Chapter 10. Backup and Recovery Table of Contents Objectives... 1 Relationship to Other Units... 2 Introduction... 2 Context... 2 A Typical Recovery Problem... 3 Transaction Loggoing... 4 System Log...

More information

BackupAssist Common Usage Scenarios

BackupAssist Common Usage Scenarios WHITEPAPER BackupAssist Version 5 www.backupassist.com Cortex I.T. Labs 2001-2008 2 Table of Contents Introduction... 3 Disaster recovery for 2008, SBS2008 & EBS 2008... 4 Scenario 1: Daily backups with

More information

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013 ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION, Fuel Consulting, LLC May 2013 DATA AND ANALYSIS INTERACTION Understanding the content, accuracy, source, and completeness of data is critical to the

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Chapter 18: Database System Architectures. Centralized Systems

Chapter 18: Database System Architectures. Centralized Systems Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

PARALLELS CLOUD STORAGE

PARALLELS CLOUD STORAGE PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...

More information

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led Course Description This four-day instructor-led course provides students with the knowledge and skills to capitalize on their skills

More information

www.dotnetsparkles.wordpress.com

www.dotnetsparkles.wordpress.com Database Design Considerations Designing a database requires an understanding of both the business functions you want to model and the database concepts and features used to represent those business functions.

More information

A multi-dimensional view on information retrieval of CMS data

A multi-dimensional view on information retrieval of CMS data A multi-dimensional view on information retrieval of CMS data A. Dolgert, L. Gibbons, V. Kuznetsov, C. D. Jones, D. Riley Cornell University, Ithaca, NY 14853, USA E-mail: vkuznet@gmail.com Abstract. The

More information

CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY

CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY 2.1 Introduction In this chapter, I am going to introduce Database Management Systems (DBMS) and the Structured Query Language (SQL), its syntax and usage.

More information

LDAP Directory Integration with Cisco Unity Connection

LDAP Directory Integration with Cisco Unity Connection CHAPTER 6 LDAP Directory Integration with Cisco Unity Connection The Lightweight Directory Access Protocol (LDAP) provides applications like Cisco Unity Connection with a standard method for accessing

More information

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter

More information

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do

More information

ASYNCHRONOUS REPLICATION OF METADATA ACROSS MULTI-MASTER SERVERS IN DISTRIBUTED DATA STORAGE SYSTEMS. A Thesis

ASYNCHRONOUS REPLICATION OF METADATA ACROSS MULTI-MASTER SERVERS IN DISTRIBUTED DATA STORAGE SYSTEMS. A Thesis ASYNCHRONOUS REPLICATION OF METADATA ACROSS MULTI-MASTER SERVERS IN DISTRIBUTED DATA STORAGE SYSTEMS A Thesis Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical

More information

Data Discovery, Analytics, and the Enterprise Data Hub

Data Discovery, Analytics, and the Enterprise Data Hub Data Discovery, Analytics, and the Enterprise Data Hub Version: 101 Table of Contents Summary 3 Used Data and Limitations of Legacy Analytic Architecture 3 The Meaning of Data Discovery & Analytics 4 Machine

More information

Document Management Software Provider Designs for Identity and Access Flexibility

Document Management Software Provider Designs for Identity and Access Flexibility Microsoft Windows Server System Partner Solution Case Study Document Management Software Provider Designs for Identity and Access Flexibility Overview Country or Region: Canada Industry: Professional Services

More information

Tk20 Backup Procedure

Tk20 Backup Procedure Tk20 Backup Procedure 1 TK20 BACKUP PROCEDURE OVERVIEW 3 FEATURES AND ADVANTAGES: 3 TK20 BACKUP PROCEDURE 4 DAILY BACKUP CREATION 4 TRANSFER OF BACKUPS 5 AUDITING PROCESS 5 BACKUP REPOSITORY 5 WRITE TO

More information

WOS for Research. ddn.com. DDN Whitepaper. Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved.

WOS for Research. ddn.com. DDN Whitepaper. Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved. DDN Whitepaper WOS for Research Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved. irods and the DDN Web Object Scalar (WOS) Integration irods, an open source

More information

Technical. Overview. ~ a ~ irods version 4.x

Technical. Overview. ~ a ~ irods version 4.x Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

Overview of Luna High Availability and Load Balancing

Overview of Luna High Availability and Load Balancing SafeNet HSM TECHNICAL NOTE Overview of Luna High Availability and Load Balancing Contents Introduction... 2 Overview... 2 High Availability... 3 Load Balancing... 4 Failover... 5 Recovery... 5 Standby

More information

ADDING A NEW SITE IN AN EXISTING ORACLE MULTIMASTER REPLICATION WITHOUT QUIESCING THE REPLICATION

ADDING A NEW SITE IN AN EXISTING ORACLE MULTIMASTER REPLICATION WITHOUT QUIESCING THE REPLICATION ADDING A NEW SITE IN AN EXISTING ORACLE MULTIMASTER REPLICATION WITHOUT QUIESCING THE REPLICATION Hakik Paci 1, Elinda Kajo 2, Igli Tafa 3 and Aleksander Xhuvani 4 1 Department of Computer Engineering,

More information

Data Storage in Clouds

Data Storage in Clouds Data Storage in Clouds Jan Stender Zuse Institute Berlin contrail is co-funded by the EC 7th Framework Programme 1 Overview Introduction Motivation Challenges Requirements Cloud Storage Systems XtreemFS

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

The CMS analysis chain in a distributed environment

The CMS analysis chain in a distributed environment The CMS analysis chain in a distributed environment on behalf of the CMS collaboration DESY, Zeuthen,, Germany 22 nd 27 th May, 2005 1 The CMS experiment 2 The CMS Computing Model (1) The CMS collaboration

More information

Distributed Data Management

Distributed Data Management Introduction Distributed Data Management Involves the distribution of data and work among more than one machine in the network. Distributed computing is more broad than canonical client/server, in that

More information

Massive Data Storage

Massive Data Storage Massive Data Storage Storage on the "Cloud" and the Google File System paper by: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung presentation by: Joshua Michalczak COP 4810 - Topics in Computer Science

More information

Deploying Exchange Server 2007 SP1 on Windows Server 2008

Deploying Exchange Server 2007 SP1 on Windows Server 2008 Deploying Exchange Server 2007 SP1 on Windows Server 2008 Product Group - Enterprise Dell White Paper By Ananda Sankaran Andrew Bachler April 2008 Contents Introduction... 3 Deployment Considerations...

More information

Prepared by Enea S.Teresa (Italy) Version 1.0 2006-October 24

Prepared by Enea S.Teresa (Italy) Version 1.0 2006-October 24 Mersea Information System: an Authentication and Authorization System to access distributed oceanographic data. Prepared by Enea S.Teresa (Italy) Version 1.0 2006-October 24 Revision History Date Version

More information

Analisi di un servizio SRM: StoRM

Analisi di un servizio SRM: StoRM 27 November 2007 General Parallel File System (GPFS) The StoRM service Deployment configuration Authorization and ACLs Conclusions. Definition of terms Definition of terms 1/2 Distributed File System The

More information

Microsoft 6436 - Design Windows Server 2008 Active Directory

Microsoft 6436 - Design Windows Server 2008 Active Directory 1800 ULEARN (853 276) www.ddls.com.au Microsoft 6436 - Design Windows Server 2008 Active Directory Length 5 days Price $4169.00 (inc GST) Overview During this five-day course, students will learn how to

More information

On the Cost of Reliability in Large Data Grids

On the Cost of Reliability in Large Data Grids Konrad-Zuse-Zentrum für Informationstechnik Berlin Takustraße 7 D-14195 Berlin-Dahlem Germany FLORIAN SCHINTKE, ALEXANDER REINEFELD On the Cost of Reliability in Large Data Grids ZIB-Report 02-52 (December

More information

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies

More information

Status and Evolution of ATLAS Workload Management System PanDA

Status and Evolution of ATLAS Workload Management System PanDA Status and Evolution of ATLAS Workload Management System PanDA Univ. of Texas at Arlington GRID 2012, Dubna Outline Overview PanDA design PanDA performance Recent Improvements Future Plans Why PanDA The

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

Data Storage Security in Cloud Computing for Ensuring Effective and Flexible Distributed System

Data Storage Security in Cloud Computing for Ensuring Effective and Flexible Distributed System Data Storage Security in Cloud Computing for Ensuring Effective and Flexible Distributed System 1 K.Valli Madhavi A.P vallimb@yahoo.com Mobile: 9866034900 2 R.Tamilkodi A.P tamil_kodiin@yahoo.co.in Mobile:

More information

Protecting Big Data Data Protection Solutions for the Business Data Lake

Protecting Big Data Data Protection Solutions for the Business Data Lake White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

More information

SOLUTION BRIEF KEY CONSIDERATIONS FOR LONG-TERM, BULK STORAGE

SOLUTION BRIEF KEY CONSIDERATIONS FOR LONG-TERM, BULK STORAGE SOLUTION BRIEF KEY CONSIDERATIONS FOR LONG-TERM, BULK STORAGE IT organizations must store exponentially increasing amounts of data for long periods while ensuring its accessibility. The expense of keeping

More information

Diagram 1: Islands of storage across a digital broadcast workflow

Diagram 1: Islands of storage across a digital broadcast workflow XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,

More information

DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing

DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing DFSgc Distributed File System for Multipurpose Grid Applications and Cloud Computing Introduction to DFSgc. Motivation: Grid Computing currently needs support for managing huge quantities of storage. Lacks

More information

Reconciliation and best practices in a configuration management system. White paper

Reconciliation and best practices in a configuration management system. White paper Reconciliation and best practices in a configuration management system White paper Table of contents Introduction... 3 A reconciliation analogy: automobile manufacturing assembly... 3 Conflict resolution...

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

An IDL for Web Services

An IDL for Web Services An IDL for Web Services Interface definitions are needed to allow clients to communicate with web services Interface definitions need to be provided as part of a more general web service description Web

More information

Forests, trees, and domains

Forests, trees, and domains Active Directory is a directory service used to store information about the network resources across a. An Active Directory (AD) structure is a hierarchical framework of objects. The objects fall into

More information

THE WINDOWS AZURE PROGRAMMING MODEL

THE WINDOWS AZURE PROGRAMMING MODEL THE WINDOWS AZURE PROGRAMMING MODEL DAVID CHAPPELL OCTOBER 2010 SPONSORED BY MICROSOFT CORPORATION CONTENTS Why Create a New Programming Model?... 3 The Three Rules of the Windows Azure Programming Model...

More information

Planning Domain Controller Capacity

Planning Domain Controller Capacity C H A P T E R 4 Planning Domain Controller Capacity Planning domain controller capacity helps you determine the appropriate number of domain controllers to place in each domain that is represented in a

More information

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES Constantin Brâncuşi University of Târgu Jiu ENGINEERING FACULTY SCIENTIFIC CONFERENCE 13 th edition with international participation November 07-08, 2008 Târgu Jiu TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED

More information