Distributed Storage Management Service in UNICORE
|
|
|
- Joleen Davis
- 9 years ago
- Views:
Transcription
1 Distributed Storage Management Service in UNICORE Tomasz Rękawek 1,2, Piotr Bała 1,2, Krzysztof Benedyczak 1,2 1 Interdisciplinary Center for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland 2 Nicolaus Copernicus University Toruń, Poland {newton, bala, golbi}@mat.umk.pl In this paper we are presenting a UNICORE Distributed Storage Management Service (DSMS). The system provides a native UNICORE solution for exposing storage resources distributed between different UNICORE sites as a single logical space. Such a feature is highly desired by end-users, allowing them to forget about an actual location of their files. Additionally the system allows for a better Workflow System experience as a larger storage space does not limit the size of files used in users workflow jobs. The paper also compares the DSMS system to the other existing solutions, both available in UNICORE and other grid middlewares. 1 Introduction The experience with grid deployment shows that storage resources are similarly important as compute resources. UNICORE middleware provides possibility to uniformly expose distributed compute resources but in case of distributed storage either separate solutions must be deployed and then integrated or the system is limited to use separate storage resources independently. There are two possible approaches which can mitigate this situation. The first one is to implement a new solution from scratch, integrating it as good as possible with the existing UNICORE stack. Another option is to provide a UNICORE interface to an another existing solution from a different grid middleware. The later approach brings all the problems of different security mechanisms and can be painful for administrators as they have to install and maintain an additional (usually very complex) software. Furthermore it is highly possible that even despite of a significant effort the result might have some other limitations resulting from different middleware paradigms. As we believe that cost of the second approach is higher then implementation of a dedicated UNICORE solution we have decided to implement the first approach. The next two sections of this paper present existing storage capabilities in UNICORE and other gird middlewares. The section 4 precisely defines the project aims. Next the DSMS system architecture is described along with explanation of its idea and realization. The subsequent section 6 discuss the synchronization issues and respective solutions implemented to overcome them. Finally the comparison with other systems is given with a description of the current and future work. 2 Existing storage capabilities in UNICORE UNICORE Storage Management Service (SMS) provides an abstract filesystem-like view on a storage resource. It has common POSIX-like operations (eg. mkdir, delete, 1
2 listdirectory) and a few more to initiate file transfers. 1 The standard UNICORE distribution includes the following SMS implementations: FixedStorage stores files in a directory defined in a configuration, HomeStorage stores files in a user s home directory, PathedStorage a destination path is resolved at runtime so environment variables can be used to parametrize it (e.g. /opt/uni-storages/$job). Unfortunately none of these implementations allows client to distribute files throughout the grid. There are, however, three solutions that extend standard storage capabilities: UniRODS, UniHadoop and Chemomentum Data Management System. UniRODS 23 is a Storage Management Service implementation providing access to irods 4 distributed data storage system. irods provides a shared disk space consisting of multiple physical storages. Metadata and storage information are persisted in icat service based on PostgreSQL, MySQL or Oracle database engines. Advantage of the UniRODS is the usage of the standard UNICORE Storage Management Service interface, so the client software can work with it without any modifications. UniRODS is also very easy in installation, kept simple and integrates well into the internal UNICORE/X architecture. On the other hand, because authorization mechanisms in UNICORE and irods are completely different, UniRODS has to use static files to map users. Another disadvantage is necessity to install and maintain yet another, complicated service on grid nodes. UniHadoop 5 is a similar solution to UniRODS, but uses a different backend Apache Hadoop 6. Hadoop is an open-source implementation of Google File System 7 (GFS). GFS is optimized for using a large number of cheap disk servers, therefore replication and monitoring are priorities for the system. Files stored in GFS are split into fixed size chunks which are distributed into chunk servers. Each chunk has 64 MB and is replicated on three servers. UniHadoop, like UniRODS, exposes Storage Management Service interface and additionally is available in the standard UNICORE distribution. Additional advantage is that Hadoop uses system user name for authorization, so it s possible to apply standard Storage Management security model in which grid user (identified by his X.509 distinguished name) is mapped to the system user (identified by her login). This mapping is done by attribute source service (e.g. XUUDB or UVOS) existing in a grid. Unfortunately, with UniHadoop there is again a requisite for an additional complex system to maintain Apache Hadoop. Moreover Apache Hadoop internal security is not yet matured and misses features which are crucial for the grid. There is no support yet for transport level encryption and proposed authentication solutions are going to be based on Kerberos which is not the best option for X.509 certificates based UNICORE. The Chemomentum Data Management System 8 was designed as a native UNICORE distributed file storage with support for advanced meta data, ontologies and support for retrieving data for external storages like scientific databases available on-line. The system is comprised of a single entry point service called Data Management System Access Service and several back-end services providing actual functionality. The physical storage of files is performed using the standard UNICORE SMS service. The Chemomentum DMS is a powerful system but bears a significant amount of problems for its potential users: The single service being an entry point to the whole system is a bottleneck. 2
3 The whole system is complex and therefore hard to maintain. It seems that support for the system was dropped and it is not updated to the newest UNICORE releases. The effort to perform such an update can be assessed as quite high as there are many services, some of them require a domain-specific knowledge (like external database access tools). The system requires extensions on a client side. This is especially hard for UNICORE users as they have to install the extension and for developers who have to maintain plugins for each UNICORE client. 3 Data management solutions in other grid middlewares Beside the UNICORE world, data management solutions are usually called Storage Resource Managers (SRM) what is at the same time a name of the popular SRM protocol 9 used to access a storage. In this section the four most popular managers are shortly introduced. StoRM? is a simple implementation SRM. It exposes a selected POSIX file system. If the distribution is needed, then distributed file system (like IBM GPFS) has to be used. Authorization is done by the POSIX Access Control List and external mapping (from grid users to the system login) is necessary. StoRM uses an internal database for storing metadata and file locations. StoRM has also plug-in mechanism which can be used to utilize additional capabilities of the file system 10. Disk Pool Manager (DPM) is a lightweight solution for disk storage management 11. Despite its advertised lightweightness, DPM is a more complex system than StoRM. Usual installation is distributed into two type of servers: head node and disk node. Head node manages metadata and is an entry point for clients. Disk nodes store physical data. The following services run on the head node: DPM nameserver daemon manages metadata and directory structure, SRM daemon a client entry point with the SRM interface implementation, DPM a main daemon which executes requests. These services do not have to be on the same server, but can not be replicated. dcache 12 is a much more sophisticated solution than StoRM and DPM. Additional to disk storage, it supports tertiary memory (usually magnetic tapes). That kind of memory is very slow but cheap. dcache has mechanisms to copy (cache) recently used files from tapes to the disks. It also can recognize very often opened files (so called hot spots) and replicate them throughout the disk servers in order to balance the load. There are three types of servers in dcache: head node unique node that runs the database, name server and other not replicable services, door nodes entry points for clients (can be replicated), pool nodes which storages the data. 3
4 dcache s name server is called Chimera. It uses PostgreSQL database for storing files metadata and directory structure. CERN Advanced STORage manager 13 (CASTOR) is similar to dcache. It also supports tertiary storage and uses disks as a cache memory. CASTOR consists of many stateless daemons querying the central database and such architecture increases scalability. The most important components are: request handler is a lightweight service adding new requests to the database, request scheduler schedules requests for execution, it is possible to use an external scheduler such as LSF or Maui Cluster Scheduler, Stager is executing clients requests, resource monitoring daemon gathers information for the scheduler, name server stores directory structure, Distributed Logging Facility is responsible for logging. The SRM resources are able to provide distributed storage, however distribution is practically limited to the hosts at one site. The Logical File Catalogue (LFC) service 14, available in glite middleware, provides a capability to maintain the global file space. The LFC uses an Oracle or MySQL database and exposes a proprietary interface to control its contents. The LFC also implements an internal authorization system. It is integrated with other glite tools and serves as one of the principle components of the Worldwide LHC Computing Grid. 4 The aim of the project Purpose of the system presented here is to combine multiple UNICORE storage resources into a single logical resource. This logical space should be usable by both grid client software directly and grid services (for instance by the UNICORE Workflow Service). Users should not have to know where their files are stored physically. Optionally advanced users should have an opportunity to be aware of the distributed nature of the storage and physical files locations. As it can be learnt from the Chemomentum DMS lesson, the system should not be too complex otherwise its future maintenance would be problematic. The requirement which can be directly derived from the above paragraph is that the standard and existing UNICORE client software should be able to access and operate on the newly created solution. Therefore the interface to the distributed file space must be the UNICORE Storage Management Service. The service should not limit the standard capabilities of the Storage Management Service as for example supported file transfer protocols. To achieve high performance the system should be ready for providing an arbitrary number of entry points. Additionally file transfers must be performed directly between source and destination storages, without employing any proxy. Last but not least the system must use the standard UNICORE authorization and authentication mechanisms to enable smooth and natural integration with the whole middleware. 4
5 Site 1 Site 2 SMS dsms SMS 2. add() dcatalogue 3. LDA (chosen SMS andfile path) Figure 1. ImportFile operation invoked on DSMS 5 DSMS approach DSMS consists of following components: data is stored on existing, standard Storage Management Services, file locations and directory structure is stored on a service called dcatalogue, client entry point is a special Storage Management Service implementation called DSMS. The presented system can use an unlimited number of storages, which are available from a UNICORE Registry. Also an entry points number is unlimited. dcatalogue is a central service. It is storing only a minimal amount of information about the files to minimize synchronization and performance issues. DSMS despite of exposing Storage Management interface also connects to the dcatalogue and physical storages. The overall idea is to mask distributed nature of the system with the standard Storage Management implementation. We will show the way we have achieved this using two representative examples of operations. The figure 1 presents the ImportFile operation invoked on DSMS. ImportFile operation allows user to add a new file to the storage. At the beginning a user connects to the DSMS using a standard UNICORE client. The client invokes function ImportFile (step 1) and waits for an endpoint reference of File Transfer Service (FTS). DSMS connects to dcatalogue (2) asking for a physical SMS where the file should be saved. dcatalogue chooses one SMS out of those existing in the grid (using a special algorithm) and returns its EPR a and physical file name (3). This pair - SMS EPR and a EPR stands for an Endpoint Reference Address, which precisely defines a network address of a UNICORE service and a concrete resource accessible through this service. 5
6 8. Merge lists from SMS1 and SMS2 Site 1 Site 2 dsms SMS1 SMS2 dcatalogue Figure 2. ListDirectory operation invoked on DSMS file path is called Location Dependent Address. DSMS connects to the received SMS (4) calling an ImportFile operation. SMS returns FTS EPR (5) which is returned to the client (6). Further communication is performed as usual: the client sends the file using the received EPR (7). Operations 2 and 4 are done on behalf of the client using trust delegation mechanism and hence the client is an owner of the file on the SMS. The user connecting directly to the SMS has access only to his or her own files. dcatalogue also authorize all operations using the client s X.509 distinguished name. ListDirectory operation presented on the figure 2 works slightly different. Although directory structure is stored in dcatalogue, file metadata (such as size, privileges, etc.) has to be retrieved from appropriate storages. So DSMS (in step 2) invokes the lookupdir operation and gets the list of storages which contains files from the given directory (3). After that DSMS connects to each storage and invokes the LookupDirectory operation (4 7). When all lists are collected they are merged (8) and sent to the user (9). Storage Factory is a service introduced with UNICORE 6.3. It can create separate SMS instances for each workflow job (other use cases are also possible). Advantage of the Storage Factory is that it can remove all files created by a workflow job by removing the whole SMS instance. Factory can be configured to create new DSMS within a separate namespace. UNICORE Storage Factories can be also used by the system by dynamically creating instances of DSMS service. A DSMS which was created using the Storage Factory can subsequently create new physical SMSes and store files in them. Of course usage of the Storage Factory is not limited to the UNICORE Workflow System: it can be used for any other application which can take advantage of a dynamic creation of a SMS instance. 6
7 6 Handling errors and synchronization issues Synchronization between the dcatalogue database and a real file state in the SMS resource is a very important issue. There are two main possible causes of losing this synchronization: A) file, which does not exist in dcatalogue, appears in SMS resource, B) file which belongs to DSMS is removed from SMS, but not from dcatalogue. DSMS operations which could suffer from the above problems can additionally be divided into three main groups: 1. directory listing operations (methods ListDirectory and Find), 2. operations which add new file (methods ImportFile, ReceiveFile, Copy, Rename), 3. operations that get file or file information (methods ExportFile, SendFile, ListProperties, ChangePermissions, Copy, Rename). During ListDirectory and Find methods DSMS gets file lists from each used SMS, merges them and sends back to the client, so there is no synchronization problem possible here: even if data in dcatalogue is invalid the client will not notice it. If we consider operations which add a new file, synchronization problem A) may appear if a file with a given name exists on an SMS (but not in dcatalogue) and user tries to send that file again in order to replace it. To avoid A) problem DSMS checks each used SMS looking for the file with the given name. If the file exists, it will be added to the dcatalogue and replaced by the new file. Synchronization problem B) with this kind of operation can not appear. The last group of operations are these which read file information. If user want to fetch file which does not exist in the dcatalogue, then before throwing the FileNotFound exception, that file will be searched in each SMS. If it is found then it will be added to the dcatalogue and returned to the user (and the A) problem is resolved). If the file exists in the dcatalogue then DSMS also checks if the file exists in the proper SMS. If not, file will be removed from the dcatalogue and user gets the mentioned exception. It must be noted that in standard circumstances none of the synchronization issues should appear. To cause A type of problem, a user must use the physical SMS directly (what can be made difficult by administrators, by hiding the SMS from the Registry) or copy the files with other tools like FTP or SCP. Additionally the files must be placed in a special, hidden folder created by the DSMS. The situation with the B type of issue is analogous. Because of this fact and the performance penalty introduced by the above described auto-synchronization methods, they can be disabled in the DSMS configuration. 7 Summary The DSMS can be seen as a direct successor of the Chemomentum DMS: it is a native UNICORE solution. In comparison with its predecastor the DSMS is more lightweight as 7
8 it does not provide some of the more advanced features (metadata, support for ontologies and external databases). Additionally DSMS removes the original bottleneck of the system and necessity for usage of a client s extension. Therefore we hope that DSMS will be loads easier to maintain. DSMS compared to UniHadoop and UniRODS does not require installation of an additional storage system and also does not introduce security mismatches between UNICORE and other system. The DSMS is also quite different than typical SRM solutions: it is designed to support storage which is distributed between different sites, while solutions like DPM are designed to support disks exposed by different machines located in a single administration domain. DSMS exploits arbitrary UNICORE SMS implementation. It may be even used to merge Hadoop SMS and irods SMS as a single logical space). Therefore DSMS can be seen more as a LFC counterpart. However LFC is designed differently as it stores more data about files centrally, what brings more complicated synchronization issues. Also the LFC requires usage of special operations to manage files in its context. At the current project stage we can state that all basic requirements for the distributed storage were fulfilled. The system was recently integrated into the standard UNICORE distribution as an experimental type of storage. It can be seen as a perfect solution for a grid system comprising of several sites, where each of them is providing their own storage resource. Our current work is concentrated on measuring the performance of the system. We are paying a special attention to an influence of the automatic dcatalouge synchronization feature on an overall performance. Of course system scalability is another crucial aspect that must be carefully determined. In future more advanced strategies for choosing the optimal physical storage resource are planned. Currently a round-robin strategy is a default one and another one which is available chooses a storage resource with the most free space. References 1. Schuller, B.: General Introduction to UNICORE, URL: unicore6/files/general_introduction_to_unicore.pdf ( ) 2. Derc, K.: Integracja systemu irods ze środowiskiem gridowym UNICORE 6, Master s thesis, NCU Toruń Janiszewski, K.: Rozproszone bazy danych w systemach gridowych, Master s thesis, NCU Toruń The irods project, ( ) 5. Bari, W., Memon, A. S., Schuller, B.: Enhancing UNICORE Storage Management using Hadoop Distributed File System. Euro-Par 2009 Parallel Processing Workshops, LNCS, vol. 6043, Springer, Heidelberg The Apache Hadoop project, ( ) 7. Ghemawat S., Gobioff H., Leung S.-T.: The Google file system. Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003, SOSP 2003, Bolton Landing, NY, USA, October 19-22, ACM Ostropytskyy, V.: Data Management in Chemomentum, European Grid Technology Days, Brussels, 2007, 8
9 ftp://ftp.cordis.europa.eu/pub/fp7/ict/docs/ssai/ cm-d2-par4-data-management-chemomentum_en.pdf ( ) 9. Sim, A., Shoshani, A. et al. (eds.): The Storage Resource Manager Interface Specification Version 2.2. OGF Corso E., Cozzini S., Forti A., Ghiselli A., Magnoni L., Messina A., Nobile A., Terpin A., Vagnoni V., Zappi R.: StoRM: A SRM solution on disk based storage system. Proceedings of the Cracow Grid Workshop 2006, Kraków Grid Data Management DPM, URL: lcgdm/wiki/dpm ( ) 12. The dcache Book 1.9.5, URL: Book-1.9.5/Book-a4.pdf ( ) 13. The CASTOR project, ( ) 14. The LFC webpage, ( ) 9
Analisi di un servizio SRM: StoRM
27 November 2007 General Parallel File System (GPFS) The StoRM service Deployment configuration Authorization and ACLs Conclusions. Definition of terms Definition of terms 1/2 Distributed File System The
Anwendungsintegration und Workflows mit UNICORE 6
Mitglied der Helmholtz-Gemeinschaft Anwendungsintegration und Workflows mit UNICORE 6 Bernd Schuller und UNICORE-Team Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH 26. November 2009 D-Grid
Glassfish Architecture.
Glassfish Architecture. First part Introduction. Over time, GlassFish has evolved into a server platform that is much more than the reference implementation of the Java EE specifcations. It is now a highly
Distributed File Systems
Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL
CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
Technical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
Distributed Database Access in the LHC Computing Grid with CORAL
Distributed Database Access in the LHC Computing Grid with CORAL Dirk Duellmann, CERN IT on behalf of the CORAL team (R. Chytracek, D. Duellmann, G. Govi, I. Papadopoulos, Z. Xie) http://pool.cern.ch &
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,[email protected]
Scalable Multiple NameNodes Hadoop Cloud Storage System
Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai
Deploying Business Virtual Appliances on Open Source Cloud Computing
International Journal of Computer Science and Telecommunications [Volume 3, Issue 4, April 2012] 26 ISSN 2047-3338 Deploying Business Virtual Appliances on Open Source Cloud Computing Tran Van Lang 1 and
HDFS Users Guide. Table of contents
Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9
Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
Web Application Hosting Cloud Architecture
Web Application Hosting Cloud Architecture Executive Overview This paper describes vendor neutral best practices for hosting web applications using cloud computing. The architectural elements described
Use Enterprise SSO as the Credential Server for Protected Sites
Webthority HOW TO Use Enterprise SSO as the Credential Server for Protected Sites This document describes how to integrate Webthority with Enterprise SSO version 8.0.2 or 8.0.3. Webthority can be configured
Web Service Based Data Management for Grid Applications
Web Service Based Data Management for Grid Applications T. Boehm Zuse-Institute Berlin (ZIB), Berlin, Germany Abstract Web Services play an important role in providing an interface between end user applications
Data Security and Governance with Enterprise Enabler
Copyright 2014 Stone Bond Technologies, L.P. All rights reserved. The information contained in this document represents the current view of Stone Bond Technologies on the issue discussed as of the date
irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire [email protected] 25th
irods and Metadata survey Version 0.1 Date 25th March Purpose Survey of Status Complete Author Abhijeet Kodgire [email protected] Table of Contents 1 Abstract... 3 2 Categories and Subject Descriptors...
Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF
Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides
Running a Workflow on a PowerCenter Grid
Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
Distributed File Systems
Distributed File Systems File Characteristics From Andrew File System work: most files are small transfer files rather than disk blocks? reading more common than writing most access is sequential most
E-mail Listeners. E-mail Formats. Free Form. Formatted
E-mail Listeners 6 E-mail Formats You use the E-mail Listeners application to receive and process Service Requests and other types of tickets through e-mail in the form of e-mail messages. Using E- mail
Google File System. Web and scalability
Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might
UNI. UNIfied identity management. Krzysztof Benedyczak ICM, Warsaw University
UNI TY UNIfied identity management Krzysztof Benedyczak ICM, Warsaw University Outline The idea Local database Groups, Entities, Identities and Attributes UNITY Authorization Local authentication Credentials
Deploying a distributed data storage system on the UK National Grid Service using federated SRB
Deploying a distributed data storage system on the UK National Grid Service using federated SRB Manandhar A.S., Kleese K., Berrisford P., Brown G.D. CCLRC e-science Center Abstract As Grid enabled applications
Oracle Service Bus Examples and Tutorials
March 2011 Contents 1 Oracle Service Bus Examples... 2 2 Introduction to the Oracle Service Bus Tutorials... 5 3 Getting Started with the Oracle Service Bus Tutorials... 12 4 Tutorial 1. Routing a Loan
Learning Management Redefined. Acadox Infrastructure & Architecture
Learning Management Redefined Acadox Infrastructure & Architecture w w w. a c a d o x. c o m Outline Overview Application Servers Databases Storage Network Content Delivery Network (CDN) & Caching Queuing
Working With Virtual Hosts on Pramati Server
Working With Virtual Hosts on Pramati Server 13 Overview Virtual hosting allows a single machine to be addressed by different names. There are two ways for configuring Virtual Hosts. They are: Domain Name
Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra
THE CCLRC DATA PORTAL
THE CCLRC DATA PORTAL Glen Drinkwater, Shoaib Sufi CCLRC Daresbury Laboratory, Daresbury, Warrington, Cheshire, WA4 4AD, UK. E-mail: [email protected], [email protected] Abstract: The project aims
Jitterbit Technical Overview : Salesforce
Jitterbit allows you to easily integrate Salesforce with any cloud, mobile or on premise application. Jitterbit s intuitive Studio delivers the easiest way of designing and running modern integrations
Single Sign On for UNICORE command line clients
Single Sign On for UNICORE command line clients Krzysztof Benedyczak ICM, Warsaw University Current status of UNICORE access Legacy certificates still fully supported nice on home workstation, especially
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Hypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
MEGA Web Application Architecture Overview MEGA 2009 SP4
Revised: September 2, 2010 Created: March 31, 2010 Author: Jérôme Horber CONTENTS Summary This document describes the system requirements and possible deployment architectures for MEGA Web Application.
IBM Tivoli Storage Manager Version 7.1.4. Introduction to Data Protection Solutions IBM
IBM Tivoli Storage Manager Version 7.1.4 Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.4 Introduction to Data Protection Solutions IBM Note: Before you use this
Tier Architectures. Kathleen Durant CS 3200
Tier Architectures Kathleen Durant CS 3200 1 Supporting Architectures for DBMS Over the years there have been many different hardware configurations to support database systems Some are outdated others
Cluster, Grid, Cloud Concepts
Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of
How To Install Powerpoint 6 On A Windows Server With A Powerpoint 2.5 (Powerpoint) And Powerpoint 3.5.5 On A Microsoft Powerpoint 4.5 Powerpoint (Powerpoints) And A Powerpoints 2
DocAve 6 Service Pack 1 Installation Guide Revision C Issued September 2012 1 Table of Contents About the Installation Guide... 4 Submitting Documentation Feedback to AvePoint... 4 Before You Begin...
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
AN APPROACH TO DEVELOPING BUSINESS PROCESSES WITH WEB SERVICES IN GRID
AN APPROACH TO DEVELOPING BUSINESS PROCESSES WITH WEB SERVICES IN GRID R. D. Goranova 1, V. T. Dimitrov 2 Faculty of Mathematics and Informatics, University of Sofia S. Kliment Ohridski, 1164, Sofia, Bulgaria
Grid Scheduling Dictionary of Terms and Keywords
Grid Scheduling Dictionary Working Group M. Roehrig, Sandia National Laboratories W. Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Document: Category: Informational June 2002 Status
Architecture and Mode of Operation
Software- und Organisations-Service Open Source Scheduler Architecture and Mode of Operation Software- und Organisations-Service GmbH www.sos-berlin.com Scheduler worldwide Open Source Users and Commercial
The Best of Both Worlds Sharing Mac Files on Windows Servers
The Best of Both Worlds Sharing Mac Files on Windows Servers March, 2008 1110 North Glebe Road Suite 450 Arlington, VA 22201 phone: 800.476.8781 or +1.703.528.1555 fax: +1.703.527.2567 or +1.703.528.3296
Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK [email protected] tel: +44 845 250 4470
Product Guide What is Sawmill Sawmill is a highly sophisticated and flexible analysis and reporting tool. It can read text log files from over 800 different sources and analyse their content. Once analyzed
The Google File System
The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:
Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf
Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Rong Gu,Qianhao Dong 2014/09/05 0. Introduction As we want to have a performance framework for Tachyon, we need to consider two aspects
Data Mining in the Swamp
WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and
Snapshots in Hadoop Distributed File System
Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any
DataNet Flexible Metadata Overlay over File Resources
1 DataNet Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC Cyfronet AGH, 2 AGH University of Science and Technology,
Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms
Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of
Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759
Web Applications Access Control Single Sign On
Web Applications Access Control Single Sign On Anitha Chepuru, Assocaite Professor IT Dept, G.Narayanamma Institute of Technology and Science (for women), Shaikpet, Hyderabad - 500008, Andhra Pradesh,
Onegini Token server / Web API Platform
Onegini Token server / Web API Platform Companies and users interact securely by sharing data between different applications The Onegini Token server is a complete solution for managing your customer s
White paper. The Big Data Security Gap: Protecting the Hadoop Cluster
The Big Data Security Gap: Protecting the Hadoop Cluster Introduction While the open source framework has enabled the footprint of Hadoop to logically expand, enterprise organizations face deployment and
Active Directory Compatibility with ExtremeZ-IP
Active Directory Compatibility with ExtremeZ-IP A Technical Best Practices White Paper Group Logic White Paper October 2010 About This Document The purpose of this technical paper is to discuss how ExtremeZ-IP
Deployment Guide: Unidesk and Hyper- V
TECHNICAL WHITE PAPER Deployment Guide: Unidesk and Hyper- V This document provides a high level overview of Unidesk 3.x and Remote Desktop Services. It covers how Unidesk works, an architectural overview
Subversion Integration for Visual Studio
Subversion Integration for Visual Studio VisualSVN Team VisualSVN: Subversion Integration for Visual Studio VisualSVN Team Copyright 2005-2008 VisualSVN Team Windows is a registered trademark of Microsoft
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
An Oracle White Paper May 2011. Oracle Tuxedo: An Enterprise Platform for Dynamic Languages
An Oracle White Paper May 2011 Oracle Tuxedo: An Enterprise Platform for Dynamic Languages Introduction Dynamic languages, also sometimes known as scripting languages, have been in existence for a long
A Grid Architecture for Manufacturing Database System
Database Systems Journal vol. II, no. 2/2011 23 A Grid Architecture for Manufacturing Database System Laurentiu CIOVICĂ, Constantin Daniel AVRAM Economic Informatics Department, Academy of Economic Studies
Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week
Michael Thomas, Dorian Kcira California Institute of Technology CMS Offline & Computing Week San Diego, April 20-24 th 2009 Map-Reduce plus the HDFS filesystem implemented in java Map-Reduce is a highly
Introduction CORBA Distributed COM. Sections 9.1 & 9.2. Corba & DCOM. John P. Daigle. Department of Computer Science Georgia State University
Sections 9.1 & 9.2 Corba & DCOM John P. Daigle Department of Computer Science Georgia State University 05.16.06 Outline 1 Introduction 2 CORBA Overview Communication Processes Naming Other Design Concerns
Status and Evolution of ATLAS Workload Management System PanDA
Status and Evolution of ATLAS Workload Management System PanDA Univ. of Texas at Arlington GRID 2012, Dubna Outline Overview PanDA design PanDA performance Recent Improvements Future Plans Why PanDA The
SSL VPN Technology White Paper
SSL VPN Technology White Paper Keywords: SSL VPN, HTTPS, Web access, TCP access, IP access Abstract: SSL VPN is an emerging VPN technology based on HTTPS. This document describes its implementation and
R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 [email protected], 2 [email protected],
HDFS Architecture Guide
by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
White Paper: Evaluating Big Data Analytical Capabilities For Government Use
CTOlabs.com White Paper: Evaluating Big Data Analytical Capabilities For Government Use March 2012 A White Paper providing context and guidance you can use Inside: The Big Data Tool Landscape Big Data
SCALABLE DATA SERVICES
1 SCALABLE DATA SERVICES 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2 Overview MySQL Database Clustering GlusterFS Memcached 3 Overview Problems of Data Services 4 Data retrieval
Architecture and Mode of Operation
Open Source Scheduler Architecture and Mode of Operation http://jobscheduler.sourceforge.net Contents Components Platforms & Databases Architecture Configuration Deployment Distributed Processing Security
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Cyber Forensic for Hadoop based Cloud System
Cyber Forensic for Hadoop based Cloud System ChaeHo Cho 1, SungHo Chin 2 and * Kwang Sik Chung 3 1 Korea National Open University graduate school Dept. of Computer Science 2 LG Electronics CTO Division
White Paper November 2015. Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses
White Paper November 2015 Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses Our Evolutionary Approach to Integration With the proliferation of SaaS adoption, a gap
Hadoop. History and Introduction. Explained By Vaibhav Agarwal
Hadoop History and Introduction Explained By Vaibhav Agarwal Agenda Architecture HDFS Data Flow Map Reduce Data Flow Hadoop Versions History Hadoop version 2 Hadoop Architecture HADOOP (HDFS) Data Flow
elearning Content Management Middleware
elearning Content Management Middleware Chen Zhao Helsinki 18.2.2004 University of Helsinki Department of Computer Science Authors Chen Zhao Title elearning Content Management Middleware Date 18.2.2004
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
Oracle Service Bus. Situation. Oracle Service Bus Primer. Product History and Evolution. Positioning. Usage Scenario
Oracle Service Bus Situation A service oriented architecture must be flexible for changing interfaces, transport protocols and server locations - service clients have to be decoupled from their implementation.
Building a SaaS Application. ReddyRaja Annareddy CTO and Founder
Building a SaaS Application ReddyRaja Annareddy CTO and Founder Introduction As cloud becomes more and more prevalent, many ISV s and enterprise are looking forward to move their services and offerings
Active Directory Compatibility with ExtremeZ-IP. A Technical Best Practices Whitepaper
Active Directory Compatibility with ExtremeZ-IP A Technical Best Practices Whitepaper About this Document The purpose of this technical paper is to discuss how ExtremeZ-IP supports Microsoft Active Directory.
ZooKeeper. Table of contents
by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...
