A High-Performance Virtual Storage System for Taiwan UniGrid

Journal of Information Technology and Applications Vol. 1 No. 4 March, 2007, pp. 231-238 A High-Performance Virtual Storage System for Taiwan UniGrid Chien-Min Wang; Chun-Chen Hsu and Jan-Jan Wu Institute of Information Science, Academia Sinica, Taipei, Taiwan {cmwang, seeme, tk}@iis.sinica.edu.tw Hsi-Min Chen Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan wuj@iis.sinica.edu.tw Abstract In Taiwan, a community of educational and research organizations interested in Grid computing technologies founded a Grid computing platform, called Taiwan UniGrid. Taiwan UniGrid consists of three primary portions: Computational Grid, Data Grid, and Web Portal. In this paper, we present the development of a virtual data storage system for Taiwan UniGrid. In addition to developing basic data storage functions, we identify three main requirements of the current development: high-performance data transfer, data sharing and single sing-on. For these requirements, we come up with three corresponding features in our data storage system: Self-Adaptation for high-performance data transfer, forming user groups and specifying admission control for data sharing, and adopting GSI authentication to enable single sing-on. Besides, we also develop a Java-based graphic user interface of the storage system that allows Grid users to manage data transparently as using local file systems. Keyword: Data Grid, data storage system, data transfer, web service, and single sign-on. 1. Introduction With the rapid growth of computing power and storage capacity of computers, many researchers and scientists have been concentrated on the development of various Grid systems to efficiently utilize distributed computing and storage resources in the recent years. In Taiwan, a community of educational and research organizations interested in Grid computing technologies founded a Grid computing platform, called Taiwan UniGrid [1]. These organizations contribute their resources of computer clusters for sharing and collaboration. The objective of Taiwan UniGrid is to provide educational and research organizations with a powerful computing platform where they can study Grid-related issues, practice parallel programming on Grid environments and execute computing/dataintensive applications. As similar to other Grid systems, Taiwan UniGrid consists of three primary portions: Computational Grid, Data Grid and Web Portal. Computational Grid is responsible for managing scattered and heterogeneous computing resources and scheduling the jobs submitted by users. Data Grid is a virtual storage infrastructure that integrates distributed, independently managed data resources and allows users to save and retrieve their data with ease. Web Portal, developed by National Tsing Hua University, is a uniform user interface by which Grid users can design workflow, submit jobs, manage data, monitor job and resource status, etc. In this paper, we will present the development of the data management system for Taiwan UniGrid. As the distribution of storage resources and the growth of data size, the needs for efficient Grid data management are continuously increasing. In these years, many research and scientific organizations have engaged in building data management and storage tools for Grids, such as SDSC SRB (Storage Resource Broker) [2], SciDAC Data Grid Middleware [3], GriPhyN Virtual Data System [4], etc. SRB is a general Data Grid middleware that integrates distributed and heterogeneous storage resources and provides virtualized access interface. It has been a production data management tool and adopted by several Grid projects. Thus, among these tools, we decide to build our virtual storage system for Taiwan UniGrid based on SRB, while developing additional features that are not well supported by SRB. Before implementing the virtual storage system, we elicited requirements from the user and manager needs. Herein, in additional to the basic Data Grid functions provided by SRB, we identify three main requirements of the current development listed as follows. High-performance data transfer: Since the size of data generated by scientific instruments and Grid applications has grown into the range of Terabytes, large data transfer over the Internet usually leads to a long latency and becomes a bottleneck for job executions. Thus, the need for

A High-Performance Virtual Storage System for Taiwan UniGrid high-performance data transfer is an important issue in Taiwan UniGrid. Data sharing: Two important concepts of Grids are sharing and collaboration. Grid users, such as scientists and researchers, are accustomed to retrieve data collected by remote scientific instruments, analyze these retrieved data via various analysis tools, and share the analyzed results for further processing. Therefore, how to facilitate Grid users to contribute or get shared data with ease is a crucial requirement in the development of a data management system. Single sign-on: In essence, physical resources within a Grid system are distributed in different organizations and managed independently. Each organization has its own security policy. Without single sign-on mechanisms, Grid users have to keep a list of accounts for each machine by themselves. This becomes an obstacle for users to use Grid systems. Hence, we have to take the problem of single sign-on into account when we integrate our system with Computing Grid and UniGrid Portal. Consequently, in our system, we come up with three features with respect to the corresponding requirements. For high-performance data transfer, we propose a multi-source data transfer algorithm, called Self Adaptation [5], which can speed up the data transfer rate in data replication, downloading, moving, and copying. For data sharing, our system allows Grid users to share their data in a manner of forming user groups and specifying admission control on each data object. For the issue of single sign-on, we choose GSI (Grid Security Infrastructure) [6] as our user certification mechanism by which Grid users only have to login once and utilize Grid resources through certificates, so that they have no need to keep all accounts for each machine. Besides these features, we also develop a Java-based graphic user interface of the storage system that allows Grid users to manipulate data transparently as using local file systems. The remainder of the paper is organized as follows. In Section 2, we explain the system framework and deployment. Section 3 presents main features, including multi-source data transfer, data sharing, single sign-on, and the data management client. An operational scenario of Taiwan UniGrid is demonstrated in Section 4. Finally, we present some concluding remarks in the last section. 2. System Framework and Deployment Figure 1 shows the framework of our virtual storage system. In the server side, the left bottom of the framework is a set of physical storage resources, including hard disks, tapes and databases, contributed by the members of Taiwan UniGrid. We adopt SRB as a data management middleware to integrate these scattered storage resources. It provides a list of data and storage management functions. Although SRB has furnished an efficient data transfer approach by using multiple TCP streaming, we propose an alternative, called Self Adaptation, to get a higher data transfer rate in comparison with the original one. We will explain the detail of Self Adaptation in section 3. Therefore, we add the alternative (Self Adaptation patch) into the original functions of SRB. A set of extended SRB APIs are built on top of SRB and the Self Adaptation patch. The extended SRB APIs consist of primary APIs provided by SRB and the APIs for highperformance data transfer, such as MSDTReplicate() and MSDTCopy(). Figure 1. The framework of the virtual storage system for Taiwan UniGrid. The right of the server side of the framework is a number of Web services used for data management. Web service technologies are playing an increasingly important role in the new generation of Grids. Such technologies encapsulate heterogeneous software components, legacy systems and resources as services and simply describe their interfaces in a standard description language, i.e. WSDL [7]. Service providers can advertise their services in a registry, i.e., the UDDI [8] server for clients to browse. If clients want to use the services advertised in a registry, the SOAP [9] technology helps them access the services through standard transport protocols, such as HTTP and SMTP. Therefore, we adopt Web services technologies in our system to integrate other software developed by third parties. There are two services implemented in the current system: the AutoReplicator service and the Account Management service. The AutoReplicator service is developed by Chung Hua University. Grid users can utilize it to set various replication policies. We develop the Account Management service to wrap up the functions of user authentication in UniGrid Portal for single sign-on. In the client side, the bottom is the data management library for UniGrid which connects to the corresponding server-side extended SRB APIs

Journal of Information Technology and Applications Vol. 1 No. 4 March, 2007, pp. 231-238 and data management services. We implemented two versions of the library. One is Java-based and another is C-based. The data management library provides a uniform interface of data and storage management by which programmers can build various Grid applications to access the underling storage resources. the data objects. Then the SRB server will automatically ask the MCAT server, which registers the SRB server, to update the metadata of the operated data and synchronize with other MCAT servers. Thus, a Grid user can login to one of SRB servers that are closest to him/her and utilize all storage resources in any zone of Taiwan UniGrid. 3. Main Features In this section, we present the main features, including multi-source data transfer, data sharing, single sign-on, which implement the requirements listed in Section 1. In addition, we also develop a friendly graphic user interface of the virtual storage system that allows Grid users to manage their data as using local file systems. Figure 2. The deployment of the virtual storage system for Taiwan UniGrid. Figure 2 presents the deployment of our virtual storage system. Since there is a huge amount of storage resources distributed in Taiwan UniGrid, using a single information server to maintain the metadata regarding users, data and storages may cause the problems of server overloading and single point of failure. To avoid these problems, we divided all storage resources in Taiwan UniGrid into five zones, i.e. Taipei_UuiGrid, Hsinchu_UniGrid, Taichung_UniGrid, Tainan_UniGrid and Hualien_UniGrid. Each zone has a MCAT (SRB Metadata Catalog) server installed for maintaining the metadata of the users, data and storage resources. To enable the flexibility of sharing, the administrators of a MCAT server can specify their won sharing policies, for instance, some resources can be shared with users registered in other zones, but some are utilized in private. In addition, each MCAT server periodically synchronizes its metadata with each other to keep the metadata consistency among zones. By synchronization, Grid users registered in one zone can access any storage resources located in other zones and retrieve sharing data timely. The members of Taiwan UniGrid can contribute their storage resources by setting up SRB servers. Each SRB server consists of one or more storage resources and is registered to a MCAT server. Gird users can manipulate data objects in a storage resource of a SRB server, for example uploading data objects, creating replicas and modifying metadata of Figure 3 (a) The replica selection approach. (b) The multi-source data transfer approach. 3.1. Multi-source Data Transfer To achieve high-performance data transfer, data replication has been a widely used technique that facilitates a Grid user to select a best replica site closest to the specific destination and transfer the selected replica to it. Instead of transferring data from the source site, selecting the best replica can reduce the data transfer time on the Internet. A number of approaches have been proposed for selecting the best replica based on various criteria [10][11][12]. However, as shown in Figure 3(a), since such an approach only allows users to specify one replica for transfer in each selection, they have two major shortcomings: When several replicas have almost the same network performance, choosing a slightly better replica and discarding all others does not fully utilize network resources. Selecting only one replica may degrade transfer reliability because, if the connection to the selected replica fails, it has to execute the

A High-Performance Virtual Storage System for Taiwan UniGrid selection algorithm again and reconnect to other replicas. Some multi-source data transfer mechanisms have been presented recently to solve the above problems [13][14], whereby a transferred data object can be assembled in parallel from multiple distributed replica sources as shown in Figure 3(b). To improve the data transfer rate, we propose an efficient data transfer algorithm, called Self- Adaptation. It not only enables the data transfer from multiple replica sites as other multi-source data transfer algorithms, but is also more adaptive to the variability of network bandwidths. Self-Adaptation assigns proper segments of transferred data to each replica site based on the overhead and bandwidth measured from the previous data transfer, so that it can achieve higher aggregate bandwidth. More information of Self-Adaptation and performance comparisons with other approaches can be found in [5]. Multi-source data transfer is the major contribution to the development of the data storage system. In the client-side library of the current system, we implement three alternative functions of data transfer based on Self-Adaptation to enable high-performance data transfer. MSDTDownload(): Grid users or programs can download data objects to their local file systems and the downloaded objects are reassembled in parallel from the source and replica sites. MSDTReplicate(): Grid users or programs, for example the AutoReplicator service, can make new data replicas to the specified destination resources and the new replicas are reassembled in parallel from the source and replica sites. MSDTCopy(): Grid users or programs can make copies of data objects to the specified directories of the virtual storage system and the copies are reassembled in parallel from the source and replica sites of the original data objects. 3.2. Data Sharing According to our experience, we found that Grid users usually need a platform where they can work collaboratively. Although most Data Grid middleware provides the sharing of storage resources, data sharing for collaborative work is not well supported. Therefore, in our system, we develop a collaborative platform through the combinations of forming user groups and specifying access permissions on each data object. A group of users who need to work collaboratively can ask the administrators to form a user group. For instance, a user group can be built according to some research topics in which a group of users are interested. Each Grid user can take part in many user groups simultaneously as long as he/she gets the grants from the administrators. Once an administrator creates a user group, the system will create a group workspace, i.e. a group home directory, for sharing and collaboration. Each group workspace can assign one or more owners to manage the admission of the workspace. In general, Grid users have their own personal workspace, i.e. a user home directory, where they can manage their private data objects. Data objects can be files, directories, replicas or links. Grid users can share their private data objects with others via specifying access permissions on data objects. Figure 4 shows a screenshot of admission control for data sharing, by which Grid users can specify read or write permission for each data object to other users or groups. It also supports the owner change of a specific data object. On the other hand, Grid users can share their data by uploading or copying private data objects directly to the group workspaces. Figure 4. Admission control for data sharing. 3.3. Single Sign-on Because software components and resources within a Grid system are distributed in different organizations and managed independently, using one account for a Grid user to utilize all these software components and resources becomes a crucial issue. GSI (Grid Security Infrastructure) [6] is a promising solution to the issue in Grids. GSI uses X.509 certificates to securely authenticate users across the network. Moreover, SRB supports two main methods of authentication, GSI and an SRB secure password system known as Encrypt1. GSI in SRB makes use of the same certificates and Public Key Infrastructure (PKI) [15] as do Globus Toolkit [16] such as GridFTP [17]. Since we adopt Globus Toolkit as the middleware for managing computing resources, in order to enable the single sign-on for utilizing Computing Grid and Data Grid, we choose GSI as the main user authentication mechanism in our system. To use Taiwan UniGrid, Grid users have to register in UniGrid Portal first. The users will receive certificates issued from UniGrid CA after approved by system administrators. Meanwhile, the users profiles are also registered to Computing Grid and Data Grid, i.e. Globus and SRB. Once users want

Journal of Information Technology and Applications Vol. 1 No. 4 March, 2007, pp. 231-238 to use Taiwan UniGrid, they can login to UniGrid Portal through their certificates and the system will automatically generate corresponding secure proxy certificates which are good for a few hours to submit jobs and manage data in distributed resources. Figure 5. The cross-zone problem. However, the current implementation of SRB does not well support the resource utilization cross difference zones by GSI authentication. As shown in Figure 5, for example, Grid_User1 and SRB_Server1 are registered in Zone A, as well as SRB_Server2 is registered in Zone B. If we adopt the Java-based client-side APIs, named Jargon, provided by SRB, Grid_User1 connecting to SRB_Server2 by GSI authentication will be failed to access the resources (Resouce3 and Resource4) in Zone B. We call this incident as the cross-zone problem. At present, SRB only supports the access to cross-zone resources through secure password authentication, Encrypt1. Since we deployed our system in five zones and developed Self-Adaptation to reassemble data objects in parallel from multiple replica sources, which may be located in different zones, it causes the cross-zone problem. We will address this problem from two perspectives, users and programs, in the following paragraphs. From the perspective of users, we intent to make Grid users login once by certificates and launch the data management client to manipulate their data without concerning with the cross-zone problem. Thus, we propose an authentication process, as shown in Figure 6, to enable single sing-on for UniGrid Portal and the data management client. Figure 6. The proposed authentication process enabling single sing-on for UniGrid Portal and the data management client. After a Grid user logins to the portal successfully, the portal asks the Account Management service to create a session and returns necessary information, including a generated session key and a profile for SRB to connect. The Grid user can launch the data management client to access data in storage resources after login to the portal. While launching the data management client, the portal passes the session key and SRB-related information to the client and then the client uses the session key to obtain the user s password through SSL from the Account Management service. Finally, the client uses the password and SRB-related information to connect to a SRB server in Encrypt1. Once connecting successfully, the Account Management service removes the session. This prevents malicious users from using the cached session keys to retrieve passwords from the Account Management service. Figure 7. The proposed authentication process enabling single sing-on for computing nodes. From the perspective of programs, Resource Broker delegates submitted jobs to computing nodes with limited proxy certificates, not full proxy certificate, for authentication. However, in the current implementation of SRB, the limited proxy certificates will be failed in accessing storage resources located in different zones. Only full proxy certificates are allowed to access the cross-zone resources in SRB. Hence, we propose an authentication process, as shown in Figure 7, to deal with this problem. After Resource Broker submits jobs to computing nodes with limited proxy certificates, the computing nodes use the limited proxy certificates to get full proxy certificates from the Account Management service. Finally, the nodes can connect to SRB servers located in different zones with full proxy certificates and access programs and data in storage resources. 3.4 The Data Management Client We develop two kinds of clients of the virtual storage system. One is Java-based standalone version not integrated with UniGrid Portal and Computing Gird. It is suitable for users who just want to store

A High-Performance Virtual Storage System for Taiwan UniGrid their data without the need of computation support. Another one is Java Web Start version which is embedded in UniGrid Portal. Grid users can launch the client directly from UniGrid Portal after they login. Figure 8. A screenshot of the data management client. Figure 8 shows a screenshot of the data management client. The left of the client is the file and directory list of local storage drives and the right is the file and directory list of SRB storage drives. Once Grid users login to our system, the system directs them to their default home directories automatically, and then they can access data or traverse the whole storage system. As shown in Table 1, for various data objects, we provide difference operations on them in the current implementation. Data object Operations File download, upload, delete, copy, paste, rename Directory download, upload, delete, copy, paste, rename Link create, download, delete, copy, paste, rename Replica create, delete Table 1. The supported operations for data objects in the virtual storage system. Unlike other FTP systems, our system allows users to specify resources, for instance closest resources, to store uploaded data. An uploaded data object can further be made several copies, i.e. replicas, disturbed in different resources for reliability and efficiency of data transfer. In addition to creating replicas by users, we also integrate the AutoReplicator service in the client. Users can set replica policies on data objects via the client. The AutoReplicator service will automatically create replicas according to the specified policies. Furthermore, through the data management client, users can also specify access permissions on data objects, as shown in Figure 4, for sharing. 4. Operation Scenario of Taiwan UniGrid In this section, we will demonstrate an operation scenario of using Taiwan UniGrid. Figure 9 shows the major components of Taiwan UniGrid and their interactions. The high-level operation scenario is explained as follows. A Grid user logins to UniGrid Portal by entering his/her account and password and UniGrid Portal employs Account Management service to verify user s identity. If login successfully, UniGrid Portal directs the user to his/her working web page, as shown in Figure 10. He/she launches the data management client (Figure 8) and uploads programs and data needed for the jobs, which will be submitted later, to the data storage system. The user makes an execution plan for a job or designs a workflow of jobs on the working web page. Once the user has submitted a job, the portal asks Resource Broker to select computing resources based on the requirement of the submitted job. Resource Broker assigns the submitted job to the selected computing nodes. The selected computing nodes then retrieve programs and data from the storage system. The selected computing nodes start computing. Once all computing nodes finish their work, the computed results are merged and stored back to the storage system. For reliability, the newly stored data can be replicated to other storage resources by the user or the AutoReplocator service.

Journal of Information Technology and Applications Vol. 1 No. 4 March, 2007, pp. 231-238 Figure 9. The major components of Taiwan UniGrid and their interactions. 5. Concluding Remarks In this paper, we present the development of a high-performance virtual storage system for Taiwan UniGrid. We employ SRB (Storage Resource Broker) as a basis to implement the functions of the storage system. Besides, we identify three main requirements in the current implementation: high-performance data transfer, data sharing, and single sign-on. To meet these requirements, we propose the corresponding features: Self-Adaptation for highperformance data transfer, forming user groups and specifying admission control for data sharing, and adopting GSI authentication to enable single sing-on. We also develop a Java-based user interface of the storage system allowing Grid users to manage their data transparently without concerning the low-level deployment of storage resources. In the future, we will continue improving our system to make it more powerful and useful. Acknowledgement This work was supported in part by the National Center for High-performance Computing under the national project, Taiwan Knowledge Innovation National Grid, and in part by National Science Council under Contract No. NSC95-2221-E-001-002. Figure 10. UniGrid Portal. Reference [1] Taiwan UniGrid, http://unigrid.nchc.org.tw. [2] Chaitanya Baru, R. Moore, A. Rajasekar and M. Wan, The SDSC storage resource broker, CASCON '98: Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research, Canada, 1998, also available at http://www.sdsc.edu/srb. [3] B. Allcock, A. Chervenak, I. Foster, C. Kesselman, and M. Livny, Data Grid tools: enabling science on big distributed data, Journal of Physics: Conference Series 16, 2005, also available at http://wwwfp.mcs.anl.gov/dsl/scidac/datagrid [4] Y. Zhao, M. Wilde, I. Foster, J. Voeckler, J. Dobson, E. Glibert, T. Jordan, and E. Quigg, Virtual Data Grid Middleware Services for Data-Intensive Science, Concurrency and Computation: Practice & Experience, Vol. 18, Issue 6, 2004, also available at http://vds.uchicago.edu/twiki/bin/ view/vdsweb/webmain [5] Chien-Min Wang, C.C. Hsu, H.M. Chen, J.J. Wu, Efficient multi-source data transfer in data grids, 6 th IEEE International Symposium

A High-Performance Virtual Storage System for Taiwan UniGrid on Cluster Computing and the Grid, Singapore, May 2006 [6] I. Foster, C. Kesselman, G. Tsudik, and S. Tuecke, A security architecture for computational grids, In ACM Conference on Computers and Security, pages 83 91, ACM Press, 1998. [7] WSDL: Web Services Description Language 1.1. Available at http://www.w3.org/tr/wsdl [8] UDDI: Universal Description, Discovery and Integration. Available at http://www.uddi.org, 2001. [9] SOAP: Simple Object Access Protocol 1.1. Global Grid Forum, available at http://www.w3.org/tr/soap [10] Bill Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke, Data management and transfer in high-performance computational grid environment, Parallel Computing, 28(5):749-771, 2002. [11] Kavitha Ranganathan and I. Foster, Design and evaluation of dynamic replication strategies for a high performance data grid, In International Conference on Computing in High Energy and Nuclear Physics, 2001 [12] S. Vazhkudai, S. Tuecke, and I. Foster, Replica selection in the globus data grid, In 1 st International Symposium on Cluster Computing and the Grid, pages 106-113, 2001. [13] Jun Feng and M. Humphrey, Eliminating Replica Selection - Using Multiple Replicas to Accelerate Data Transfer on Grids, In 10 th International Conference on Parallel and Distributed Systems (ICPADS 2004), pages 359-366, 2004. [14] C.T. Yang, S.Y. Wang, C.H. Lin, M.H. Lee, and T.Y Wu, Cyber-Transformer: A Toolkit for Files Transfer with Replica Management in Data Grid Environments, In the 2 nd Workshop on Grid Technologies and Applications (WoGTA 05), Taiwan, 2005 [15] Carlisle Adams and Steve Lloyd, Understanding Public-Key Infrastructure: Concepts, Standards, and Deployment Considerations, New Riders Publishing, 1999. [16] Ian Foster and C. Kesselman, Globus: A Metacomputing Infrastructure Toolkit, The International Journal of Supercomputer Applications and High Performance Computing, vol. 11, No. 2, pp 115-128, 1997. [17] B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke, Data Management and Transfer in High-Performance Computational Grid Environments, Parallel Computing. 2001.