TSRR: A Software Resource Repository for Trustworthiness Resource Management and Reuse

TSRR: A Software Resource Repository for Trustworthiness Resource Management and Reuse Junfeng Zhao 1, 2, Bing Xie 1,2, Yasha Wang 1,2, Yongjun XU 3 1 Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing 100871, China 2 School of Electronics Engineering and Computer Science, Peking University, China 3 Digital China Limited {zhaojf, xiebing, wangys}@sei.pku.edu.cn, xuyja@digitalchina.com Abstract Software reuse is a key technology to improve software quality and software productivity. Software resource repositories, which provide the management mechanism for software resources, are one of the infrastructures of software reuse. The existence of abundant software resources in software resource repositories provides possibility for successful software reuse. In the mean time, assuring the quality of software resources is a critical point to keep the confidence of users to reuse software resources. In this paper we present TSRR, a software resource management system that provide not only effective software resource management but also a software resource search engine on the Internet and trustworthiness management for software resources. The search engine makes TSRR acquire different types of software resources and organize these resources for better retrieval. The trustworthiness management, which includes evidence gathering, evidence trust management and trustworthiness evaluation, provides a mechanism for users to use trustworthy resources. The case study shows that TSRR can effectively help user to select software resources. 1. Introduction Software reuse is a key technology to improve software quality and software productivity. Software resource repositories, which provide the management mechanism of software resources, are one of the infrastructures of software reuse. A software resource is, broadly speaking, any cohesive collection of artifacts that solve a specific problem or set of problems encountered in the software development lifecycle. It may be any work-products from the software development lifecycle or software related activities, such as components, patterns, tools, Web Services, frameworks, solutions, documents, test cases or scripts, and so on. The existence of abundant software resources in software repositories is a key factor for successful software reuse. Currently, user submission is the main way to enrich a software resource repository. However, some users may be reluctant to submit software resources, and therefore user submission may not be a reliable way for a software resource repository to get abundant resources. So a more convenient and automatic way to collect software resources on the Internet is needed. On the other hand, existing software resource repositories lacks mechanisms of guaranteeing the quality of software resources they provide. This will impact the confidence of users to reuse software resources. Hence, it becomes important for software resource repositories to find a way to assure users that the provided software resources are trustworthy. To address these two problems and provide developers a more convenient and trustable platform for resource reuse, based on the Trustie (Trust integrated environment) platform, we have developed the Trustie Software Resource Repository (abbreviated as TSRR). TSRR provides a mechanism to describe, collect, evaluate, classify and manage trustworthiness of software resources, to support trustworthy software development. It cannot only support software reuse process but also provide a resource sharing platform among projects. In order to enrich resources in TSRR, we developed a software resource search engine on the Internet. In addition, TSRR provide mechanisms to manage trustworthy software resources and manage their evaluation information. Thus it can make developers and endusers to make full use of software resources to build their own applications with good quality. TSRR is an open system to support the whole process of software development using existing software resources. It provides web service APIs, which can support SOAbased application. The implementation of TSRR is based on JO 2 nas[1] and MySQL[2 ], so it makes TSRR an open repository to support software reuse. Moreover TSRR provides an Eclipse plug-in, which

can integrate with software development environment to support better software reuse and sharing. 2. Related work Besides our TSRR, there are other software resource repositories like REBOOT [3], Agora [4], CodeBroker [5], OSOR.EU [6], SourceForge.Net [7], Component-Source [8], and Download.com [9]. REBOOT (Reuse Based on Object Oriented Techniques) is a famous repository in 90 s. It aims to push the research and development of software reuse. It is comprised by a library to store components and a series of tools to support component publishing, retrieval, classification, selection and evaluation. Agora is a component search engine developed by CMU. It is designed for searching components (like JavaBeans, ActiveX, CORBA etc.) on the Internet. CodeBroker is a repository prototype to realize seamless integration between a repository and a code editing tool, and thus it can provide active inquire services. CodeBroker mainly stores Java classes to support Java related software development. The OSOR.EU project has implemented a repository and a collaborative development environment. It aims to provide a library where software, documentation and knowledge can be easily accessible according to a specific taxonomy. Component-Source, SourceForge.Net and Download.com are all commercial software resource repositories that provide business component trading through their web sites. Although all of these software resource repositories address the problem of the management of software resource, only TSRR supports the management of trustworthiness for software resources. Moreover it provides a software resource search engine on the Internet, which can make TSRR abound with different types of software resources. 3. The Framework of TSRR TSRR aims to provide a software resource management mechanism and a software resources sharing environment. It provides the following functionalities: Software resources acquisition, organization and management on the Internet. Mechanisms to describe, collect, evaluate, classify and manage software resources trustworthiness, to support trust software development. A platform and a series of APIs to support software reuse especially support SOA-based software development. Fig. 1 depicts the framework of TSRR. It has three layers: the storage layer, the function layer and the interface layer. The bottom layer is the storage layer to store information of software resources including code, services, software tools etc. It provides a series of storage security mechanisms, such as backup, recover, access control etc. The storage layer also stores the information of all kinds of evidence to evaluate the trustworthiness of software resources. The evidence can be submitted by users or some tools, such as source code analysis tools and Web Service QoS (Quality of Service) acquiring tools. Fig. 1. The framework of TSRR The middle layer is the function layer to provide the management for trustworthy software resources. The Internet software resource search engine is for a wide range search of different types of software resources and related information. Part of the results can be published to the TSRR as a software resource, the others that have a close relationship with the resources, such as documents, feedbacks etc., can be saved as evidence for trustworthiness evaluation. The publishing mechanism provides the function for publishing resource descriptions, trust evidence, resource entities etc. In order to improve retrieval accuracy, the classification module is for publishers to describe the resource accurately using keyword classification, facet classification, enumeration classification and other classification methods. The software resource retrieval mechanism

provides different ways to find software resources that meet user requirements. For example, it can support users to search resources using different trustworthiness criteria. The user management module is to manage all the information about users including access control information. The trustworthiness evaluation for software resources aims to give an assessment of whether the resource is trustworthy enough for users to use according to the evidence submitted by users or collected at run-time (such as Web Service QoS). The results of trustworthiness evaluation help users to select appropriate software resources. The interface layer provides different access interfaces for users to publish, retrieve, classify, evaluate and manage software resources. User cannot only access the repository via the web but also through Web Service APIs. Thus the repository can be integrated with other software development tools or platforms. 4. TSRR TSRR provides a management mechanism to support different software recourses, such as source code, web services and documents. Furthermore, it provides a software resource search engine on the Internet and trustworthy software resource management to improve the quantity and the quality of the software resources in TSRR. We present them separately in the following subsections. 4.1 Software resource search engine on the Internet In order to provide a large number of software resources for developers, we developed a software resource search engine to collect software resources. Currently, there are a great number of software resources available on the Internet. However, these resources are not well organized and managed, which makes developers spend a lot of time to acquire their desired resources. The software resource search engine can harvest, organize software resources on the Internet and make them well organized for retrieval. The framework of the software resource search engine is depicted in Fig. 2. In the resource harvesting phase, we proposed a resources harvesting mechanism which relies on both the web search engine as well as spiders that concentrate on specific sites. Based on this mechanism, many resources and related information can be obtained. We define a series of file formats to clean the redundant information. In addition to this, we build a feedback module to collect user information that could help us to extract useful information. Relied on the information obtained, we build an extensible model to describe both the resources and the related information. This model can be extended based on different scenarios. Fig2. The framework of the software resource search engine In the resource organization phase, we use a group of algorithms that can identify association relationships among different kinds of information, including the example code recommending algorithm, the similar resource discovery algorithm, the algorithm that links the resources with their developers, and the algorithm that discovers similar texts etc. we provide retrieval support to consider result ranking and to improve the precision and the recall. The example code recommending algorithm extracts related code from the obtained resources and then clusters the code based on their usage of the component and finally ranks the clustered results to provide examples for developers. The similar component discovery algorithm leverages the cooperation relationship among components and makes use of the LSA technique to calculate the similarity between two components. By doing this, we can greatly reduce the efforts for developers in acquiring desired components and improve the efficiency of software reuse. 4.2 Trustworthy software resource management Trustworthy software resource management comprises four parts: evidence model customization, evidence gathering, evidence trustworthiness management, and trustworthiness evaluation and

classification. Fig. 3 depicts trustworthy software resource management in TSRR. Fig3. Trustworthy software resource management in TSRR Evidence model customization is the basis for evidence gathering. It determines what is needed to collect and organize the evidence. As shown in figure 4, the model is a hierarchical model like the 9126 quality model and each node is linked with an evidence type to indicate the source of evidence. The evidence type may be user feedback, resource test data, code analysis data, or Web Service QoS, etc. Considering there may be several evidence models and each model may have the properties of the same meaning, we build synonymous relationships between evidence model properties. Thus it makes full use of evidence and avoids repeated evidence collection. The evidence gathering module collects and stores various kinds of evidence for trustworthy software resource evaluation. Fig4. Evidence model with synonymous relationship Through the establishment of trustworthy relationships between users and evidence submitters, the evidence trustworthiness management module deals with the situation that the collected evidence is false or inaccurate. If a user feels that one piece of subjective evidence (such as the user feedback) cannot be trusted, he or she can give a percentage degree to this evidence. This degree is the possibility of the evidence that can be considered in trustworthiness evaluation. It should be noted that users cannot give a degree to objective evidence like Web Service QoS. Trustworthiness evaluation and classification is the core of trustworthy software resource management. It uses gathered evidence and the user-defined expectation model to evaluate the trustworthiness of software resources. As there may be different requirements and restrictions of specific domains, the expectation model is defined by users to describe their expectation of software resources. The expectation model is also a hierarchical model to describe user requirements. We also build mapping from each user expectation model to one or more evidence model. The mapping defines two things. One thing is to define the inclusion relation between properties in user expectation and evidence models. The relation means that one user-expected property may be calculated or derived by one or more properties in an evidence model. For example, the security can be seen as the summation of availability, integrity and confidentiality. The other is to define the weight degree of each relation. The degree represents the importance that user considered. The value of the degree is between 0 and 1, and the sum total is 1. In order to provide a simple and convenient way to tell users which resource is better and trustable, we establish a software trustworthiness classification specification to classify resources. Now, we divide software into five levels (from 1 to 5, the higher means higher trustworthiness) and each resource in the repository is marked with a trustability level according to the results of trustworthiness evaluation. Considering some evidence is objective like user satisfactory, we adopt fuzzy comprehensive evaluation method to decide the trustability level of the resource. The first step is to normalize the value of every characteristic and attribute in evidence model. Then we create factor set, evaluation set and the weight set to compute the trustability level. The factor set is indicate every quality factor that the resource related, the value of each can be get from the evidence that collect by TSRR. The evaluation set is a judge set from 1 to 5 to indicate the trustability level of resource. The weight set is the weight distribution of each element in factor set that defined by domain/user expectation model. Then we can get a fuzzy matrix to compute the result and give the user recommendations on which one is proper to use.

By doing this, we can greatly reduce the efforts for developers in acquiring desired resources and improve the efficiency and quality of reuse. 5. Case study The scenario of the case is: A service user needs a service to look up information of a book by its ISBN. The user wants the service has trustability level 4 and the degree for availability, response time and satisfaction is 0.4, 0.4, and 0.2. TSRR has already found there are six web services have the similar function of what user required. In addition to this, TSRR has collect these services QoS data and user evolution information by evidence collection tools. Table 1 shows one of the evidence of these service and we can see their quality are different. The question is: which one should be the proper one to meet with user s needs? Table1. The information and QoS of six ISBN web services ID Service Service access address Availabi Response Satisfact Provider lity Time ion WS daehosting http://webservices.daehos91.10% 840ms 92.5% _ISBN_1.com ting.com/services/isbnser vice.wso WS webservicehttp://www.webservicex. 75.5% 810ms 89.5% _ISBN_2 x.com com/isbn.asmx?wsdl WS booksprice http://www.booksprice.c 99.11% 1060ms 87.5% _ISBN_3.com om/isbnconverter.jws?w sdl WS wou.edu http://studentnt.wou.edu/rwessel/cha 98.81% 1900ms 85.5% _ISBN_4 PTER18/ISBN.asmx?WS DL WS xmlme.co http://www.xmlme.com/ 99.7% 285ms 91.5% _ISBN_5 m WSAmazonBox.asmx?w sdl WS pickabook. http://services.pickabook. 99.72% 368ms 93.7% _ISBN_6 co.uk co.uk/service.asmx?ws DL Using the way depicted in previous sections, we can get user expectation model and evidence model with weight relationship as figure 5. Then we can use the fuzzy comprehensive evaluation to compute the trustability level of these services. At first we use the Gaussian normalization method to normalize the evidence data. The factor set is made by the value of the evidence, that is U={Availability, Response Time, Satisfaction}. The evaluation set is V={1,2,3,4,5} and the weight set is A={0.4,0.4,0.2}. After that we get a fuzzy matrix to compute each web service s trustability level. The result we get is WS_ISBN_5 and WS_ISBN_6 are level 4, WS_ISBN_1 is level 3, the others are level 2. So we can recommend WS_ISBN_5 and WS_ISBN_6 to user. Fig5. User Expectation model and Evidence model with weight relationship 6. Conclusion This paper introduces a software resource repository to support software reuse and sharing. The repository provides publishing, retrieving, classifying, evaluating and managing functions to support trustworthy software development. Moreover, it provides an Internet search engine to find more resources to be reused. The case study shows the TSRR effectively help users to select what they want. The future work is to study an automatic way and a more accurate method to calculate the trustability level of software resources. 7. Acknowledgement This work is supported by the National Basic Research Program of China under Grant No. 2009CB320703, the High-Tech Research and Development Program of China under Grant No. 2007AA010301, 2009AA010307, the Science Fund for Creative Research Groups of China under Grant No. 60821003, the National Natural Science Foundation of China under Grant No. 60803011, 60803010. References 1. http://www.ow2.org/view/activitiesdashboard/jonas 2. http://www.mysql.com 3. J. M. Morel, J. Faget, The REBOOT Environment, BULL S. A. Rue Jean JAURES, F-78340 LESCLAYES-SOUS-BIOS, France. 4. Robert C. Seacord, Scott A. Hissam, Kurt C. Wallnau: Agoro-a search engine for component, IEEE Internet Computing, November/December, 1998, pp. 62-70. 5. Yunwen Ye, An Active and Adaptive Reuse Repository System, Proceedings of 34th Hawaii International Conference on System Sciences (HICSS- 34), Software Technology Track, Maui, HI, IEEE Press, 2001. 6. http://www.openworldforum.org 7. http://sourceforge.net/ 8. http://www.componentsource.com/index.html 9. http://download.cnet.com/windows/