1 OntologyBased Metamodel for Storage and Retrieval of Software Components Cristiane A. Yaguinuma Department of Computer Science Federal University of São Carlos (UFSCar) P.O. Box São Carlos SP Brazil Marilde T. P. Santos Department of Computer Science Federal University of São Carlos (UFSCar) P.O. Box São Carlos SP Brazil Marina T. P. Vieira Methodist University of Piracicaba (UNIMEP) Rod. Açúcar, Km Piracicaba SP Brazil Abstract This paper presents a metamodel for storage and retrieval of software components that considers domain semantic information based on ontologies. In contrast to most existing repositories, which only retrieve a limited set of components, the proposed metamodel makes possible the recommendation of interrelated components, as ontology characteristics were incorporated. A behavior analysis of the proposed metadata is presented, involving the semantics of a multimedia application domain and components developed in the context of the DBCM Project . 1. Introduction Reuse is a key concept to software development, as it reduces development effort, time and cost. Component Based Software Engineering proposes the reuse of software components, which can be retrieved and asssembled into applications of specific domains . In order to build these applications successfully, it is fundamental to choose appropriated components from a collection of available components. Thus, it is desirable to have a repository that supports the storage, query and retrieval of components and makes reuse possible. Most existing repositories only retrieve a limited set of components and some do not satisfy user queries. Interrelated components may exist and would be useful, but the user either does not know about them or is unable to retrieve them because the query is defined too narrowly Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment Proceedings of the 31 st VLDB Conference, Trondheim, Norway, 2005 . The schema of the repository itself often does not consider semantic relationships among components and thus omits important retrieval information. An approach to component repositories is needed that provides the retrieval and recommendation of semantically interrelated components. This paper presents an ontologybased metamodel for storage and retrieval of components. Section 2 describes the main approaches to component retrieval; Section 3 contextualizes the work developed considering the DBCM Project ; Sections 4 and 5 respectively present the metamodel proposed and an analysis of its behavior; and Section 6 contains the conclusions and recommendations for future work. 2. Existing approaches to component retrieval Mili et al.  classify some of the existing proposals for component search and retrieval in four categories: Keyword Search, Faceted Classification, Signature Matching and Behavioral Matching. Keyword Search  uses search engines to compare a set of terms specified by the user to terms related to components. This can result in many or few retrieved components, since only keywords are used in the search without any consideration of semantics. The Faceted Classification   approach makes an attempt to classify the objects in the repository based on predefined taxonomies. Although this approach is useful for objects that fall into such categories, there are problems regarding objects for which classification is not explicit . Signature Matching  compares names, parameters and return types of component methods to the user query. However, the information contained in signatures and comments of operations has little semantic content and hinders the adequate retrieval of components. Behavioral Matching  executes each component with input data vectors in order to retrieve components that present the expected behavior. However, it is difficult
2 to determine the expected behavior in query terms and inadequate input data can impair the retrieval process, thereby reducing query effectiveness. In general, the traditional categories Mili et al. describe are restricted in that they do not consider semantic relationships among components or relevant information on specific domains when processing queries. Addressing such restrictions, Sugumaran and Storey  have proposed an approach based on ontologies and domain models in order to increase search effectiveness and provide information on components interrelated to those that were retrieved. The ontologies Sugumaran and Storey use contain a set of related terms that describe some knowledge area, though they do not consider formal axioms. According to Noy and Hafner , formal axioms represent more information on concepts and their relationships as well as restrictions related to properties and concept values. Therefore, axioms play a significant role in domain semantics and can be relevant to component retrieval. As retrieval based on semantic information has several advantages, the present work has adopted the ontologybased approach. However, formal axioms were considered in the representation of knowledge domains, as they capture richer semantics and contribute toward retrieving interrelated software components. As a result, an ontologybased metamodel for storage and retrieval of software components was developed. The proposed metadata belong to the set of metadata modeled in the context of the DBCM Project  described in Section Metadata for storage and retrieval of software components The work proposed in this paper integrates the Multimedia ComponentBased Sotware Development Project (DBCM) , which aims to define strategies for the software development process based on multimedia components. Specifically, this work belongs to the module related to storage and retrieval of components in databases. A set of metadata that make it possible to store relevant component information was modeled, lending adequate support to the retrieval process. Figure 1 displays the categories of the identified metadata. Structural Metadata Pattern Metadata Semantic Metadata Content Ontology Figure 1. Metadata categories for storage and retrieval of software components. Structural Metadata correspond to information on the component structure, describing its attributes, interfaces, methods and other characteristics such as name, authors, creation date, company, etc. Semantic Metadata concern the component semantics divided into Content Metadata, which contain information automatically extracted from the source code or from component documentation, and Ontology Metadata, which present information on the application domain to which the components belong. Pattern Metadata consider information on possible software patterns identified from component reuse. This paper focuses on Ontology Metadata, detailing the metamodel built from the incorporation of ontology principles in the set of metadata modeled in DBCM Project. 4. Ontologybased metamodel for storage and retrieval of software components From an analysis of studies related to ontologies , it was observed that some ontology characteristics are suitable for component retrieval, as they allow capturing domain semantics and recommending interrelated components. Thus, ontologybased metadata were incorporated to the set of metadata presented in Section 3. The incorporated metadata were identified from elements belonging to the ontology creation language Web Ontology Language (OWL)  and were also based on the domain layer model of the ODEd ontology editor . Both ODEd and OWL support basic ontology elements and allow the definition of formal axioms that provide richer semantics to ontologies. The ontology principles that were considered relevant to storage and retrieval of components were modeled in the metamodel presented in Figure 2. As the metamodel shows, a Domain has usual attributes, name and description, and also a modeling that graphically describes how the domain is organized according to the elements belonging to the metamodel. This modeling can be graphically represented by an UML Class Diagram. A domain is composed of Entities, which refer to the main concepts of the knowledge domain. Entities have attributes, represented by the class Property_Attribute, and can be related to other entities through Associations, containing minimum and maximum cardinality as well as some attributes. Both classes Property_Attribute and Association are generalized in the class Property, according to the OWL language . It is important to mention that the class Component was not detailed in the metamodel because it belongs to Structural Metadata, which are not covered in the scope of this paper. In order to provide relationships with richer semantics, axioms that would contribute toward component retrieval were investigated. In    a series of axioms are presented, some of which are considered relevant,
3 Domain name : String description : String modeling : Blob Property name : String description : String 1..n +superentity 1..n +subentity Entity name : String description : String +disjoint_with Component 1..n Property_Attribute type : String Association min_cardinality : int max_cardinality : int Whole_part +inverse_of Composition Aggregation Figure 2. Ontologybased metamodel for storage and retrieval of software components. namely, generalization/specialization, disjunction, inverse and wholepart associations. Thus, entities can have superentities and subentities, and can be disjoint with other entities. Inverse associations (inverse_of) indicate whether the relationship is bidirectional, allowing navigation in both directions. For example, the inverse associations between Media and Scene entities: Media composes Scene Scene is composed of Media. Wholepart associations include axioms such as irreflection, antisymmetry and transitivity, and are classified as Aggregations (parts compose the whole, but not exclusively) and Compositions (parts exclusively compose the whole). Through these axioms, it is possible to present more information on the domain semantics and also infer knowledge in order to recommend interrelated components. The captured domain information should be related to the components through an analysis of their purposes and functionalities. Thus, it is possible to relate components to correspondent associations and entities in domain semantics. Therefore, the elements belonging to the metamodel permit retrieving and recommending components based on the analysis of semantic information. 5. Behavior analysis of the proposed metamodel A prototype for storage and retrieval of components was built so that the behavior of the incorporated metadata could be analyzed. The prototype is called Ontology Based Component Repository (OBCR) and manipulates a database whose structure follows the proposed metamodel. 5.1 The OBCR prototype In order to evaluate the effectiveness of the metamodel, components developed in the context of the DBCM Project were stored. These components belong to distance learning (DL) and multimedia application domains, but this section specifically considers components developed for the multimedia domain, which were described in . Initially, the semantic information on the multimedia domain must be acquired so that it can be stored in the repository. The semantics captured was based on the analysis of the multimedia domain developed in the scope of the DBCM Project [3, 16]. From this analysis, it was possible to identify concepts, relationships and relevant axioms. The information captured represents multimedia applications as scene compositions (Scene Composition entity). Each scene (Scene entity) is composed of a set of media (Media entity) synchronized in a sequential (Sequential Synchronization association), parallel (Parallel Synchronization association) or composed manner (Composed Synchronization association). Important axioms of the domain were identified: Media is the superentity of Text, Image, Audio and Video entities; The Scene Composition entity aggregates the Scene entity, which is composed of the Scene Media entity (representing media of a specific scene). These axioms permit the inference of knowledge and, therefore,
4 recommend components that might be interesting to the user. For example, by retrieving components associated to the Media entity, components related to its subentities could be recommended and the process could continue if the subentities also have other subentities. Components related to Scene and Scene Media entities can also be recommended when users query for components associated to the Scene Composition entity, as it aggregates the Scene entity, which in turn is composed of the Scene Media entity. This is possible because the transitivity axiom, present in aggregations and compositions, allows to make such inferences. With the domain information stored, the next step considers component relationships with entities and associations. Table 1 presents the association of the multimedia components analyzed with the domain semantics. From the information presented in this table, the OBCR prototype can retrieve and recommend multimedia components based on semantic analysis. Considering the component retrieval, the ontologybased search was implemented. In this scenario, the user browses the domain entities and retrieves the related components. The user may then be interested in interrelated components. Therefore, the recommendation of interrelated components was also implemented, where components semantically related to those that were retrieved are recommended. Figure 3 shows the OBCR user interface. When using the prototype, the user should first choose the domain to be explored. It is possible to view how the domain is organized by means of an UML Class Diagram containing entities, associations and attributes of the chosen domain. In order to browse the domain entities and view their information, the user should select the link corresponding to the name of the required entity. In Figure 3, attributes, subentities and associations of the Media entity are shown, as well as links to semantically related components. It is possible to retrieve components related to each entity by analysing the domain semantics. Also in Figure 3, the component TextBean was retrieved because it is related to the Text entity, which is a subentity of the Media entity. The axiom of generalization/specialization allowed the retrieval of this component when considering the Media entity. Finally, the component can be obtained through the Source Code link. If the user is interested in components interrelated to TextBean, it is possible to retrieve them through the links corresponding to their names located in Related Components. 5.2 Experiment with users An experiment with users was performed considering ontologies of distance learning and multimedia domains. 26 Enterprise JavaBeans components belonging to these domains were also used, all of which were developed within in the scope of the DBCM Project. Table 1. Component relationships with multimedia domain information. Domain information Component Entity Association MediaBean Media Scene Media AudioBean Audio ImageBean Image TextBean Text VideoBean Video SceneBean Scene Scene Media SceneCompositions Scene Bean Composition SequentialSynBean Sequential ParallelSynBean Parallel ComposedSynBean RetrievalBean Media Scene Scene Composition Sequential Parallel The applied tests were based on the experiments described in , related to the search and retrieval of components that satisfy a determined set of requirements. Two distinct requirement specifications were elaborated (Teacher Application and Student Application), based on the functionalities performed by the stored components. Users were separated in two groups, Group 1 and Group 2. Each group was composed of a system analyst and two MSc candidates, all of whom had experience in the development and reuse of software components. The Teacher Application and Student Application were respectively assigned to Group 1 and Group 2. Each user received textual instructions to retrieve components that support the requirements corresponding to his/her group. The users then filled out a questionnaire evaluating ease of use and process satisfaction: Ease of use: o The prototype was / was not easy to use. o In order to use the prototype, I need / do not need more training. Process satisfaction: o The recommendation of interrelated components helped / did not help to retrieve required components.
5 Figure 3. OBCR user interface. o The sequence of web pages was / was not helpful to component retrieval. Once the users had completed the tasks, the following measures were applied on results: recall, precision and search effort (measured in terms of the number of components inspected and the time the user took in completing the search). Table 2 presents the results obtained, considering each requirement specification (Teacher Application and Student Application). Table 2. Experiment results. Search Application Recall Precision Effort Teacher 86.33% 100% 1.24 Student 95.83% 92% 1.04 The recall obtained in both applications was satisfactory (above 86%). As expected, the recall of the Teacher Application was smaller, as it contained more complex requirements. In terms of precision, the results were very satisfactory, even though the Student Application, which was considered less complex, presented a lower degree of precision. This could be justified by the fact that users were confused on some concepts of the multimedia domain, resulting in the retrieval of unnecessary components. Hence, it is important to use wellelaborated ontologies to avoid an ambiguous understanding of the domain semantics and consequent retrieval of incorrect components. The search effort results presented an average of one component inspected per minute in both applications. The Teacher Application demanded a greater effort from the users because it required more components than the Student Application. One of the probable reasons why the search effort was not lower is related to the prototype ease of use, as 60% of the users found the prototype hard to use and thought they needed more training. Therefore, the prototype interface should be reformulated in order to improve the ontologybased search and consequently reduce the effort in component retrieval. Finally, in terms of process satisfaction, 100% of the users agreed that the recommendation of interrelated components helped to retrieve the required components
6 and 80% opined that the sequence of web pages contributed to the component retrieval. 6. Conclusion and future work The use of semantic information of domains in component repositories makes it possible to decrease the number of retrieved components that are unnecessary to the queries. This translates to greater precision. It also permits the recommendation of interrelated components that might be of interest to the user. The ontology principles incorporated to the set of metadata presented in Section 3 played an important role, as they represented richer domain semantics and allowed knowledge inference. From the incorporated metadata, it was possible to analyze entities, associations and axioms to perform the recommendation of semantically interrelated components, which is not covered in most traditional approaches. These metadata also contributed toward understanding how the information was organized in the repository and consequently helped users find the required components. The experiment results show that the adopted approach has adequate search precision, avoiding the retrieval of nonrelevant components. Users said that the recommendation of components helped find the components more quickly, which was considered the main contribution of this work. However, users suggested improvements to the user interface. As the component relationship with entities and associations were done manually, it is necessary to investigate ways to automatize (or semiautomatize) this process. An automatic way of obtaining ontologies to feed the repository is also needed. Enhancements in the prototype interface should be considered too, such as graphically performing the ontologybased search through UML Class Diagrams. 7. Acknowledgments We would like to thank CNPq/Brazil for supporting the Multimedia ComponentBased Sotware Development Project (DBCM) Program CTINFO Process number / References  Lucrédio, D., et al. Component Retrieval using Metric Indexing. In IEEE International Conference on Information Reuse and Integration, IRI Las Vegas, USA.  Sugumaran, V. and V.C. Storey, A semanticbased approach to component retrieval. SIGMIS Database, (3): p  Vieira, M.T.P., et al., DBCM: Desenvolvimento Baseado em Componentes Multimídia. 2004, CNPq.  Mili, H., et al. Automating the Indexing and Retrieval of Reusable Software Components. In 6th International Workshop NLDB' Madrid, Spain.  Seacord, R., S. Hissan, and C. Wallnau, AGORA: A Search Engine for Software Components, IEEE Internet Computing p  PrietoDíaz, R., Implementing faceted classification for software reuse. Communications of the ACM, (5): p  Vitharana, P., F. Zahedi, and H.K. Jain, Knowledge Based Repository Scheme for Storing and Retrieving Business Components: A Theoretical Design and an Empirical Analysis. IEEE Transactions on Software Engineering, (7): p  Ye, Y. and G. Fischer. Information Delivery in Support of Learning Reusable Software Components on Demand. In International Conference on Inteligent User Interface IUI' San Francisco, USA.  Hall, R.J. Generalized BehaviorBased Retrieval. In 15th International Conference on Software Engineering Baltimore, USA.  Noy, N.F. and C.D. Hafner, The State of the Art in Ontology Design, AI Magazine p  Guarino, N. Formal Ontology and Information Systems. In International Conference on Formal Ontologies in Information Systems Trento, Italy.  Mian, P.G. and R.A. Falbo, Supporting Ontology Development with ODEd. Journal of the Brazilian Computer Society, (2): p  PrietoDíaz, R. A faceted approach to building ontologies. In IEEE International Conference on Information Reuse and Integration Las Vegas, USA.  Smith, M.K., C. Welty, and D.L. McGuiness, W3C Proposed Recomendation: OWL Web Ontology Language Guide Available in <www.w3.org/tr/2004/recowlguide >. Accessed in May  Staab, S. and A. Maedche. Ontology Engineering beyond the Modeling of Concepts and Relations. In ECAI'2000 Workshop on on Applications of Ontologies and ProblemSolving Methods Berlin, Germany.  Vieira, M.T.P., et al. Reuse of Multimedia Components in the Development of Distance Learning Applications. In IEEE International Conference on Advanced Learning Technologies (ICALT 2005) Kaohsiung, Taiwan.