A Framework for Ontology-Based Knowledge Management System

A Framework for Ontology-Based Knowledge Management System Jiangning WU Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: jnwu@dlut.edu.cn Abstract Knowledge management a crucial activity in organizations since knowledge considered the most important asset that enables sustainable competitive advantage in very dynamic and competitive markets. The development of effective knowledge management system (KMS) has become an important sue in applied domains. In th paper, we present a framework of ontology-based KMS that mainly focuses on performing the activity for projects and domain experts matching in which system architecture, ontology building, and semantic similarity calculation are addressed respectively. At last, a simple experiment implemented to evaluate the effectiveness of the proposed ontology-based KMS. Keywords: Knowledge management system, Ontology, Semantic Similarity, Matching 1. Introduction Knowledge management a crucial activity in organizations since knowledge considered the most important asset that enables sustainable competitive advantage in very dynamic and competitive markets. The development of effective knowledge management system (KMS) has become an important sue in applied domains. The goal of a general KMS to provide the right knowledge to the right people at the right time and in the right format. Through KMSs, users can access and utilize the rich sources of data, information and knowledge stored in different forms. Furthermore KMSs facilitate people sharing knowledge and hence creating new knowledge. Traditional KMSs are based on the exting data repositories and users needs. For knowledge dcovering, users submit queries to the system and receive knowledge by keyword match. But keyword-based systems cannot understand the meaning of data. They are inflexible and stifle for knowledge creation. Fortunately, the emerging ontology-based KMSs can find the content-oriented knowledge that people really want due to the fact that the domain ontology powerful in knowledge representation and associated inference. Ontologies are meant to provide an understanding of the static domain knowledge that facilitates knowledge retrieval, store, sharing, and dsemination. For KMSs, ontology can be regarded as the classification of knowledge [1]. That to say, ontology defines shared vocabulary for facilitating knowledge communication, storing, searching and sharing in knowledge management systems. In th paper, we propose a framework of ontology-based KMS that mainly focuses on performing the activity for projects and domain experts matching. In project management, it not easy to choose an appropriate domain expert for a certain project if experts research areas and the contents of the projects are not understood very well. It also a hard work for matching projects and domain experts when the number of projects much high. So there a great need for the

effective technologies that can capture the knowledge involved in both domain experts and projects. The ontology-based KMS proposed in th paper tries to solve th problem. The main idea that both the experts research areas and the contents of the projects are represented by separated ontologies based on the same standard subject category of China. So the matching problem transformed into calculating the semantic similarities between ontologies. Once the similarity values are worked out, the matched results can then be obtained and ranked accordingly. The two main barriers faced our KMS are ontology building and similarity calculating. In the following sections we will present our approaches to solving these two problems in detail. 2. Ontologies in Knowledge Representation Research on knowledge representation has been a focus of AI and IS dciplines for a number of years. Much of contemporary research extends the seminal work within AI dcipline, of which research in ontology has been one of the beneficiaries. Research in computational ontology has traditionally sought to develop structure for the purpose of knowledge subsumption. The goal of such research aims to develop generic, reusable representations of domain ontology. Much of ontology research considers a deep development approach necessary to provide the extensive knowledge and reasoning required for expert level queries [3]. In [4], T.R. Gruber pointed out: An ontology an explicit specification of a conceptualization. The term borrowed from philosophy, where an ontology a systematic account of extence. For knowledge-based systems, what exts exactly that which can be represented. When the knowledge of a domain represented in a declarative formalm, the set of objects that can be represented called the universe of dcourse. Th set of objects, and the describable relationships among them, are reflected in the representational vocabulary with which a knowledge-based program represents knowledge. Thus, we can describe the ontology of a program by defining a set of representational terms. In such an ontology, definitions associate the names of entities in the universe of dcourse (e.g., classes, relations, functions, or other objects) with human-readable text describing what the names are meant to denote, and formal axioms that constrain the interpretation and well-formed use of these terms. In short, an ontology a vocabulary of entities, classes, properties, functions and their relationships. So far, domain ontologies are thought to be capable of significantly improving knowledge management practices. They provide conceptual abstraction and differentiated relationships, specifically separate concepts from lexicalizations and thereby better reflect the structure of human understanding of a domain. In ontologies, the semantics are developed through ensuring that each concept within the domain uniquely and precely defined and by specifying elaborated relationships among the concepts. The relationships in an ontology are explicitly named and developed with specification of rules and constraints so that they reflect the context of the domain for which the knowledge modeled. In our work, the ontology a collection of concepts and their relationships, and serves as a conceptualized vocabulary to describe an application domain. It created by means of Protégé 1, which developed by Stanford University. The initial concepts in our ontology are broadly extracted from the standard subject category 1 Protégé: http://protege.stanford.edu

of China. To make the selected concepts more suitable for our concerned projects and domain experts, a tool called Concept Filler developed, which simply an interface to help domain experts assign proper concepts and weights manually, see also Figure 1. When specifying the concept, the corresponding weight value ranging from 0 to 1 also assigned to itself aiming to dtinguh its importance, the bigger value the more importance. Fig. 1. The interface for specifying concepts by the concept filler As to relationships between concepts, many types can be found in ontology construction as we have known, such as IS-A relation, Kind-of relation, Part-of relation, Substance-of relation, and so on. Since IS-A (hyponym / hypernym) relation the most common concern in ontology presentation, only th kind of relation therefore introduced in our research for simplification. Th also the need for calculating the similarity between concepts presented below. After specifying the concepts and their relationships based on contextual knowledge involved in projects and domain experts, the hierarchical ontology-concept tree can be built through Protégé tool. The following work how to calculate the similarity between concept trees and realize the matching process. 3. Similarity Calculating and Matching Process As mentioned above, there are many kinds of relationships between concepts in ontology creation. Calculating the similarity between concepts based on the complex relationships a challenging work. But unfortunately no method can deal with the above problem effectively up to now. Considering some similarity calculation methods have been developed based on the simplest relation - IS-A relation [8], only th kind of relation retained in our study. And we let the other relations be the future research topics. 3.1 Node-based Approach and Edge-based Approach Here we want to dcuss two types of methods for calculating semantic similarities between concepts, they are node-based method and edge-based method [8].

Resnik used information content to measure the similarity [9, 10]. H point that the more information content two concepts share, the more similarity two concepts have. The similarity of two concepts c 1 and c 2 quantified as sim( c, c ) = max [ log p( c)], 1 2 c Sup( c1, c2) where Sup( c 1, c 2 ) the set of concepts whose child concepts contain c 1 and c 2, pc ( ) the probability of encountering an instance of concept c, and freq( c) (2) pc ( ) =, N where freq( c ) simply the stattical frequency of concept c, and N the total number of concepts contained in the given document. Considering many inherited concepts may have more than one senses, similarity calculation should be modified as t1 t2 sim( c, c ) = max [ sim( c, c )], 1 2 1 2 c1 sen( t1 ), c2 sen( t2 ) where sen(t) means the set of possible different concepts denoted by the same term. Another important method to quantify the similarity the edge-based approach. Leacock and Chodorow summed up the shortest path length and converted th stattical dtance to the similarity measure [11]: sim( c t1 1, c t2 2 ) = log[ c1 sen min len( c1, c2 ) ( t1 ), c2 sen( t2 ) ], 2 d max (1) (3) (4) where len( c, c ) the number of edges along the shortest path between concepts c 1 2 1 and c 2, and d max the maximum depth of the ontology hierarchy. 3.2 An Integrated and Improved Approach By analyzing both the node-based method and the edge-based method, three main shortcomings can be found below. 1. Both node-based and edge-based methods only simply consider two concepts in the same concept tree without expanding to two lts of concepts in different concept trees. However the fact when we describe different documents in the same domain using ontology structures, homogeneous but heteromorphic concept trees are often formed. The matching problem to be solved here calculating the similarity between two different concept trees, not between two concepts in the same tree. So we have to develop a new method that can calculate the similarities between two lts of concepts in different trees, by which the quantified similarity value can show how similar the documents are. 2. The node-based method does not concern the dtance between concepts. Take a four-hierarchy concept tree for example, as shown in Figure 2. If concepts C 21, C 31 and C 36 have the sense and the equal frequency that determines the same information content, we may get the following result according to the node-based method sim(c 21, C 31 ) = sim(c 21, C 36 ). (5)

However, it obvious, from Figure 2, to see that concepts C 21 and C 31 are more similar since C 31 the direct inheritor of C 21. C 11 Layer 1 C 21 C 22 C 23 Layer 2 C 31 C 32 C 33 C 34 C 35 C 36 Layer 3 C 41 C 42 C 43 C 44 Layer 4 Fig.2. An example of four-hierarchy concept tree 3. In contrast to the node-based method, the edge-based method only considers the relationships between concepts and ignores the weights of concepts. For example in Figure 2, both concepts C 31 and C 32 respectively have only one edge with C 21. According to the edge-base method, the same similarity value can be obtained. That Sim(C 31, C 21 ) = Sim(C 32, C 21 ). (6) But, if C 31 has bigger weight than C 32, C 31 considered to be more important and the corresponding similarity value between C 31 and C 21 should be greater. To overcome the shortcomings of both node-based and edge-based methods, a new integrated method proposed in th paper in order to calculate the similarity between two documents. Before conducting the proposed method, the documents related to projects and domain experts should be formalized first that results in two vectors containing the concepts with their frequencies. Suppose Doc(i) describes the ith project, and Doc(j) describes the h domain expert, the formalization results are: Doc(i) = {c i1, c i2,, c im }, (7) Doc(j) = {c j1, c j2,, c jn }, (8) with their corresponding frequencies: W(i) = {w i1, w i2,, w im }, (9) W(j) = {w j1, w j2,, w jn }. (10) For each pair of concepts (c, c ) in the concept tree, there must ext a concept c, for which both c and c are child concepts, and the path length minimum. Concept c the nearest parent concept for both c and c. The similarity between c and c can be calculated by w w sim ( c,c ) = log( + ), len( c, c ) + 1 len( c, c ) + 1 where len(c, c ) the path length between c and c. Considering multiple senses of the concepts, we improve the calculation equation as: sim( c t, c t ) = max [ sim( c, c )], c sen( t ), c sen( t ) st where t the sense of concept c. Then we calculate the maximum similarity value among all (11) (12)

candidate concepts: t t SIM = max[ sim( c, c )]. (13) Thus, the similarity between two documents can be calculated by using the following formula: m n t t sim( c, c ) s= t= SIM sim( Doc( i), Doc( j)) = 1 1. m n Once we get the similarity values between each pair of documents in both project collection and domain expert collection, the matched results are therefore ranked and returned to end users. (14) 4. A Framework of Ontology-based KMS Our ontology-based KMS encompasses four main modules as shown in Figure 3, they are: Ontologies Building, Documents Formalization, Similarity Calculation and User Interface. Ontology Building: We adopt Protégé, developed by Stanford University, to build our domain ontologies. The concepts and relations are from the standard subject category of China. Document Formalization: Benefiting from the ontologies that we have built, we can use the concepts to formalize the documents containing information about projects and domain experts. Similarity Calculation: By conducting the proposed integrated method to the concept trees corresponding to projects and domain experts respectively, we can calculate the similarities between them and rank the candidate domain experts afterwards. As a result, the most appropriate domain expert can be obtained. User Interface: Th matching system implements the typical client-server paradigm. End users can access and query the system from the Internet, while domain experts or system admintrators can manipulate the formalization and ontology building process. Expert Documents Project Documents Ontologies Building Documents Formalization Expert Concept Trees Project Concept Trees Ontology Library Similarity Calculation Database Result Lt User Interface Internet Users Fig. 3. The architecture for ontology-based KMS

5. Evaluation We carry out a series of experiments to compare and evaluate edge-based method, node-based method and our integrated method. Generally two measures precion and recall are used to evaluate the effectiveness of the information retrieval system. In our research, we also use these two measures to verify our ontology-based KMS. Let R be the set of relevant documents, and A be the answer set of documents. The precion and recall are defined as follows respectively: (21) AI R Precion = 100%, A AI R Recall = 100%. R (22) In the experiment, we collect around 300 domain experts (including professors, engineers, researchers, etc) and over 500 projects within the domain of computer science and engineering. Table 1 shows the different precion and recall results using three different methods with different number of projects. Also the comparon charts are given in Figures 4 and 5 respectively. Table 1. Precion and recall comparon. E-based denotes edge-based approach, N-based denotes node-based approach, Integrated denotes integrated approach. Projects Precion (%) Recall (%) E-based N-based Integrated E-based N-based Integrated 1 100 20.65 25.99 30.71 30.56 32.28 39.04 2 200 22.32 25.85 28.93 31.00 33.98 34.73 3 300 27.55 19.32 32.79 23.56 30.46 42.92 4 400 20.38 27.61 31.59 30.87 35.43 32.96 5 500 23.40 23.44 29.63 33.70 43.75 49.74 Precion Comparon Chart 40.00% 30.00% 20.00% 10.00% E-based N-based Integrated 0.00% 1 2 3 4 5 Fig. 4. A comparon of precion among three different methods

Recall Comparon Chart 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 1 2 3 4 5 E-based N-based Integrated Fig. 5 A comparon of recall among three different methods 6. Conclusions In th paper, we present an ontology-based method to match projects and domain experts. The prototype system we developed contains four modules: Ontology building, Document formalization, Similarity calculation and User interface. Specifically, we dcuss node-based and edge-based approaches to computing the semantic similarity, and propose an integrated and improved approach to calculating the semantic similarity between two documents. The experimental results show that our ontology-based KMS performing the activity for projects and domain experts matching can reach better recall and precion. As mentioned previously, only the simplest relation IS-A relation considered in our study. When dealing with the more complex ontology whose concepts are restricted by logic or axiom, our method not powerful enough to describe the real semantic meaning by merely considering the hierarchical structure. So the future work should focus on the other kinds of relations that are used in ontology construction. In other words, it will be an exciting and challenging work for us to compute the semantic similarity upon various relations in the future. References: [1] N. Guarino, Understanding, Building, and Sing Ontologies: A Commentary to Using Explicit Ontologies in KBS Development, International Journal of Human and Computer Studies, 46: 293-310, 1997. [2] D.E. O'Leary, Enterpre Knowledge Management, Computer, 31(3): 54-61, 1998. [3] M.S. Fox and M. Gruninger, Enterpre Modeling, AI Magazine, 19(3): 109-121, 1998. [4] T.R. Gruber, A translation approach to portable ontologies. Knowledge Acquition, 5(2): 199-220, 1993. [5] S. Staab, H.-P. Schnurr, R. Studer, Y. Sure, Knowledge processes and ontologies, IEEE intelligent Systems, 16(1): 26-34, 2001. [6] N. Guarino, P. Giaretta, Ontologies and knowledge bases: Towards a terminological clarification, In N.J.I. Mars (Ed.), Towards Very Large Knowledge Bases, IOS Press, 1995. [7] P. Resnik, Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language, Journal of Artificial Intelligence

Research, 11: 95-130, 1999. [8] J. Jiang and D. Conrath, Semantic similarity based on corpus stattics and lexical taxonomy, In Proceedings on International Conference on Research in Computational Lingutics, Taiwan, 1997, pp. 19-33. [9] P. Resnick, Using information content to evaluate semantic similarity in a taxonomy, In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95), Canada, 1995, pp. 448-453. [10] P. Resnik, Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language, Journal of Artificial Intelligence Research, 11: 95-130, 1999. [11] C. Leacock and M. Chodorow, Filling in a sparse training space for word sense identification, ms, 1994.