White Paper K@ A collaborative platform for knowledge management Quinary SpA www.quinary.com via Pietrasanta 14 20141 Milano Italia t +39 02 3090 1500 f +39 02 3090 1501 Copyright 2004 Quinary SpA
Index 1. Summary... 3 2. K@ main features...4 2.1 Document Management... 4 2.2 User Tracking... 4 2.3 Ontologies and Annotations... 4 3. Technology... 5 2
1. Summary K@ (to be read kat ) is a collaborative web-based platform for knowledge management. With K@ users can access and share a common repository of documents, web links and notes while the system keeps track of people interaction. It supports organisation along user-defined hierarchies of categories (the Knowledge Area Tree, KAT hereafter), against which documents are classified (manually or automatically by means of external classifier engines), and provides a set of instruments for users to browse and modify the KAT and to insert and search for documents. It supplies functionalities of a standard document management system, in which documents can be uploaded or referenced by an URL, and also provides free editable text areas to share comments and ideas in the form of Wiki pages. K@ is able to maintain the association between documents and semantic annotations with respect to a formal ontology according to Semantic Web standards. It provides a number of tools tracking users actions and behaviour in order to provide a better user experience and to facilitate sharing. K@ is currently actively maintained and extended, and used both within Quinary and at selected end user sites. 3
2. K@ main features 2.1 Document Management K@ manages several KATs (trees of classification), with versions; supports multilingual names and descriptions; provides security management. It works with different document sources and files. It supports classification in more than one category, with a ranking parameter. It provides interfaces enabling the integration of external classifiers. The search engine, based on Apache Lucene, is in charge of indexing documents of various formats, including MS Word, RTF and PDF. Searches can be filtered by context; a search engine manager has been integrated to provide concurrent searches on external engines. Kzilla, an addon for Mozilla browsers, is also provided: it is a sidebar that communicates with K@ while you are browsing the web. It provides list of related documents enriching your browsing context with information from your internal repository. 2.2 User Tracking K@ stores static and dynamic information about users, making them available in order to help people in collaboration and ease of use. A profiling system is in charge of tracking users behavior to create a better user experience: favorite areas and documents are dynamically rendered in the context supporting the fastest access to a large repository; users can see who is doing what and look for areas and documents starting from other users experience. We are currently experimenting a Collaborative Filtering Engine based on [CoFE] to use data from user tracking to generate suggestion of interest ('this new document may be of interest for you' please note that unlike in conventional recommendation system, we do not use explicit ratings but rather implicit ones derived from user behaviour). 2.3 Ontologies and Annotations To allow connecting annotations to documents, the SemantiK plugin is provided. SemantiK is a platform featuring integration, searching, presentation and editing of knowledge 4
expressed through the RDF language. It stores the whole set of RDF triples in the RDF repository [Sesame] in order to obtain inferencing capabilities. Tipically the ontology is domain specific and must be expressed in RDFS. Java classes reflecting the ontology can be defined in order to specify a particular rendering, searching or knowledge integration behaviour for instances of the corresponding RDF Class. Then, semantic annotations can be manually inserted by means of a web-based user interface, featuring an ontology driven search engine that tries to lessen the burden of finding resources in the Knowledge Base. To final users, the process consists in adding metadata to documents through smart web forms. Annotations may also be created by means of external knowledge brokers, such as Information Extraction tools, dedicated services accessing external databases, or wrappers for HTML scrapers: SemantiK can push document s content to the knowledge broker and store the RDF answer in the KB. Experimental activites with Knowledge Brokers are currently ongoing in the framework of EC sponsored DotKom project. In any case, the Knowledge Integration layer of SemantiK is responsible of checking data being stored in the KB in order to avoid duplication of resources; it relies on a fuzzy measure of closeness between instances. Annotation can be exploited in many ways: they can be browsed and queried through the K@ web interface; they enrich the document repository with a Semantic Web layer, for external agents to access; moreover a similarity measure based on semantic features can be computed between documents and a classification engine motivated by it can be plugged in K@. 3. Technology K@ is a standard J2EE web application, deployed in a Tomcat application server. It relies on several Java open source components. It requires a SQL DBMS, currently supporting Oracle, MySql, and Firebird. The document repository is stored in the filesystem and is indexed by a search engine derived from Apache Lucene. It integrates a JSPWiki via XML-RPC protocol. The Semantic Annotation Module, called SemantiK, runs within K@ and provides communication with Sesame through HTTP calls. Sesame stores RDF triples in an RDBMS. 5
K@ is known to work effectively with any recent browser (MS Explorer, Firefox, Mozilla, Safari). K@ supports output as RSS feeds and provides and interface to create custom feeds. XML import and output modules for KATs, documents and users are provided. RSS Aggregator Browser Knowledge addon (Kzilla) Information Extraction Tools (Gate) semantic web recommender system (CoFE) external classifier K@ user module search engine (lucene) SemantiK RDF repository/reasoner (Sesame) J2EE App.Server (Tomcat) Wiki (JSPWiki) document filters filesystem search indexes filesystem docs repository RDBMS metadata, users db, sesame triples filesystem wiki repository Figure 1 - K@ Architecture References CoFe - COllaborative Filtering Engine: http://eecs.oregonstate.edu/iis/cofe DotKom: http://www.dot-kom.org Firebird: http://firebird.sourceforge.net/ Gate - General Architecture for Text Engineering: http://gate.ac.uk JSPWiki: http://www.jspwiki.org/ Lucene: http://jakarta.apache.org/lucene MySql: http://www.mysql.com Quinary, K@, SemantiK: http://www.quinary.com Semantic Web, RDF, RDFS: http://www.w3.org/2001/sw/ Tomcat: http://jakarta.apache.org/tomcat 6