Ontology Based Knowledge Discovery in Social Networks

Similar documents

Logic and Reasoning in the Semantic Web (part I RDF/RDFS)

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

ONTOLOGY-BASED MULTIMEDIA AUTHORING AND INTERFACING TOOLS 3 rd Hellenic Conference on Artificial Intelligence, Samos, Greece, 5-8 May 2004

Ampersand and the Semantic Web

Getting Started Guide

Ontology and automatic code generation on modeling and simulation

Defining a benchmark suite for evaluating the import of OWL Lite ontologies

A Semantic web approach for e-learning platforms

powl Features and Usage Overview

13 RDFS and SPARQL. Internet Technology. MSc in Communication Sciences Program in Technologies for Human Communication.

Formalization of the CRM: Initial Thoughts

Grids, Logs, and the Resource Description Framework

Secure Semantic Web Service Using SAML

Semantic Search in Portals using Ontologies

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology

A generic approach for data integration using RDF, OWL and XML

OWL based XML Data Integration

Information Technology for KM

Chapter 2 AN INTRODUCTION TO THE OWL WEB ONTOLOGY LANGUAGE 1. INTRODUCTION. Jeff Heflin Lehigh University

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

An Ontology-based e-learning System for Network Security

Layering the Semantic Web: Problems and Directions

Applying OWL to Build Ontology for Customer Knowledge Management

Introduction to Service Oriented Architectures (SOA)

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY

Perspectives of Semantic Web in E- Commerce

How To Write A Drupal Rdf Plugin For A Site Administrator To Write An Html Oracle Website In A Blog Post In A Flashdrupal.Org Blog Post

RDF Resource Description Framework

Oct 15, Internet : the vast collection of interconnected networks that all use the TCP/IP protocols

ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS

Co-Creation of Models and Metamodels for Enterprise. Architecture Projects.

Semantics and Ontology of Logistic Cloud Services*

Lightweight Data Integration using the WebComposition Data Grid Service

Transformation of OWL Ontology Sources into Data Warehouse

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

12 The Semantic Web and RDF

RDF y SPARQL: Dos componentes básicos para la Web de datos

RDF Support in Oracle Oracle USA Inc.

DISCOVERING RESUME INFORMATION USING LINKED DATA

Service Oriented Architecture

Semantic Interoperability

Semantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology

Semantically Enhanced Web Personalization Approaches and Techniques

HEALTH INFORMATION MANAGEMENT ON SEMANTIC WEB :(SEMANTIC HIM)

Powl A Web Based Platform for Collaborative Semantic Web Development

Semantic Transformation of Web Services

FIPA agent based network distributed control system

No More Keyword Search or FAQ: Innovative Ontology and Agent Based Dynamic User Interface

Characterizing Knowledge on the Semantic Web with Watson

Application of ontologies for the integration of network monitoring platforms

CONTEMPORARY SEMANTIC WEB SERVICE FRAMEWORKS: AN OVERVIEW AND COMPARISONS

A Framework for Collaborative Project Planning Using Semantic Web Technology

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

Semantic Stored Procedures Programming Environment and performance analysis

A Collaborative System Software Solution for Modeling Business Flows Based on Automated Semantic Web Service Composition

Annotea and Semantic Web Supported Collaboration

Integrating and Exchanging XML Data using Ontologies

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

SPARQL: Un Lenguaje de Consulta para la Web

II. PREVIOUS RELATED WORK

Introduction to the Semantic Web

XBRL Processor Interstage XWand and Its Application Programs

On the Standardization of Semantic Web Services-based Network Monitoring Operations

The Ontological Approach for SIEM Data Repository

The Semantic Web Rule Language. Martin O Connor Stanford Center for Biomedical Informatics Research, Stanford University

Department of Defense. Enterprise Information Warehouse/Web (EIW) Using standards to Federate and Integrate Domains at DOD

Linked Medieval Data: Semantic Enrichment and Contextualisation to Enhance Understanding and Collaboration

OWL Ontology Translation for the Semantic Web

Ontology-based Classification of

Big Data Analytics. Rasoul Karimi

Reusable Knowledge-based Components for Building Software. Applications: A Knowledge Modelling Approach

Representing the Hierarchy of Industrial Taxonomies in OWL: The gen/tax Approach

Building Semantic Content Management Framework

Firewall Builder Architecture Overview

A Mind Map Based Framework for Automated Software Log File Analysis

Using SQL Developer. Copyright 2008, Oracle. All rights reserved.

DLDB: Extending Relational Databases to Support Semantic Web Queries

Module I: Overview of Semantic Technologies and the Semantic Web

Data processing goes big

Integration of Hotel Property Management Systems (HPMS) with Global Internet Reservation Systems

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

Transcription:

JRC Joint Research Center European Commission, Institute for the Protection and Security of the Citizen (IPSC) Support to External Security (SES) T.P.267 I - 21020 Ispra (VA) Web Technology Sector -EMM (Europe Media Monitor) Action http://wt.jrc.it/ Language Technology Activities in the Web Technology Sector http://www.jrc.cec.eu.int/langtech/index.html Ontology Based Knowledge Discovery in Social Networks Pinar Oezden Wennerberg Final Report Traineeship Start: 15. April 2005 Traineeship End: 30. September 2004 Scientific Supervisor: Clive Best

Abstract Ontology is a term that appears in contexts as diverse as computer science, linguistics and philosophy. In computer science, ontologies are understood as devices that bring a machinereadable conceptual structure to a domain of interest. As such they comprise the logical component of the so-called "Knowledge Bases". Currently, ontologies have started to become popular within the context of the Web as they are thought to provide the current Web with "meaning" to generate the future "Semantic Web". A social network is a model, where social entities such as people and organisations, happenings such as events and finally locations are connected to each other by certain relationships at a certain time. Ontology based social network models help explicating relationships between these entities that may not be obvious at the first glance, thereby enabling so-called knowledge discovery. Consequently, the knowledge discovered can be facilitated in a knowledge base, which can be used to track, among others, security issues on the (Semantic) Web. This report summarizes such a system, an ontology based knowledge base, that has been developed for the purpose of tracking terrorism related data on the Web. 2

Contents 1. A Brief Introduction to Ontological Engineering 1.1. How do we machine-process information? Statistical vs. Knowledge Representation Approaches 1.2. Ontologies 1.3. Knowledgebases vs. Databases 1.4. Languages for Ontology Construction and Editing 1.5. Environments for Building Ontologies 1.6. Some Application Areas of Ontologies: The Semantic Web Vision 2. Social Network Analysis 2.1. What are social network analysis? 2.2. Why is social network analysis necessary? 2.3. Ontologies and social networks 3. Security Knowledge Base Extension for the Europe Media Monitor (EMM)? 3.1. the ontology component 3.2. the data component 3.3. the system design 3.4. the application 4. Research Challenges Appendix 3

1. A Brief Introduction to Ontological Engineering 1.1. How do we machine-process information? Statistical vs. Knowledge Representation Approaches There are several approaches for machine-processing information such as the statistical approaches and the knowledge representation approaches. The major difference between the two approaches concerns their handling of incoming or already existing data. The former approach is free of any structural information. Systems that are developed according to this approach, such as those that involve machine learning methods, analyse a given piece of data and try to learn rules out of the regularities in it. As a next step the learned rules are applied to the new incoming data for processing. The latter on the other hand does make use of structural representation. The systems that base on this approach have an internal representational structure of how the data to be processed ought to look like. The incoming data will be then processed according to this representational structure. Therefore, the second approach implies knowing beforehand, whereas the first one implies learning. Both approaches have advantages and disadvantages and decisions can be taken w.r.t. to the purpose of the intended application, the context of application, user requirements etc. Deciding for one approach shall not necessarily exclude considering the other approach. Statistical approaches have the advantage that they are able to handle any type of information, they scale and they are language independent because they are free of any structure. Yet, they are error prune, ambiguous and they cannot deduce additional information out of the existing one again because they lack structure and lack of the inference mechanism that will be described soon. In contrast, knowledge representation approaches are very accurate, explicit and they can deduce additional information out of the existing information as a consequence of having structure. Out of the same reason however, they hardly scale, are time and effort consuming and are language dependent 1.2. Ontologies An ontology is a a specification of a conceptualization [1], whereby a conceptualization is a collection of objects, concepts and other entities that are presumed to exist in some domain and that are tied together with some relationships. A conceptualization is a simplified view of the world, a way of thinking about some domain. Ontologies belong to the knowledge representation approaches that have been discussed above and they aim to provide a shared understanding of a domain both for the computers and for the humans. Thereby, an ontology describes a domain of interest in such a formal way that it can be processed by computers. The outcome is that the computer system knows about this domain. An ontology is a formal classification schema, which has a hierarchical order and which is related to some domain. An ontology comprises the logical component of a Knowledge Base. Typically, a knowledge base consists of an ontology, some data and also an inference mechanism. Ontology, comprising the logical component of the knowledge base, defines rules that formally describe how the field of interest looks like. The data can be any data related to this field of interest that is extracted from various resources such as databases, document collections, the Web etc. The inference mechanism would deploy rules in form of axioms, restrictions, 4

logical consequences and other various methods based on the formal definition in the ontology over the actual data to produce more information out of the existing one. For example the ontology component a family knowledge base may say that a mother is a person, who as at least 2 children. The actual data, say extracted from some database, may contain the information that Queen Sylvia of Sweden has 2 children. On the basis of this actual data and the formal definition of the mother, the knowledge base would infer that Queen Sylvia of Sweden is a mother. 1.3. Knowledgebases vs. Databases Even though knowledgebases and databases have certain similarities, they also have significant differences. One major difference is that knowledgebases include inference mechanisms that enable information gain out of the information already present in the storage, by running rules over the data in the storage. Databases however lack this facility as they do not include any rules. One other difference between knowledge bases and databases is that knowledge bases for the Web applications usually make open world assumption in contrast to the closed world assumption of databases. According to the open world assumption, if some data is not found in the knowledgebase, we are not supposed to assume that the data does not exist at all. This assumption conforms to the very nature of the Web. On today s Web there are billions of documents. It may not be possible for every Web application to be aware of each existent document. However, this does not imply that the documents do not exist at all. Closed world assumption on the other asserts that if some data is not found in the storage then this data does not exist. 1.4. Languages for Ontology Construction and Editing Ontologies are formal theories about a specific domain; therefore they require a formal logical language to express them. Most languages for formalizing ontologies seem to have emerged based on two approaches; first-order predicate logic (FOL) and XML-RDF. Languages based on the former approach are more generic, whereas XML-RDF based languages such as OWL 1 (Web Ontology Language) are specific for the development of Web ontologies. Even though RDF 2 (Resource Description Framework) is most commonly mentioned as a language, it is rather a data model that is independent of any domain or implementation. It has been developed to provide meaning to the Web documents by decorating the Web documents with metadata in order to achieve terminological consensus on the Web. XML fell short for this purpose as it can help only for the syntactic structuring of the documents. Thus, researchers set out for developing languages that support semantics and that built on XML to benefit from its advantages such as the syntactic structure. RDFS and OWL languages are outcomes of such an attempt. Both languages are based on RDF, which is a data model developed for describing Web resources with metadata. As such RDF is not a language but a data model that is independent of any domain or implementation [2], [3]. As a data model RDF is graph based and it consists of nodes and edges. Nodes correspond to objects or resources and the edges correspond to properties. The labels on the nodes and on the edges are Uniform Resource Identifiers (URIs). Resources are all things being described by RDF 1 http://www.w3.org/2002/07/owl 2 http://www.w3.org/rdf/ 5

expressions. A resource may be an HTML document, it can be a part of a Web page e.g. a specific HTML or XML element within the document source or it can be a collection of pages e.g. an entire Web site. Properties are specific attributes that describe resources and they have a defined meaning. A property together with its value for a specific resource makes a statement about that resource. Statements consist of a specific resource together with a named property plus the value of that property for that resource. Thus, an RDF statement is a triple, whose parts are the subject, the predicate, and the object. The object of a statement that is the property value, can be another resource, it can be a literal for example a resource specified by a URI, it can be a simple string or some other primitive datatype defined by XML. Reification is possible in RDF, so statements can be made about statements. A detailed documentation of RDF can be found at World Wide Web Consortium (W3C) RDF Primer [3] As such, RDF itself does not define any primitives for creating ontologies, it provides basis for several other ontology definition languages such as RDFS. RDF Schema or RDFS [4] has been developed in order to define the vocabulary used in RDF data models by specifying which kinds of properties apply to which kinds of objects, what values the objects can take and what kinds of relations between those objects exist. Therefore, RDFS is considered as a first move towards an ontology language for the Web. RDFS offers a fix set of modelling primitives such as rdfs:class, rdf:property or the rdfs:subclassof relationship to define RDF vocabularies for some specific application. In RDFS it is possible to define classes of classes, classes of properties, classes of literals that are strings, integers, booleans and so forth and classes of statements. Using RDFS properties, which are rdf:type, rdfs:subclassof and rdfs:subpropertyof, it is possible to define instanceof relationship between resources and classes, subsumption relationship between classes and subsumption relationship between properties, respectively. Using rdfs:domain and rdfs:range properties it is possible to restrict the resources that can be subjects or objects of the property. As we have mentioned, RDFS is regarded as only a first move towards an ontology language because it is considered to be not expressive enough to qualify as a full ontology language. There are a number of things that cannot be said in RDFS. For example, disjoint, union, intersection and complement classes cannot be defined, cardinality restrictions are not present and properties cannot be declared as transitive, symmetric or inverse of each other. Yet, researchers have determined that such features are essential for an ontology language if it is to provide efficient reasoning support. Therefore, they have set out for the development of a more expressive ontology language. OWL has been developed with such a motivation. It is an outcome of the collaborative efforts of US American and European researchers, whose goal has been to develop an ontology language other than RDFS that can be commonly adopted and that will facilitate the semantic interoperability on the Web. The Web Ontology Working Group of World Wide Web Consortium (W3C) describes OWL as a language designed for use by applications that need to process the content of information instead of just presenting information to humans [5]. OWL ontologies have three components. These are classes, individuals, also called instances, and properties. In other formalisms properties are sometimes called as roles, relations, or attributes. OWL classes are interpreted as sets that contain individuals. Classes can be organised into a superclass-subclass hierarchy. When a class is declared to be the subclass of another, then every instance of the first class will also be the instance of the second one. In OWL DL, the superclass- 6

subclass relationships can be computed automatically by an automatic inference mechanism. Classes can be declared to be union, intersection and complement classes. They can also be equivalent to each other. Finally, there are enumerative classes in OWL, which are classes that are defined by precisely listing the individuals that are the members of the class. Exactly these individuals make up the class. For example, the class Kansas City Jazz Musicians can be defined as being made up of exactly the members (the individuals) Count Basie and Dizzy Gillespie. OWL individuals are the objects of the domain that we are interested in. Referring to the example above Count Basie and Dizzy Gillespie are some of the individuals of our domain, say, the domain of Jazz Musicians. Further individuals could be then Billy Holiday, Miles Davis, Thelonious Monk, Duke Ellington and so forth. OWL properties are binary relations on individuals i.e. they link two individuals together. There are two types of properties in OWL. Object Properties relate objects to other objects like in Chet Baker plays Instrument Trumpet. Datatype Properties, relate objects to datatype values. For example, Chet Baker died at the Age of 59. Like in RDFS, properties in OWL have also domains and ranges. Similar to the case with classes, OWL properties may have subproperties, so that it is possible to form hierarchies of properties. For example, the property is Jazz Musician may have the more specific property is West Coast Jazz Musician as its subproperty. Restrictions in OWL are the quantifier restrictions, the has-value restriction and the cardinality restrictions. The quantifier restrictions are declared using the two OWL constructs owl:allvaluesfrom (semantically equivalent to the universal quantifier ) and the owl:somevaluesfrom (semantically equivalent to the universal quantifier ). The has-value restriction is declared using the construct owl:hasvalue ( ). The owl:hasvalue is a restriction on the value that some property can take by exactly specifying what that value is. For example, is the city of Olympic Games 2004 owl:hasvalue Athens. Using the cardinality restrictions on properties, we can describe the class of individuals that have at least <, at most > or exactly = a specified number of relationships with other individuals or datatype values. Properties in OWL can be declared to be transitivelike in is Older than property, they can be symmetric like in is Married To property or they can be functional, which states that a property has at most one value such as the property age. One benefit of writing ontologies using OWL is that they can be processed by an inference mechanism i.e. by a reasoner. Thus, it is possible for a reasoner to check for subsumption relations in OWL ontologies and to compute the inferred class hierarchy. A reasoner can also check for consistency of OWL ontologies and can determine whether or not it is possible for a class to have any instances. 1.5. Environments for Building Ontologies Protégé 2000, which has been developed by Stanford s Medical Informatics Section in USA, is one of the most commonly used editors. It is compatible with the latest standards in the field of ontological engineering. Protégé allows direct editing in OWL, has a well developed import and export mechanism for OWL and for other recent ontology languages. It is open source and can be freely obtained from the World Wide Web. It can be used on various operating systems such as Windows and Unix/Linux. 7

Protégé 2000 is a computer program, which should be installed on the local computer and it can be downloaded as freeware from the Website of Protégé 2000 3. It is available on different platforms like Windows, Mac OS, Solaris, Linux, Unix and its capabilities can be extended by downloading various plug-ins that are designed for the tool. Classes (or concepts) of the domain to be modelled are visualized in a taxonomic hierarchy in Protégé. It is possible to define the instances of the model, so that for each class associated instances can be created directly in the model. The instances automatically become related to their classes by instanceof relationship. Slots in Protégé describe properties of classes and instances. Facets specify constraints on allowed slot values. Axioms and rules cannot be explicitly represented, extra plug-ins need to be downloaded. Protégé does not allow synchronous editing of an ontology by multi-users, yet it is possible to import and export ontologies in different formats such as text files, database tables and RDF files. Since OWL has become standard ontology language for the Web, Protégé supports the editing of OWL ontologies by an OWL plug-in. This can be separately downloaded and be integrated into the editor. Thus primitives of the OWL language become available for use in Protégé to produce OWL ontologies. The reasoner RACER provides reasoning support for Protégé. This tool can be separately downloaded to on the local computer. When it is run, it checks for the consistency of the ontologies created by Protégé and infers the classification tree of the ontology based on the subclass-superclass relationships. Several mailing lists such as protégé-users, protégé discussion, protégé-beta exist that are really active and that are helpful for the developers. Installation of Protégé is fairly easy, the instructions are given on the Protégé website. One the program is installed, double klick on the icon in order to start it. In order to work with OWL models, Beta version of Protégé needs to be installed. The program installs automatically and comes with all the necessary plug-ins and extensions. To edit OWL ontologies, choose the create OWL model option. This activates additional facilities that make use of the OWL language. 1.6. Some Application Areas of Ontologies: The Semantic Web Vision The future Semantic Web is envisioned as an extension to the current Web, in which information is given well-defined meaning, better enabling computers and people to work on [6]. As such, Semantic Web should be a place, where information can be better discovered, can be automatically processed, can be integrated and shared across various applications. The precondition for the Semantic Web is viewed as providing the documents on today s Web with machine processable contents. In other words, Web documents should be furnished with information, whose context dependent meaning can be interpreted by software programs and applications. There is a list of expectations from tomorrow s Semantic Web. Accordingly, it should understand the meaning and user background, it should enable inter-operability between heterogeneous applications and it should provide a platform for intelligent web agents and adaptive web systems to operate on. Eventually, it should require less human intervention. As 3 http://protege.stanford.edu 8

such the ultimate goal set for Semantic Web is that it should assist human users in their daily online activities by exhibiting a higher level intelligence. 2. Social Network Analysis 2.1. What is social network analysis? Social network analysis [SNA] is the mapping and measuring of relationships and flows between people, groups, organizations, animals, computers or other information/knowledge processing entities. The nodes in the network are the people and groups while the links show relationships or flows between the nodes. SNA provides both a visual and a mathematical analysis of human relationships [7] Examples of such relations could be among people: Kinship such as mother of, wife of Other role-based such as boss of, teacher of friend of Cognitive such as knows, aware of Affective such as likes, trusts Interactions such as give advice, talks to, fights with etc. Examples of such relations could be between people and organisations: such as buy from / sell to, leases, owns shares of, subsidiary of, regulates, organizes, is leader of, is founder of etc. via their members such as belongs to, is affiliated to, is member of etc. Typically events, organizations and people are related to locations such that people reside at or travel to places, organizations are located at places, events happen at places etc. Hence, in our model of social networks locations are also a part of the whole network model. Another important aspect of the network design that it also needs to take the notion of time into account. Events, happen at a given time, states of affairs evolve over time such that locations present now may not exist later, people who are alive today maybe dead tomorrow or organizations that did not exist before may come into being today. 2.2. Why is social network analysis necessary? The relationships between the social entities, events and locations may not be obvious at the first glance. Social network analysis helps discovering such (hidden) relationships by explicitly stating them in form of a network model. For example from distributed resources we may collect the following information: Abdullah Al Reshood roommateof Aafia Siddique Aafia Siddique ismemberof Al Queda Aafia Siddique traveledto Pakistan Mounir-al Motassad traveledto Pakistan If we model this information in a social network model, we may come up with possible relationships could be further tracked such as: Abdullah Al Reshood isrelatedto Al Queda? Al Queda isrelatedto Pakistan? 9

Hence, knowledge discovered in this fashion, i.e. Knowledge Discovery, could be used as input to discover further information related but not limited to security issues, such as terrorism, on the Web. Going one step further, the information discovered can be visualized using relevant software to provide convenient and user friendly navigation of the whole model. e.g. Figure 1: Possible visualization of the social networks http://www.trackingthethreat.com/flash2/flash.html 2.3. Ontologies and Social Networks Modeling of social networks can be aided by ontologies out of several reasons. First ontologies are commonly deployed for the specification and explication of concepts and relationships related to a given domain. Social networks have the same purpose but with the focus on social relations and entities, hence domain ontologies related to social entities and relations can be designed and deployed. Second, through reasoning and inference ontologies do not allow the modeling of contradictory or inconsistent information. Modeling social networks via ontologies ensure the validity of the information encoded. Third ontologies, together with the inference mechanism, enable information gain through deploying rules to infer new information. Inference mechanism can be facilitated over ontology based social networks to come up with new relations and concepts out of the already existing ones between the social entities i.e. people, organisations and events, locations. 10

3. Security Knowledge Base Extension for the Europe Media Monitor (EMM)? The picture in mind is to extend the EMM with a knowledge base related to security domain starting with terrorism related data. EMM offers a vast collection of Web news articles thus makes a very valuable resource for discovering knowledge about, among others, terrorism related data. The knowledge base shall include data about people, events, locations, organizations related to terrorism and as such it shall offer reference services for the users. That is, while reading the online news texts collected by EMM users should be offered the opportunity to learn more about an entity of their interest. For instance when they come across to named entities in the text such as Al-Qaeida, Al Zarkawi, Bagdad etc. the users shall have the opportunity to learn what these entities are, e.g. are they people, organization, are they events or some locations, what they do, in what ways are they related to each other etc. Hence, starting from one, say name, in an online news article the users shall enter the knowledge base and explore the entire social network. Below is the documentation of the knowledgebase that has been initiated for the purposes we have discussed. The knowledgebase deploys four ontologies encoded in OWL to model the social network of people-organisations-events-locations. The actual data is extracted from an MSAccess database that includes tables related to people, events, locations, organizations and the relations between them. The inference mechanism of the knowledge deploys backward chaining inferencing to produce new knowledge out of the ontologies plus the database. The application is implemented using the powl - Semantic Web Development Plattform 4 and the RAP - RDF API for PHP V0.9.2 5. using PHP/5.0.4 6. 3.1. The ontology component There are four ontologies that model separately the four components of the social network. i.e. there is an ontology for people, one for events, one for organizations and one for locations. Each ontology is encoded in OWL using Protégé 2000 and has been checked for consistency using RACER. Please refer to Appendix A for the whole documentation of the four ontologies. In each ontology there are classes, relationships and attributes related to the subdomain. New classes, relationships and attributes can be entered to each ontology e.g. by using Protégé 2000. Each model can be extended to include as many relationships and classes as possible. Ontologies are located under C:\MyOntologies\PeopleOntologyPackage.Additionally, in each ontology, every class and every property is explained through comments. These comments give information about the purpose and characteristics of each class and property; 3.2. The data component The actual data is extracted from an MsAccess Database. The database holds several tables related to people, location, events, organizations and several other tables about the relationships between them. The database is queried using SQL to extract parts of the information. 3.3. The System Design 4 http://powl.sourceforge.net/ 5 http://www.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/ 6 http://www.php.net/ 11

The entire system consists of several steps and is realized using several PHP scripts. Each script has its task and typically produces output that comprises the input to another script. The knowledgebase resides in Apache HTTP Server, where C:\webs\test is the Web directory. All PHP scripts, all input/output files as well as the MSAccess database also reside in this directory. A README file in the directory explains the steps of the whole application and the function of every script. Figure 2: The knowledge base resides in the localhost, which is an Apache HTTP server. Technical Data: Apache/2.0.54 (Win32) PHP/5.0.4 Server at localhost Port 80 The system starts with a PHP script that reads both the ontologies and the MSAccess database in a memory model at the same time. This way the formal specification of each component, i.e. people, events, locations, organisations and the actual data from the database are combined in the same model, which is an RDF model. So the PHP script converts the data in MSAccess to RDF. There is no need to convert the ontologies encoded in OWL to RDF as every OWL model is by its nature an RDF model. Finally the script outputs RDF files that include data both from the ontologies and from the database. 12

Figure 3: The RDF triplets that are stored in memory model created by the PHPAPI. The model includes the formal definitions in the ontology and the data from the Access database, which is converted to the RDF format Second PHP script reads the output RDF files from the first script but this time it holds them in a memory model with backward chaining inference facility. This is the step where the formal definitions in the ontologies, i.e. the rules, are run over the actual data from the database to infer additional knowledge. This second script includes RDQL queries, which are SQL-like queries that query the inference model. Hence, the queries extract those parts of the information we are interested in from the model and look for rules at the same time to see what additional information can also be extracted by applying the rules. The script finally outputs smaller RDF files that contain the extracted information. For example, all data about a given terrorist, all data about a given terrorist event or all data about a given location etc. 13

Figure 4: The query results of the RDQL queries as RDF triplets. The triplets are first stored that in memory model with backward chaining inference facility created by the PHPAPI. Then RDQL queries are conducted on the model to return the results above. Third PHP script reads the output RDF files from their relevant directories and serializes them to string. This serialization is necessary in order to make the application more user-friendly. Common users are not familiar with the RDF models and we believe it is not very convenient to read the RDF syntax. Therefore this PHP script converts the RDF model into string which feeds into the query interface to deliver the user the information he asked for in an easy-to-read format. A fourth optional script inserts all the RDF triplets in a MYSQL database for storage purposes. Once an empty database is crated and the script is called, tables are automatically created that hold the triplets of the model. 14

Figure 5: mysql is the name of the database that holds the RDF models that have been created by the pervious scripts 15

16

17

Figure 6: RDF triplets of each model are inserted to tables in form of subject-predicate-object 3.4. The Application The application starts with the query interface. It is a simple HTML form that consists of four option menus corresponding to the four ontologies: people, events, organizations, locations. Options in each menu reflect the classes of the ontologies, so that when an option is selected the instances of that class will be delivered. Upon selecting an option, the user enters the knowledgebase through an instance e.g. through the information about one terrorist, one event etc. The information is encoded in html files, so that every instance has its own html file. In other words, for every instance, such as every terrorist, every event, every location, every organization there exists an html file that holds all the necessary information related to that given instance. In every html file of every instance, the classes, attributes and relationships are highlighted. The classes are additionally converted to hyperlinks from simple strings in order to provide the navigation more generic and more specific classes of the current ontology as well as to the classes of the other ontologies, which can be again specific or generic. The user can this way navigate the whole knowledgebase exploring the connections between ontologies that represent the components of the social network i.e. people, events, locations, organizations. Following figures demonstrate separate steps of the application: 18

Classes of the People Ontology Figure 7: People options menu of the query interface that allow the users enter the knowledgebase through a selected person. The menu options correspond to the classes of the People ontology 19

Classes of the Events Ontology Figure 8: Events options menu of the query interface that allows the users enter the knowledgebase through a selected event. The menu options correspond to the classes of the Events ontology 20

Classes of the Locations Ontology Figure 9: Locations options menu of the query interface that allows the users enter the knowledgebase through a selected location. The menu options correspond to the classes of the Locations ontology 21

Classes of the Organisations Ontology Figure 10: Organisations options menu of the query interface that allows the users enter the knowledgebase through a selected organisation. The menu options correspond to the classes of the Organisations ontology 22

Once an option from the menu (i.e. a class from the ontology) is selected the application delivers the instances related to that class. Here all the terrorist instances of the class Terrorists are delivered 23

Every single terrorist instance delivers information about itself. Same applies to events, organisations and locations menus. The properties (or attributes) are highlighted in blue. The purple highlights indicate hyperlinks that take us to other classes of other ontologies or to the more generic classes of the same ontology. This way the entire knowledgebase can be navigated. 24

4. Research Challenges There are several research challenges that need to be overcome. One of the most challenging ones is how to come from the text to the conceptual categories/classes of the ontology. The ambiguity of the natural language and the computers lack of commonsense are the two major obstacles on this way among others. Efficient named entity recognition may help overcoming some aspects related to this problem. Once names of entities such as organizations, people, common event, places are recognized as such by linguistic tools, these maybe stored in an explicit and unambiguous way to provide reference for text processing programs. At least these entities could be linked to the conceptual categories of the ontologies in a comparably easier fashion. A second challenge concerns keeping track of time. Keeping track of time necessary for several purposes such as trend analysis. It is also necessary to be aware of the evolving states of affairs as for instance the physical conditions of people may change, the places that exist today may not exist tomorrow, the relationships between entities may be established or terminated etc. Ideally the knowledgebase should be able to be aware of all these changes and be able to update itself. REFERENCES [1] T. R. Gruber. A Translation Approach to Portable Ontologies. Knowledge Acquisition, 5(2), 1993, pp. 199--220 [2] G. Antoniou, F. van Harmelen. Semantic Web Primer. In Cooperative Information Systems. MIT Press Cambridge, April 2004 [3] E. Miller, F. Manola. RDF Primer. tech. report, World Wide Web Consortium (W3C) Recommendation 10 February 2004, http://www.w3.org/tr/rdf-primer [4] F. Yergeau, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler Extensible Markup Language (XML) 1.0 tech. report, World Wide Web Consortium (W3C) 4th February 2004, http://www.w3.org/tr/2004/rec-xml-20040204/ [5] G. Schreiber, M. Dean. OWL Web Ontology Language Reference. tech. report, World Wide Web Consortium (W3C) Recommendation 10 February 2004, http://www.w3.org/tr/owl-ref/ [6] O. Lassila T. Berners-Lee, J. Hendler. The Semantic Web. Scientific American, 184(5), 2001, pp. 34--43. [7] Orgnet.com Social network analysis software and services for organizations and their consultants http://www.orgnet.com/ 25

APPENDIX A Main classes of the Events.owl Ontology Properties of the Events.owl Ontology 26

Main classes of the Locations.owl Ontology Properties of the Locations.owl Ontology 27

Main classes of the Organisations.owl Ontology Properties of the Organisations.owl Ontology 28

Main classes of the Organisations.owl Ontology Properties of the People.owl Ontology 29

30

ARCHITECTURE OF THE MODULAR ONTOLOGY MODEL Organizations e.g. belong_to People e.g. located_at e.g. reside_in e.g. involved_in Locations e.g. happen_at Events bookkeeping_informationabout Meta Information Meta Information: 1. When is the data entered? \ 2. When is the data modified? Timestamp 3. When is the data deleted? / 4. Language? 5. Who entered the data? 31

Classes in People.owl Thing People Enumeration Gender MaritalStatus PhysicalCondition Person PersonOfNoRelevance Politician President CurrentPresident FormerPresident Spokesperson, Terrorist Properties and Relations in People.owl has_alias has_date_of_arrestion has_date_of_birth has_date_of_death has_description has_gender has_marital_status has_nationality has_physical_condition has_place_of_birth has_relation_to_event present_at author_of victim_of killed_at planner_of actor_in has_relation_to_location has_current_residence_in has_previous_residence_in has_relation_to_organisation deputy_of ideologist_for member_of (transitive) financier_of founder_of leader_of has_relation_to_person parent_of (inverse child_of) follower_of sibling_of (symmetric) child_of (ineverse parent_of) 32

cousin_of collegue_of(symmetric) spouse_of (symmetric) friend_of (symmetric) partofpattern partofenumeratedclass Main Classes in Events.owl Thing Events Enumeration Recurrence Target InstantEvent, Assasination BookOrManifesto CoupAttempt Execution FailedTerrorPlot HostageTaking TerroristAttack RecurringEvent Election Meeting Properties and Relations in Events.owl has_description has_duration has_end_date has_location has_number_of_killed has_organizer has_recurrence has_start_date has_target partofpattern)) partofenumeratedclass Main Classes in Locations.owl Thing Locations City CapitalCity Country Province State 33

Properties and Relations in Locations.owl city_is_in_country (inverse country_has_city) city_is_in_province (inverse_province_has_city) city_is_in_state (inverse state_has_city) country_has_city (inverse city_is_in_country) has_capital_city country_has_province (inverse province_is_in_country) country_has_state (inverse state_is_in_country) is_location_of province_has_city (inverse city_is_in_province) province_is_in_country (inverse country_has_province) state_has_city (inverse city_is_in_state) state_is_in_country (inverse country_has_state) Main Classes in Organisations.owl Thing Organisations NGONonGovernmental Political Governmental Religious Terrorist Properties and Relations in Organisations.owl has_current_status has_description has_foundation_date has_leader has_location has_member is_affiliated_to is_inconflict_with is_organizer_of is_suborganization_of is_target_of 34