Size: px
Start display at page:





3 Approval of the Graduate School of Natural and Applied Sciences Prof. Dr. İbrahim Akman Director I certify that this thesis satisfies all the requirements as a thesis for the degree of Master of Science. Prof. Dr. İbrahim Akman Head of Department This is to certify that I have read this thesis and that in my opinion it is fully adequate, in scope and quality, as a thesis for the degree of Master of Science. Asst. Prof. Dr. Çiğdem Turhan Supervisor Examining Committee Members Prof. Dr. Ali Yazıcı Prof. Dr. İbrahim Akman Assoc. Prof. Dr. Nazife Baykal Asst. Prof. Dr. Çiğdem Turhan Asst. Prof. Dr. Nevzat Sezer iii

4 ABSTRACT SEMANTIC WEB APPLICATION : ONTOLOGY-DRIVEN RECIPE QUERYING Kalem, Güler M.S., Computer Engineering Department Supervisor: Asst. Prof. Dr. Çiğdem Turhan June 2005, 102 pages Currently all the information presented on the Internet just have static content giving meaning in some contexts, and these documents cannot be used effectively by different systems. However, presenting information with well-defined meaning will enable different computer systems to process and reason about the information at the semantic level whereas the present systems process the information only at the syntax level. Semantic Web approach will drastically change the effectiveness of the Internet and will enable the reuse of information and increase the representative power of information. It will be possible to combine information from different locations and process them together since they are defined in a standard way. In this thesis, concepts such as representing knowledge with a Semantic Web language, ontology processing, reasoning and querying on ontologies have been implemented to realize a Semantic Web application: Ontology-driven Recipe Querying. As the domain, a Web-based application dealing with food recipes has been chosen. All the information and application logic have been moved into an OWL (Web Ontology Language) ontology file which controls all the content and the iii

5 structure of the application, and makes it possible to reason on the provided information to create new facts from already given logic statements. In the application, it is possible for the user to enter queries made up of arbitrary elements when querying for available food recipes. The application is capable of responding meaningfully no matter how the queries are constructed. Keywords: Semantic Web, ontology, ontology querying, ontology management, ontology-driven knowledge management, knowledge representation, Internet. iv

6 ÖZ ANLAMSAL AĞ UYGULAMASI: ONTOLOJİ ODAKLI YEMEK TARİFİ SORGULAMASI Kalem, Güler Yüksek Lisans, Bilgisayar Mühendisliği Bölümü Tez Yöneticisi: Yrd. Doç. Dr. Çiğdem Turhan Haziran 2005, 102 sayfa Günümüzde, Internet ortamında yer alan tüm bilgiler statik içerik içermektedir ve bu dokümanların farklı sistemler tarafından etkili bir şekilde kullanılması oldukça zordur. Bununla birlikte, bilgiyi uygun tanımlanmış bir anlamla sunmak, farklı bilgisayar sistemlerine anlamsal düzeyde bilgiyi işlemeyi ve bilgi hakkında çıkarım yapmayı sağlayacaktır, fakat mevcut sistemler sadece imla düzeyinde bilgiyi işlemektedirler. Anlamsal Ağ yaklaşımı Internetin etkinliğini büyük oranda arttıracak, bilginin tekrar kullanımını sağlayacak, ve bilginin sunum gücünü arttıracak. Bilgiler bir standart ile tanımlandığından, farklı yerlerdeki bilgilerin birleştirilmesi ve bu bilgilerin birlikte işlenmesi mümkün olacaktır. Bu tezde, bir Anlamsal Ağ Uygulaması: Ontoloji Odaklı Yemek Tarifi Sorgulaması gerçekleştirmek için; bilginin Anlamsal Ağ dili ile sunulması, ontoloji işleme, ontolojiler üzerinden çıkarım yapma ve sorgulama kavramları gerçekleştirilmiştir. Alan olarak, Internet tabanlı yemek tarifleri uygulaması seçilmiştir. Tüm bilgi ve uygulama mantığı uygulamanın içeriğini ve yapısını kontrol eden OWL (Web v

7 Ontoloji Dili) ontoloji dosyasına yüklenmiştir, ve bu dosya var olan mantıksal ifadelerden yeni bilgiler elde etmek için çıkarım yapmaya olanak sağlar. Uygulamada, kullanıcı mevcut yemek tariflerini görebilmek için isteğine göre seçtiği malzemeleri yazarak sorgulamalar yapabilir. Ayrıca uygulama sorgulamalar her ne şekilde oluşturulursa oluşturulsun anlamlı cevaplar döndürür. Anahtar Kelimeler: Anlamsal ağ, ontoloji, ontoloji sorgulaması, ontoloji yönetimi, ontoloji odaklı bilgi yönetimi, bilginin temsil edilmesi, örüt ağ. vi

8 ACKNOWLEDGMENTS I express sincere appreciation to my supervisor Asst. Prof. Dr. Çiğdem Turhan for sharing her knowledge with me and guiding me throughout my thesis. Without her project proposal, support, encouragement, guidance and persistence this thesis would never have happened. I should also express my appreciation to examination committee members Prof. Dr. Ali Yazıcı, Prof. Dr. İbrahim Akman, Assoc. Prof. Dr. Nazife Baykal and Asst. Prof. Dr. Nevzat Sezer for their valuable suggestions and comments. In addition, I would like to thank my parents Nafiye and Recep Kalem and my sister Ayşegül for their unlimited patience, support and love during the course of the study. vii

9 TABLE OF CONTENTS ABSTRACT... iii ÖZ...v ACKNOWLEDGMENTS...vii TABLE OF CONTENTS...viii LIST OF FIGURES...x CHAPTER 1. INTRODUCTION OVERVIEW OF THE WEB Web Languages Information Management on the Web Information Retrieval on the Web SEMANTIC WEB Overview of the Semantic Web Information Retrieval with Semantic Web Semantic Web Tools and Languages SGML (Standard Generalized Markup Language) XML (extensible Markup Language) RDF (Resource Description Framework) RDFS (RDF Schema) OIL (Ontology Inference Layer) DAML+OIL (DARPA Agent Markup Language - OIL) OWL (Web Ontology Language) ONTOLOGY, ONTOLOGY EDITORS AND QUERY LANGUAGES Ontology Editors Ontology Management System...36 viii

10 4.3 Ontology Query Languages DESIGN OF THE SEMANTIC WEB APPLICATION: ONTOLOGY- DRIVEN RECIPE QUERYING Overview of the System System Domain System Specifications System Design OWL Ontology Design Technical Specification IMPLEMENTATION OWL Query Server Web Interface Implementing the Ontology with Protégé CONCLUSION...76 REFERENCES...79 APPENDICES A. Ontology Editor Survey Results...86 B. Sample Queries...90 C. OWL Model...92 D. Class Hierarchy for foodreceipts Project...98 ix

11 LIST OF FIGURES FIGURE 1. Structure of the System Properties and Relations of the System OWLQueryServer UML diagram RequestHandler UML diagram FoodEntry UML diagram Search Interface Sample Search (A) Sample Search (B) Sample Search (C) Sample Search (D) Ingredients and Recipe of Körili Pilav Source of Körili Pilav Search Interface Selecting Ingredients from Categorized Box Sample Search (E) Sample Search (F) Sample Search (G) Category of Pilavlar Help Interface of the System Protégé Ontology Editor...75 x

12 CHAPTER 1 INTRODUCTION During the last fifteen years the Internet has stepped into our lives to stay permanently. Today, almost everything in our life has a connection with the Web in one way or another. The Internet has become one of the most important platforms for e-commerce, communication, entertainment, business, education and sharing knowledge by all means. By just looking at the different fields and platforms involved in the Internet it is not difficult to say that the Internet is not just a modern way of doing something, but more it is a de facto situation which will not just fade away but continue its growth into the way we live. It is an entire concept surrounding our life and reshaping our life style. While the Internet is changing our way of living it is also changing and evolving within itself. A new phase is needed where information on the Internet are given well-defined meaning, enabling computers and people to work in cooperation. Currently all the information presented on the Internet just have static content giving meaning in some target environments or contexts. The Internet contains billions of such documents which in general cannot be used effectively by different systems. However, presenting the information in a well defined format using shared standards will enable computer systems to process information at the semantic level whereas the present systems process the information only at the syntax level. Presenting information with well-defined meaning will enable different computer systems to process and reason about the information presented. This approach will drastically change the effectiveness of the Internet and will enable the reuse of information and increase the representative power of information. It will be possible to combine 1

13 information from different locations and process them together since they are defined in a standard way. The Semantic Web is a new way of representing information enabling it to be defined and presented at the semantic level, better enabling computer systems to process this information. A possible realization of the above mentioned process, if not the only one, is to use Semantic Web languages enabling the semantic definition of the information. In this thesis all the concepts involved in Semantic Web have been studied, and how different solutions could be combined to realize such applications have been shown. Concepts such as representing knowledge with a Semantic Web language, ontology processing, reasoning and querying on ontologies have been applied successfully in the developed application. The main purpose of the thesis is to investigate and research the Semantic Web concept and get a solid understanding of the concepts together with its difficulties, problems and the ability to be used in real world applications. In developing the Semantic Web application, the following practical problems arise: to process data defined with the Semantic Web language OWL (Web Ontology Language) [34] [35] [37] [38] [50] [62] [73]. to execute queries on OWL ontologies. to use meaning when applied within applications. to combine and process different information located at different systems, on a single system. The implementation part of the thesis mostly deals with the problems mentioned in the above list. As a domain a Web-based application dealing with food recipes has been chosen. Instead of building all the application logic into static standard HTML with a scripting language, all the information and application logic have been moved into an OWL ontology file. All the data and the application logic should reside in the OWL Web ontology as much as possible for a more effective system. 2

14 Specifically, the OWL ontology controls all the content and the structure of the application. It makes it possible to reason on the provided information and create some new facts from already given logic statements. In the application, functionality is provided, so that an end-user can enter queries for some food recipes through the Web interface. All the data needed for user queries is provided from the information stored in the ontology itself. It is also possible for the user to enter queries made up of arbitrary elements when querying for available food recipes. The application is able to respond meaningfully no matter how the queries are constructed. The report document has been divided into chapters, each of them dealing with specific parts of the Semantic Web concept and the implemented application. The following chapter presents the background information about the concept of Web, general problems and Information Management on the current Web. Then, in chapter 3, the history of Semantic Web, the domain of the Semantic Web and Semantic Web Tools and Languages are presented. Chapter 4 is about Ontology, Ontology Editors and Query Languages. In the 5th chapter, System Design is covered, and System Domain and Specifications has been explained. Chapter 6 presents the Implemented Application discussing User Interface and System Structure in detail. Finally, the Conclusion chapter explains possible extensions to the thesis and future work on the Semantic Web subject. 3

15 CHAPTER 2 OVERVIEW OF THE WEB At the very beginning, when the Web first emerged, some computers connected to each other in order to work together and share the necessary data between them [13] [14] [42]. Over time, the Web started to grow, and the intranets and LANs came on to the scene. But the explosion of personal computers, Mobile devices and major advances in the field of telecommunications were the actual triggers of the Web as we know it today. The growth of the Web has been impressive for the past few years. It is a phenomenon which cannot be defined and described ranging over a period of time because of its potential to change and to fit into our life. The interaction between the Web and the way we live is an interaction which involves both parts equally. As the Web changes according to our needs, equally the way the human beings work, study, communicate with each other are also being reshaped. It is an interaction which still has a great potential to move this interaction far away beyond our imagination. At the first stage of the Web, it was thought of as some exchange platform of documents and data, and a communication media for work collaboration. It was meant to be a big network of workstations where the programs and databases could share their knowledge and work together in collaboration. But with the enormous explosion of the media programs, video games, films, music, pictures, etc. the present Web is almost only used by humans and not by machines. The content is mainly targeted for human consumption. The information meant to be processed by computing systems are generally defined by some custom standards which is a handicap for a more broad and extended use of the provided information. 4

16 Specifically, the main problem that has appeared in the present Web is that the information is written only for human consumption in most cases. The machines cannot understand the meaning of online information. Enormous amount of pictures, drawings, movies of all kind of media types, and information presented in a natural language format populate the actual Web. As a result, this meaningless information is not useful at all to the machines because they cannot process these data as a context-aware system; they only present these data for the user in a specific format. On the other hand, finding the right piece of information is often a nightmare on the present Web. Search results are in most cases imprecise, often yielding matches to thousands of pages. The human searching is often a difficult task, takes too much time and has several limitations. Moreover, users face the task of reading all the documents retrieved in order to extract the information which is actually desired. Today s search engines are not context-aware but rather, perform search with textmatch based methods. A related problem is that the maintenance of Web sources has become very difficult. The burden on users to maintain consistency is often overwhelming. This has resulted in a vast number of sites containing inconsistent and contradictory information. 2.1 Web Languages There are many languages used to publish data in the current Web [13] [15]. Some of these languages are: HTML, PHP, JSP and ASP and some Media-oriented Web languages such as Flash. However, these scripting and markup languages are only meant to process the business logic of the applications and the visual presentation of the information they are dealing with. Markup languages such as HTML does not care about what the information is, it will only control the layout and appearance of the given information. Server side Web scripting languages such as PHP are generally targeted to the dynamic behavior of the Web applications and the business logic of such applications. However, the above mentioned languages all have a common lack in providing and processing semantic meaning bound to information. They just treat data as plain text without any meaning, that is, such Web languages are not aware of the information they are dealing with. 5

17 2.2 Information Management on the Web The incredible progress of the Web is as a direct consequence of a big explosion of all kinds of online Web documents. The information storage and collection on the Web is as follows; the information is generally stored in large databases that are kept in the servers. The programs running on the servers generate the requested Web documents on the fly, based on the needed data at some state. Most of these dynamically generated on-line documents are only made for human consumption and it is impossible for the machines to understand the meaning of these documents. Such kind of Web documents is difficult to reuse and to make available to other parties because they not permanent but are being generated on specific requests without any well-defined meaning. 2.3 Information Retrieval on the Web Information retrieval on the Web [15] refers to the act of recovering information from the vast amount of online Web documents; getting the desired documents and presenting them to the user. This is the classic and the most widely way of obtaining information from the Web. With this approach a user does not extract any information from a document. However, the user just picks up some documents among all the available documents in the Web. The user will get a document or a set of documents and will have to analyze the document to find the desired information if it exists. Actually in this approach, only a portion of the computational power exposed by the computers is used to fetch the desired information. The computing systems used are only responsible in transferring the document and presenting to the user. No processing power is used to retrieve directly relevant information through context-aware processes and methods. The problems associated with the retrieval of quality information from the Internet are many. We can consider the Internet as a connected undirected graph with many nodes where the connections are the edges. In this perspective, the nodes are distributed across the world without regard for cultures of time zones. From this point of view, the idea of a connected undirected graph captures elegantly the idea of the Internet. The problems related to traversal of the Internet to retrieve information 6

18 are that the data is distributed to the whole world and the nodes of the Internet are spread across the world. And, it is obvious that the Internet is changing very fast and the data is volatile. Every six months the Internet nodes and connections are doubling in a topology that is not predefined. The data is redundant and stored in an unstructured way, and the data on the Web is duplicated in many instances across mirror sites. And also, the quality of the data is poor. The volume of data to be searched and found on the Web is growing at an exponential rate. Not all the data is in the same language because the Web is a reflection of the real world in that it is multicultural and multilingual. New media types are appearing at a fast rate, particularly where audio-visual or multimedia files are concerned. Many Web pages content s are created dynamically on demand. On the Web, the unstructured markup languages make it difficult for humans and even more difficult for the machines to locate and acquire the desired information. To retrieve information on the Web, current methods are browsing and keyword based searching. Both of the mentioned methods have several limitations when retrieving information from the Web. Browsing: Browsing the Web refers to the act of retrieving a Web document by means of its URI (Uniform Resource Identifier) and displaying it in the local client browser to view its content. The user often has to traverse from link to other link in order to reach the desired information if it ever happens. Anybody familiar with the Web knows the drawbacks of looking for information by means of browsing: It is very time consuming It is not always possible to reach the desired information even though it exists somewhere on the Web. It is also very easy to get lost and disoriented following all the links the user might find relevant; suffering from what is called the lost-in-hyperspace syndrome. 7

19 Keyword Searching: Keyword searching is an easier way to retrieve information when compared against browsing Web documents through Web links. Keyword searching on the Web refers to the act of looking for information using some words to guide the searching. These keywords that the user wants to search for are entered into a search engine which will perform the searching on the Web cache it has stored and indexed locally. Beforehand, the search engines continually traverse all the links available on the Web caching and indexing all the Web documents they reach. The search engines search the reduced copy of the Web following the links and trying to match the input words with the words found in its index tables. When a match occurs, the links pointed to by the index tables are returned back to the user. Keyword searching is more useful than just browsing when looking for information, since the user does not need to know the exact URI of the desired Web document, however this approach still has some disadvantages: The user must be aware of the available search engines and choose the correct one that fits his/her necessities. The keywords entered by a user are the ones the user considers more relevant for the information he/she wants to look for, which is a very subjective decision. The entered keywords have to exactly match the words presented in the Web documents. Even a slight variation is not tolerated. Keyword searching normally returns vast amounts of useless document references/links the user has to filter by hand. Although search engines index much of the Web's content, they have little ability to select the pages that a user really wants or needs [66]. 8

20 CHAPTER 3 SEMANTIC WEB The Web has dramatically changed the accessibility of electronically available information. Today, the Web currently contains about 3 billion static documents and these are accessed by over 500 million users from all around the world [5] [12] [67]. For this reason, with this huge amount of data, and since the information content is presented primarily in a natural language, it became increasingly difficult to find, access, present, and maintain relevant information. So, a wide gap has occurred between the information available for tools and the information maintained in human-readable form. As a response to this problem, many new research initiatives and commercial enterprises have been set up to enrich available information with machineprocessable semantics. One of the examples of the recent research is Semantic Web which aims to provide intelligent access to heterogeneous, distributed information, enabling software products (agents) to mediate between the user needs and the information resources available. This support is essential for bringing the Web to its full potential. Tim Berners-Lee [66], Director of the World Wide Web Consortium and the inventor of World Wide Web foresees a number of ways in which developers can use self-descriptions and other techniques so that context-understanding programs can selectively find what users want. Lee referred to the future of the current Web as the Semantic Web that is extended Web of machine-readable information and automated services that amplify the Web far beyond current capabilities. 9

21 The explicit representation of the semantics underlying data, programs, Web documents, and all kind of information related Web resources will enable a knowledge-based Web that provides a qualitatively new level of service and a new way of processing data. Computing systems and automated services will improve in their capacity and ability to assist humans in achieving their goals by understanding more of the information presented on the Web, and thus providing more accurate filtering, categorizing, and searching of these information sources available on the Web. This process will ultimately lead to an extremely knowledgeable system that features various specialized reasoning services thus extending the representational power of the available information. As Lee summarized [5] [67]; The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web - a Web of data that can be processed directly or indirectly by machines. 3.1 Overview of the Semantic Web The purpose of the new phase in Web technology is to make the machines capable of understanding the semantics of the information presented on the Web. To be able to read and understand the Web as a human being does. For this purpose, many different approaches have been formulated by a large number of researchers, organizations and universities. Most of these methods are explained in detail in this thesis. The Semantic Web is not a separate Web [2] [11], yet it can be assumed to be an extension toward the meaning of the current Web. The main difference between the Semantic Web and the Web is that the Semantic Web is supposed to provide machine accessible meaning for its constructs whereas in the Web this meaning is provided by external mechanisms. In order to determine the meaning of a collection of documents, it is necessary to use only the meaning determined by the formal language specifications of the Semantic Web, currently the RDF (Resource Description Framework) model theory and the OWL model theory. 10

22 The Semantic Web aims for meaningful and machine-understandable Web resources, whose information can then be shared and processed both by automated tools, such as search engines, and by human beings [5] [9]. The consumers of Web resources, whether automated tools or human beings are referred to, as agents. This sharing of information between different agents requires semantic mark-up, for example, an annotation of the Web page with information on its content that is understood by the agents searching the Web. This kind of an annotation will be given in some standardized, expressive language (which, e.g., provides predicate logic and some form of quantification) and will make use of certain terms or classes (like \Human", \Plant", etc.). To make sure that different agents have a common understanding of these terms, we need ontologies in which these terms are described, and which thus establish a joint terminology between the agents. Basically, Web ontology is a collection of definitions of concepts and the shared understanding that comes from the fact that all the agents interpret the concepts with respect to the same ontology. Using the same standards will enable the reuse of the defined information. That is, the information is not annotated for a specific system, however the annotation relies on some shared standards which makes it possible to be recognized by different computer systems. What the Semantic Web is NOT? The Semantic Web is not Artificial Intelligence: The concept of machineunderstandable documents does not imply some magical artificial intelligence which allows machines to comprehend human words and fully understand them as human beings do [16]. Semantic Web only denotes a machine's ability to solve a welldefined problem by performing well-defined operations on existing well-defined data. Instead of asking machines to deduce people's language, it involves asking people to make the extra effort so that the machines are able to process the data in some specific way. Even though it is simple to define information with languages such as RDF, at the level with the power of a Semantic Web these language will be complete languages, capable of expressing paradoxes and tautologies. And it will be possible to phrase questions whose answers normally would require a machine to search the 11

23 entire Web and take an unpredictable amount of time to find the answer. This should not keep us away from making these languages complete. Each mechanical application relying on such languages will use a schema to restrict its use to an intentionally limited language. However, when links are made between the Webs relying on such languages, the result will be an expression of a big amount of information. It is obvious that because the Semantic Web must be able to include all kinds of data to represent the world, the languages must be completely expressive. A Semantic Web will not require every application to use expressions of arbitrary complexity: Even though the languages used to define information allow expressions of arbitrary complexity and computability, applications which generate semantically defined information will in practice be limited to generating simple expressions such as access control lists, privacy preferences, and search criteria. A Semantic Web will not require proof generation to be useful: proof validation will be enough: Although access control on Web sites involve validation of a previously prepared proof, there is no requirement for them to answer an arbitrary question, find the path and the construct of a valid proof. It is well known that to search for an answer for an arbitrary question and generate a proof for the question is typically an intractable process as many other real world problems, and a Semantic Web language does not require this (unsolvable) problem to be solved in order to be useful. A Semantic Web is not an exact rerun of a previous failed experiment: Until now other concerns has been raised against the Semantic Web concept such as the relation to Knowledge Representation Systems. More or less, such systems have tried to achieve similar results as the Semantic Web concept is trying to do. Systems such as KIF [97] and CYC [98] [99] are some examples of such Knowledge Representation Systems. However the success or failure of such systems should not be a threshold or limit for the Semantic Web concept/project. A more constructive approach would be to feed the Semantic Web with design experience and the Semantic Web may provide a source of data for reasoning engines developed in similar projects such as those that utilize Knowledge Representation systems. 12

24 3.2 Information Retrieval with Semantic Web Machine to Human: The addition of semantic annotations to Web documents would improve information retrieval in various ways yet unimagined. As Tim Bray said, search engines "do the equivalent of going through the library, reading every book, and allowing us to look things up based on the words found in some text" [66]. If more descriptive metadata were available, one would not, as when using Web search engines; have to rely on the popularity of the resource as an assurance of its relevancy. How can we be sure that often accessed information against some queries is relevant to each other? We cannot be sure that such relations always hold. Librarians, who often act as human mediators between the complex relations of structured information and the often unformulated queries of the information seeker know that information retrieval is often incomplete even when information is organized well. When organized badly or not at all, the consequences are failure in retrieving information. Human to Machine: Tim Berners Lee discussed as illustrated in the reference [42] how content-aware agents using semantic information could be used to conduct research efforts into everyday tasks such as investigating health care provider options, prescription treatments, or available appointment times. Each of these tasks now is usually conducted by a human researcher assigned for this task. If one has left the task to take a trip, he/she must investigate the best price for an airplane ticket, (even though some of this information is already collected), and match the information about available flights with available times from a personal calendar. This sort of research is conducted daily and one takes for granted the mental and representational systems needed to ask a question, investigate an answer, pull related information together, select the information which is relevant to the inquiry and initiate another set of actions based on this selection. Researchers on artificial-intelligence [68] [69] have been working on methods to automate these kinds of tasks and processes for many years. Such researchers have developed several approaches that in the future may be applicable to the Semantic Web. 13

25 3.3 Semantic Web Tools and Languages During the last few years, several ontology languages [4] [17] [21] [71] [72] have been developed. All of these languages are based on XML [23] syntax, such as XOL [25] (Ontology Exchange Language), SHOE [26] (Simple HTML Ontology Extension) which was previously based on HTML, and OML (Ontology Markup Language), whereas RDF [27, 28, 29] (Resource Description Framework) and RDFS [30] (RDF Schema) are languages created by W3C (World Wide Web Consortium) group members. Two additional languages are being built on top of the union of RDF and RDF Schema with the objective of improving its features; these are OIL (Ontology Inference Layer) and DAML+OIL [32] (Darpa Agent Markup Language). Semantic Web languages such as XML, RDF, RDFS, DAML+OIL, OWL [33, 34, 35] are used to organize, integrate and navigate the Web; at the same time allowing content documents to be linked and grouped in a logical and relevant manner. With the information environment that these standards can create, users can search and browse information resources in an intuitive way with the help of contentaware machines/computing systems. All of these languages that are oriented to create the Semantic Web are structured languages and with this feature they can carry on meaning besides giving structure to the text. Also they have different characteristics compared to each other. Some of them are relatively new languages, and the newly available languages aim to make progress from the previous ones, evolving and improving their characteristics to support the Semantic Web concept. The reached semantic power is at different levels, some languages provide meaning to the text/information; others go further and make assertions and inference of knowledge and facts etc. possible as well. Some important languages in chronological order [17] [18] [20]: Standard Generalized Markup Language (SGML) extensible Markup Language (XML) Resource Description Framework (RDF) 14

26 Darpa Agent Markup Language - Ontology Inference Layer (DAML+OIL) Web Ontology Language (OWL) In the context of the Semantic Web, a major effort is devoted to the realization of machine processable semantic meaning, expressed in meta-models such as RDF, OIL, OWL, DAML+OIL and based on shared ontologies. Still, these approaches rely on common ontologies being able to be merged, to which existing information sources can be related by proper annotation. This is an extremely important development, but its success will heavily rely on the wide standardization, acceptance of different languages and adoption of common ontologies or schemas. In the Semantic Web, all the necessary information resources (data, documents and programs) will be made available along with various kinds of descriptive information and annotations, i.e., metadata. A clear defined knowledge about the meaning, usage, accessibility or quality of Web resources will considerably facilitate automated processing of all the available Web content/services. The Semantic Web will allow both human beings and machines to query the Internet as if it were a huge database. To allow the realization of such a concept, besides the Web languages, different tools also have to be developed in order to infer information from the Web. Inference does not only depend on the languages but also on the different tools that are currently being developed around the languages SGML (Standard Generalized Markup Language) It is a system for organizing and tagging elements of a document. SGML was developed and standardized by the International Organization for Standards (ISO) in 1986 [70] XML (extensible Markup Language) The XML [14] [19] is a meta-language for defining application specific markup tags and it is the universal format for structuring Web documents and data on the Web which is also proposed by the W3C. The main contribution of XML is 15

27 providing a common and communicable syntax for Web documents. XML itself is not an ontology language, but XML Schemas [24], which define the structure, constraints and the semantics of XML documents, can be used to specify ontologies. But since the aim of the creation of XML Schema is the verification of XML document and its modeling primitives and these tasks are more application oriented rather than concept oriented, XML Schema will not be considered as an ontology language. The only reasonable interpretation is that XML code contains named entities with sub-entities and values; that is, every XML document forms an ordered, labeled tree, which is because of the both XML s strength and its weakness. It is possible to encode all kinds of data structures in an unambiguous syntax, but XML does not specify the data s use and semantics. The groups that use XML for their data exchange must agree beforehand on the vocabulary, its use and meaning. Why Meta Data Is Not Enough: XML metadata is a form of description of available data within some document or information. It describes the purpose or meaning of raw data via a text format to more easily enable information exchange, interoperability, and application/platform independence [5]. As a description, the general rule is accepted as more is better. Meta data increases the usability and granularity of the defined data. The way to think about the current state of metadata is that words (or labels) are attached to the data values in order to describe it. While the moving toward metadata evolution will not follow natural language descriptions, it is a good analogy to that only the words are not enough. The motivation for providing richer data description is to move data processing from being static and mechanistic to dynamic and adaptive. For example, we may be enabling our systems to respond in real time to a location-aware cell phone customer who is walking in a store outlet. If a system could match consumers needs or past buying habits to current sale merchandise, the revenue would increase. Additionally, the computers should be able to support that sale with just-in-time inventory by automating the supply chain with its partners. The general rule is: The more computers understand, the more effectively they can handle complex tasks. 16

28 All the possible ways a semantically aware computing system can drive new business and decrease the operation costs have not yet been invented. However, to get there, it must push beyond simple metadata modeling to knowledge modeling and standard knowledge processing. There are three emerging steps beyond simple metadata: semantic levels, rule languages, and inference engines. These are the backbones of the Semantic Web RDF (Resource Description Framework) RDF is a document structure for encoding, exchange and reuse of structured metadata that is also proposed by W3C [14] [19]. In order to represent metadata in XML, RDF provides a standard form. The RDF data model consists of three object types: Resources: All things being described by RDF expressions are called resources. A resource could be an entire Web document; such as the well known HTML document "" for example. A resource may be a part of a Web page; e.g. a specific element within the document source of an HTML or XML Web document. A resource may also be a large collection of Web documents; e.g. an entire Web site. A resource could also be a Web object not directly presented on the Web; e.g. a printed book. Properties: A property is a specific aspect, characteristic, attribute, or relation used to describe a resource. Each property has a specific meaning, defines its permitted values, the types of resources it can describe, and its relationship with other properties. This document does not address how the characteristics of properties are expressed; for such information, one should refer to the RDF Schema specification. Statements: A specific resource together with a named property and the value of that property for that resource is defined as an RDF statement. These three individual parts of a statement are called the subject, the predicate, and the object of that statement respectively. The object of a statement (i.e., the property value) can be another resource or it can be a literal value; i.e., a resource (specified by some URI) or a simple string or any other primitive data type defined by XML. Speaking in 17

29 RDF, a literal may have a content that is XML markup, however is not further evaluated by the RDF processor. RDF does not have any specific mechanisms to define relationships between these object types, but the RDF Schema (RDFS) Specification Language does. Although the main intention of RDFS is not for ontology specification, RDFS can be used directly to describe ontologies. RDFS provides a standard set of modeling primitives for defining ontology (class, resource, property, is a and element-of relationships etc.) and a standard way to encode them into XML. But, since axioms cannot be defined directly, RDFS has a rather limited expressive power. And also, the relation between ontology and RDF(S) is much closer than that of between ontology and XML. Basically, the RDF data model consists of statements about resources, encoded as object-attribute-value triples. The objects are resources, the attributes are properties and the values are resources or strings. For example, to state that Zeynep is the author of the article at a specific URL (Uniform Resource Locator), one would use the triple: has author, Zeynep. Attributes, such as has author introduced in the previous example, are called the properties RDFS (RDF Schema) The important feature of RDFS when concerned with ontologies is that RDFS expresses class-level relations describing acceptable instance-level relations. RDF Schema is a language layered on top of the RDF language. This layered approach has been presented by the W3C organization and Tim Berners-Lee as the Semantic Web Stack of layers of different languages or concepts all related to each other [30] [71] [72]. The base layer of the stack is the concepts of universal identification (URI) and a universal character set (Unicode). Above those concepts, the XML Syntax is layered (elements, attributes, and angle brackets) and namespaces to avoid vocabulary conflicts so that every domain can identify names only required to be unique within the local domain. The layers above XML are the triple-based assertions of the RDF model and syntax discussed in the previous section. If a triple is used to denote a class, class property, and value, it will be possible to create class 18

30 hierarchies for the classification and description of different objects. This is the goal of RDF Schema. The data model expressed by RDF Schema is the same data model used by different object-oriented paradigms e.g. programming languages like Java. The data model for RDF Schema allows creating classes of some information within a domain. A class is defined as a group of things with distinct features and with some common characteristics. In object-oriented programming (OOP), a class is defined as a template or a type definition for an object (instance) composed of characteristics (also called data members or fields) and behaviors (also called methods or functions). An object is a single instance of a specific class. Object-oriented languages also allow classes to inherit characteristics and behaviors from a parent class (also called a super class). All these concepts are more or less very similar to the model used by RDF Schema. Above RDF Schema the ontologies layer is residing. Above ontologies, logic rules can be added about things defined in the ontology. A rule language will make it possible to infer new knowledge and make decisions. Additionally, the rules layer provides a standard way to query and filter out data from RDF. The rules layer is sort of an introductory logic capability, while the actual logic framework will be advanced logic. The logic framework allows formal logic proofs to be shared. Lastly, with such proofs, it will be possible to establish a trust layer for levels of application-to-application trust. This Web of trust forms the third and final Web in Tim Berners-Lee s three-part vision expressed as collaborative Web, Semantic Web and Web of trust OIL (Ontology Inference Layer) OIL was developed in the OnToKnowledge Project [17] [19] [41], and is both a representational and exchangeable language for creating Web ontologies. The language is combined with primitive elements from frame-based languages, formal semantics and reasoning services from description logics. To enable the use of OIL on the Web it is based on the W3C standards, XML and RDF(S). The ontology description is divided into three different layers: object level (concrete instances), 19

31 first meta-level (ontological definitions) and second meta-level (describing features of the ontology). The OIL ontology language provides definitions for classes and class relations, and a limited set of axioms enabling the representation of different classes and their properties. Relations (also called slots) are treated like first-class citizens, and can be represented in different hierarchies. Although it has some limitations, OIL can provide precise semantic meaning which will enable reasoning systems to process the defined information effectively. As mentioned in the above paragraph, OIL is built on top of RDF(S), and has the following layers: Core OIL groups the OIL elements/primitives that have a direct mapping to RDF(S) elements/primitives; Standard OIL is the complete OIL model with all its features, using more primitives than the ones defined in RDF(S); Instance OIL adds instances of different concepts, classes and roles to the previous model; and Heavy OIL has been designed as the layer for future extensions of the OIL language. OILEd, Protégé-2000, and WebODE are some powerful ontology editors that can be used to author OIL ontologies (as well as other Web ontologies). Another feature of OIL is that its syntax can also be expressed in ASCII which is not XML compliant DAML+OIL (DARPA Agent Markup Language - OIL) These two languages are the XML and Web-based languages to support the development of Semantic Web. DAML+OIL [11] is a descriptive semantic markup language for Web resources which is built on top of earlier defined languages such as RDF and RDF Schema, and extends these languages with richer modeling primitives enabling reasoning systems to process it more effectively. DAML+OIL was developed by the Defense Advanced Research Projects Agency (DARPA) [20] under the DARPA Agent Markup Language (DAML3) Program. With DAML+OIL, in order to make the information/data yet more expressive and powerful, it is possible to use description logic to describe the data enabling it to be processed on reasoning systems. In this way, not only the explicitly given data will be available but some new facts and conclusions will be available about the data 20

32 provided. In order to achieve this extra feature, the DAML+OIL is a suitable language because of its expressiveness with descriptive logic. For achieving this extra power, an extension of RDF, called DAML+OIL, can be used. DAML+OIL is a description logic language disguised in an XML format. DAML extends RDFS in the following ways: Support of XML Schema data types rather than just string literals and primitive data types such as dates, integers, decimals, etc. Restrictions on properties like cardinality constraints. Definition of classes by enumerations of their instances. Definition of classes by terms of other classes and properties. In order to enable the definition from other classes different expressions has been defined such as; unionof, intersectionof, complementof, hasclass and hasvalue which some of them has their roots in classic set theory. It is possible to make Ontology and instance mappings (sameclassas, samepropertyas, sameindividualas, differentindividualfrom) permitting translation between ontologies. Additional hints to reasoning systems such as; disjointwith, inverseof, TransitiveProperty and the UnambiguousProperty. DAML is not completely developed yet. Even though it was actually the recommended ontology language by the World Wide Web Consortium, a new project called Ontology Web Language (OWL) has been developed to replace DAML. The OWL project has removed some of the requirements specified for DAML language, as rules, queries and services are still under development. Description Logics Description logics (DLs) [9] [39] are a family of knowledge representation languages that can be used to represent the knowledge of an application domain and is very well-suited to provide structure to information. Description Logics is a subset of First Order Logic, which is non functional and does not allow explicit variables. It is less expressive in favor of having greater decidability when processed by inference 21

33 procedures. Description Logics is different form predecessors, such as semantic networks and frames, in that they are equipped with a formal, logic-based semantics. High quality Web ontologies are necessary for the Semantic Web to be successful, and their construction, integration, and evolution is greatly dependent on the availability of a well-defined semantics and powerful reasoning systems. Since DLs provide these aspects, they should be ideal candidates for creating and developing ontology languages. That much was already clear ten years ago, but at that time, there was a fundamental mismatch between the expressive power and the efficiency of reasoning that DL systems provided, and the expressivity and the large knowledge bases that ontologists needed. Through the basic research in DLs in the last 10 to 15 years, the gap between the needs of ontologists and the systems that DL researchers provide has finally become narrow enough to build stable bridges OWL (Web Ontology Language) OWL Ontology: Ontology is a term borrowed from philosophy which refers to the science of describing the kinds of entities in the world and how they are related to each other. An ontology created with OWL may include descriptions of classes, their instances and properties. Given such an ontology, the formal semantics of OWL specifies how to derive its logical meaning not given explicitly, i.e. facts that are not present in the ontology, but derived by the semantics. These derivations may be based on a single OWL document or multiple distributed documents that have been combined with OWL mechanisms allowing such extendable ontologies. The Web Ontology Language is developed and produced by the W3C Web Ontology Working Group (WebOnt). The Web Ontology Language OWL [11] [38] is a semantic markup language for publishing, extending and sharing ontologies through the Web. OWL is developed as a vocabulary extension of the formerly developed RDF and is derived from the DAML+OIL Web Ontology Language adding some extra features and discarding some of the specifications intended for DAML+OIL. It is a revision of the DAML+OIL Web ontology language including lessons learned from the design and application of DAML+OIL ontology language. 22

34 OWL can be used to explicitly represent the exact semantics of classes within some domain and the relationships between those classes (and instances). OWL has more expressive semantic power than XML, RDF, and RDFS, and thus goes beyond these languages ability to represent machine readable content on the Web. In the comparision of OWL to XML and XML Schema, two points must be mentioned: An ontology differs from an XML Schema in a way that ontology is a knowledge representation, not a message format. Most Web standards based on industrial corporations consist of a combination of different message formats and protocol specifications. These formats have been given an operational semantics, such as, "Upon receipt of this PurchaseOrder message, transfer Amount dollars from AccountFrom to AccountTo and ship the product purchased." That is each of the steps in the semantics is precisely defined. However, this kind of a specification is not designed to support reasoning outside the transaction context. It is fixed on the well defined steps. For example, in general it is not possible to have a mechanism to conclude that because the Product is a type of Chardonnay it must also be a white wine. Such kind of reasoning and conclusions are essential in Semantic Web. One advantage of OWL ontologies is the availability of different tools that can reason about them (For example Racer which reasons on OWL ontologies and derives new facts from given statements). Such tools will provide generic support that is not specific to a particular domain, which would definitely be the case if one were to build a system to reason about a specific industry-standard XML Schema. Developing a useful reasoning system is not a simple task to accomplish. Developing an ontology is much more tractable and feasible. The OWL language provides three increasingly expressive sublanguages designed for different users in specific communities. 23

35 OWL Lite: OWL Lite is targeted for users only needing simple constraint features and classification hierarchies. For example, even though OWL Lite supports cardinality constraints the cardinality values are restricted. For such constraints only the values 0 and 1 is allowed. It is much simpler to provide tool support for OWL Lite than it is for its more expressive relatives. This will allow easy migration to OWL Lite from different ontology languages being used on the market. OWL DL: OWL DL supports users who want the maximum expressiveness without the lack of computational completeness (all entailments are guaranteed to be computed) and decidability (all computations will finish in finite time) of reasoning systems. OWL DL includes all OWL language constructs with restrictions such as type separation (a class cannot also be an individual or property, a property cannot also be an individual or class) enabling to create distinct definitions. It is named as OWL DL because of its correspondence to Description Logic [39], a field of research that has studied a decidable fragment of first order logic. OWL DL was designed so that it has desirable computational properties for reasoning systems. OWL Full: OWL Full is targeted for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees. Decidability and completeness properties have not been restricted as it is in OWL DL. Type separation is not as strict as it is in OWL DL. For example, in OWL Full a defined class can be treated as a collection of different individuals and as an individual in its own right simultaneously. Another important difference from OWL DL is that in OWL Full a owl:datatypeproperty can be marked as an owl:inversefunctional- Property. OWL Full allows an ontology to incorporate the meaning of a pre-defined (RDF or OWL) vocabulary. It is unlikely that any reasoning software will be able to support every feature supported by OWL Full. Each of the sublanguages mentioned above is an extension of their simpler predecessor; both in what can be legally expressed in the ontology and in what can be validly concluded in it. The following set of relations hold, but their inverses do not. Every legal OWL Lite ontology considered legal OWL DL ontology. 24

36 Every legal OWL DL ontology considered legal OWL Full ontology. Every valid OWL Lite conclusion considered valid OWL DL conclusion. Every valid OWL DL conclusion considered valid OWL Full conclusion. Ontology developers should consider which of the species best suits their needs when choosing an OWL language. When making choice between OWL Lite and OWL DL, the choice depends on whether the users need the more expressive restriction constructs provided by OWL DL. Reasoning systems for OWL Lite will have desirable computational properties. Reasoners for OWL DL will be subject to higher worst-case complexity because of its more expressiveness compared to OWL Lite. When considering OWL DL and OWL Full, the choice between them mainly depends on the extent to which users require the meta-modeling facilities provided by RDF Schema (i.e. defining classes of classes). Reasoning support is less predictable when comparing OWL Full to OWL DL. Moreover, OWL makes an open world assumption, that is, descriptions of resources are not bounded to a single file or scope. While class C1 may be defined originally in the ontology O1, it can also be extended in other ontologies. The consequences of these additional propositions about C1 are not reversible. New information cannot retract previous information where the new information is originated from. New information from reasoning can be contradictory, but facts and entailments can only be added and never deleted. It is the responsibility of the designer of the ontology to take into consideration, the possibility of these contradictions. It is expected that tool support will help detecting such cases. In order to write an ontology that can be interpreted unambiguously and used by software agents, a syntax and formal semantics for OWL is required. In addition, OWL is a vocabulary extension of RDF [37]. 25

37 CHAPTER 4 ONTOLOGY, ONTOLOGY EDITORS AND QUERY LANGUAGES Shortly, an ontology [44] [45] [47] is conceptualization based specification of classes or in other words things. An ontology is a detailed description of certain concepts and the relations among them where the concepts are defined within a specific domain. The usage of an ontology is consistent with the definition since it is broken into simpler sets of such concept definitions and relations when being processed. Even though the word ontology has its origin in philosophy, it is understood in a completely different sense. Ontologies are designed for the purpose of defining knowledge, reusing and sharing it effectively. It is a formal definition for some ontological commitment so that different parts can participate when relying on the same definitions and vocabularies. An ontology is a set of definitions written using a formal vocabulary. To specify a conceptualism this is the main approach used because it has some properties enabling AI processing systems to share knowledge among them. In other words, an ontological commitment is a kind of an agreement involving different domain specifications to use a specific vocabulary when defining concepts. Different processing systems are being built so that they can participate in such commitments. That is they can be connected to some ontology without any conflicts with respect to the definitions and the vocabularies used. An ontology is built so that such systems can participate into them, and share knowledge among other systems. Given a specific domain, an ontology defined for that domain is the base for the knowledge to represent for that domain. An ontology enables the definition of a 26

38 vocabulary in order to express the knowledge for some domain. Without a definition of some vocabulary it is not possible to share knowledge among different systems/agents. Simply said, there will be no common ground for such systems to exist and share knowledge. A domain is a specific area of a subject or an area of knowledge like medicine, economy, a specific research etc. Ontologies are used by systems/agents such as databases application programs or any other thing that need to share knowledge. Ontologies are being build up of basic concepts and the different relations between them. The definitions of such concepts and relation are computer usable so that computing systems can process these concepts and relations. Simply said, defining an ontology is similar to defining a set data with all its properties so other programs can use this data. Different computing systems as domain independent applications and software agents use ontologies and knowledge bases built on top of a set of ontologies. Class definitions are the most common approach when defining some domain in an ontology. Class definitions are suitable to define and describe the different concepts within a domain. For example, a class defining a pizza represents all the different pizza instances that exist. Any pizza is an instance of the class defining and describing a pizza. Classes can have inheritance relations between them enabling the definition of more specific classes from a given class. Definition of more general classes is also possible. For example we can have subclasses of the class pizza as spicy pizza and non spicy pizza where the class pizza is a super-class of these two classes. An ontology supports software agents, or in general all computer systems requiring to share and reuse domain knowledge. Below are listed the important features of an ontology are listed [47]: Ability to reuse domain language. Making domain assumptions explicit. Separation of operational knowledge and domain knowledge. Sharing the formal definitions and vocabularies when describing some concept. 27

39 Analysis of domain knowledge. There are many contradicting definitions of ontologies especially in the AI world. An ontology is not directly a knowledge base. There is a thin line between the definitions of these to concepts. Definitions of some knowledge for a domain, the classes and the instances of these classes constitute a knowledge base. On the other hand, an ontology is not much concerned with the individual instances. For example, for an ontology the number of spicy pizzas is not important, rather the definition of a pizza is essential for an ontology. The definition of knowledge is what ontologies a more concerned about. What can ontologies be used for? Below is a list of major use cases of different ontologies identified by the Web Ontology Working Group at W3C [16] [33] [47]. Controlled vocabulary Web site or document organization and navigation support Browsing support Search support (semantic search) Generalization or specialization of search Sense "disambiguation" support Consistency checking (use of restrictions) Auto-completion Interoperability support (information/process integration) Support validation and verification testing Configuration support Support for structured, comparative, and customized search. How are ontologies different from relational databases? Although databases and ontologies have some similarities, they differ in many important features. First of all an ontology is not storage for data but is a defining model for the data whereas a relational database is a data repository. An ontology can 28

40 be used as filter or a framework to access and manipulate data where a database can be used to store the different data instances defined by the ontology. Another important difference is querying. When making queries against a relational database the returned data will be the same data stored previously, just matching some conditions. However when making a query against an ontology, together with some reasoning process, the returned data can be some inferred data which was not stored previously but generated from some facts represented by the ontology. In ontologies, queries can also be made for some specific relations while this is not possible with ordinary relational databases. How are ontologies different from object-oriented modeling? An ontology is also different than the object-oriented paradigm even though there are a lot things in common, especially when it comes to model real life with class definitions. First of all, the whole concept of ontologies has its theoretical roots in logic. Because of that ontologies allows reasoning systems to make automated reasoning on the defined knowledge represented by the ontology. Another important difference is the definition of properties. In an ontology, properties are treated as first-class citizens while in the object-oriented paradigm this not true. In the objectoriented world, properties are internal to class definitions. In an ontology it is possible define multiple inheritance while this is not the case in the object-oriented paradigm. In object-oriented modeling it is only possible to make single inheritance between classes because of overlapping method signatures defined in different super classes when participating in a multiple inheritance relationship. Ontologies allows property inheritance while this not possible with objectoriented modeling. While the ontologies allow user defined relations between different classes, object-oriented modeling restricts the relation with the class subclass concept. However, because of the wide acceptance and use of object-oriented modeling and UML, they are accepted as practical specifications when modeling ontologies. But because of the lack of logic capabilities of the object-oriented modeling approach these two different concepts cannot be fully combined and be productive as they are defined today. Currently there is an on-going effort to add 29

41 logic capability to object-oriented modeling, represented by OCL (Object Constraint Language). Some important aspects of ontologies are explained below [46]: Kinds of Ontologies: Ontologies may differ with respect to different aspects such as their implementation, content, level of description and the structure of knowledge modeling. Level of description: Ontologies can be built in several ways. The same knowledge domain can be described in different ways. There is no unique perception of a knowledge domain which results in a specific description. It is entirely dependent on the different practitioners. Different vocabularies, terms and taxonomies used can be given distinguishing properties where these properties can make it possible to define new concepts where these concepts have some named relationships with other concepts. Conceptual scope: The scope and purpose of the different concepts can also be different in ontologies. The clearest difference can be seen between ontologies where one of them is modeling fields of knowledge domains such as medicine and more high level ontologies describing basic concepts and relationship when domain knowledge is expressed with natural language. Instantiation: All ontologies have a terminological component which is analogues to the relationship between an XML document and Schema. This terminological component defines the vocabulary and structure of the domain the ontology is intended to model. The second part called the asserted part, populates the ontology with individual instances that are created on the ground established by vocabulary and structure of the ontology. The second part can be separated from the ontology implementation and be maintained in a knowledge base where access to this knowledge base is controlled by the ontology itself. However, treating an instance as an individual or treating it as a concept is entirely defined by the way the specific way the ontology is defined. 30

42 Building Ontologies: An ontology can be built in several ways depending on the practitioners and the domain to be modeled. Below is list of the different ontology building approaches. 1. Acquiring domain knowledge: Assembling all the information resources that will define the consistency and terms used to formally describe the things in a given domain. These information, concepts and relations must be collected so that they can be described by a chosen language. 2. Organization of the ontology: Designing the overall conceptual structure of the domain involving the identification of the domain s specific concepts and properties. Identifying the different relationship between the different concepts and all the concepts that have individual instances. 3. Building detailed descriptions for the ontology: Adding concepts, properties, relations and individuals according to the needs of the domain being modeled. 4. Ontology verification: Checking inconsistencies among the ontology element such as the syntax, logic and semantic properties. This can also be based on automatic classification that defines new concepts from existing concepts, class relations and properties. 5. Ontology Commitment: Final verification of the ontology and later commitment of the ontology by deploying it into a target environment. Why are ontologies important in computing? Building systems relying on ontologies shows a great potential to make software more efficient, adaptive and intelligent. It is one of the most promising areas in Web technology which will enable the next break through in the Web. Still it is not widely accepted and deployed but it has already been accepted by some industries. For example some medicine industries are heavily using ontologies and contributing to the development of it. The medicine community has produced the powerful ontology editor Protégé [50] which is an ontology editor allowing the management and development of ontologies. However it is still not being used by the majority of the mainstream users because it is not a straightforward process to apply ontologies into different software system dealing with knowledge. There is no standard way of doing 31

43 things. However, it is only a matter of time that some more techniques will gain attention after experience gained from the different subject fields using ontologies as information representation technology. However, the Semantic Web has completely changed their vision of the ontology landscape to make it a more widespread applied technology. They make a great effort on developing standard semantics markup languages based on XML, ontology managements systems and different ontology management tools to make it easier to adapt to ontologies and integrate them into computer systems. The use of ontologies is newly being discovered in important applications heavily dealing with information and the integration of different processes with information. Ontology is slowly making its way into the software world as its usefulness is becoming clearer as time passes. Ontology Tools Effective and efficient work with the Semantic Web must be supported by advanced tools enabling the full power of this technology. In particular, we need the following elements. In order to effectively make use of the Semantic Web, the users must be supported by different tools to be able to use all the power exposed by the Semantic Web. The following list is the important elements needed to make use of Semantic Web efficiently and effectively: Ontology editors to easily create and manipulate ontologies. Annotation tools to link information sources together with different structures. Reasoning services to enable advanced query services and to map between ontologies with different terminologies. Ontology library systems and Ontology Environments to create and reuse ontologies. Such systems should in general allow merging different ontologies sharing the same terminology. 32

44 Inference engines can be used to reason about ontologies and the instances defined by those ontologies and create new knowledge from existing knowledge. Inference engines are similar to SQL (Structured Query Language) query engines running against databases but provide stronger support for different rules which cannot be represented in relational databases known today. An example inference engine is Ontobroker [57] which is now a commercial product. Ontobroker can automatically derive new concepts in a given concept hierarchy when reasoning the concepts of an ontology. Another well known inference engine is Racer [49] which can be used to implement industrial strength projects which make uses of ontologies created with OWL/RDF. Ontology Libraries and Environments If we assume that we have access to various well defined ontologies, creating a new ontology is only a matter of merging the existing ontologies and adding new concepts. Instead of building the ontologies from scratch it will be possible to reuse the existing ontologies. In order to do this mainly two types of tools are needed. 1. Tools to store and access existing ontologies. 2. Tools to manipulate and manage existing tools. How to create and manage ontologies in order to make them reusable is far from being easy. This is why ontology libraries are important. An ontology library makes it easy to re-organize, group ontologies and merge them together so that they can be reused, managed and integrated with existing systems. In order to support ontology reuse, a system must support the following properties: Ontology reuse by identification, versioning and open storage to enable access to ontologies. Ontology reuse by providing support for specific task oriented fields to easily adapt the stored ontologies. 33

45 Ontology reuse by constructing ontologies which fully supports the standards available: Providing access to high level ontologies and standard representation languages is an important issue when reuse is going to be provided to its full potential. Some examples of existing ontology library systems are: WebOnto [74] [84], Ontolingua [75] [85], DAML Ontology library system [76], SHOE [26] [86], Ontology Server [77], IEEE Standard Upper Ontology [78], Sesame [79], OntoServer [80], and ONIONS [81]. ONIONS has been implemented in several medical ontology library systems [83]. It is a methodology to enable integration of existing ontologies. Comparisons together with detailed description for these library systems can be found in article written by Ding & Fensel [82]. 4.1 Ontology Editors Most of the existing ontology editors [46] are sufficiently general purpose to allow the construction of ontologies targeting as specific domain. Some of these tools lack useful ontology export capabilities because they make use of an object-oriented specification language to model information in a domain. Currently independent tools to convert different specifications such as UML and DAML+OIL are being developed. Tools for ontology design and management: Today, there are more than 90 tools available for ontology development from both non-commercial organizations and commercial software vendors [47] [87]. Most of them are tools for designing and editing ontology files. Some of them may provide certain capabilities for analyzing, modifying, and maintaining ontologies over time, in addition to the editing capabilities. One of the more popular editing tools is Protégé, developed by the Stanford University School of Medicine [88]. Other tools are SemTalk [89], OilEd [90], Unicorn [91], Jena [92], and Snobase [93], to name a few. Some of the different tools available can be integrated to each other enabling a more complete development environment. For example the ontology editor Protégé 34

46 can communicate with the Inference engine to make reasoning and consistency checking on the ontology being built. Please refer to the detailed survey on different ontology editors that is provided in Appendix A. Protégé: Protégé is a free, open-source, integrated and platform-independent system for development and maintenance of ontologies [19] [50]. Currently, it is the version 3.0 of the tool, and was developed by Stanford Medical Informatics. Protégé has a frame-based knowledge model, which is completely compatible with OKBC (The Open Knowledge-Base Connectivity protocol) enabling interoperability with other knowledge-representation systems. Protégé enables a development environment supported by a number of third party plug-in, targeted to the specific needs of specific knowledge domains. It is also an ontology development platform which can easily be extended to include various graphical components such as graphs and tables, media such as sound, images, and video, and various storage formats such as OWL, RDF, XML, and HTML. Ontolingua: The Ontolingua system [46] [75] provides users with the ability to manage, share and reuse different ontologies stored on an remote ontology server. The system has been developed at the Knowledge Systems Laboratory at Stanford University in the early 90s [19]. Ontolingua supports a wide range of translations while most ontology editors only support a limited range of translations. It can easily import and export constructed ontologies with the newer languages like DAML+OIL and OWL. WebOnto: WebOnto [74] can manage ontologies constructed in OCML. It is a Web-based tool for browsing, editing and managing ontologies constructed with OCML. WebOnto has been developed at the Knowledge Media Institute, at the Open University as part of several European research projects in the late 90s [19]. It is basically a Java based client application connected to a specific Web server having access to ontologies constructed with OCML. WebODE: WebODE is a workbench for managing ontologies on the Web. It has been developed by The Ontology and Knowledge Reuse Group, at the Technical University of Madrid [19]. It is built up based on three-tier architecture: the user 35

47 interface, the application server and the database management system. The main elements of the WebODE knowledge model are: concepts, groups of concepts, relations, constants and instances of specific definitions. OntoEdit: OntoEdit is developed by the Knowledge Management Group of the University of Karlsruhe [19]. It is an ontology design and management tool whose knowledge model is related to frame-based languages, and it supports multilingual development. OilEd: OilEd [90] is a development environment for ontologies constructed with the OIL and DAML+OI languages. It can be integrated with a reasoner (FaCT) and can extend the expressiveness of frame based tools. OilEd is a simple tool to make demonstrations and ignores services and flexibility of ontologies. 4.2 Ontology Management System An ontology management system for ontologies is similar to a database management system for relational databases [47]. A DBMS allows an application to access data stored in a database via a standard interface. The techniques for storing and structuring the data is left to the DBMS itself so that the application does not have to consider these issues. The DBMS system allows the application to access data stored in the database with a query language (SQL) taking care of all the things related to data storage, indexing of data and data file management. An ontology management system allows access to ontologies in a similar way that a DBMS does. The application making queries on an ontology through an ontology management system does not have to worry about how the underlying processes relate to data storage, and how structuring of data is done. Ontology editing capabilities are not the central parts of an ontology management system, however some systems may provide capabilities to edit ontologies programmatically through a programming interface. In the case that such editing capabilities are not provided, developers can choose to use some graphical editing environments such as Protégé. 36

48 Snobase (Semantic Network Ontology Base) Ontology Management System Snobase [47] [93] is an ontology management system providing capabilities to loading files remotely or through any URL (Unifrom Resource Locator) for files stored on some Web server somewhere in the world. It is possible to create, modify and store locally created ontologies. With Snobase, it is possible to run queries against the loaded ontology through a well defined programming interface. Applications are allowed to access ontologies through standard ontology languages such as RDF, DAML+OIL, and OWL. The system makes use of a persistent storage for ontologies, built-in inference engine, a local ontology directory and source connectors to application programs. Snobase is a Java package providing similar capabilities as the JDBC (Java Data Base Connectivity) and returns query results similar to the results set returned from queries made against a relational database. Snobase currently supports a variant of OWL Query Language (OWL-QL) [94] when making queries against an ontology model loaded into the persistent storage of Snobase. OWL-QL is an ontological equivalent of SQL for the Snobase ontology management system. Jena Semantic Web Framework Jena [92] is a Java framework for building Semantic Web applications programmatically. Jena provides a programmatic environment for RDF, RDFS and OWL ontologies, including a rule-based inference engine. Given an ontology and a model, Jena's inference engine can make reasoning so that additional statements that the model does not express explicitly can be derived. Jena provides several Reasoner types to work with different types of ontologies. Some important capabilities of Jena are listed below: Provides a RDF Application Programming Interface (API). Reading and writing RDF in RDF/XML, N3 and N-Triples. Provides an OWL API. Provides both in-memory and persistent storage ontology models. Provides support for RDQL a query language for RDF. 37

49 Jena is an open source project and its development has started at the HP Labs Semantic Web Program. 4.3 Ontology Query Languages A query can be thought of as an assertion or some restrictive statement whose result to be returned [7]. RDF at the logic level, is enough to express such assertions. However, in practice a query engine has specific algorithms and indeces available with which to work, and can therefore answer specific sorts of query. However, the implemented query engines can have their specific algorithms and ways of doing the underlying things and can therefore only respond to specific queries. It is possible to develop a query language in either of the following ways: allowing query types to be expressed succinctly with less complicated algorithms mathematically, or allowing certain constrained queries to be expressed with certain computability properties. For example, SQL is a query language which has both of the above properties. It is important that a query language targeted for ontologies can be defined in terms of RDF logic. For example to query against an ontology, the assertion could have the form "x is the author of p1" for some x. To ask for a list of all authors, it should be asserted that all the members of the matching set should be authors and that all the authors should be in the set, etc. In practice, the different algorithms and mathematical foundations behind the different search engines and the different algorithms in local logical systems suggests that there exists different forms of query agents that will be capable to provide results to different form of queries. A useful step could be to restrict the queries in some common sense so that specifications for query engines and languages could be defined out of them. The experience gained from different query languages currently being used will enable such common specification so that it will be possible to chain different search engines together and make them inference through some intermediate query engines. 38

50 OWL-QL (OWL Query Language) OWL-QL [47] [94] is a query language and protocol supporting agent-to-agent query-answering dialogues using knowledge represented in OWL language. The semantic relationship is exactly specified among a query, a query answer and the included ontologies to produce query answer. It also provides dialog with the query engine so that the query engine can use automated reasoning methods to derive answers to queries. The query engine could need some extra information from the querying agent in order to produce the answer. So a dialog could exist between the two parts in order to produce the answer to a query. This is why OWL-QL has the properties as a protocol. In this setting, the set of answers to a query may be of unpredictable size and may require an unpredictable amount of time to compute since the domain is not totally restricted because multiple knowledge bases can be involved in the dialog between query agents. The following quote is from the OWL- QL specification; an OWL-QL query contains a query pattern that is a collection of OWL sentences in which some literals and/or URI-refs have been replaced by variables. A query answer provides bindings of terms to some of these variables such that the conjunction of the answer sentences produced by applying the bindings to the query pattern and considering the remaining variables in the query pattern to be existentially quantified is entailed by a knowledge base (KB) called the answer KB. OWL-QL is relatively simple and expressive. To make a query, a querying agent can simply describe what is being searched for, indicating the variables and their matching concepts for the answer queried. The advantage of the OWL-QL query language is that the underlying mechanism is easily adaptable for different ontology representation languages. RDQL (RDF Data Query Language) RDQL [95] was developed by HP and submitted to W3C for a possible recommendation of the query language. It is an implementation of an SQL-like query language designed for RDF. RDQL treats RDF as data and provides querying with some constraints on the triple patterns exposed by the RDF model. The purpose of 39

51 RDF is to be used at a higher level than the RDF API itself. RDF queries provide a way such that query statements can be written in a more declarative and intuitive way for the answers, expected from the query answering system. RQL (A declarative query language for RDF) RQL [10] is a typed query language that relys on a functional approach. It is defined by a set of basic queries and iterators that can be used to build new ones through functional composition. In addition, RQL supports generalized path expressions, featuring variables on labels for both nodes (i.e., classes) and edges (i.e., properties). The smooth combination of RQL schema and data path expressions is a key speciality for satisfying the needs of several Semantic Web applications such as Knowledge Portals and e-marketplaces. RQL is a typed query language which based on a functional approach. An RDF query is defined by a set of queries which can be combined in some functional manner to construct new queries. It supports generalized path expressions featuring variables for some labels, classes and edges. The combination of such expressions and RQL Schema is a key feature satisfying the needs of Semantic Web applications. The online documentation presents the complete RQL syntax, formal semantics and type inference rules of the RQL language. 40

52 CHAPTER 5 DESIGN OF THE SEMANTIC WEB APPLICATION: ONTOLOGY-DRIVEN RECIPE QUERYING This chapter will go through the design and specifications of the Semantic Web application project implemented for this thesis. Specifications regarding choice of domain, services, user facilities etc. will be discussed in detail in this chapter. As the internet has been in a period of rapid growth, the need of applications making use of machine and human consumable data has come on to the scene as a promising candidate for a new breakthrough in information presentation and processing. Various kind of technologies have been developed along with different standards and techniques proposed by different communities making use of the Web. The Semantic Web project is moving forward in becoming an important actor in the mainstream. Applications, tools, Semantic Web languages are constantly being developed creating a solid background for future Semantic Web developments and a valuable pool of experience are gained from the effort spent on these developments. The purpose of this thesis project is to explore the potential advantages of Semantic Web ontologies and to demonstrate how different technologies can be combined to create applications primarily based on ontologies. Different technologies directly related to Semantic Web have been used along with other technologies which are more general Web targeted technologies not directly related to Web ontologies and the Semantic Web as a whole. The choice of the different technologies; tools, ontology language, programming platform etc. will be discussed in detail at the end of this chapter. 41

53 5.1 Overview of the System As explained in the previous chapters, there are several benefits of Web ontologies in the Semantic Web context. The developed Web application for this thesis project is based on making use of Semantic Web technologies to show the benefits of such technologies. The overall structure of the system is illustrated in the Figure-1 below. Figure - 1: Structure of the System The application is mainly a Web-based interface for accessing and querying content stored in an OWL ontology which can be located on the local system or on any Web server located somewhere on the Web. The targeted content domain is food 42

54 recipes whose data/information has been collected from various Web sites giving access to food recipes. The ontology being processed is not only intended to be used to retrieve data for various food recipes but also to structure the general view and behavior of the Web interface such as categorizing the recipes and displaying them under a navigation menu. All the information and Web content presented at the front end of the application is extracted from the OWL ontology. The actual ontology processing task is done by a different OWL server implemented for this project. The Web interface retrieves the necessary data from this server through a TCP connection. Loading and creating a model from OWL ontology, recipe category extraction, recipe querying and recipe content extraction is all being done by the OWL server working in the background of the application. As mentioned before, the Web interface is a separate module interacting with OWL server only to accept user input and present data returned from the server. The OWL server implements the processing tasks by making use of the Ontology Management System Snobase. The server creates a persistent model of the ontology and loads it into the system memory for fast access. For constructing and creating the OWL ontology, a separate ontology editor has been used. The ontology constructing and managing part have been kept external for the project implementation because of the powerful editors already available today. As functionality the Web interface provides an intuitive and easy to use interface allowing users to browse through the recipes and providing search capabilities so the users can easily find a food recipe based on a certain specification. 5.2 System Domain As the information domain, food recipes have been selected for the project because of its various interesting properties. Information on food recipes are widely being presented on the Web on a large number of Web sites. This makes it easy to 43

55 find information related to recipes and use these data when constructing the ontology. Food recipes provide a useful context and structure to create a system relying on structured data. It is easy to classify the existing data and present this classification with an ontology. Concepts such as classes, subclasses, properties and relations can easily be applied and demonstrated within this domain. The poor quality of the existing food recipe Web sites and portals was also one of the main reasons in choosing this domain for the thesis application. The lack of easy content access in finding relevant information is one of the general problems in this domain. Such problems have been kept in mind when developing this application. The constructed ontology contains a large number of food recipes in which data has been collected from the currently existing Web sites publishing food recipes. Because of the common information structure of the recipes it has been easy to create a common format and structure to store these recipes in the ontology file that has been constructed. Properties such as cooking time, preparation time, preparation recipe and vegetarian information etc. are all common to food recipes. In addition, the different ingredients used to prepare food are all common to the domain, allowing them to be reusable definitions instead of being distinct to each and every recipe. Some of the Web sites used to obtain information on published food recipes are listed below:

56 All of the Web sites in the list above, together with many other Web sites not mentioned here have no powerful features to extract the relevant recipe for users. In general, they all lack in providing search facilities for the user so that they do not have to browse all the recipes in order to find the desired one. It has not been possible to find Web sites which have published their recipe contents in any form of an ontology. All the available data was marked up with HTML (Hyper Text Markup Language) which makes it almost impossible for other systems to access the data and make use of it effectively. When providing data for the ontology constructed for the application, no automated process could have been used. Extracting information from such sites has been done by copying and pasting from the page where the recipe information was presented visually. It is out of the scope of this thesis project to develop an automated system to extract information from the HTML markup the recipe information is being presented in. Besides, it would most probably be an unsuccessful project and waste of time and effort to create such an automated system since the markup used by the different sites is not identical, so a general pattern cannot be constructed to extract the necessary information and store them on ontologies. This is a different subject and has no direct relation to Semantic Web. Before designing the system, a detailed investigation of the different Web sites has been performed. The main focus was on the capabilities related on how different users can access the relevant content as fast as possible. Not one of the mentioned Web sites have advanced search capabilities except for some of them which allows recipe search based on the ingredients. Most of the sites only make it possible to view the recipes in different categories so that the users can only browse and search for them manually. Some of the sites provide search capabilities only based on keyword search where only the recipe titles are being used for keyword matching. 45

57 The content stored at various Web sites is not structured in a way so that they are accessible for other systems. That is, the content is not reusable and is just waiting for users which have plenty of time and passion to seek for it. The only way of retrieving the stored information is by manually copying and pasting the information so that it can be used for other purposes. Web sites providing search capabilities with respect to the ingredients of the recipe being searched still is not powerful enough to return the user with the most relevant recipe as possible. A recipe can be classified in various ways and have different properties and relations. When performing a search based on the ingredient, none of the mentioned notions are being considered, although they provide useful information to retrieve the exact information desired. For example a search mechanism for food recipes could consider different properties such as the preparation time given for some recipe. Whether a food is vegetarian or not could also be a useful criteria when performing a search. Other common criterions could be mentioned such as food category, level of difficulty, origin of country/region etc. However, none of these properties have been used in the Web sites visited. As mentioned above, the only different search mechanism other than direct keyword matching is based on the ingredient which has only been implemented in a very few number of Web sites. 5.3 System Specifications Storage and representation of information The domain information is represented with an ontology. All the data related to food recipes including classifications, properties and relations are all being stored in the ontology file. Ontology language The ontology language used to construct the ontology was specified as OWL because it is currently the most powerful ontology language with a greater representational power compared to other ontology constructing language. The OWL sublanguage has been specified as OWL DL because some of the more advanced 46

58 class related constructs such as subclassof and disjointwith was used in constructing the ontology. Ontology processing Ontology processing is being performed on a separate server application making use of the Ontology Management System Snobase. Ontology processing is implemented by making use of the API provided by Snobase and some classes implemented in order to create communication with the Web client. Web interface The Web interface is a Web-based application that handles the visual presentation of the recipe contents and provides navigation through the different categories of food recipes. It provides an easy to use search interface allowing the users to construct queries with different criteria. Application development platform The OWL server responsible of ontology processing has been developed using the popular object oriented programming language Java. The Snobase API is provided as Java package. The Web interface is separated from server application itself although it could have been developed using the same platform. The Web interface is implemented using the widely used server side scripting language PHP which can be installed and run on any Web server with the proper configuration. 5.4 System Design The Ontology-driven recipe querying application developed for this thesis is built up in three main parts; the OWL ontology, OWL server and the Web interface to interact with the system. The ontology constructed is the only information resource used for the application. All data such as the text representing the recipe, recipe category names etc. are stored in the ontology file constructed. Even the link names appearing on the menu displayed on the Web interface are stored and retrieved from the ontology file as shown in the Figure-1. 47

59 The OWL server is a Java based client-server application program which acts as a bridge between the constructed ontology and the Web interface. Taking a given URL as parameter, it will load the Ontology file from any location accessible from the Web and create an internal model enabling queries to be executed on it. The server will create a persistent model of the loaded ontology on a local directory so that it does not have to rely on the network connection during execution after the ontology has been loaded. After loading the OWL ontology, the server will be ready to accept requests from the Web interface. The communication between server and Web interface is based on a simple ad-hoc protocol enabling the two parts to exchange information. Whenever the server receives a request, it will validate the format of the request, perform the requested task and send back the results to the requesting Web interface. Mainly four types of requests can be made from the Web interface. Request for recipe categories. Request for all recipes under a specific category given as a resource id for the recipe. Request for a food recipe given a specific resource id for the recipe. Perform a search given some query. The Web interface is a PHP (PHP Hypertext Preprocessor) application which only performs the user interaction with the OWL ontology server/model. The Web interface does not deal with any data processing other that making requests to the OWL server and presenting the responses in an HTML formatted page. It is responsible of accepting user input, creating a request message and sending it to the OWL server. When the server returns the corresponding response, the Web interface simply displays it to the user. The different parts of the interface such as the category based navigation menu, the available selective ingredients and the food recipes being displayed are all retrieved from the OWL server dynamically on each page request. Whenever the category structure and the content of the ontology file has been changed and modified, all the changes are reflected at the Web interface and are made available to the user without making any modification to the code for the PHP Web application. 48

60 The communications between the different parts of the system are based on commonly used network communication techniques. The server fetches the ontology stored on some Web server using the HTTP protocol whereas the communication between the OWL server and Web interface is a TCP/IP socket connection implementing an ad-hoc communication protocol. All the three parts of the system can be located remotely or on any machine having access to the Internet OWL Ontology Design The Web ontology file has been constructed so that it will reflect the food recipe domain in detail. However, the ontology has not been constructed too fine grained, that it will be difficult to provide specific information for all of the details being built into the ontology. For example, information such as origin of country and region has been discarded. The ontology constructed makes advantage of concepts such as class hierarchies, class relations and properties. In order to model the information domain in a realistic manner, proper information description methods have been used together with the powerful logic descriptors provided by OWL-DL and OWL-Full. The following part will explain the different classes that have been defined together with the different relations and properties. Please refer to the Appendix D for some part of the Class Hierarchy for foodreceipts Project which is automatically generated as an html file. OWL Classes: Food The Food class is the super class of all the defined classes for specific food types such Çorba and Kebap etc. It is the base class for all the food instances defined in the ontology. It is defined as disjoint from other classes other than those inheriting from it. 49

61 Ingredients The Ingredients class (<owl: Class rdf: ID="Ingredients">) is the super class for ingredient classes defined for each type of ingredient. Every specific ingredient class will inherit from this class. It is defined as disjoint with other classes except fro those classes directly inheriting from it. Difficulty Level The DifficultyLevel class is an enumerated class with three different instances defined. These are Easy, Normal and Difficult. This class is also disjoint with other classes defined in the domain. Preparation Time The PreparationTime Class is an enumerated class with 17 different pre-defined instances representing minutes from 10 minutes to 90 minutes with proper intervals. Instances of this class will be used to define the preparation time taking to prepare food. Properties and Relations: All properties and relations are shown in Figure-2 including domain and range relationships, and explained one by one in the following paragraphs. 50

62 Figure - 2: Properties and Relations of the System HasCalory The HasCalory datatype property (owl: DatatypeProperty) is a relation between all instances of Class Food and some string representing the calorie value of some food instance. HasDifficultyLevel The HasDifficultyLevel object property (owl: ObjectProperty) is a relation between all instances of class food and an instance from the enumerated class DifficultyLevel. 51

63 HasIngredient The HasIngredient object property (owl: ObjectProperty) is a relation between instances of type Food and instances of type Ingredient. This property and Food instance can be in several relations with different instances of type Ingredient. HasPreparationTime The HasPreparationTime object property is a relation between instances of type Food and instances of type PreparationTime. This property is used to attach a PreparationTime instance to the food instance. HasWebSourceURL The HasWebSourceURL is a datatype property (owl: DatatypeProperty) relation between instances of Class Food and some literal representing the URL of the Web source the recipe data has been retrieved from. IsIngredientOf The IsIngredientOf object property is defined as the inverse (owl: inverseof) of the object property HasIngredient. The domain range relationship is the reverse of that of HasIngredient. HasReceipt This is the datatype property which binds a recipe to an instance of type Food. The range type is a string (XMLLiteral) being a light HTML formatted text representing the recipe of a particular food instance. 5.5 Technical Specification Protégé Ontology Editor Protégé is developed by Stanford Medical Informatics. It has been selected as the ontology editor for the thesis because it is suitable to the project in many ways. It is a widely used free ontology editor especially for constructing and maintaining OWL ontologies. OWL is different than other ontology languages in that it supports a richer set of operators such as AND, OR, and Negation. Protégé supports all the advanced properties of OWL language and provides all the functionality to maintain OWL ontologies. Because of its wide popularity, it was easy to obtain support for it. 52

64 Plenty of tutorials, documentation and support forums are available for the editor. The functionality of Protégé can be extended by various plugins available online on the home page for Protégé. In addition, different wizards to ease the work are provided. The internal logical model allows it to interact with external reasoning services such as Racer, for example to compute inferred types and make consistency checking on the ontology being constructed. Please refer to the Appendix A for detailed survey on different ontology editors. Racer (Renamed ABox and Concept Expression Reasoner) Racer is a Semantic Web inference engine for Web ontologies. It currently supports a wide range of inference services about ontologies specified in the ontology language OWL. The services provided are made available through a network based API so that different agents can make use of it. For example Protégé can interact with the reasoner and use its reasoning services. For the thesis, Racer has been used to check the consistency of the developed ontology and to make use of its reasoning services to compute derived types etc. For example in the constructed ontology, a class to compute the non-vegetarian food instances named NonVegetarianFood has been created. In its type definition it has been asserted that the instances of this class contain some ingredients of the type Meat. The reasoning service of Racer can compute all the Food instances having ingredients of type Meat even though these instances have not directly been created with the type NonVegetarianFood. 53

65 CHAPTER 6 IMPLEMENTATION The application mainly consists of three parts; the OWL ontology, OWL query server and Web interface. All of them are implemented with different technologies and different development platforms. The OWL query server is a Java application implementing the ontology processing API provided by the Snobase ontology management system. The server loads a given ontology and creates an internal model so that it can be processed. It provides a network interface allowing client systems to connect and pass requests to the server through a simple protocol. The server processes the query, makes necessary queries on the loaded ontology model and returns the answer back to the client system through the same network interface. In this case, the client system is the Web interface. The client Web interface provides functionality to send requests to the OWL server and to display the results in a human consumable style. The requests sent to the OWL server are different depending on the information needed from the ontology. A user may construct a query in order to search for some food recipe or may click on a particular link in order to view list of recipes listed under a category, etc. The form of the requests sent to the server depends on such different functionalities provided by the interface. The OWL ontology described in the previous chapter has been developed by making use of the Protégé ontology editor (Version 3) with the assistance of Racer to check the consistency all along the development procedure. The detailed steps in constructing the ontology are not provided in this chapter since the procedure is straightforward and simple by using Protégé editor. However, some screenshots will 54

66 be provided of the editor environment giving a visual view of the ontology design discussed in chapter OWL Query Server As mentioned above, the server is a Java application providing a network interface to accept requests from the Web interface. The network interface is implemented by simple TCP socket connections. The main process of the server listens to incoming requests and passes it to a separate thread which will further process it, create a response and send the response back to the client process. Each request received from the Web client is handled by a separate thread (RequestHandler). Depending on the type of request, the RequestHandler thread creates a response and sends it back to the client. The server responds differently for different types of requests made by the client process. A simple protocol allows the proper communications between the client and server. The simple request headers are as follows. GET_MENU: When the server receives a request with this header, it performs a query against the loaded ontology model and retrieves all the existing food categories. That is, it retrieves all the subclasses of the class Food and returns it back to the client. This request is made by the client interface in order to create the navigation menu for the different food categories available. GET_CATEGORY: This is the request header for the request containing a food category ID e.g. Çorbalar. The server makes a query against the ontology model and retrieves all the instances under the given category. (Each sub class of class Food is considered a food category.) GET_FOOD: This request header is used to retrieve a particular food from the loaded ontology. When attached with some food instance id, the server retrieves the information of the food with the given id and returns it back to the client. GET_INGREDIENTS: When a request with this header is made, the server returns all instances of class Ingredient and returns it back to the client. The 55

67 client uses this information to create a suitable search environment for the user where the user can make a selection among the different ingredients to create his/her search query based on ingredients. GET_RESULT: This request header is attached with a query string containing the query constructed by the user. The server processes this string, makes a search and returns the results back to the client. Details on how the internal process works will be discussed in the following section where each class is explained in detail. The server is made up of three classes. OWLQueryServer, RequestHandler and FoodEntry. Class: OWLQueryServer This is the main class of the server. OWLQueryServer starts with loading the OWL ontology file from a Web server with a given URL and creates an OWL model. The OWL model API is provided by Snobase making it possible to represent the ontology within the system memory. This API enables to run RDQL queries against the model. Initially, after loading the model, a query is executed against the model to retrieve all the food instances represented in the OWL model. These instances are stored in a Hashtable as FoodEntry objects. This hash table is the internal cache storing all the available food entries, which will be used to fast access the information for some food instance. Instead of executing a query against the model each time some information is needed for some food instance, it is much easier to refer to it from the hash table containing all the instances. And it will also be easier to make search operations when having all the food instances as a collection. After retrieving the food instances, a similar work will be performed for the ingredient instances represented within the OWL model. Instead of creating a hash table, this time a string containing all the ingredients is created. This string is used by the Web interface when creating a list of ingredients where the user can select among them to construct a query based on ingredients. Instead of making queries on each 56

68 client request, it will be much faster to create this string beforehand and just send it to the client whenever the client makes a request for the ingredients available. Finally the main process of OWLQueryServer creates a server socket and waits for incoming client request. Whenever it receives a request it hands over the socket connection to a RequestHandler object and continues its loop, waiting for new requests. The RequestHandler reads in the request string, processes it and returns a response to the client and the client finishes its job. Each RequestHandler object is created instantly for each request and finishes its execution after handling the request and sending back the created response. The UML diagram of OWLQueryServer is illustrated in Figure-3. Figure - 3: OWLQueryServer UML diagram Class: RequestHandler The RequestHandler class is where the actual processing is being done. It processes a single request and finishes its execution. Depending on the request received it invokes a proper method to generate the response and sends it back to the client via the socket connection. For example whenever it receives a request as the following GET_FOOD{food}foodInstanceID{food}, it retrieves the food instance with id foodinstanceid of the previously constructed hash table and sends the information back to the client. 57

69 For some requests it generates a query and executes it against the OWL model previously created by the main server class OWLQueryServer. For example, when a request is made for the names of the different food instances under a specific category, an RDF query is created with parameters provided by the request string and executed against the model. The returned response contains the labels for each instance instead of the pure owl id. This is because the label attached to each instance is intended for humans. For example, instead of returning #somefood it will return the label which probably has the form Some Food which is more suitable to present to a human user instead of the raw owl id. As an example, there is a sample query below which returns all the elements of type Food; Sample query: // Variable definition RDFVariable X1 = model.createrdfvariable("?x1"); // Query statement RDFStatement querystatement = model.createrdfstatement( X1, "", "http://localhost/localontologies/foodreceipts.owl#food"); // Executing the query against the OWL model RDFResultSet resultset =; This query will return all the elements (matching X1) of type Food. For more queries, please refer to Appendix B in order to view some sample queries (Java code) illustrating how they are constructed and executed against the OWL model. The most important request the RequestHandler is dealing with is when search requests are made. A search can be made depending on different criterions together with some specified ingredients. Criterions such as preparation time, vegetarian, calorie amount etc. are all optional elements of a query that can be constructed by a user through the search interface provided by the Web client. Each request for a recipe search must contain a set of specified ingredients. 58

70 The different elements/restrictions of a recipe search that can be constructed by a user through the Web interface are listed below. The ingredients the recipes searched for should contain. Minimum value for food calorie. Maximum value for food calorie. Minimum value for preparation time. Maximum value for preparation time. Whether the food is vegetarian or not. A specific category (sub class of Food). And the difficulty level of the recipe being searched for. When a search operation starts the RequestHandler performs an iterative operation on the previously constructed Hashtable (searchtable) storing all the food instances as FoodEntry objects. The search operation is being done in several steps in order to return back the most relevant results to the user requesting for recipes. Initially when a request is received, the RequestHandler simply selects all the food entries matching all the criterions specified. First of all, it starts with the ingredients criteria and creates a set of food entries containing those ingredients. Later it will move on to other restrictions and will perform a selective iteration on the previously created set containing the specified ingredients. This operation continues until all the specified restriction has been checked. Finally, when this procedure is finished, the set of food entries (or recipes) meeting all the criterions will be sent back to the client. In the cases that the above procedure results in an empty set, a different procedure will be started. When there cannot be found any suitable recipes in the first procedure, which is mentioned in the RequestHandler does not return a NO_RESULT response, it instead performs a new search for different combinations of the given set of ingredients. Initially an ingredient element is removed from the specified set and afterwards a search procedure identical to the above procedure is made. The recipes matching the 59

71 ingredients are selected and then a similar selective operation with respect to the other restrictions will be performed. This operation where an ingredient is removed will be done for all possible removals of different ingredients. If the resulting set contains some elements it will be sent back to the client side. In cases that removal of a single ingredient does not return any result, the procedure above will be continued again. However, this time two ingredients will be removed from the ingredients set. And the search will be performed for all different ingredient sets resulting in removal of two ingredients. The operation resulting in maximum number of recipes are selected and sent back to the client. When none of the procedures above returns any results, the restrictions other than the ingredients are removed. That is, all the above mentioned procedures are repeated without considering the extra restrictions such as preparation time, calorie amount etc. The different search procedures explained above return the most relevant result to the user requesting for recipes through the search interface. Instead of viewing a Number of results: 0 for some query being made, the user will most probably receive a result relevant to the original query he/she has made. A proper message is displayed of the search performed when viewing the results. The user is informed of what exactly has been done behind the scenes. The UML diagram of class RequestHandler is illustrated in Figure-4. 60

72 Figure - 4: RequestHandler UML diagram Class: FoodEntry The FoodEntry class is defined so that it can represent a food instance with all its data such id, label, preparation time, calorie value etc. Instances of this class are used to create a collection of OWL food instances for easy manipulation. It has been mentioned before that the main process in OWLQueryServer initially creates a Hashtable (searchtable) containing all the food instances represented in the OWL model. Each of these food instances are stored as FoodEntry objects in the Hashtable (searchtable). The FoodEntry class implements some methods that will make it easy to make use of the data for each food instance. Proper getter and setter methods are implemented together with some extra methods making it easy to use these objects when performing a search. For example one such method is containsingredients (ingredients). This method checks if the given ingredients exist in the food recipe of the food instance which containsingredients method has been called. The UML diagram of class FoodEntry is illustrated in Figure-5. 61

73 Figure - 5: FoodEntry UML diagram 6-2 Web Interface The Web interface is a PHP Web application providing a HTML user interface for the user. It makes use of a network interface relying on socket connections when communicating with the server OWLQueryServer. The client creates different requests to send to the server and processes the responses differently. Some of the requests are made in order to create the navigation menu providing navigation through the different classes/categories of food instances represented in the OWL ontology. Such a request string looks like GET_MENU. The response to such a request will be a string with category OWL id s appended with the instance id s belonging to the specific category/food class. 62

74 Whenever the user navigates through the different categories, the Web interface makes proper request to the server in order to populate the page with the right content requested by the user. For example, when the user selects a link for a food instance among the listed food instances under a specific category, the interface will immediately make a request for that food entry and view the response sent back by the OWLQueryServer. The most important part of the interface is the search page where different ingredients can be selected/added together with the different restrictions to the query being constructed. The user interface provides a search pane where the user can directly enter the ingredients and enter values to the optional extra restrictions such as calorie amount, preparation time, difficulty level etc. as shown in Figure-6. Additionally, two different search interfaces which provide the user to type ingredients by himself/herself or selecting the ingredients from a categorized box and some example searches are shown with the screen shots in the following figures. Figure - 6: Search Interface 1 63

75 Figure - 7: Sample Search (A) Figure - 8: Sample Search (B) 64

76 Figure - 9: Sample Search (C) Figure - 10: Sample Search (D) 65

77 When a search is done, if the user clicks on one of the results e.g. Körili Pilav, the ingredients and recipe of it can be viewed as shown in Figure-11. Figure - 11: Ingredients and Recipe of Körili Pilav 66

78 When the user clicks on the link which is indicated as Kaynak at the bottom of the each recipe, it is possible to view the source of that recipe in a new web page as illustrated in the Figure-12. Figure - 12: Source of Körili Pilav 67

79 The other search interface, as shown in Figure-13, where users can select from among the ingredients is defined in the OWL ontology file. Figure - 13: Search Interface 2 68

80 The other search interface also provides functionality to make selections among the ingredients defined in the OWL model itself when creating his/her set of ingredients that the search should be based on. The user can select the ingredients from an easy to use list where the ingredients are listed and view the selected ingredients in a different list to prepare for the querying. The Figure-14 illustrates the search mode users can select the ingredients from among the ingredients defined in the owl ontology. Please refer to Appendix - C for a portion of the model that is automatically generated from the source code of the application. Figure - 14: Selecting Ingredients from Categorized Box 69

81 Figure - 15: Sample Search (E) 70

82 Figure - 16: Sample Search (F) 71

83 Figure - 17: Sample Search (G) 72

84 When the user clicks on the Pilavlar link from the left side of the window, the list of Pilavlar can be viewed as shown in Figure-18. Figure - 18: Category of Pilavlar 73

85 A help interface is also available to the user when the help button is clicked as illustrated in Figure-19. Figure - 19: Help Interface of the System In order to make comparison, some searches are done with three popular search engines and following results have been achieved. When a search is done with the ingredients of pirinç soğan tavuk, Google returns 6,020 results, Yahoo returns 14,600 results and Altavista returns 14,500 results. And when a search is done with the ingredients of pirinç soğan tavuk patates bezelye domates, Google returns 1,150 results, Yahoo returns 3,490 results and Altavista returns 3,500 results. For this reason, it is a very hard task and takes too much time to work with these huge amount of results. Additionally, when these links have been analyzed, it is found that many of them point to the same website, this is another drawback of the search engines. After these seaches, it is obviously shown that it is difficult to find the effective result with search engines. 74

86 6-3 Implementing the Ontology with Protégé After designing the OWL ontology it was a straightforward process to construct the ontology file with the help of the powerful tools and wizards the Protégé development environment provides. It allows creating and populating classes and concepts by simple clicks while providing a visual overview of the created classes and instances with class hierarchies and instance diagrams. Creating properties and relations (objectproperties, dataproperties) is done similarly by defining the name of the property and specifying the domain-range relation of the constituent elements/definitions. While implementing the ontology, the reasoning services provided by Racer reasoning system has been used continuously to assist and guide the implementation process. It has mostly been used to check ontology consistency while adding constrains and assertions for the different classes and relations being created. The Protégé development environment is illustrated in Figure-20. Figure - 20: Protégé Ontology Editor 75

UNIVERSIDAD DE LAS AMÉRICAS PUEBLA ESCUELA DE INGENIERÍA. Departamento de Ingeniería en Sistemas Computacionales


More information

JCR or RDBMS why, when, how?

JCR or RDBMS why, when, how? JCR or RDBMS why, when, how? Bertil Chapuis 12/31/2008 Creative Commons Attribution 2.5 Switzerland License This paper compares java content repositories (JCR) and relational database management systems

More information



More information

Which Semantic Web? Frank M. Shipman Department of Computer Science Texas A&M University College Station, TX 77843-3112 1 (979) 862-3216

Which Semantic Web? Frank M. Shipman Department of Computer Science Texas A&M University College Station, TX 77843-3112 1 (979) 862-3216 Which Semantic Web? Catherine C. Marshall Microsoft Corporation 1 Microsoft Way Redmond, WA 98052 1 (425) 705-9057 Frank M. Shipman Department of Computer Science Texas A&M University

More information

Stakeholder Relationship Management for Software Projects

Stakeholder Relationship Management for Software Projects Stakeholder Relationship Management for Software Projects BY FRANCESCO MARCONI B.S., Politecnico di Milano, Milan, Italy, 2010 M.S., Politecnico di Milano, Milan, Italy, 2013 THESIS Submitted as partial

More information

THE TRUTH ABOUT TRIPLESTORES The Top 8 Things You Need to Know When Considering a Triplestore

THE TRUTH ABOUT TRIPLESTORES The Top 8 Things You Need to Know When Considering a Triplestore TABLE OF CONTENTS Introduction... 3 The Importance of Triplestores... 4 Why Triplestores... 5 The Top 8 Things You Should Know When Considering a Triplestore... 9 Inferencing... 9 Integration with Text

More information

Semantic Web Methods for Knowledge Management

Semantic Web Methods for Knowledge Management Semantic Web Methods for Knowledge Management Zur Erlangung des akademischen Grades eines Doktors der Wirtschaftswissenschaften (Dr. rer. pol.) an der Fakultät für Wirtschaftswissenschaften der Universität

More information

Recommendations for Interdisciplinary Interoperability (R 3.3.1)

Recommendations for Interdisciplinary Interoperability (R 3.3.1) Recommendations for Interdisciplinary Interoperability (R 3.3.1) Version 15.02.2013 - V. 1.0 Arbeitspaket 3.3 Interdisciplinary Interoperability Verantwortlicher Partner DAI DARIAH-DE Aufbau von Forschungsinfrastrukturen

More information



More information



More information

Integrating Conventional ERP System with Cloud Services

Integrating Conventional ERP System with Cloud Services 1 Integrating Conventional ERP System with Cloud Services From the Perspective of Cloud Service Type Shi Jia Department of Computer and Systems Sciences Degree subject (EMIS) Degree project at the master

More information

Development of a 3D tool for visualization of different software artifacts and their relationships. David Montaño Ramírez

Development of a 3D tool for visualization of different software artifacts and their relationships. David Montaño Ramírez Development of a 3D tool for visualization of different software artifacts and their relationships David Montaño Ramírez Development of a 3D tool for visualization of different software artifacts and their

More information


INTRODUCTION TO THE INTERNET AND WEB PAGE DESIGN INTRODUCTION TO THE INTERNET AND WEB PAGE DESIGN A Project Presented to the Faculty of the Communication Department at Southern Utah University In Partial Fulfillment of the Requirements for the Degree

More information

Extension of a SCA Editor and Deployment-Strategies for Software as a Service Applications

Extension of a SCA Editor and Deployment-Strategies for Software as a Service Applications Institut fur Architektur von Anwendungssystemen Universität Stuttgart Universitätsstraße 38 70569 Stuttgart Diplomarbeit Nr. 2810 Extension of a SCA Editor and Deployment-Strategies for Software as a Service

More information


METADATA STANDARDS AND METADATA REGISTRIES: AN OVERVIEW METADATA STANDARDS AND METADATA REGISTRIES: AN OVERVIEW Bruce E. Bargmeyer, Environmental Protection Agency, and Daniel W. Gillman, Bureau of Labor Statistics Daniel W. Gillman, Bureau of Labor Statistics,

More information

Introduction to SOA with Web Services

Introduction to SOA with Web Services Chapter 1 Introduction to SOA with Web Services Complexity is a fact of life in information technology (IT). Dealing with the complexity while building new applications, replacing existing applications,

More information

Future Internet Roadmap. Deliverable 1.1 Service Web 3.0 Public Roadmap

Future Internet Roadmap. Deliverable 1.1 Service Web 3.0 Public Roadmap Future Internet Roadmap Deliverable 1.1 Service Web 3.0 Public Roadmap Authors: Elena Simperl (UIBK) Ioan Toma (UIBK) John Domingue (OU) Graham Hench (STI) Dave Lambert (OU) Lyndon J B Nixon (STI) Emilia

More information

Configuration Management Models in Commercial Environments

Configuration Management Models in Commercial Environments Technical Report CMU/SEI-91-TR-7 ESD-9-TR-7 Configuration Management Models in Commercial Environments Peter H. Feiler March 1991 Technical Report CMU/SEI-91-TR-7 ESD-91-TR-7 March 1991 Configuration Management

More information

SOA Development and Service Identification. A Case Study on Method Use, Context and Success Factors

SOA Development and Service Identification. A Case Study on Method Use, Context and Success Factors Frankfurt School Working Paper Series No. 189 SOA Development and Service Identification A Case Study on Method Use, Context and Success Factors by René Börner, Matthias Goeken and Fethi Rabhi April 2012

More information

Detecting Inconsistencies in Requirements Engineering

Detecting Inconsistencies in Requirements Engineering Swinburne University of Technology Faculty of Information and Communication Technologies HIT4000 Honours Project A Thesis on Detecting Inconsistencies in Requirements Engineering Tuong Huan Nguyen Abstract

More information

Clinical Terminology: Why is it so hard?

Clinical Terminology: Why is it so hard? Clinical Terminology: Why is it so hard? Alan L. Rector Medical Informatics Group, Department of Computer Science University of Manchester, Manchester M13 9PL Tel: +44-161-275-6188/7183 Fax: +44-161-275-6204

More information

1 Applications of Intelligent Agents

1 Applications of Intelligent Agents 1 Applications of Intelligent Agents N. R. Jennings and M. Wooldridge Queen Mary & Westfield College University of London 1.1 Introduction Intelligent agents are a new paradigm for developing software

More information

Deploying and managing Web services: issues, solutions, and directions

Deploying and managing Web services: issues, solutions, and directions The VLDB Journal DOI 10.1007/s00778-006-0020-3 REGULAR PAPER Deploying and managing Web services: issues, solutions, and directions Qi Yu Xumin Liu Athman Bouguettaya Brahim Medjahed Received: 12 August

More information

Data migration, a practical example from the business world

Data migration, a practical example from the business world Data migration, a practical example from the business world Master of Science Thesis in Software Engineering and Technology LINA RINTAMÄKI Chalmers University of Technology University of Gothenburg Department

More information

Feasibility of Cloud Services for Customer Relationship Management in Small Business

Feasibility of Cloud Services for Customer Relationship Management in Small Business Feasibility of Cloud Services for Customer Relationship Management in Small Business A Drupal Approach towards Software as a Service LAHTI UNIVERSITY OF APPLIED SCIENCES Degree programme in Business Studies

More information

Behind the Help Desk: Evolution of a Knowledge Management System in a Large Organization

Behind the Help Desk: Evolution of a Knowledge Management System in a Large Organization Behind the Help Desk: Evolution of a Knowledge Management System in a Large Organization Christine A. Halverson IBM Research 650 Harry Rd San Jose, CA. 95120, USA Thomas Erickson IBM Research

More information

Bachelor Degree Project Report. Web accessibility A middleware prototype for visually impaired users

Bachelor Degree Project Report. Web accessibility A middleware prototype for visually impaired users Bachelor Degree Project Report Web accessibility A middleware prototype for visually impaired users Author: David SALVADOR ASTALS Supervisor: Dr. Arianit KURTI Co-Supervisor: Bahtijar VOGEL Examiner: Semester:

More information



More information

Lehrstuhl für Datenbanksysteme Fakultät für Informatik Technische Universität München

Lehrstuhl für Datenbanksysteme Fakultät für Informatik Technische Universität München Lehrstuhl für Datenbanksysteme Fakultät für Informatik Technische Universität München Metadata Management and Context-based Personalization in Distributed Information Systems Dipl.-Inf. Univ. Markus Keidl

More information