THE SEMANTIC WEB AND IT`S APPLICATIONS



Similar documents
Towards the Integration of a Research Group Website into the Web of Data

Converging Web-Data and Database Data: Big - and Small Data via Linked Data

SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks

Open Data collection using mobile phones based on CKAN platform

Lightweight Data Integration using the WebComposition Data Grid Service

LiDDM: A Data Mining System for Linked Data

A Semantic web approach for e-learning platforms

DISCOVERING RESUME INFORMATION USING LINKED DATA

Querying DBpedia Using HIVE-QL

Open Data Initiative: Challenges and Opportunities for NSOs of OIC Member Countries

Scope. Cognescent SBI Semantic Business Intelligence

Leveraging existing Web frameworks for a SIOC explorer to browse online social communities

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Cataloguing is riding the waves of change Renate Beilharz Teacher Library and Information Studies Box Hill Institute

The Ontological Approach for SIEM Data Repository

CSCI-UA: Database Design & Web Implementation. Professor Evan Sandhaus sandhaus@cs.nyu.edu evan@nytimes.com

Developing Web 3.0. Nova Spivak & Lew Tucker Tim Boudreau

Semantic Interoperability

How to Publish Linked Data on the Web

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Annotation: An Approach for Building Semantic Web Library

DataOps: Seamless End-to-end Anything-to-RDF Data Integration

Annotea and Semantic Web Supported Collaboration

How To Write A Drupal Rdf Plugin For A Site Administrator To Write An Html Oracle Website In A Blog Post In A Flashdrupal.Org Blog Post

A generic approach for data integration using RDF, OWL and XML

Automatic Timeline Construction For Computer Forensics Purposes

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Towards a reference architecture for Semantic Web applications

Creating an RDF Graph from a Relational Database Using SPARQL

CitationBase: A social tagging management portal for references

Database System Concepts

Structured Content: the Key to Agile. Web Experience Management. Introduction

Chapter 1: Introduction

Lift your data hands on session

Conceptual IT Service Provider Model Ontology

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

New Generation of Social Networks Based on Semantic Web Technologies: the Importance of Social Data Portability

The Ontology and Architecture for an Academic Social Network

Designing a Semantic Repository

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

12 The Semantic Web and RDF

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Towards a Sales Assistant using a Product Knowledge Graph

Towards a Semantic Wiki Wiki Web

Explorer's Guide to the Semantic Web

It s all around the domain ontologies - Ten benefits of a Subject-centric Information Architecture for the future of Social Networking

Serendipity a platform to discover and visualize Open OER Data from OpenCourseWare repositories Abstract Keywords Introduction

We have big data, but we need big knowledge

Integration Platforms Problems and Possibilities *

MarkLogic Server. Reference Application Architecture Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

ELIS Multimedia Lab. Linked Open Data. Sam Coppens MMLab IBBT - UGent

Data Mining in the Swamp

LDIF - Linked Data Integration Framework

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Semantic Web based e-learning System for Sports Domain

Security Issues for the Semantic Web

A Framework for Collaborative Project Planning Using Semantic Web Technology

ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem

Linked Open Government Data Analytics

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

DYNAMIC QUERY FORMS WITH NoSQL

Building a Mobile Applications Knowledge Base for the Linked Data Cloud

HOW TO DO A SMART DATA PROJECT

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Data Modeling in the Age of Big Data

Collaborative Knowledge Construction in a Peer-to-Peer File Sharing Network

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Semantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology

Integrating Open Sources and Relational Data with SPARQL

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Transcription:

15-16 September 2011, BULGARIA 1 Proceedings of the International Conference on Information Technologies (InfoTech-2011) 15-16 September 2011, Bulgaria THE SEMANTIC WEB AND IT`S APPLICATIONS Dimitar Vuldzhev National high-school of mathematics and science, Sofia e-mail(s): vouldjeff@gmail.com Bulgaria Abstract: In this paper, a brief introduction to the concept of the semantic web has been made as well as the idea of using ontologies. The main objective of this project is to implement such an application by relying on the studies conducted. Some of the problems which occurred during the course of realizing the aims are also discussed in this paper. Key words: semantic web, intelligent systems, ontologies, collaborative database 1. INTRODUCTION Have you ever been making a research and having to collect and process an enormous amount of data? At some point you want to be facilitated, you need computer help. The Semantic web (Daconta et al., 2003) is a concept which enables machines to understand the meaning of already existing data in the web. Actually this is nothing but a unified method for saving information using metadata data about the data or we could also call it machine representation. The aim of this project is to present an application of this kind, with data in Bulgarian, which will collect information from various sources and will help us finding just the right data, filtering it, etc. In the course of realization the objectives the following main problems were solved: Detailed study of the existing standards and similar applications; Choosing technologies; Considerations of automated methods for data extraction; Implementing a priority queue for delayed jobs; Adding a reliable system for tracking data changes; Optimization of the db schema and backend.

2 PROCEEDINGS of the International Conference InfoTech-2011 There are a few semantic web application most of them have taken a particular segment of the market, while several other tend to be the Semantic Wikipedia (Auer et al., 2007). Unfortunately, except for that the information there is only in English, they do not offer instruments for playing with the data (unless you are a programmer). 2. PROBLEM DEFINITION In order the Semantic web to become reality it is important a huge amount of data in standardized format to exist. What is more, not only the access to them is needed, but the relation between them, because in that way one resource will lead us to another one. All the interconnected collections of data are called Linked data. Linked data rely on two fundamental technologies in web - URI and HTTP. Although URI is recognized as a web address of a document it`s actual usage is to give a unique identity to every resource. The creator of the WWW Tim Berners-Lee defines linked data (Berners-Lee et al., 2009) as giving the following four rules: Use URI to name things; Use HTTP URI so that people could check those resources; When somebody opens certain URI give useful information using standards; Include other URI so that people could find new things. 2.1. Resource Description Framework The web space, to which we are so used to, consists of interconnected documents. In the semantic web we call thing resources. Shakespeare, Stratford are all examples for resources. That is why the fundamental technology is called Resource Description Framework. RDF is not a complex concept it is just a way for serializing statements. Consider the following example: The author of this project is Dimitar. Every RDF statement consist of three parts subject (this project), object (Dimitar) and predicate (author). Having in it in mind and the rules of Tim Berners-Lee we could build a graph.

15-16 September 2011, BULGARIA 3 Fig. 1: A simple graph, which visualizes the aforementioned statement. Every statement is called an RDF triple and the structure of predicates ontology. It may look to simple, but just because of that RDF is such an important part. All the RDF statements form a graph and the people in the field of computer science may tell us a lot about the efficiency of graphs. 2.2. Ontologies All the predicates which describe certain subject from the real world are called Ontology. Examples for an ontology are Person, Animal, Place, etc. The reason that ontologies are so important is that they define a standard for the predicates` names, because if each one of us calls one predicate however he likes the whole concept for the Semantic web loses its purpose there will be no communication between different systems. 3. PROBLEM SOLUTION 3.1. Architecture The architecture of the proposed application is tree-tier User, Business Logic, Database. For a database management system we use the so called document-orientated database MongoDB. In MongoDB, unlike typical relational databases, which keep data in many tables with relations between them, in document-orientated databases everything for a certain resource is saved in only one document (JSON formatted). Some of the major advantages, which influenced the choice are: document-orientated, without strict schema; support for arrays, hashes and embedded document; support for indexes; availability of so called atomic updates;

4 PROCEEDINGS of the International Conference InfoTech-2011 provides simple, but powerful query language + MapReduce; build-in methods for easy scaling. For developing the application itself we have chosen the programming language Ruby and the framework Rails - very powerful, agile and popular combination. At the core lies the MVC pattern division of the application in three parts model (database stuff), view (the user interface) and controller (business logic). 3.2. Automatic Information Extraction There must be a large amount of data in order the application to be useful. This is not within the reach of a man or at least for a reasonable period of time. The module for information extraction was not an easy task. In order to extract information you need to provide the resource`s name and say whether you want to translate it. Here is the process: 1. The resource is searched in Freebase and the result is loaded. 2. It translation is on, the result is passed to Google Translate. 3. Check for already existing resource with that key is made. If there is, only the new data will be saved. 4. For every ontology in the result a check for existence is made. If no, new one is constructed. 5. Every property from the ontology is processed and filled in the resource. If the ontology is new-made, the property is added to the schema. 6. After completion a flag to the new resource is added. 7. Extra information is extracted from Twitter, IMDb and other. 8. If during the execution of any of the steps an error has occurred, an Exception is thrown. 9. You are now able to view the newly extracted resource! The module also offers a rollback functionality everything which the extractor has made is changed to its previous state. 3.3. Delayed Jobs Operations such as automatic information extraction require more system resources, load the machines and take longer to execute. Therefore their execution during a standard user request is very ineffective and subverts the operation of the system as a whole. Such operations will be called jobs. Jobs have certain parameters and are added to a priority queue. Separate system processes called workers take one

15-16 September 2011, BULGARIA 5 task from the queue and start its execution. After success or failure the result is saved in a log. 3.4. Tracking changes In a system where everybody has the right to edit information it is possible that someone may abuse. And it will be pity the hardly collected data for certain resource to disappear just like that. That is the reason for the implementation of a tracking changes module, which allows reverts to previous versions. There are several know approaches for this task to keep the whole resource after every edition or to keep only the edition itself. Unfortunately both options have their drawbacks the first one takes a lot of system space and the in the second you have to make changes merge in order to read a resource. The approach used in the application is something in the middle. The last version is kept in the database as well as the old versions of only the fields changed. In this way we do not have to make merges while reading and it does not take a lot of space. Example: {title: test, description: description } After editing: {title: test, description: description1 extra: 123} and {description: description, added: [ extra ], version: 1} 4. CONCLUSION In this project a brief introduction into the world of the Semantic web has been made. The second part presents a semantic application. Its architecture is described database, platform. From the Eleventh Students Conference in January 2011 to present days the system has changed a lot. The module for automatic extraction works stable, a system for changes tracking has been implement and a lot of other things. As for future plans the presented application could further develop in some of the following areas: Creating a powerful module for ontology editing; Collection data from other sources; Using different algorithms for manipulating and using the existing information. The author hopes that, having in mind the nature of the problem, the application could provoke interest as well as being useful for the public.

6 PROCEEDINGS of the International Conference InfoTech-2011 REFERENCES Auer S., Bizer C., Kobilarov C., Lehmann J., Cyganiak R., Ives Z. (2007). DBpedia: A Nucleus for a Web of Open Data http://www.informatik.uni-leipzig.de/~auer/publication/dbpedia.pdf Berners-Lee T., Bizer C., Heath T. (2009). Linked Data The story so far http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf Daconta M., Obrst L., Smith K.. (2003) The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, Wiley