JOURNAL OF COMPUTER SCIENCE AND ENGINEERING



Similar documents
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

HadoopRDF : A Scalable RDF Data Analysis System

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering

A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud.

Big RDF Data Partitioning and Processing using hadoop in Cloud

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Distributed Framework for Data Mining As a Service on Private Cloud

Semantic Web Standard in Cloud Computing

Chapter 7. Using Hadoop Cluster and MapReduce

International Journal of Advance Research in Computer Science and Management Studies

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

DISTRIBUTED RDF QUERY PROCESSING AND REASONING FOR BIG DATA / LINKED DATA. A THESIS IN Computer Science

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Autonomic Data Replication in Cloud Environment

Log Mining Based on Hadoop s Map and Reduce Technique

Advances in Natural and Applied Sciences

Efficient Analysis of Big Data Using Map Reduce Framework

Design of Electric Energy Acquisition System on Hadoop

Open source Google-style large scale data analysis with Hadoop

Survey on Scheduling Algorithm in MapReduce Framework

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Query engine for massive distributed ontologies using MapReduce

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform


An Approach to Implement Map Reduce with NoSQL Databases

DYNAMIC QUERY FORMS WITH NoSQL

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics

Semantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source large scale distributed data management with Google s MapReduce and Bigtable

Big Data Storage Architecture Design in Cloud Computing

Hadoop Architecture. Part 1

ISSN: Keywords: HDFS, Replication, Map-Reduce I Introduction:

Scalable Multiple NameNodes Hadoop Cloud Storage System

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Similarity Search in a Very Large Scale Using Hadoop and HBase

An Efficient and Scalable Management of Ontology

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Suresh Lakavath csir urdip Pune, India

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers

Telecom Data processing and analysis based on Hadoop

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

International Journal of Innovative Research in Computer and Communication Engineering

Testing Big data is one of the biggest

NoSQL and Hadoop Technologies On Oracle Cloud

UPS battery remote monitoring system in cloud computing

E6895 Advanced Big Data Analytics Lecture 4:! Data Store

IMPLEMENTATION OF P-PIC ALGORITHM IN MAP REDUCE TO HANDLE BIG DATA

Application Development. A Paradigm Shift

Querying DBpedia Using HIVE-QL

A Performance Analysis of Distributed Indexing using Terrier

Industry 4.0 and Big Data

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

Advanced SQL Query To Flink Translator

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

Graph Database Performance: An Oracle Perspective

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Exploring Incremental Reasoning Approaches Based on Module Extraction

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

How To Scale Out Of A Nosql Database

How To Write A Nosql Database In Spring Data Project

Scalable Forensics with TSK and Hadoop. Jon Stewart

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Introduction to Hadoop

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT

Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

How To Handle Big Data With A Data Scientist

An Hadoop-based Platform for Massive Medical Data Storage

GeoGrid Project and Experiences with Hadoop

Apache Hadoop new way for the company to store and analyze big data

SURVEY ON SCIENTIFIC DATA MANAGEMENT USING HADOOP MAPREDUCE IN THE KEPLER SCIENTIFIC WORKFLOW SYSTEM

High-Performance, Massively Scalable Distributed Systems using the MapReduce Software Framework: The SHARD Triple-Store

Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics

A Database Hadoop Hybrid Approach of Big Data

Transcription:

Exploration on Service Matching Methodology Based On Description Logic using Similarity Performance Parameters K.Jayasri Final Year Student IFET College of engineering nishajayasri@gmail.com R.Rajmohan Assistant Professor IFET College of Engineering rjmohan89@gmail.com D.Dinagaran, Assistant Professor IFET College of engineering ddinagaran@gmail.com Abstract Extracting accurate information for the user input is the main drawback for information retrieval system in web. To overcome the matching problem, a description logic based matching technique is proposed. The description logic consists of TBox and ABox to separate the individual and universally quantified properties. The user input can be matched against ontology base and allows us to formalize information of domain as classes and instances. The DL-structured format is processed through Hadoop to obtain the optimal matching service platform. The unstructured information is stored as files in the Hadoop Distributed File System and MapReduce operation is performed for simplified development of programs. The SPARQL query is used for querying stored in Mongodb. Mongodb has collection of RDF sets and are processed for matching using SPARQL query. The resultant set for the user input is given in the form of text document. The performance of the matching system is improved in terms of precision and recall value. Comparing with the existing system proves that the proposed system provide an optimal solution with minimal search time and maximum accuracy. Keywords ontology matching; description logic; hadoop; mongodb; SPARQL query language. predicate and object are denoted as triple. The resources are described by using statement of RDF and all resources have a I. INTRODUCTION The Web Ontology Working Group of W3C found that RDF cannot fully satisfy all the requirements of the semantic web. To overcome this problem, efficient processing of ontology model is used. The ontology is used to formalize based on the domain and describe the relations. The OWL format is used to find the domain of knowledge and it is the best method to describe the semantic web. The user input is processed through ontology base and it creates the structure based on the resources in the OWL model. The obtained semantic knowledge is matched with the set of bases in the ontology base. The base has collection of information about the domain and the matched information for the user input is obtained in structure format. The OWL-DL format is obtained using description logic to obtain the RDF framework model. To reduce the sets stored in HDFS (Hadoop Distributed File System), the MapReduce framework is used. II. RDF AND SPARQL Efficient model for storing and representing by Resource Description Framework (RDF) plays an important role to retrieve the. To convert the structured to machine readable form, the RDF framework [1] is used. RDF is a directed, labeled graph and model used for semantic web management. Each statement of RDF has subject, URI for their unique representation. An RDF Graph Representation for the resources is used to create the SPARQL query. The SPARQL [2] is the query language to retrieve the from an RDF. The SPARQL query language is created and matched with the URI and generate the new RDF graph. For the new RDF graph, the SPARQL query is created and MapReduce jobs are performed.

... JOURNAL OF COMPUTER SCIENCE AND ENGINEERING Geographical area <area> <area> <LA> <WA> <name> <name> Land area Island City Water area salt water fresh water Hadoop Distributed File System (HDFS) store the RDF [6] based on the two nodes such as Data Node and Name Node. In our approach, the DL format is based on the RDF and we cannot directly store the RDF in HDFS. To overcome this problem, the N-triple format is taken [3] and split into predicate files based on the type of object. It is stored as text files according to the schema in HDFS. An important feature of HDFS is replication and each file is divided into blocks and replicated among nodes. The files have the collection of and it is stored in Data Node. The Name Node describes the location of the file using the URI and manages the file namespace. The SPARQL query [15] is created for the stored in HDFS and MapReduce operation is performed. Predicate Subject Object Figure 1.Resource Description Framework (RDF) graph. III. DESCRIPTION LOGIC Description Logic [7] is a family of knowledge representation language. The knowledge of an application domain is represented by first defining the relevant concepts of the domain and then using the concepts to specify properties of objects and individuals occurring in a domain. DL is constituted by two components such as TBox and ABox. The TBox describes a set of universally quantified assertions using general properties of concepts and roles. The ABox comprises assertions on individual objects. In our approach, the structured is processed through description logic to obtain the OWL-DL [9] language format. The OWL-DL language has based on the RDF to describe the semantic meaning. The proposed logic includes syntax and semantics for conversion of OWL Axiom to DL Axiom and an RDF framework [8] to convert the structured to machine readable form. The RDF model mainly used to describe the conversion of individuals as rdf:description, rdf:type and rdf:property. IV. HADOOP In order to effectively handle and store the large amount of RDF, the Hadoop framework is used. Hadoop [11] is an open source implementation that enables the distributed processing of large sets across cluster of commodity server. It is capable to connect and coordinate thousands of nodes inside a cluster. The Hadoop framework has two components such as HDFS [5] and MapReduce [12]. HDFS is designed for storage and MapReduce for processing. V. MAPREDUCE Map Reduce is used to execute the SPARQL query and provide the parallel processing over a large number of nodes to simplify the [10]. The SPARQL query created using the DataNode and matched with the NameNode and generates the RDF graph. Finally, the SPARQL query language is created for the RDF graph and the is retrieved from the HDFS. The Mapstep has two nodes such as master node and worker node [4]. The master node split the and assigns the keyvalue pairs. The master node picks the idle workers and assigns each one a map tasks to set the intermediate keyvalue pairs and send back to master node. The master node stores the details about the location of the. The Reduce function takes the intermediate keyvalue pairs and reduces to a smaller solution. Based on the smaller solution, again the SPARQL query language is created. HDFS base1 base2... base3 Map step Job split 0 split 1 split 2 MAP REDUCE Reduce step part 0 part 1 Figure 2. HDFS and MapReduce flow VI. PROPOSED ARCHITECTURE HDFS base1 base2 base3 The proposed architecture explains about the systematic study on the performance of service matching methodology

based on description logic. Our architecture consists of several components for storing and querying an RDF. The description logic converts structured into DL format based on RDF framework. The resultant is processed through the Hadoop component to generate the SPARQL query language. The query language is matched against Mongodb and gives the output. In the architecture, the unstructured user input is converted to structure format by matching the user input to the ontology Database and formalize information as classes. The user input is converted to structure format by matching the user input to the ontology base and formalize information as classes and instances. The description language converter uses the TBox and ABox to describe the concept definitions, inclusions and assertions. The OWL-DL axiom based on RDF framework to formalize in machine understandable form. To store and query RDF, the Hadoop framework is used. This component has two subcomponents such as Hadoop Distributed File System (HDFS) and MapReduce. The DL format is processed through Hadoop component and stored as text files in HDFS and MapReduce operation is performed. The reduced set is stored in HDFS and SPARQL query language is created. The Mongodb has the collection of RDF and it matched with the SPARQL query and gives the output in the form of text document. User Ontology Database Description language converter installed in the system. Our approaches are implemented in Core Java language with the version JDK 1.7 and running in Netbeans IDE 7.2.1. The RDF sets are stored in Mongodb and retrieved using the SPARQL query language. VIII. DATASETS AND QUERIES Table 1: Query Sets ID Query Shape SELECT?x WHERE { Q1?x geog:hascapital?y } Point SELECT?s?p WHERE { Q2?s geog:islowestpointof. Point geog:assam. } SELECT?m?pWHERE { Q3?p rdf:type ex:park. Graph?m rdf:type ex:monument. geog:within?p } Q4 Q5 SELECT?p WHERE {?p rdf:type ex:park geog:hasgeometry?pgeog. ex:newyorkmonument geog:hasgeometry?mgeog. FILTER ( geogf:distance (?pgeog,?mgeog, units:m)<3000 } SELECT?m?p WHERE {?m rdf:type ex:monument geog:hasgeometry?mgeog.?p rdf:type ex:park geog:hasgeometry?pgeog.?m geog:within?pgeog. } Graph Graph 1---------------------- unstructured result base 1 Matching tool... base 2 base n Relational base structured SPARQL query TBox ABox DL format SPARQL over mapreduce The experiments were based on the geographical sets. We generated the geographical sets and expressed in OWL format with the ontology type in RDF model. We collected the 9 RDF class and 14 properties. The RDF triples are collected for the sets to express the OWL-DL format. The queries from the user are expressed in OWL to find the semantic relation. For the purpose of our experiments, we use the SPARQL query language. It s because our experiments tested performance for extracting accurate information for the queries using ontology matching. Sample queries used for our evaluation are shown in the above Table1: Query Sets. 2--------------------- Document base 1 base 2... Mongodb base n sets Job HADOOP Figure 3: Architecture of Sematic similarity based matching VII. EXPERIMENTAL SETUP Our experiments use the Windows 7 operating system with Pentium 4 processor, 1-Gbyte RAM with a clock speed as 2.7 GHz. The capacity of the Hard disk drive is 200GB. The tools and base such as Hadoop 0.18.10 and Mongodb are IX. EXPERIMENTATION Our experiments based on the semantic search for Information Retrieval (IR). In search engine, the IR is based on the keyword and return less relevant documents. The proposed search methodology uses the ontology matching technique and MapReduce to develop the efficient query plan for Information Retrieval. The ontology model gives the relevant RDF sets based on semantic similarity. By applying MapReduce we can join the relevant RDF sets in a specific reducer. Therefore, the relevant information is retrieved accurately in search engine for the user input.

X. PRECISION AND RECALL The Precision and Recall [13] are commonly used to measure the accuracy of an information retrieval system. Precision is the fraction of retrieved webpage that are relevant to the search. Recall is the fraction of the webpage that are relevant to the query that is successfully retrieved. They are computed as, In Figure 4, the Information Retrieval (IR) based on the keyword and gives less relevant web pages for the queries from the user. relevant webpages retrieved webpages Precision retrieved webpages relevant webpages retrieved webpages Recall relevant webpages The precision and recall are combined to calculate the F- measure. The F-measure [14] is the harmonic mean and used in IR for measuring search efficiency. precision.recall F measure 2. precision recall Here, ranking is analyzed by Information Retrieval (IR) accurately using ontology matching models and MapReduce for the user input. XI. EVALUATION The performance of the OWL-DL model and MapReduce are used to develop the query plan for the user input. Based on the semantic relation in query plan are used to extract the accurate webpage in search engine. The queries based on geographical sets and the precision, recall and F-measure is computed. Figure 5. Experimental results for semantic similarity From the Figure 5, the semantic similarity based search using ontology matching model gives the efficient query plan and retrieve the accurate information using the SPARQL query. XII. CONCLUSION AND FUTURE WORK The proposed information retrieval system overcomes the existing matching problem with the help of description logic. The TBox and ABox components are used to formalize the semantic information from the user input with the help of ontology. MapReduce operation is performed in Hadoop for simplifying the querying and matching process. The SPARQL query is used for querying the DL structured stored in Mongodb. The performance of the proposed system exceeds the existing in terms of performance measures such as precision, recall and F-measure. Our future work includes extending our DL pattern based matching by integrating Hyperclique and Lattice pattern. References Figure 4. Experimental results for keyword based search [1] Wangchao Le, Feifei Li, Anastasios Kementsietsidis, and Songyun Duan, Scalable Keyword Search on Large RDF Data, IEEE, Knowledge and Data Eng., vol. 26, no. 11, 2014. [2] LIU Chang, WANG Haofen, YU Yong, XU Linhao, Towards Efficient SPARQL Query Processing on RDF Data, vol. 15, no. 6, 2010, pp. 613-622. [3] Craig Franke, Samuel Morin, ArtemChebotko, John Abraham, and Pearl Brazier,.Efficient Processing of Semantic Web Queries in HBase and MySQL Cluster, 2013. [4] Dawei Jiang, Anthony K. H. Tung, and Gang Chen, MAP-JOIN- REDUCE: Toward Scalable andefficient Data Analysis on Large Clusters, IEEE, Knowledge and Data Eng., vol. 23, no. 9, 2011.

[5] Praveen Kumar, Dr Vijay Singh Rathore, Efficient Capabilities of Processing of Big Data using Hadoop Map Reduce, vol. 3, issue 6, 2014. [6] M. Husain, J. McGlothlin, M. M. Masud, L. Khan, Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing, Knowledge and Data Eng., vol. 23, issue 9, 2011. [7] Krotzsch.M, Simancik.F, and Horrocks.I, Description Logics, Intelligent System, vol. 29, issue 1, 2014. [8] Jeff Z. Pan and Ian Horrocks, RDFS (FA): Connecting RDF(S) and OWL DL, Knowledge and Data Eng., vol. 19, no. 2, 2007. [9] Pan. J. Z, A Flexible Ontology Reasoning Architecture for the Semantic Web, Knowledge and Data Eng., vol. 19, no. 2, 2007. [10] Bo Liu, Keman Huang, Jianqiang Li, MengChu Zhou, An Incremental and Distributed Inference Method for Large-Scale Ontologies Based on MapReduce Paradigm, Cybernetics, vol. 45, issue 1, 2015. [11] Jun Liu, Feng Liu, N.Ansari, Monitoring and analyzing big traffic of a large-scale cellular network with Hadoop, Network, vol. 28, issue 4, 2014. [12] Cheng Chen, Zhong Liu, Wei-Hua Lin, Shuangshuang Li, Kai Wang, Distributed Modeling in a MapReduce Framework for Data-Driven Traffic Flow Forecasting, Intelligent Transportation Systems, vol. 14, issue 1, 2013. [13] Viviana Mascardi, Angela Locoro, and Paolo Rosso, Automatic Ontology Matching via Upper Ontologies: A Systematic Evaluation, Knowledge and Data Eng., vol. 22, no. 5, 2010. [14] Pavel Shvaiko and Jerome Euzenat, Ontology Matching: State of the Art and Future Challenges, Knowledge and Data Eng., vol. 25, no. 1, 2013. [15] R. Valencia-Garcia, F. Garcia-Sa nchez, D. Castellanos-Nieves, J. T. Ferna ndez-breis, OWLPath: An OWL Ontology-Guided Query Editor, Systems,Man and Cybernetics, vol. 41, issue 1, 2011.