A Relation Extraction Method between Related Concepts using Web Search

Similar documents
THE SEMANTIC WEB AND IT`S APPLICATIONS

INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY VOLUME 3 ISSUE

Discovering and Querying Hybrid Linked Data

Integrating FLOSS repositories on the Web

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

Subtask Mining from Search Query Logs for How-Knowledge Acceleration

Cross-Lingual Concern Analysis from Multilingual Weblog Articles

Towards the Integration of a Research Group Website into the Web of Data

Monitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control. Phudinan Singkhamfu, Parinya Suwanasrikham

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

Intui2: A Prototype System for Question Answering over Linked Data

Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities

LiDDM: A Data Mining System for Linked Data

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Querying DBpedia Using HIVE-QL

Semantic Content Management with Apache Stanbol

Evaluation of Retrieval Systems

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

Representing Specialized Events with FrameBase

Social Business Intelligence Text Search System

Semi-Supervised Learning for Blog Classification

Cross-lingual Knowledge Linking Across Wiki Knowledge Bases

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Forum of International Development Studies 21 (Mar. 2002)

Natural Language to Relational Query by Using Parsing Compiler

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

Archivage du contenu éphémère du Web à l aide des flux Web *

The Path is the Destination Enabling a New Search Paradigm with Linked Data

Knowledge Continuous Integration Process (K-CIP)

Developing Semantic Classifiers for Big Data

Exploiting Linked Open Data as Background Knowledge in Data Mining

Knowledge Continuous Integration Process (K-CIP)

2-3 Automatic Construction Technology for Parallel Corpora

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

A Bi-Dimensional User Profile to Discover Unpopular Web Sources

A Survey on Product Aspect Ranking Techniques

Converging Web-Data and Database Data: Big - and Small Data via Linked Data

Survey Results: Requirements and Use Cases for Linguistic Linked Data

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

Semantic annotation of requirements for automatic UML class diagram generation

On the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services

A Case Study of Question Answering in Automatic Tourism Service Packaging

Automated FAQ Answering with Question-Specific Knowledge Representation for Web Self-Service

Recent Topics of Research around the YAGO Knowledge Base

Domain Classification of Technical Terms Using the Web

Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure

LOCATION-AWARE MOBILE LEARNING OF SPATIAL ALGORITHMS

Keywords: Information Retrieval, Vector Space Model, Database, Similarity Measure, Genetic Algorithm.

ASKWIKI : SHALLOW SEMANTIC PROCESSING TO QUERY WIKIPEDIA. Felix Burkhardt and Jianshen Zhou. Deutsche Telekom Laboratories, Berlin, Germany

Optimized Mobile Search Engine

Wikidata. Semantic Web in Libraries December A Free Collaborative Knowledge Base. Markus Krötzsch TU Dresden

A MULTILINGUAL AND LOCATION EVALUATION OF SEARCH ENGINES FOR WEBSITES AND SEARCHED FOR KEYWORDS

Sentiment analysis for news articles

Active Learning SVM for Blogs recommendation

An Approach to support Web Service Classification and Annotation

Using Data Mining for Mobile Communication Clustering and Characterization

Automatic ontology-based User Profile Learning from heterogeneous Web Resources in a Big Data Context

Curriculum Vitae Ruben Sipos

Linked Open Government Data Analytics

Impact of Financial News Headline and Content to Market Sentiment

A Hybrid Approach for Multi-Faceted IR in Multimodal Domain

DBpedia: A Multilingual Cross-Domain Knowledge Base

Type Inference on Noisy RDF Data

LDA Based Security in Personalized Web Search

Identifying Focus, Techniques and Domain of Scientific Papers

SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks

Mining the Web of Linked Data with RapidMiner

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

External Semantic Annotation of Web-Databases

LinksTo A Web2.0 System that Utilises Linked Data Principles to Link Related Resources Together

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Word Taxonomy for On-line Visual Asset Management and Mining

Sheeba J.I1, Vivekanandan K2

Profile Based Personalized Web Search and Download Blocker

Evaluation of Bayesian Spam Filter and SVM Spam Filter

Linking Search Results, Bibliographical Ontologies and Linked Open Data Resources

A Framework for Ontology-Based Knowledge Management System

The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)

Using Semantic Data Mining for Classification Improvement and Knowledge Extraction

Enriching the Crosslingual Link Structure of Wikipedia - A Classification-Based Approach -

Recovering Traceability Links between Requirements and Source Code using the Configuration Management Log *

Query Recommendation employing Query Logs in Search Optimization

Creating an RDF Graph from a Relational Database Using SPARQL

Analysis One Code Desc. Transaction Amount. Fiscal Period

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

Harvesting and Structuring Social Data in Music Information Retrieval

SINAI at WEPS-3: Online Reputation Management

Automatic Annotation Wrapper Generation and Mining Web Database Search Result

Big Data in The Web. Agenda. Big Data Asking the Right Questions Wisdom of Crowds in the Web The Long Tail Issues and Examples Concluding Remarks

Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation

Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies

Statistical Analyses of Named Entity Disambiguation Benchmarks

An Information Retrieval System for Expert and Consumer Users

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Enhancing Requirement Traceability Link Using User's Updating Activity

Performance Evaluation Techniques for an Automatic Question Answering System

Transcription:

DEIM Forum 2010 C1-2 Web 565-0871 1-5 113-8656 7-3-1 E-mail: {shirakawa.masumi,hara,nishio}@ist.osaka-u.ac.jp, nakayama@cks.u-tokyo.ac.jp, eiji.aramaki@gmail.com Web Wikipedia is-a a-part-of Wikipedia Web Wikipedia Web Wikipedia Abstract A Relation Extraction Method between Related Concepts using Web Search Masumi SHIRAKAWA, Kotaro NAKAYAMA, Eiji ARAMAKI, Takahiro HARA, and Shojiro NISHIO Department of Multimedia Engineering, Graduate School of Information Science and Technology, Osaka University 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan The Center for Knowledge Structuring, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan E-mail: {shirakawa.masumi,hara,nishio}@ist.osaka-u.ac.jp, nakayama@cks.u-tokyo.ac.jp, eiji.aramaki@gmail.com Construction of a huge scale ontology covering many named entities, domain specific terms and relations among these concepts is one of the essential technologies in the next generation Web based on semantics. In this work, we aim at automated ontology construction with a wide coverage of concepts and these relations by combining information on the Web with Wikipedia. In this paper, we propose a relation extraction method which gets pairs of co-related concepts from an association thesaurus extracted from Wikipedia and extracts relations by using Web search. Key words relation extraction, natural language processing (NLP), thesaurus, Wikipedia 1

1. Web [1] [2] OpenCyc 1 WordNet [3] EDR 2 is-a a-part-of Web Wikipedia Wikipedia Wikipedia Web Wikipedia [4], [5] Web Web Web 2. Wikipedia Web Wikipedia Wiki Web Web 1 http://www.opencyc.org/ 2 http://www2.nict.go.jp/r/r312/edr/j_index.html 1 Fig. 1 Ontology reconstruction from Case Frames 300 60 2009 12 Wikipedia URL [6] Wikipedia DBpedia [7] YAGO [8], [9] DBpedia Wikipedia RDF (Resource Description Framework) YAGO WordNet Wikipedia WordNet [10] [11] Wikipedia is-a SVM (Support Vector Machine) Wikipedia Wikipedia Web is-a a-part-of [12], [13] 1 2

1 R Table 1 A list of link texts of Softbank and reliability R 2 Wikipedia Fig. 2 An example of Wikipedia Thesaurus [14] Web [15] Web 3. Web 3. 1 Web Wikipedia [4], [5] Wikipedia Wikipedia 2 iphone ipod touch 0.016 iphone BlackBerry 0.0053 Wikipedia Web Wikipedia 371 1 17 0.479 SoftBank 4 0.234 4 0.234 3 0.186 SOFTBANK 3 0.186 Softbank 3 0.186 2 0.117 2 0.117 2 0.117 1 0 1 0 1 0 3. 2 Web Web Web Web 3. 2. 1 Wikipedia [16] Wikipedia Wikipedia 1 1 c L c l i L c M c (l i ) l i R c (l i ) 3

3 Web Fig. 3 An example of query generation with case particles Fig. 4 4 An example of relation extraction by analyzing responses R c (l i ) = ln(m c(l i)) ln(max(m c(l k )), l k L c (1) R c (l i ) l i c 1 M c (l i ) 1 l i Wikipedia URL R c 3. 2. 3 I R c 3. 2. 2 Web Web [14] Web Web AND Web Web Web 3 Web AND N Web H H 3. 2. 3 I 3. 2. 3 N Web 3. 2. 1 SoftBank 1 CaboCha [17] MeCab [18] v v c 1 c 2 l i l j p 1 p 2 F (p 1, p 2, v) F (p 1, p 2, v) = R c 1 (l i ) + R c2 (l j ) 2 F l i l j 1 c 1 c 2 1 0.5 I(p 1, p 2, v) ( ) H(p1, p 2 ) I(p 1, p 2, v) = max, 1 N (2) F (p 1, p 2, v) (3) H(p 1, p 2) p 1 p 2 N Web (3) Web I 4. Web 4

4. 1 Wikipedia 306 20 Web 5 Web 50 250 5 (H) (A) (B) Web [14] 1 (CFR) 2 Mean Reciprocal Rank (MRR) 0 3 Discounted Cumulative Gain (DCG) Jarvelin [19] g(1) if i = 1 dcg(i) = dcg(i 1) + g(i) otherwise log c (i) h g(i) = a b if r(i) H if r(i) A if r(i) B r(i) i g(i) r(i) dcg(i) i [14], [20] (h, a, b) = (3, 2, 0) c = 2 0 CFR MRR DCG 5 3 4. 2 (4) (5) 2 3 4 2 Table 2 Evaluation (all concept pairs) CFR 0.141 0.258 CFR 0.0817 0.160 MRR 0.145 0.272 MRR 0.0866 0.181 DCG 5 0.547 1.27 DCG 3 0.519 1.14 3 Table 3 Evaluation (concept pairs relation extracted between) 49 101 CFR 0.878 0.782 CFR 0.510 0.485 MRR 0.908 0.825 MRR 0.541 0.549 DCG 5 3.42 3.86 DCG 3 3.24 3.44 4 Table 4 Analysis data in the experiment Web 72473 37019 1650 11197 140 953 0.0848 0.0851 49 101 93 263 0.304 0.859 1.90 2.60 CFR 306 2 1 2 2 MRR CFR 2 CFR MRR MRR 5

Table 5 5 An example of extracted relations Table 6 6 An example of extracted wrong relations A B Berryz A B NEWS A B ZARD A B A B A B A B A B A B A B A B A B A B A B SMAP A B 0 1.90 2.60 DCG DCG 1 5 3 DCG Web Web 50 Web Web 4. 3 5 5 A B MBS A B A B F A B A B A B A B A B A B A B 2010 1 SMAP 2010 1 NEWS 2010 1 6 MBS MBS MBS MBS MBS MBS MBS 306 205 is-a a-part-of 6

is-a a-part-of 8! 5. Wikipedia Web Wikipedia Wikipedia [4], [5] Web 2 C(20500093) B(21300032) CORE [1] vol.20 no.6 pp.707 714 Nov. 2005. [2] vol.19 no.2 pp.172 186 Mar. 2004. [3] G.A. Miller, WordNet: A Lexical Database for English, Communications of the ACM (CACM), vol.38, no.11, pp.39 41, Nov. 1995. [4] Wikipedia vol.47 no.10 pp.2917 2928 Oct. 2006. [5] K. Nakayama, T. Hara, and S. Nishio, Wikipedia Mining for An Association Web Thesaurus Construction, Proceedings of International Conference on Web Information Systems Engineering (WISE), pp.322 334, Dec. 2007. [6] : Wikipedia vol.22 no.5 pp.693 701 Sept. 2007. [7] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z.G. Ives, DBpedia: A Nucleus for a Web of Open Data, Proceedings of International Semantic Web Conference, Asian Semantic Web Conference (ISWC/ASWC), pp.722 735, Nov. 2007. [8] F.M. Suchanek, G. Kasneci, and G. Weikum, YAGO: A Core of Semantic Knowledge, Proceedings of International Conference on World Wide Web (WWW), pp.697 706, May 2007. [9] F.M. Suchanek, G. Kasneci, and G. Weikum, YAGO: A Large Ontology from Wikipedia and WordNet, Journal of Web Semantics, vol.6, no.3, pp.203 217, Sept. 2008. [10] Wikipedia 14 pp.769 772 Mar. 2008. [11] Wikipedia 18 Web SIG-SWO-A801-06 July 2008. [12] vol.12 no.2 pp.109 131 Mar. 2005. [13] D. Kawahara, and S. Kurohashi, Case Frame Compilation from the Web using High-Performance Computing, Proceedings of International Conference on Language Resources and Evaluation (LREC), May 2006. [14] D vol.j91-d no.3 pp.619 627 Mar. 2008. [15] Web : vol.47 no.sig19(tod32) pp.98 112 Dec. 2006. [16] K. Nakayama, T. Hara, and S. Nishio, A Thesaurus Construction Method from Large Scale Web Dictionaries, Proceedings of IEEE International Conference on Advanced Information Networking and Applications (AINA), pp.932 939, May 2007. [17] T. Kudo, and Y. Matsumoto, Fast Methods for Kernel- Based Text Analysis, Proceedings of Meeting on Association for Computational Linguistics (ACL), pp.24 31, July 2003. [18] T. Kudo, K. Yamamoto, and Y. Matsumoto, Applying Conditional Random Fields to Japanese Morphological Analysis, Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.230 237, July 2004. [19] K. Järvelin, and J. Kekäläinen, IR Evaluation Methods for Retrieving Highly Relevant Documents, Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp.41 48, July 2000. [20] K. Eguchi, Overview of the Topical Classification Task at NTCIR-4 WEB, Working Notes of the 4th NTCIR Meeting, Supplement, vol.1, no.48 55, June 2004. 7