Semantically enhanced Information Retrieval: an ontology-based approach
|
|
|
- Barrie Ryan
- 10 years ago
- Views:
Transcription
1 Semantically enhanced Information Retrieval: an ontology-based approach Miriam Fernández Sánchez under the supervision of Pablo Castells Azpilicueta Departamento de Ingeniería Informática Escuela Politécnica Superior Universidad Autónoma de Madrid
2 Table of contents Motivation Part I. Analyzing the state of the art What is semantic search? Part II. The proposal An ontology-based IR model Semantic retrieval on the Web Part III. Extensions Semantic knowledge gateway Coping with knowledge incompleteness Conclusions
3 Motivation (I) How to find and manage massive-scale content stored and shared on the Web and other large document repositories? Using search engines (Google, Yahoo, MSN) Do users always manage to find the information they are looking for? Example: systems that recommend books
4 Motivation (II) Problem: current content description and query processing techniques for IR are based on keywords Limited capabilities to grasp and exploit the conceptualizations involved in user needs and content meanings Relations between search terms: books about recommender systems vs. systems that recommend books Polysemy: jaguar as animal vs. jaguar as car Synonymy: movies vs. films Documents about individuals where query keywords do not appear: English banks, individual Abbey Inference: water sports in Mediterranean coast windsurf, scuba diving, etc., in Valencia, Alicante, etc. Potential solution: semantic search Search by meanings rather than literal strings If machines are able to understand the semantics behind user needs and contents, they will retrieve more accurate results!
5 Motivation (III) Semantic search: different perspectives Information Retrieval (IR) The level of conceptualization is often shallow and sparse, especially at the level of relations Semantic-based knowledge technologies (SW) Use of higher levels of conceptualization but are more focused on data retrieval and not directly applied to unstructured information objects carrying free text or multimedia content Goal: The realization of a novel semantic retrieval model Exploit deep levels of conceptualization Support search in large, open and heterogeneous repositories of unstructured information
6 Research questions Q1: What do we understand by semantic search? Q2: Where are we standing in the progress towards semantic information retrieval? Q3: Can the achievements in semantic retrieval from different research fields be combined and give rise to enhanced retrieval models thereupon? Q4: Can semantic retrieval models be scaled to open, massive, heterogeneous environments such as the World Wide Web? Q5: How to standardize the evaluation of semantic retrieval systems? Q6: How to deal with knowledge incompleteness?
7 Part I. Analyzing the state of the art What is semantic search? IR vs. semantic-based knowledge technologies perspectives A global classification Drawbacks and limitations
8 What is semantic search? What is semantic search? Raising the representation of content meanings to a higher level above plain keywords, in order to enhance current mainstream Information Retrieval (IR) technologies Goal Reduce the distance between the logic representation of the IR systems and the real one in the user s mind with regards to the formulation of queries and the understanding of documents Barriers The bag of words approach is pervasively adopted in currently deployed IR technologies Conceptual representations are difficult and costly to create and maintain
9 Semantic search: IR and SW perspectives Semantic search: IR perspective Early 80s: elaboration of conceptual frameworks and their introduction in IR models Taxonomies (categories + hierarchical relations), e.g., Linaen taxonomy Thesaurus (categories + fixed hierarchical & associative relations), e.g., WordNet (used by linguistic approaches) Algebraic methods such as LSA Main limitations: The level of conceptualization is often shallow (relations) Semantic search: semantic-based technologies perspective Late 90s: introduction of ontologies as conceptual framework (classes + instances (KBs) + arbitrary semantic relations + rules) Main limitations: Semantic search is understood as a data retrieval task (e.g., semantic portals) Sometimes it makes partial use of the expressive power of an ontology-based representation
10 Semantic search: a global classification Criteria Semantic knowledge representation Scope Goal Query Content retrieved Content ranking Approaches Linguistic conceptualization Latent Semantic Analysis Ontology-based information retrieval Web search Limited domain repositories Desktop search Data retrieval Information retrieval Keyword query Natural language query Controlled natural language query Structured query based on ontology query languages Pieces of ontological knowledge XML documents Text documents Multimedia documents No ranking Keyword-based ranking Semantic-based ranking
11 Semantic search: identified limitations Criteria Limitations IR Semantic Semantic knowledge representation Do not exploit the full potential of an ontological language, beyond those that could be reduced to conventional classification schemes X (partially) Scope Do not scale to large and heterogeneous repositories of documents X Goal Are based on Boolean retrieval models where the information retrieval problem is reduced to a data retrieval task X Query Limited usability X Content retrieved Content ranking Focused on textual content: unable to manage different formats (multimedia) (partially) (partially) Lack of semantic ranking criteria. The ranking (if provided) relies on keyword-based approaches X X Additional Limitations Coverage Knowledge incompleteness (partially) X Evaluation Lack of standard evaluation frameworks X
12 Part II. The proposal Our proposal towards semantic search: an ontology-based IR model Semantic retrieval framework Semantic indexing Query processing Searching and ranking An example Evaluation Results Conclusions Semantic retrieval on the Web Limitations of semantic retrieval in the Web environment Semantic retrieval framework extensions Semantic indexing Query processing Searching and ranking Evaluation Results Conclusions
13 Semantic retrieval framework Adaptation of the classic keyword-based IR model Semantic knowledge representation: the bag of words is replaced by an ontology and its corresponding KB SPARQL Editor SPARQL query Query processing Semantic Knowledge (Ontology + KB) Semantic entities searching Semantic index (Weighted annotations) indexing Unordered documents ranking Document Corpus Ranked documents
14 Semantic indexing Adaptation of the classic inverted index Concepts instead of keywords are associated to documents Annotation weights are computed using an adaptation of the TF-IDF algorithm Topic Topic MetaConcept MetaConcept label label keyword keyword classification label label keyword keyword classification instanceof DomainConcept DomainConcept label label keyword keyword ODP IPTC SRS 1 1 Annotation Annotation annotation weight weight annotation Document Document url title title author author date date Upper Ontologies Automatic Annotation Manual Annotation Text Document Media Document Domain Ontologies
15 Querying, searching and ranking (I) Adapting the vector-space IR model Keyword-Based IR Model Query keyword-vector q Document keyword-vector d k 3 ( d q) ksim, = cosα Semantic IR Model Result-set concept-vector q Document concept-vector d x 3 ( d q) sim, = cosα q d q α α d k 1 x 1 {k 1, k 2, k 3 } = set of all keywords {x 1, x 2, x 3 } = set of semantic entities k 2 x 2
16 Querying, searching and ranking (II) Building the query vector Execute the query (e.g. SPARQL) Result set R Ο V Variable weighs: for each variable v V in the query, w v [0,1] For each x Ο, q x = Building the document vector Map concepts to keywords Weight for an instance x Ο that annotates a document d: TF-IDF freq x,d = number of occurrences of keywords of x in d n x = number of documents annotated by x N = total number of documents d x q x w v if x instantiates v in some tuple in R 0 otherwise d x x, d = max y O freq freq y, d N log n x
17 An example SPARQL query Query: players from USA playing in basketball teams of Catalonia PREFIX rdf: < PREFIX kb: SELECT?player?team WHERE {?player rdf:type kb:sportsplayer.?player kb:plays kb:basketball.?player kb:nationality kb:usa.?player kb:playsin?team.?team kb:locatedin kb:catalonia.} Player (w=1.0) Aaron Jordan Bramlet Derrick Alston Venson Hamilton Jamie Arnold Results Team(w=0.5) Caprabo Lleida Caprabo Lleida DKV Joventut DKV Joventut Query vector: (,1, 1, 1, 1, 0.5, 0.5, ) Found documents: 66 news articles ranked from 0.1 to E.g., 1 st result Johnny Rogers and Berni Tamames went yesterday through the medical revision required at the beginning of each season, which consisted of a thorough exploration and several cardiovascular and stress tests, that their team mates had already passed the day before. Both players passed without major problems the examinations carried through by the medical team of the club, which is now awaiting the arrival of the Northamericans Bramlett and Derrick Alston to conclude the revisioning. Document vector (, 1.73,, 1.65, ) Semantic rank value: cos (d, q) = 0.88 Keyword rank value: cos (d, q) = 0.06 Combined rank value: 0.47
18 Evaluation Evaluation benchmark Document collection: news articles from the CNN Web Site 145,316 documents (445 MB) from the CNN (NewsArticle TextDocument) Domain ontology and KB: KIM with minor extensions and adjustments 281 domain classes, 138 properties, in several domains 35,689 instances, 465,848 sentences, (71MB in RDF text format) Queries A set of twenty queries was prepared manually. w v = 1 for all v in the SPARQL queries Judgments: Manual judgement of documents from 0 to 5 Experimental conditions Keyword-based search (Lucene) Ontology-only search Semantic search
19 Precision News about banks that trade on NASDAK, with fiscal net income greater than two billion dollars. Results Precision News about insurance companies in USA Recall Recall News about telecom companies Average Semantic search Keyword-based search Ontology-only search Precision Precision Recall Recall
20 Initial model conclusions Better precision by using structured semantic queries (more precise information needs) E.g. a football player playing in the Juventus vs. playing against the Juventus Better recall when querying for instances by class (query expansion) E.g. News about companies quoted on NASDAQ Better recall by using inference E.g. Watersports in Spain ScubaDiving, Windsurf, etc. in Cadiz, Valencia, Alicante, etc. Better precision by using query variable weights E.g. new articles about car models released this year, where the release date is not necessarily mentioned Ambiguity is easier to deal with at the level of concepts Property domain/range, topic-based classification, etc. Conditions on concepts and conditions on documents E.g. film review published by Le Monde within the last 7 days about sci-fi movie
21 Part II. The proposal Our proposal towards semantic search: an ontology-based IR model Semantic retrieval framework Semantic indexing Query processing, Searching and ranking An example Evaluation Results Conclusions Semantic retrieval on the Web Limitations of semantic retrieval in the Web Semantic retrieval framework extensions Semantic indexing Query processing Searching and ranking Evaluation Results Conclusions
22 Limitations of semantic retrieval on the Web Applying semantic retrieval on a decentralized, heterogeneous and massive repository of content such as the Web is still an open problem Heterogeneity: Web contents span a potentially unlimited number of domains. Impossible to fully cover with a predefined set of ontologies and KBs Proposal: generation of a SW gateway that collects and gives access to semantic metadata available online. Scalability: Scaling our model to the Web environment involves exploiting all the semantic metadata available online and to manage huge amounts of information in the form of unstructured content Proposal: creation of scalable and flexible semantic indexing (annotation) methods. Usability: Provide users with usable query interface Proposal: support natural language
23 Semantic retrieval framework extensions Queries are expressed in natural language A SW gateway is integrated to collect, store and give fast access to the online metadata NL interface NL query Semantic Web Semantic Web gateway Query processing Semantic instances entities searching Unordered documents ranking Semantic Weighted index annotations (Weighted annotations) Pre-processed Semantic Knowledge indexing Unstructured Web contents Ranked documents
24 Semantic indexing (I) Two different semantic indexing (annotation) methodologies are proposed Annotation based on NLP Annotation based on contextual semantic information Common requirements Identify ontology entities (classes, properties, instances or literals) within the documents to generate new annotations Do not populate ontologies, but identify already available semantic knowledge within the documents Support annotation in open domain environments (any document can be associated or linked to any ontology without any predefined restriction). This brings scalability limitations. To solve them we propose: Generation of ontology indices Generation of document indices Construction of an annotation database which stores non-embedded annotations Entity ID Doc Id Weight
25 Annotation based on NLP Semantic indexing (II) <head> <body> <p> Schizophrenia patients whose medication couldn't stop the imaginary voices in their heads </p> </body> </head> Weighted annotations Ontology Entity Document Weight E1 D1 0.5 E45 D HTML Parser Annotations creator 6 Schizophrenia patients whose medication couldn't stop the imaginary voices in their heads Ontology Entity E1 E45 Document frequencies D1(2), D4(3) D1(1), D25(7), D34(1) 2 NLP processing Frequency counter 5 <body> <p> <s> <w c="w" pos="nnp" stem="schizophrenia">schizophrenia</w> <w c="w" pos="nns" stem="patient">patients</w> <w c="w" pos="wp$">whose</w> <w c="w" pos="nn" stem="medication">medication</w> <w c="w" pos="md">could</w><w c="w" pos="rb">not</w> <w c="w" pos="vb" stem="stop">stop</w> <w c="w" pos="dt">the</w> <w c="w" pos="jj">imaginary</w> <w c="w" pos="nns" stem="voice">voices</w> <w c="w" pos="in">in</w> <w c="w" pos="prp$">their</w> <w c="w" pos="nns" stem="head">heads</w> </s> </p> </document> Tokens filter 3 -Schizophrenia -Patient -Medication -Stop -voice -head Keyword Schizophrenia Patient head Ontology Entities E1, E4, E80 E45 E2, E7, E123 Index Searcher Ambiguities: exploit the PoS to reduce ambiguities creating groups of words that can potentially express a concept (e.g., Noun + noun. tea cup ) Annotation weights: use of TF-IDF + PoS to include pronouns when computing frequencies 4
26 Semantic indexing (III) Annotation based on contextual semantic information Ontology (o1) 1 Select the next semantic entity Weighted annotations E1= Individual: Maradona Labels = { Maradona, Diego Armando Maradona, pelusa } Select the semantic context 3 2 Search the terms in the document index E34 = Class: football_player Labels = { footbal l player } I22= Individual: Argentina Labels = { Argentina } Ontology Entity Document Weight Keyword Maradona Pelusa Documents D1, D2, D87 D95 D140 Search contextualized terms in the document index 4 E1 D1 0.5 E1 D2 0.2 E1 D football_player Argentina D87, D61, D44, D1 D43, D32, D2 Conextualizeddocuments= {D1, D2,D32, D43, D44, D61, D87} Annotations creator 6 Potential documents to annotate= {D1, D2, D87, D95, D140} Select the semantic contextualized docs 5 documents to annotate= {D1, D2, D87} Ambiguities: exploit ontologies as background knowledge (increasing precision but reducing the number of annotations) Annotation weights: computed from document ranking scores P S_d + (1-P) C_d
27 Query processing Integration of PowerAqua as query processing module Input: natural language query Output: list of semantic entities retrieved from different ontologies and KBs Components Liguistic component: translate the query into its linguistic triple form: which are the members of the rock group Nirvana? = <what-is, members, rock group nirvana>. PowerMap: maps the terms of each linguistic triple to semantically relevant ontology entities Triple similarity service: selects the ontological triples that best represent the user s query
28 Searching and ranking Construction of the query vector: The weights are computed considering the set of answers retrieved by PowerAqua The weight of each entity in the query vector is computed as 1/ S were S is the set of semantic entities retrieved for the query condition i E.g., symptoms and treatments of Parkinson disease (this query has two conditions) Construction of the document vector: Document vectors are computed using semantic entities from different ontologies and KBs
29 Evaluation benchmark Document collection: TREC WT10G Queries and judgments Evaluation TREC 9 and TREC 2001 test corpora (100 queries with their corresponding judgments) 20 queries selected and adapted to be used by PowerAqua (our QA query processing module) Ontologies 40 public ontologies covering a subset of the TREC domains and queries (370 files comprising 400MB of RDF, OWL and DAML) 100 additional repositories (2GB of RDF and OWL) stored and indexed with the SW gateway Knowledge Bases Some of the 40 selected ontologies have been semi-automatically populated from Wikipedia Experimental conditions Keyword-based search (Lucene) Semantic-based search Best TREC automatic search Best TREC manual search
30 Results (I) Topic Semantic Lucene TREC TREC automatic manual Mean Topic Semantic TREC TREC Lucene retrieval automatic manual Mean MAP: mean average precision P@10: precision at 10 Figures in bold correspond to best result for each topic, excluding the best TREC manual approach (because of the way it constructs the query) Annotation based on contextual semantic information is used for this experiment
31 Results (II) By the semantic retrieval outperforms the other two approaches It provides maximal quality for 55% of the queries and it is only outperformed by both Lucene and TREC in one query (511) Semantic retrieval provides better results than Lucene for 60% of the queries and equal for another 20% Compared to the best TREC automatic engine, our approach improves 65% of the queries and produces comparable results in 5% By MAP, there is no clear winner The average performance of TREC automatic is greater than semantic retrieval. Semantic retrieval outperforms TREC automatic in 50% of the queries and Lucene in 75% Bias in the MAP measure More than half of the documents retrieved by the semantic retrieval approach have not been rated in the TREC judgments The annotation technique used for the semantic retrieval approach is very conservative (missing potential correct annotations)
32 Results (III) For some queries for which the keyword search (Lucene) approach finds no relevant documents, the semantic search does queries 457 (Chevrolet trucks), 523 (facts about the five main clouds) and 524 (how to erase scar?) In the queries in which the semantic retrieval did not outperform the keyword baseline, the semantic information obtained by the query processing module was scarce. Still, overall, the keyword baseline only rarely provides significantly better results than semantic search TREC Web search evaluation topics are conceived for keyword-based search engines. With complex structured queries (involving relationships), the performance of semantic retrieval would improve significantly compared to the kewyord-based The full capabilities of the semantic retrieval model for formal semantic queries were not exploited in this set of experiments
33 Results (IV) Studying the impact of retrieved non-evaluated documents 66% of the results returned by semantic retrieval were not judged not affected. Results in the first positions have a higher probability of being evaluated MAP: evaluating the impact Informal evaluation of the first 10 unevaluated results returned for every query 89% of these results occur in the first 100 positions for their respective query A significant portion, 31.5%, of the documents we judged turned out to be relevant Even though this can not be generalized to all the unevaluated results returned by the semantic retrieval approach (the probability of being relevant drops around the first 100 results and then varies very little) we believe that the lack of evaluations for all the results returned by the semantic retrieval impairs its MAP value
34 Extended model conclusions Construction of a complete semantic retrieval approach Input: Natural language queries Output Specific answers in the form of ontology entities Semantically ranked documents Addressing challenges of the Web environment Heterogeneity The system can potentially cover a large amount of domains reusing the ontologies and KBs available online Semantic coverage enhancement would directly result in retrieval performance improvement Scalability The proposed semantic indexing (annotation) methods are able to manage large amounts of unstructured content and semantic metadata without any predefined restriction Need to study in more detail the trade-off s between the quantity and quality of annotations Usability Use of PowerAqua as query processing module. Queries are expressed in NL Knowledge incompleteness If the query processing module does not find any answer, the ranking module ensures that the system degrades gracefully to behave as a traditional keyword-based retrieval approach
35 Part III. Extensions Semantic knowledge gateway (WebCORE) Collects, stores and provides access to the semantic content Ontology Indexing module Multi-ontology accessing module Ontology evaluation and selection module Content-based ontology evaluation techniques Collaborative ontology evaluation techniques Coping with knowledge incompleteness Recall and precision of keyword-based search shall be retained when ontology information is not available or incomplete Making use of rank fusion strategies to combine the results coming from our ontology-based retrieval model and the results returned by traditional keyword-based techniques Proposing a novel score normalization approach based on the behavioral patterns of the search engines (drawn from long-term observations)
36 Contributions Study and comparison of the different views and approximations to the notion of semantic search from the IR and semantic technologies fields, identifying fundamental limitations in the state of the art Definition, development and formal evaluation of a novel semantic retrieval model with deep levels of conceptualization to improve semantic retrieval in large repositories of unstructured information Steps towards semantic retrieval in the Web environment Creation of semantic retrieval evaluation benchmarks
37 Semantic resources Discussion and future work Take in larger amounts of online available semantic metadata (Watson) Further study on the trade-off between the quality and quantity of annotations Extensions of the model Personalization Contextualization Recommendation
38 Thank you!
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Mining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
HybIdx: Indexes for Processing Hybrid Graph Patterns Over Text-Rich Data Graphs Technical Report
HybIdx: Indexes for Processing Hybrid Graph Patterns Over Text-Rich Data Graphs Technical Report Günter Ladwig Thanh Tran Institute AIFB, Karlsruhe Institute of Technology, Germany {guenter.ladwig,ducthanh.tran}@kit.edu
Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.
White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
Folksonomies versus Automatic Keyword Extraction: An Empirical Study
Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
Automated News Item Categorization
Automated News Item Categorization Hrvoje Bacan, Igor S. Pandzic* Department of Telecommunications, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia {Hrvoje.Bacan,Igor.Pandzic}@fer.hr
Application of ontologies for the integration of network monitoring platforms
Application of ontologies for the integration of network monitoring platforms Jorge E. López de Vergara, Javier Aracil, Jesús Martínez, Alfredo Salvador, José Alberto Hernández Networking Research Group,
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials
ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity
A Framework for Ontology-Based Knowledge Management System
A Framework for Ontology-Based Knowledge Management System Jiangning WU Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: [email protected] Abstract Knowledge
Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object
Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Anne Monceaux 1, Joanna Guss 1 1 EADS-CCR, Centreda 1, 4 Avenue Didier Daurat 31700 Blagnac France
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY
ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY Yu. A. Zagorulko, O. I. Borovikova, S. V. Bulgakov, E. A. Sidorova 1 A.P.Ershov s Institute
M3039 MPEG 97/ January 1998
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039
isecure: Integrating Learning Resources for Information Security Research and Education The isecure team
isecure: Integrating Learning Resources for Information Security Research and Education The isecure team 1 isecure NSF-funded collaborative project (2012-2015) Faculty NJIT Vincent Oria Jim Geller Reza
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired
TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt
TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article
Text Analytics Evaluation Case Study - Amdocs
Text Analytics Evaluation Case Study - Amdocs Tom Reamy Chief Knowledge Architect KAPS Group http://www.kapsgroup.com Text Analytics World October 20 New York Agenda Introduction Text Analytics Basics
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.
Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION
Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,
A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow
A Framework-based Online Question Answering System Oliver Scheuer, Dan Shen, Dietrich Klakow Outline General Structure for Online QA System Problems in General Structure Framework-based Online QA system
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise
CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise 5 APR 2011 1 2005... Advanced Analytics Harnessing Data for the Warfighter I2E GIG Brigade Combat Team Data Silos DCGS LandWarNet
Ontology based ranking of documents using Graph Databases: a Big Data Approach
Ontology based ranking of documents using Graph Databases: a Big Data Approach A.M.Abirami Dept. of Information Technology Thiagarajar College of Engineering Madurai, Tamil Nadu, India Dr.A.Askarunisa
K@ A collaborative platform for knowledge management
White Paper K@ A collaborative platform for knowledge management Quinary SpA www.quinary.com via Pietrasanta 14 20141 Milano Italia t +39 02 3090 1500 f +39 02 3090 1501 Copyright 2004 Quinary SpA Index
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
Novel Data Extraction Language for Structured Log Analysis
Novel Data Extraction Language for Structured Log Analysis P.W.D.C. Jayathilake 99X Technology, Sri Lanka. ABSTRACT This paper presents the implementation of a new log data extraction language. Theoretical
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
A Semantic Portal for the International Affairs Sector
A Semantic Portal for the International Affairs Sector Contreras, Benjamins, Blazquez, Losada, Salle, Sevilla, Navaro, Casillas, Mompo, Paton, Corcho (isoco) www.esperonto.net MCYT, PROFIT Tena, Martos
» A Hardware & Software Overview. Eli M. Dow <[email protected]:>
» A Hardware & Software Overview Eli M. Dow Overview:» Hardware» Software» Questions 2011 IBM Corporation Early implementations of Watson ran on a single processor where it took 2 hours
Semantic EPC: Enhancing Process Modeling Using Ontologies
Institute for Information Systems IWi Institut (IWi) für at the German Research Wirtschaftsinformatik Center for im DFKI Saarbrücken Artificial Intelligence (DFKI), Saarland University Semantic EPC: Enhancing
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Big Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
Customer Intentions Analysis of Twitter Based on Semantic Patterns
Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun [email protected] Mohamed Salah Gouider [email protected] Lamjed Ben Said [email protected] ABSTRACT
Semantic Interoperability
Ivan Herman Semantic Interoperability Olle Olsson Swedish W3C Office Swedish Institute of Computer Science (SICS) Stockholm Apr 27 2011 (2) Background Stockholm Apr 27, 2011 (2) Trends: from
Text Mining and Analysis
Text Mining and Analysis Practical Methods, Examples, and Case Studies Using SAS Goutam Chakraborty, Murali Pagolu, Satish Garla From Text Mining and Analysis. Full book available for purchase here. Contents
Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu
Constructing a Generic Natural Language Interface for an XML Database Rohit Paravastu Motivation Ability to communicate with a database in natural language regarded as the ultimate goal for DB query interfaces
Knowledge Management
Knowledge Management INF5100 Autumn 2006 Outline Background Knowledge Management (KM) What is knowledge KM Processes Knowledge Management Systems and Knowledge Bases Ontologies What is an ontology Types
Industry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, [email protected] Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
Similarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
Text Analytics Software Choosing the Right Fit
Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group http://www.kapsgroup.com Text Analytics World San Francisco, 2013 Agenda Introduction Text Analytics Basics
The Prolog Interface to the Unstructured Information Management Architecture
The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, [email protected] 2 IBM
Semantic Stored Procedures Programming Environment and performance analysis
Semantic Stored Procedures Programming Environment and performance analysis Marjan Efremov 1, Vladimir Zdraveski 2, Petar Ristoski 2, Dimitar Trajanov 2 1 Open Mind Solutions Skopje, bul. Kliment Ohridski
11-792 Software Engineering EMR Project Report
11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of
Evaluation experiment for the editor of the WebODE ontology workbench
Evaluation experiment for the editor of the WebODE ontology workbench Óscar Corcho, Mariano Fernández-López, Asunción Gómez-Pérez Facultad de Informática. Universidad Politécnica de Madrid Campus de Montegancedo,
Enterprise Search Solutions Based on Target Corpus Analysis and External Knowledge Repositories
Enterprise Search Solutions Based on Target Corpus Analysis and External Knowledge Repositories Thesis submitted in partial fulfillment of the requirements for the degree of M.S. by Research in Computer
ELPUB Digital Library v2.0. Application of semantic web technologies
ELPUB Digital Library v2.0 Application of semantic web technologies Anand BHATT a, and Bob MARTENS b a ABA-NET/Architexturez Imprints, New Delhi, India b Vienna University of Technology, Vienna, Austria
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
Information Retrieval Elasticsearch
Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches
Task Description for English Slot Filling at TAC- KBP 2014
Task Description for English Slot Filling at TAC- KBP 2014 Version 1.1 of May 6 th, 2014 1. Changes 1.0 Initial release 1.1 Changed output format: added provenance for filler values (Column 6 in Table
Learn to Personalized Image Search from the Photo Sharing Websites
Learn to Personalized Image Search from the Photo Sharing Websites ABSTRACT: Increasingly developed social sharing websites, like Flickr and Youtube, allow users to create, share, annotate and comment
Weblogs Content Classification Tools: performance evaluation
Weblogs Content Classification Tools: performance evaluation Jesús Tramullas a,, and Piedad Garrido a a Universidad de Zaragoza, Dept. of Library and Information Science. Pedro Cerbuna 12, 50009 [email protected]
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
How To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
Web 2.0-based SaaS for Community Resource Sharing
Web 2.0-based SaaS for Community Resource Sharing Corresponding Author Department of Computer Science and Information Engineering, National Formosa University, [email protected] doi : 10.4156/jdcta.vol5.issue5.14
Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari [email protected]
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari [email protected] Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Gallito 2.0: a Natural Language Processing tool to support Research on Discourse
Presented in the Twenty-third Annual Meeting of the Society for Text and Discourse, Valencia from 16 to 18, July 2013 Gallito 2.0: a Natural Language Processing tool to support Research on Discourse Guillermo
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
IT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Improving EHR Semantic Interoperability Future Vision and Challenges
Improving EHR Semantic Interoperability Future Vision and Challenges Catalina MARTÍNEZ-COSTA a,1 Dipak KALRA b, Stefan SCHULZ a a IMI,Medical University of Graz, Austria b CHIME, University College London,
Sentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
Putting IBM Watson to Work In Healthcare
Martin S. Kohn, MD, MS, FACEP, FACPE Chief Medical Scientist, Care Delivery Systems IBM Research [email protected] Putting IBM Watson to Work In Healthcare 2 SB 1275 Medical data in an electronic or
Information Technology for KM
On the Relations between Structural Case-Based Reasoning and Ontology-based Knowledge Management Ralph Bergmann & Martin Schaaf University of Hildesheim Data- and Knowledge Management Group www.dwm.uni-hildesheim.de
Exam in course TDT4215 Web Intelligence - Solutions and guidelines -
English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed
A Comparative Approach to Search Engine Ranking Strategies
26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab
Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
Semantic Information on Electronic Medical Records (EMRs) through Ontologies
Semantic Information on Electronic Medical Records (EMRs) through Ontologies Suarez Barón M. J. Researcher, Research Center at Colombian School of Industrial Careers [email protected] Bogotá,
