Question Answering. Chin-Yew Lin Senior Researcher Knowledge Mining Group Microsoft Research Asia

Size: px
Start display at page:

Download "Question Answering. Chin-Yew Lin Senior Researcher Knowledge Mining Group Microsoft Research Asia"

Transcription

1 Question Answering Chin-Yew Lin Senior Researcher Knowledge Mining Group Microsoft Research Asia

2 Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

3 Taylor s 4 Levels of Information Need Visceral ( 本 能 的 ) Actual but unexpressed Conscious ( 自 觉 的 ) Formalised ( 形 式 化 的 ) Formulated description of need within brain Expressed statement of need Compromised ( 折 衷 的 ) Question as presented to the information system Compromised ( 折 衷 的 ) need Support question refinement, reformulation Compromised ( 折 衷 的 ) need * Taylor, R. S. (1968). Question-negotiation and information seeking in libraries. College & Research Libraries, 29(3), * Adapted from Tetsuya Sakai s presentation on a related topic.(

4 What is Question Answering? A natural way to Request information that we do not know Check information that we are not sure Interact with people around us Fulfill our visceral need A research field Develop automatic systems to answer questions Involve IR, NLP, and AI A better way to find what we want than search?

5

6 Critics Powerset has a lot of skeptics (BW 09/17/07): Search expert Danny Sullivan, editor-in-chief of the online news site Search Engine Land, noted that no claims of the superiority of natural-language search have ever held up. And he disputed the idea that most people would rather ask questions than simply type in a few words, noting Google didn't train people to query that way but simply responded to the way users were already conducting searches. "Linguistics will not solve most search problems," adds Apostolos Gerasoulis, executive vice-president of search technology at Ask.com, the search engine unit of IAC/InterActiveCorp (IACI).

7 Open-Domain Question Answering (ODQA) Question Answering Domain specific Domain independent Structured data Free text Web Fixed set of collections Single document

8 Sample Questions 9: How far is Yaroslavi from Moscow? 15: When was London s Docklands Light Railway constructed? 22: When did the Jurassic Period end? 29: What is the brightest star visible from Earth? 30: What are the Valdez Principles? 73: Where is the Taj Mahal? - from TRECs 8 and 9

9 Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

10 Apple s Knowledge Navigator 1987

11 Things to Think about What type of technologies are necessary to build an Apple Knowledge Navigator? Is QA part of the puzzle? Do you think that we already have all the pieces together? If you am to design a modern Apple Knowledge Navigator, what would it be like? Do you want one?

12 What Have We Learned? NLP & Summarization Lexical analysis ( 词 法 分 析 ) Segmentation ( 分 词 ) Tokenization ( 符 号 化 ) POS tagging ( 词 性 标 注 ) Syntactic analysis ( 句 法 分 析 ) Parsing Semantic analysis ( 语 义 分 析 ) WSD ( 词 义 消 歧 ) Ref resolution ( 指 代 消 解 ) Discourse analysis ( 话 语 分 析 ) Challenges Ambiguity ( 歧 异 性 ) Variations ( 多 样 性 ) Search Engine Overview Search engine architecture Crawler Web page parser Index builder Inverse index Signature file Suffix tree Web graph builder Link analysis PageRank, HITS Query analysis Indexing & ranking Relevance(Q, D) IR models Top-K query & index pruning Caching 80% of queries are cached User interface

13 Term Based Apps Web Search News Search MM Search Serve Index Acquisition Data Web text

14 Term Based Apps Serve Index Acquisition Data Web Search Unstructured data News Search Semi-structured data MM Search Structured data

15 Term Based Entity Based Apps Serve Index Acquisition Data Web search Unstructured data News search MM search Semi-structured data Structured Data Ingester Structured data

16 Apps Serve Index Acquisition Data Web search Unstructured data Term Based News search MM search QA Semi-structured data Structured Data Ingester Entity Based Entitybased Search... Semi- Structured Data Ingester KGraph Unstructured Data Ingester Structured data

17 Apps Serve Index Acquisition Data Web search Unstructured data Term Based News search MM search QA Semi-structured data Structured Data Ingester Entity Based Entitybased Search... Semi- Structured Data Ingester KGraph Unstructured Data Ingester... Structured data

18 Architecture of a Typical Search Engine Query User Interface Online Part Caching Indexing and Ranking Inverted Index Index Builder Page Ranks Link Analysis Cached Pages Page & Site Statistics Web Page Parser Pages Crawler Links & Anchors Link Map Web Graph Builder Offline Part Web Graph Web * Ji-Rong Wen Search Engine Overview 18

19 Architecture of a Typical QA Engine Question User Interface Online Part Question Analysis Caching Indexing and Ranking Answer Reranking Inverted Index Index Builder Page Ranks Link Analysis Cached Pages Page & Site Statistics Annotation Web Page Parser Pages Web Crawler Links & Anchors Link Map KB Web Graph Builder Ontology Offline Part Web Graph 19 * Adapted from Ji-Rong Wen Search Engine Overview

20 Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

21 QA Terminology Material in this section is based on: John Prager, Open-Domain Question-Answering, Foundations and Trends in Information Retrieval 1:2 (2006)

22 Terminology: Question Phrase The part of question says what is being sought Wh-words who, what, which, when, where, why, and how Wh-words + nouns, adjectives or adverbs what company, which president, how long, how fast.

23 Terminology: Question Type Question category for distinguishing different processing strategies FACTOID: How far is it from Earth to Mars? LIST: List the name of chewing gums. DEFINITION: Who is Vlad the Impaler? RELATIONSHIP: What is the connection between Valentina Tereshkova and Sally Ride? SUPERLATIVE: What is the largest city on Earth? YES-NO: Is Osama bin Laden alive? OPINION: What do most Americans think of gun control? CAUSE & EFFECT: Why did Iraq invade Kuwait?

24 Terminology: Answer Type The class of object sought by the question Person (from Who ) Place (from Where ) Date (from When ) Number (from How many ) Explanation (from Why ) Method (from How ) See UIUC question classification for more: See USC/ISI Question Answer Typology for more: html Hermjakob, U., Parsing and Question Classification for Question Answering, Workshop on Open-domain QA in ACL-2001, (

25 Terminology: Question Focus & Topic Question focus is the property or entity that is being sought by the question McCarren Airport is located in what city? What is the population of Japan? What color is yak milk? Give hint on answer type Question topic is the object or event that the question is generally about What is the height of Mt. Everest? Where on the body is a mortarboard worn? Answer passages most likely would contain question topic

26 Terminology: Candidate Passage & Answer Candidate passage is a text passage retrieved by a search engine given a question Candidate answer is a small piece of text ranked according to its likelihood of being an answer to a question 50 Queen Elizabeth II September 8, 2003 by baking a mixture of flour and water

27 Terminology: Authority List A collection of instances of an answer type of interest used to test a term for class membership Days of week (Sun, Mon, Tue, ) Planets Elements States/Provinces/Counties/Countries Animals Plants Colors People Organizations Existing databases or lists Movies (IMDB) Books (Amazon) Freebase Sempute NeedleSeek (

28 Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

29 Inside a QA System Material in this section is based on: John Prager, Open-Domain Question-Answering, Foundations and Trends in Information Retrieval 1:2 (2006)

30 A Simplified View of a QA System Question Question Analysis Keyword Query Search Web or Corpus Answer Type Documents or Passages Answer Extraction Search Engine Answer(S) * John Prager, Open-Domain Question-Answering, Foundations and Trends in Information Retrieval 1:2 (2006)

31 Search Engine Review Prof. Xiaojun Wan s lecture Build your own search engine The easy way Lucence Indri Ask TA for Chinese-specific SE

32 Named Entity Tagger Read Daniel Bikel s paper Build your own named entity tagger Daniel M. Bikel, Richard Schwartz and Ralph M. Weischedel An Algorithm that Learns What s in a Name in the Machine Learning Journal Special Issue on Natural Language Learning The easy way Stanford Named Entity Recognizer (NER) Apache opennlp Ask TA for Chinese-specific SE

33 Answer Type Inventory Decide a set of answer types that cover majority of questions in a chosen domain Hint: using a query log, check Dr. Wei WU s lecture Factors to consider Question analysis predict answer types Mountain: What mountain/peak Organization: What organization/company/group/agency NER must recognize instances of answer types Resources WordNet ( Wikipedia, UIUC question classes, USC/ISI question answer typology, MSRA Sempute NeedleSeek, Freebase, Yago and so on. Ask TA for Chinese-specific SE

34 Question Classification Part of question analysis to determine expected answer type Approaches Manually Person: Who/whom/whose ; The name of the person who Distance: How far/wide/broad/narrow/tall/high ; What is the distance/height/breadth of Machine learning See UIUC question classifier ( Li & Roth, Learning question classifiers: The role of semantic information, Journal of Natural Language Engineering, vol. 12, no. 3, pp * How to deal with unknown answer types? Ask TA for Chinese-specific SE

35 Query Generation Part of question analysis to generate a query for the search engine to retrieve candidate passages Goal: retrieve all documents containing candidate answers and none others Factors to consider Drop counter productive words What organization => drop what and organization Keep critical words Who is the CEO of Microsoft? => keep CEO and Microsoft (but drop who ) Expand critical words 北 大 在 那 里? => expand 北 大 to 北 京 大 学 Iterate and use feedback from previous retrieval Ask TA for Chinese-specific SE

36 Answer Extraction Heuristic (*) Approximate matching btw question/query and the candidate passages; using various heuristic features to compute scores (bag-of-words approach) Radev et al., Ranking suspected answers to natural language questions using predictive annotation, in ANLP2000. Pattern-based (*) When did Mozart die? => Mozart expired in 1791 When did X die? => X expired in <Date> When did Beethoven die? => Beethoven expired in <Date> ; <Date> = 1784 Ravichandran & Hovy, Learning surface text patterns for a question answering system, in ACL2002. Relationship-based Take advantage of relationship among words Who wrote the Declaration of Independence? => [X.write], [write, Declaration of Independence ] Jefferson wrote the Declaration of Independence. => [Jefferson.write], [write, Declaration of Independence ] Cui et al., Unsupervised learning of soft patterns for generating definitions from online news, in WWW2004. Logic-based Convert question to a goal and apply theorem-proving to prove the goal is true or not. Moldovan and Rus, Logic form transformation of WordNet and its applicability to question answering, in ACL2001. Ask TA for Chinese-specific SE

37 A Simplified View of a QA System Question Question Analysis Keyword Query Search Web or Corpus Answer Type Documents or Passages Answer Extraction Search Engine Answer(S) * John Prager, Open-Domain Question-Answering, Foundations and Trends in Information Retrieval 1:2 (2006)

38 Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

39 Evaluation * Part of this slides are included with the permission of Tetsuya Sakai

40 QA: Community-wide Efforts TREC QA Track ( ) CLEF MLQA, ResPubliQA Track ( ) NTCIR QAC, CLQA, ACLIA ( ) Conferences ACL, IJCNLP, NAACL, EACL, COLING, SIGIR, WSDM, WWW, CLEF, LREC, AAAI, IJCAI

41 Information Access Evaluation Workshops/Forums (USA 1992-) Collaboration (constructing shared data) and Competition (what approaches perform best?) Cross-Language Evaluation Forum (Europe 2000-) NII Test Collection for Information Retrieval systems (Asia 1999-) Question answering Cross-language retrieval Patent processing Opinion analysis : ACLIA= Advanced Cross-Lingual Information Access

42 Information Retrieval for Question Answering (IR4QA) ACLIA=Advanced Cross-lingual Information Access (Japanese, Simplified/Traditional Chinese, English) Question answers Question Question classification Question type Document retrieval Ranked list of documents Answer extraction IR4QA=document retrieval task in the context of QA

43 Constructing Test Collections via Pooling Runs submitted by Participating teams Ranked list Topic Topic (search Topic request) (search request) (search request) topics Target Documents (several million) Ranked list : Ranked list pool pool pool Several hundred documents/topic Manual relevance Assessment (creating right answers) Highly Relevant, Relevant, : nonrelevant

44 PARTICIPANTS Relevance assessment is costly and time-consuming TIME Develop algorithms, systems, Tune them Submit runs IDLE TIME = NO EXPERIMENTS = NO PROGRESS (e.g. 4 months) Start Working Again! Release topics Pooling, RELEVANCE ASSESSMENT, Ranking runs, double-checking Release Evaluation results ORGANIZERS

45 NTCIR-8 ACLIA = IR4QA + CCLQA ACLIA: Advanced Cross-lingual Information Access *

46 ACLIA Tasks English to Japanese CLQA (with J to J as a subtask) English to Chinese CLQA (CS or CT, with C to C as a subtask) English to Japanese CLIR (embedded in E-J CLQA) English to Chinese CLIR (embedded in E-C CLQA)

47 Question & Answering Roadmap 2001 * Burger et al. Issues, Tasks and Program Structures to Roadmap Research in Question & Answering (Q&A), 2001

48 Open Advancement of QA 2008 * Challenge Set Profile * Ferrucci et al. Towards the Open Advancement of Question Answering Systems

49 Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

50 IBM Watson

51 Falcon QA Architecture Traditional IR Module TREC8 TREC9 A QA system gives direct answers to a question instead of documents Falcon QA system (LCC) Moldovan et al. ACL 2000 Surdeanu et al. IEEE Trans. PDS 2002 Best QA system in TREC 8 & 9 Average question answering time TREC 8: 48 seconds TREC 9: 94 seconds QP 1.1% 1.2% PR (21.3 sec) 44.4% (24.9 sec) 26.5% PS 5.4% 2.2% PO 0.1% 0.1% AP (23.4 sec) 48.7% (65.5 sec) 69.7% Falcon QA system module analysis: processing time

52 IBM Watson Hardware Cluster of 90 IBM Power 750 servers + IO + network in 10 racks 2, GHz POWER7 processor cores (8 cores per processor) 16T of RAM Cost about USD$ 3 million Content are stored in RAM Software Jave and C++, Apache Hadoop, Apache UIMA, IBM DeepQA software, SUSE Linux Enterprise Server 11 More than 100 different techniques are used Data Encyclopedias, dictionaries, thesauri, newswire articles, and literary works Database, taxonomies, and ontologies (DBPedia, WordNet, and Yago) 200 millions pages of structured and unstructured content on 4T disk People Led by Dr. David Ferrucci with his 46 people research, PM, annotation, system, strategy team Reference Ferrucci, D, et al. (2010), "Building Watson: An Overview of the DeepQA Project", AI Magazine (AI Magazine.) 31 (3), IBM Journal of Research and Development: This is Watson ;

53 Behind IBM Watson

54 IBM Watson Videos Final Jeopardy and the Future of Watson The Science behind an Answer detailpage&v=dywo4zksfxw

55 IBM Watson Videos Final Jeopardy and the Future of Watson The Science behind an Answer detailpage&v=dywo4zksfxw

56 IBM Watson Videos Final Jeopardy and the Future of Watson The Science behind an Answer detailpage&v=dywo4zksfxw

57 IBM Watson Videos Final Jeopardy and the Future of Watson The Science behind an Answer detailpage&v=dywo4zksfxw

58 IBM Watson Videos Final Jeopardy and the Future of Watson The Science behind an Answer detailpage&v=dywo4zksfxw

59 IBM Watson Videos Final Jeopardy and the Future of Watson The Science behind an Answer detailpage&v=dywo4zksfxw

60 IBM Watson Videos Final Jeopardy and the Future of Watson The Science behind an Answer detailpage&v=dywo4zksfxw

61 Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

62 Question Answering The Easy Way? Community QA

63 Search vs. Question Answering (QA) User intention Understanding what users want is difficult!

64 Scalable Question Answering & Distillation Goal Create a web-scale QA repository and service Key idea Leverage existing knowledge in the QA forms Methods Extract and aggregate QA pairs in web-scale Learn user intents through analysis of QA repository Serve enriched answers instead of 10 blue links

65 Yahoo! Crawl Status 03/04/2009 SQuAD Crawled Y! Answers Remaining 9,000,000 8,000,000 7,000,000 6,000,000 5,000,000 4,000,000 3,000,000 2,000,000 1,000,000 0 Total: 55,554,314; Crawled: 43,852,589; Crawled/Total: 78.94%

66 Community Question and Answering

67 Community QnA in Details Topic Context 1 Context 2

68 Online Discussion Forum topic

69 FAQ Context dependent About 28,424,184 results on Live Search using query: FAQ travel (Google: about 64,200,000)

70 Challenges Question Mining Answer Summarization Question Answering Question Generation Question Utility Question Search & Recommendation

71 List of Related Papers Using Graded-Relevance Metrics for Evaluating Community QA Answer Selection Sakai et al.; WSDM 2011 Comparable Entity Mining from Comparative Questions Li et al.; ACL 2010 Learning to Recommend Questions Based on User Rating Sun et al.; CIKM 2009 A Structural Support Vector Method for Extracting Contexts and Answers of Questions from Online Forums Yang et al.; EMNLP 2009 Recommending Questions Using the MDL-based Tree Cut Model Cao et al.; WWW 2008 Searching Questions by Identifying Question Topic and Question Focus Duan et al.; ACL 2008 Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums Ding el al.; ACL 2008 Finding Question Answer Pairs from Online Forums Cong et al.; SIGIR 2008 Question Utility: A Novel Static Ranking of Question Search Song et al.; AAAI 2008 Answer Summarization: Understanding and Summarizing Answers in Community- Based Question Answering Services Liu et al; COLING 2008 Automatic Question Generation from Queries Lin; NSF Workshop on Question Generation Shared Task and Evaluation Challenge 2008

72 Question Mining & Answering (ACL 2008 & SIGIR 2008) Extract question and answer pairs Community QnA Create a resolved question list Extract & index question, best answer, and other answers Live Qna, Yahoo! Answers, Baidu Zhidao, Forum Extract and index threads and postings, find questions and their answers

73 QA Pairs in Online Forums

74 Question Search & Recommendation (ACL 2008 & WWW 2008) Query We would like to know what will be available to see in the Forbidden City because we understand that it will be under repairs. Question search Is it true that the Forbidden City is undergoing renovation & we won't be allow to enter? Question recommendation Would you get a lower price by not needing a guide for the Forbidden City and etc? Can anybody recommend a budget hotel near Forbidden City? Question = Topic + Focus + Others (TFO) Search: same topic similar foci Recommend: same topic different foci

75 Identifying Topic and Focus Answers Answers Asia Pacific Asia Pacific China Japan Europe Europe China Japan China 1. Anyone know where to see the Dragon Boat Festival in Beijing? 2. Where is a good (Less expensive) place to shop in Beijing? 3. What's the cheapest way to get from Beijing to Hong Kong? Europe 1. How far is it from Berlin to Hamburg? 2. What is the cheapest way from Berlin to Hamburg? 3. Where to see between Hamburg and Berlin? 4. How long does it take from Hamburg to Berlin? Specificity: the inverse of the entropy of the topic term s distribution over the sub-categories Order topic terms by their specificity

76 Motivation Question Utility (AAAI 2008) How useful is a question? How should we rank questions without queries? Definition How likely a question would be asked again? ( ' ) argmax Q p( Q Q') argmax Q p( Q) p( Qp ' ( Qw ) Q) p( Q') The prior probability of question Q reflecting a static rank of the question i.e. Question Utility w Q ' The probability generating query Q from question Q (Relevance score)

77 Answer Summarization (COLING 2008) Example: Where to stay in Paris? 2,645 answers (Yahoo! Answers 03/04/09) Is the best answer the best answer? Question clustering Find similar questions Answer summarization Aggregate answers for a question cluster Answer Taxonomy Question Taxonomy

78 Mixed Mode Question Answering Knowledge Distillation & Dissemination Knowledge Distillation and Dissemination Mixed Mode Scalable Question Answering and Distillation FAQ Highly Structured QnA QnA Structured QnA Forum Semi-structured QnA Web Unstructured QnA

79 Q&A = Knowledge = Power Q&A is complement to web keyword search Q&A can enhance existing QnA and search services Leverage existing knowledge in the question and answer forms and their authors Acquire or elicit human knowledge automatically

80 Discussion

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:>

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:> » A Hardware & Software Overview Eli M. Dow Overview:» Hardware» Software» Questions 2011 IBM Corporation Early implementations of Watson ran on a single processor where it took 2 hours

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Searching Questions by Identifying Question Topic and Question Focus

Searching Questions by Identifying Question Topic and Question Focus Searching Questions by Identifying Question Topic and Question Focus Huizhong Duan 1, Yunbo Cao 1,2, Chin-Yew Lin 2 and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China, 200240 {summer, yyu}@apex.sjtu.edu.cn

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

Question Answering and Multilingual CLEF 2008

Question Answering and Multilingual CLEF 2008 Dublin City University at QA@CLEF 2008 Sisay Fissaha Adafre Josef van Genabith National Center for Language Technology School of Computing, DCU IBM CAS Dublin sadafre,josef@computing.dcu.ie Abstract We

More information

Interoperability, Standards and Open Advancement

Interoperability, Standards and Open Advancement Interoperability, Standards and Open Eric Nyberg 1 Open Shared resources & annotation schemas Shared component APIs Shared datasets (corpora, test sets) Shared software (open source) Shared configurations

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

CiteSeer x in the Cloud

CiteSeer x in the Cloud Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

More information

The Prolog Interface to the Unstructured Information Management Architecture

The Prolog Interface to the Unstructured Information Management Architecture The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, pfodor@cs.sunysb.edu 2 IBM

More information

SEAIP 2009 Presentation

SEAIP 2009 Presentation SEAIP 2009 Presentation By David Tan Chair of Yahoo! Hadoop SIG, 2008-2009,Singapore EXCO Member of SGF SIG Imperial College (UK), Institute of Fluid Science (Japan) & Chicago BOOTH GSB (USA) Alumni Email:

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Discovering and Querying Hybrid Linked Data

Discovering and Querying Hybrid Linked Data Discovering and Querying Hybrid Linked Data Zareen Syed 1, Tim Finin 1, Muhammad Rahman 1, James Kukla 2, Jeehye Yun 2 1 University of Maryland Baltimore County 1000 Hilltop Circle, MD, USA 21250 zsyed@umbc.edu,

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Intinno: A Web Integrated Digital Library and Learning Content Management System

Intinno: A Web Integrated Digital Library and Learning Content Management System Intinno: A Web Integrated Digital Library and Learning Content Management System Synopsis of the Thesis to be submitted in Partial Fulfillment of the Requirements for the Award of the Degree of Master

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

Toward a Question Answering Roadmap

Toward a Question Answering Roadmap Toward a Question Answering Roadmap Mark T. Maybury 1 The MITRE Corporation 202 Burlington Road Bedford, MA 01730 maybury@mitre.org Abstract Growth in government investment, academic research, and commercial

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

How To Make Sense Of Data With Altilia

How To Make Sense Of Data With Altilia HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines , 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing

More information

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,

More information

Ranked Keyword Search in Cloud Computing: An Innovative Approach

Ranked Keyword Search in Cloud Computing: An Innovative Approach International Journal of Computational Engineering Research Vol, 03 Issue, 6 Ranked Keyword Search in Cloud Computing: An Innovative Approach 1, Vimmi Makkar 2, Sandeep Dalal 1, (M.Tech) 2,(Assistant professor)

More information

Date: May 6 (Wednesday), 2015, 14:00 ~ 18:00 Venue: Room No. 201, Engineering Building 2, Yonsei University, Seoul, Korea

Date: May 6 (Wednesday), 2015, 14:00 ~ 18:00 Venue: Room No. 201, Engineering Building 2, Yonsei University, Seoul, Korea Microsoft Research Yonsei University Joint Workshop Date: May 6 (Wednesday), 2015, 14:00 ~ 18:00 Venue: Room No. 201, Engineering Building 2, Yonsei University, Seoul, Korea PROGRAM Time 14:00 ~ 14:10

More information

Improving Question Retrieval in Community Question Answering Using World Knowledge

Improving Question Retrieval in Community Question Answering Using World Knowledge Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang

More information

Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering

Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Guangyou Zhou, Kang Liu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese

More information

A MULTILINGUAL AND LOCATION EVALUATION OF SEARCH ENGINES FOR WEBSITES AND SEARCHED FOR KEYWORDS

A MULTILINGUAL AND LOCATION EVALUATION OF SEARCH ENGINES FOR WEBSITES AND SEARCHED FOR KEYWORDS A MULTILINGUAL AND LOCATION EVALUATION OF SEARCH ENGINES FOR WEBSITES AND SEARCHED FOR KEYWORDS Anas AlSobh Ahmed Al Oroud Mohammed N. Al-Kabi Izzat AlSmadi Yarmouk University Jordan ABSTRACT Search engines

More information

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

More information

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Survey Results: Requirements and Use Cases for Linguistic Linked Data Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group

More information

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

Interactive Chinese Question Answering System in Medicine Diagnosis

Interactive Chinese Question Answering System in Medicine Diagnosis Interactive Chinese ing System in Medicine Diagnosis Xipeng Qiu School of Computer Science Fudan University xpqiu@fudan.edu.cn Jiatuo Xu Shanghai University of Traditional Chinese Medicine xjt@fudan.edu.cn

More information

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow A Framework-based Online Question Answering System Oliver Scheuer, Dan Shen, Dietrich Klakow Outline General Structure for Online QA System Problems in General Structure Framework-based Online QA system

More information

THUTR: A Translation Retrieval System

THUTR: A Translation Retrieval System THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for

More information

Anotaciones semánticas: unidades de busqueda del futuro?

Anotaciones semánticas: unidades de busqueda del futuro? Anotaciones semánticas: unidades de busqueda del futuro? Hugo Zaragoza, Yahoo! Research, Barcelona Jornadas MAVIR Madrid, Nov.07 Document Understanding Cartoon our work! Complexity of Document Understanding

More information

Shallow Parsing with Apache UIMA

Shallow Parsing with Apache UIMA Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded

More information

TREC 2003 Question Answering Track at CAS-ICT

TREC 2003 Question Answering Track at CAS-ICT TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China changyi@software.ict.ac.cn http://www.ict.ac.cn/

More information

Andre Standback. IT 103, Sec. 001 2/21/12. IBM s Watson. GMU Honor Code on http://academicintegrity.gmu.edu/honorcode/. I am fully aware of the

Andre Standback. IT 103, Sec. 001 2/21/12. IBM s Watson. GMU Honor Code on http://academicintegrity.gmu.edu/honorcode/. I am fully aware of the Andre Standback IT 103, Sec. 001 2/21/12 IBM s Watson "By placing this statement on my webpage, I certify that I have read and understand the GMU Honor Code on http://academicintegrity.gmu.edu/honorcode/.

More information

Fast Data in the Era of Big Data: Twitter s Real-

Fast Data in the Era of Big Data: Twitter s Real- Fast Data in the Era of Big Data: Twitter s Real- Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Presented by: Rania Ibrahim 1 AGENDA Motivation

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

More information

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), jagatheshwaran.n@gmail.com, Velalar College of Engineering and Technology,

More information

Analysis of Web Archives. Vinay Goel Senior Data Engineer

Analysis of Web Archives. Vinay Goel Senior Data Engineer Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

Incorporating Participant Reputation in Community-driven Question Answering Systems

Incorporating Participant Reputation in Community-driven Question Answering Systems Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem,

More information

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure

Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure Fuminori Kimura Faculty of Culture and Information Science, Doshisha University 1 3 Miyakodani Tatara, Kyoutanabe-shi,

More information

SINAI at WEPS-3: Online Reputation Management

SINAI at WEPS-3: Online Reputation Management SINAI at WEPS-3: Online Reputation Management M.A. García-Cumbreras, M. García-Vega F. Martínez-Santiago and J.M. Peréa-Ortega University of Jaén. Departamento de Informática Grupo Sistemas Inteligentes

More information

Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute

More information

How To Write A Summary Of A Review

How To Write A Summary Of A Review PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

More information

Text Mining and Analysis

Text Mining and Analysis Text Mining and Analysis Practical Methods, Examples, and Case Studies Using SAS Goutam Chakraborty, Murali Pagolu, Satish Garla From Text Mining and Analysis. Full book available for purchase here. Contents

More information

The University of Lisbon at CLEF 2006 Ad-Hoc Task

The University of Lisbon at CLEF 2006 Ad-Hoc Task The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports

More information

An Overview of Computational Advertising

An Overview of Computational Advertising An Overview of Computational Advertising Evgeniy Gabrilovich in collaboration with many colleagues throughout the company 1 What is Computational Advertising? New scientific sub-discipline that provides

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 Automatic Knowledge Base Construction Systems Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 1 Text Contains Knowledge 2 Text Contains Automatically Extractable Knowledge 3

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

An Integrated Knowledge-based and Machine Learning Approach for Chinese Question Classification

An Integrated Knowledge-based and Machine Learning Approach for Chinese Question Classification An Integrated Knowledge-based and Machine Learning Approach for Chinese Question Classification Min-Yuh DAY 1,2, Cheng-Wei LEE 1, Shih-Hung WU 3, Chorng-Shyong ONG 2, Wen-Lian HSU 1 1 Institute of Information

More information

Subordinating to the Majority: Factoid Question Answering over CQA Sites

Subordinating to the Majority: Factoid Question Answering over CQA Sites Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

More information

Putting IBM Watson to Work In Healthcare

Putting IBM Watson to Work In Healthcare Martin S. Kohn, MD, MS, FACEP, FACPE Chief Medical Scientist, Care Delivery Systems IBM Research marty.kohn@us.ibm.com Putting IBM Watson to Work In Healthcare 2 SB 1275 Medical data in an electronic or

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

More information

Mining Opinion Features in Customer Reviews

Mining Opinion Features in Customer Reviews Mining Opinion Features in Customer Reviews Minqing Hu and Bing Liu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu

More information

Development of Framework System for Managing the Big Data from Scientific and Technological Text Archives

Development of Framework System for Managing the Big Data from Scientific and Technological Text Archives Development of Framework System for Managing the Big Data from Scientific and Technological Text Archives Mi-Nyeong Hwang 1, Myunggwon Hwang 1, Ha-Neul Yeom 1,4, Kwang-Young Kim 2, Su-Mi Shin 3, Taehong

More information

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...

More information

A Survey on Product Aspect Ranking Techniques

A Survey on Product Aspect Ranking Techniques A Survey on Product Aspect Ranking Techniques Ancy. J. S, Nisha. J.R P.G. Scholar, Dept. of C.S.E., Marian Engineering College, Kerala University, Trivandrum, India. Asst. Professor, Dept. of C.S.E., Marian

More information

Search Engine Based Intelligent Help Desk System: iassist

Search Engine Based Intelligent Help Desk System: iassist Search Engine Based Intelligent Help Desk System: iassist Sahil K. Shah, Prof. Sheetal A. Takale Information Technology Department VPCOE, Baramati, Maharashtra, India sahilshahwnr@gmail.com, sheetaltakale@gmail.com

More information

Cross-Lingual Concern Analysis from Multilingual Weblog Articles

Cross-Lingual Concern Analysis from Multilingual Weblog Articles Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/

More information

A Comparative Study on Sentiment Classification and Ranking on Product Reviews

A Comparative Study on Sentiment Classification and Ranking on Product Reviews A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan

More information

Computational Advertising Andrei Broder Yahoo! Research. SCECR, May 30, 2009

Computational Advertising Andrei Broder Yahoo! Research. SCECR, May 30, 2009 Computational Advertising Andrei Broder Yahoo! Research SCECR, May 30, 2009 Disclaimers This talk presents the opinions of the author. It does not necessarily reflect the views of Yahoo! Inc or any other

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Financial Trading System using Combination of Textual and Numerical Data

Financial Trading System using Combination of Textual and Numerical Data Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,

More information

TREC 2007 ciqa Task: University of Maryland

TREC 2007 ciqa Task: University of Maryland TREC 2007 ciqa Task: University of Maryland Nitin Madnani, Jimmy Lin, and Bonnie Dorr University of Maryland College Park, Maryland, USA nmadnani,jimmylin,bonnie@umiacs.umd.edu 1 The ciqa Task Information

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation

Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation Denis Turdakov, Pavel Velikhov ISP RAS turdakov@ispras.ru, pvelikhov@yahoo.com

More information

On the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services

On the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services 21st International Congress on Modelling and Simulation, Gold Coast, Australia, 29 Nov to 4 Dec 2015 www.mssanz.org.au/modsim2015 On the Feasibility of Answer Suggestion for Advice-seeking Community Questions

More information

A QoS-Aware Web Service Selection Based on Clustering

A QoS-Aware Web Service Selection Based on Clustering International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

SEO Techniques for various Applications - A Comparative Analyses and Evaluation

SEO Techniques for various Applications - A Comparative Analyses and Evaluation IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 20-24 www.iosrjournals.org SEO Techniques for various Applications - A Comparative Analyses and Evaluation Sandhya

More information

An Ontology Framework based on Web Usage Mining

An Ontology Framework based on Web Usage Mining An Ontology Framework based on Web Usage Mining Ahmed Sultan Al-Hegami Sana'a University Yemen Sana'a Mohammed Salem Kaity Al-andalus University Yemen Sana'a ABSTRACT Finding relevant information on the

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

A Comparative Approach to Search Engine Ranking Strategies

A Comparative Approach to Search Engine Ranking Strategies 26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab

More information

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1

More information

Web Database Integration

Web Database Integration Web Database Integration Wei Liu School of Information Renmin University of China Beijing, 100872, China gue2@ruc.edu.cn Xiaofeng Meng School of Information Renmin University of China Beijing, 100872,

More information

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告 SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:

More information

Comparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering

Comparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering Comparing and : A Case Study of Digital Reference and Community Based Answering Dan Wu 1 and Daqing He 1 School of Information Management, Wuhan University School of Information Sciences, University of

More information

Frontera: open source, large scale web crawling framework. Alexander Sibiryakov, October 1, 2015 sibiryakov@scrapinghub.com

Frontera: open source, large scale web crawling framework. Alexander Sibiryakov, October 1, 2015 sibiryakov@scrapinghub.com Frontera: open source, large scale web crawling framework Alexander Sibiryakov, October 1, 2015 sibiryakov@scrapinghub.com Sziasztok résztvevők! Born in Yekaterinburg, RU 5 years at Yandex, search quality

More information

A Framework of User-Driven Data Analytics in the Cloud for Course Management

A Framework of User-Driven Data Analytics in the Cloud for Course Management A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer

More information

IBM Watson Ecosystem. Getting Started Guide

IBM Watson Ecosystem. Getting Started Guide IBM Watson Ecosystem Getting Started Guide Version 1.1 July 2014 1 Table of Contents: I. Prefix Overview II. Getting Started A. Prerequisite Learning III. Watson Experience Manager A. Assign User Roles

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

Web Information Mining and Decision Support Platform for the Modern Service Industry

Web Information Mining and Decision Support Platform for the Modern Service Industry Web Information Mining and Decision Support Platform for the Modern Service Industry Binyang Li 1,2, Lanjun Zhou 2,3, Zhongyu Wei 2,3, Kam-fai Wong 2,3,4, Ruifeng Xu 5, Yunqing Xia 6 1 Dept. of Information

More information