Question Answering. Chin-Yew Lin Senior Researcher Knowledge Mining Group Microsoft Research Asia

Similar documents

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:>

Search and Information Retrieval

Searching Questions by Identifying Question Topic and Question Focus

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System

CiteSeer x in the Cloud

The Prolog Interface to the Unstructured Information Management Architecture

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Search Result Optimization using Annotators

Discovering and Querying Hybrid Linked Data

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval

Natural Language to Relational Query by Using Parsing Compiler

Intinno: A Web Integrated Digital Library and Learning Content Management System

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Interactive Dynamic Information Extraction

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

How To Make Sense Of Data With Altilia

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

Improving Question Retrieval in Community Question Answering Using World Knowledge

Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu

Blog Post Extraction Using Title Finding

Interactive Chinese Question Answering System in Medicine Diagnosis

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow

THUTR: A Translation Retrieval System

Shallow Parsing with Apache UIMA

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

TREC 2003 Question Answering Track at CAS-ICT

Andre Standback. IT 103, Sec /21/12. IBM s Watson. GMU Honor Code on I am fully aware of the

Fast Data in the Era of Big Data: Twitter s Real-

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Analysis of Web Archives. Vinay Goel Senior Data Engineer

Domain Classification of Technical Terms Using the Web

Incorporating Participant Reputation in Community-driven Question Answering Systems

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure

Dynamical Clustering of Personalized Web Search Results

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

How To Write A Summary Of A Review

Text Mining and Analysis

The University of Lisbon at CLEF 2006 Ad-Hoc Task

An Overview of Computational Advertising

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014

Special Topics in Computer Science

Subordinating to the Majority: Factoid Question Answering over CQA Sites

Putting IBM Watson to Work In Healthcare

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Mining Opinion Features in Customer Reviews

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

A Survey on Product Aspect Ranking Techniques

Search Engine Based Intelligent Help Desk System: iassist

Cross-Lingual Concern Analysis from Multilingual Weblog Articles

A Comparative Study on Sentiment Classification and Ranking on Product Reviews

Computational Advertising Andrei Broder Yahoo! Research. SCECR, May 30, 2009

Hadoop. Sunday, November 25, 12

Financial Trading System using Combination of Textual and Numerical Data

Customizing an English-Korean Machine Translation System for Patent Translation *

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation

On the Feasibility of Answer Suggestion for Advice-seeking Community Questions about Government Services

A QoS-Aware Web Service Selection Based on Clustering

Identifying Focus, Techniques and Domain of Scientific Papers

Collecting Polish German Parallel Corpora in the Internet

SEO Techniques for various Applications - A Comparative Analyses and Evaluation

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

A Comparative Approach to Search Engine Ranking Strategies

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

Web Database Integration

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混合策略汉英和英汉机器翻译系 CWMT2011 技术报告

Comparing IPL2 and Yahoo! Answers: A Case Study of Digital Reference and Community Based Question Answering

Frontera: open source, large scale web crawling framework. Alexander Sibiryakov, October 1, 2015

A Framework of User-Driven Data Analytics in the Cloud for Course Management

IBM Watson Ecosystem. Getting Started Guide

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Web Information Mining and Decision Support Platform for the Modern Service Industry

Transcription:

Question Answering Chin-Yew Lin Senior Researcher Knowledge Mining Group Microsoft Research Asia

Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

Taylor s 4 Levels of Information Need Visceral ( 本能的 ) Actual but unexpressed Conscious ( 自觉的 ) Formalised ( 形式化的 ) Formulated description of need within brain Expressed statement of need Compromised ( 折衷的 ) Question as presented to the information system Compromised ( 折衷的 ) need Support question refinement, reformulation Compromised ( 折衷的 ) need * Taylor, R. S. (1968). Question-negotiation and information seeking in libraries. College & Research Libraries, 29(3), 178-194. * Adapted from Tetsuya Sakai s presentation on a related topic.(http://research.microsoft.com/en-us/people/tesakai/)

What is Question Answering? A natural way to Request information that we do not know Check information that we are not sure Interact with people around us Fulfill our visceral need A research field Develop automatic systems to answer questions Involve IR, NLP, and AI A better way to find what we want than search?

Critics Powerset has a lot of skeptics (BW 09/17/07): Search expert Danny Sullivan, editor-in-chief of the online news site Search Engine Land, noted that no claims of the superiority of natural-language search have ever held up. And he disputed the idea that most people would rather ask questions than simply type in a few words, noting Google didn't train people to query that way but simply responded to the way users were already conducting searches. "Linguistics will not solve most search problems," adds Apostolos Gerasoulis, executive vice-president of search technology at Ask.com, the search engine unit of IAC/InterActiveCorp (IACI).

Open-Domain Question Answering (ODQA) Question Answering Domain specific Domain independent Structured data Free text Web Fixed set of collections Single document

Sample Questions 9: How far is Yaroslavi from Moscow? 15: When was London s Docklands Light Railway constructed? 22: When did the Jurassic Period end? 29: What is the brightest star visible from Earth? 30: What are the Valdez Principles? 73: Where is the Taj Mahal? - from TRECs 8 and 9

Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

Apple s Knowledge Navigator 1987

Things to Think about What type of technologies are necessary to build an Apple Knowledge Navigator? Is QA part of the puzzle? Do you think that we already have all the pieces together? If you am to design a modern Apple Knowledge Navigator, what would it be like? Do you want one?

What Have We Learned? NLP & Summarization Lexical analysis ( 词法分析 ) Segmentation ( 分词 ) Tokenization ( 符号化 ) POS tagging ( 词性标注 ) Syntactic analysis ( 句法分析 ) Parsing Semantic analysis ( 语义分析 ) WSD ( 词义消歧 ) Ref resolution ( 指代消解 ) Discourse analysis ( 话语分析 ) Challenges Ambiguity ( 歧异性 ) Variations ( 多样性 ) Search Engine Overview Search engine architecture Crawler Web page parser Index builder Inverse index Signature file Suffix tree Web graph builder Link analysis PageRank, HITS Query analysis Indexing & ranking Relevance(Q, D) IR models Top-K query & index pruning Caching 80% of queries are cached User interface

Term Based Apps Web Search...... News Search MM Search Serve Index Acquisition Data Web text

Term Based Apps Serve Index Acquisition Data Web Search Unstructured data...... News Search Semi-structured data MM Search Structured data

Term Based Entity Based Apps Serve Index Acquisition Data Web search Unstructured data News search MM search Semi-structured data Structured Data Ingester Structured data

Apps Serve Index Acquisition Data Web search Unstructured data Term Based News search MM search QA Semi-structured data Structured Data Ingester Entity Based Entitybased Search... Semi- Structured Data Ingester KGraph Unstructured Data Ingester Structured data

Apps Serve Index Acquisition Data Web search Unstructured data Term Based News search MM search QA Semi-structured data Structured Data Ingester Entity Based Entitybased Search... Semi- Structured Data Ingester KGraph Unstructured Data Ingester... Structured data

Architecture of a Typical Search Engine Query User Interface Online Part Caching Indexing and Ranking Inverted Index Index Builder Page Ranks Link Analysis Cached Pages Page & Site Statistics Web Page Parser Pages Crawler Links & Anchors Link Map Web Graph Builder Offline Part Web Graph Web * Ji-Rong Wen Search Engine Overview 18

Architecture of a Typical QA Engine Question User Interface Online Part Question Analysis Caching Indexing and Ranking Answer Reranking Inverted Index Index Builder Page Ranks Link Analysis Cached Pages Page & Site Statistics Annotation Web Page Parser Pages Web Crawler Links & Anchors Link Map KB Web Graph Builder Ontology Offline Part Web Graph 19 * Adapted from Ji-Rong Wen Search Engine Overview

Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

QA Terminology Material in this section is based on: John Prager, Open-Domain Question-Answering, Foundations and Trends in Information Retrieval 1:2 (2006)

Terminology: Question Phrase The part of question says what is being sought Wh-words who, what, which, when, where, why, and how Wh-words + nouns, adjectives or adverbs what company, which president, how long, how fast.

Terminology: Question Type Question category for distinguishing different processing strategies FACTOID: How far is it from Earth to Mars? LIST: List the name of chewing gums. DEFINITION: Who is Vlad the Impaler? RELATIONSHIP: What is the connection between Valentina Tereshkova and Sally Ride? SUPERLATIVE: What is the largest city on Earth? YES-NO: Is Osama bin Laden alive? OPINION: What do most Americans think of gun control? CAUSE & EFFECT: Why did Iraq invade Kuwait?

Terminology: Answer Type The class of object sought by the question Person (from Who ) Place (from Where ) Date (from When ) Number (from How many ) Explanation (from Why ) Method (from How ) See UIUC question classification for more: http://cogcomp.cs.illinois.edu/data/qa/qc/ See USC/ISI Question Answer Typology for more: http://www.isi.edu/naturallanguage/projects/webclopedia/taxonomy/taxonomy_toplevel. html Hermjakob, U., Parsing and Question Classification for Question Answering, Workshop on Open-domain QA in ACL-2001, (http://www.isi.edu/~ulf/papers/acl01-qa-parsing.pdf)

Terminology: Question Focus & Topic Question focus is the property or entity that is being sought by the question McCarren Airport is located in what city? What is the population of Japan? What color is yak milk? Give hint on answer type Question topic is the object or event that the question is generally about What is the height of Mt. Everest? Where on the body is a mortarboard worn? Answer passages most likely would contain question topic

Terminology: Candidate Passage & Answer Candidate passage is a text passage retrieved by a search engine given a question Candidate answer is a small piece of text ranked according to its likelihood of being an answer to a question 50 Queen Elizabeth II September 8, 2003 by baking a mixture of flour and water

Terminology: Authority List A collection of instances of an answer type of interest used to test a term for class membership Days of week (Sun, Mon, Tue, ) Planets Elements States/Provinces/Counties/Countries Animals Plants Colors People Organizations Existing databases or lists Movies (IMDB) Books (Amazon) Freebase Sempute NeedleSeek (http://needleseek.msra.cn)

Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

Inside a QA System Material in this section is based on: John Prager, Open-Domain Question-Answering, Foundations and Trends in Information Retrieval 1:2 (2006)

A Simplified View of a QA System Question Question Analysis Keyword Query Search Web or Corpus Answer Type Documents or Passages Answer Extraction Search Engine Answer(S) * John Prager, Open-Domain Question-Answering, Foundations and Trends in Information Retrieval 1:2 (2006)

Search Engine Review Prof. Xiaojun Wan s lecture Build your own search engine The easy way Lucence http://lucene.apache.org/ Indri http://www.lemurproject.org/indri/ Ask TA for Chinese-specific SE

Named Entity Tagger Read Daniel Bikel s paper Build your own named entity tagger Daniel M. Bikel, Richard Schwartz and Ralph M. Weischedel. 1999. An Algorithm that Learns What s in a Name in the Machine Learning Journal Special Issue on Natural Language Learning The easy way Stanford Named Entity Recognizer (NER) http://nlp.stanford.edu/software/crf-ner.shtml Apache opennlp http://incubator.apache.org/opennlp/documentation.html Ask TA for Chinese-specific SE

Answer Type Inventory Decide a set of answer types that cover majority of questions in a chosen domain Hint: using a query log, check Dr. Wei WU s lecture Factors to consider Question analysis predict answer types Mountain: What mountain/peak Organization: What organization/company/group/agency NER must recognize instances of answer types Resources WordNet (http://wordnet.princeton.edu/), Wikipedia, UIUC question classes, USC/ISI question answer typology, MSRA Sempute NeedleSeek, Freebase, Yago and so on. Ask TA for Chinese-specific SE

Question Classification Part of question analysis to determine expected answer type Approaches Manually Person: Who/whom/whose ; The name of the person who Distance: How far/wide/broad/narrow/tall/high ; What is the distance/height/breadth of Machine learning See UIUC question classifier (http://cogcomp.cs.illinois.edu/data/qa/qc/) Li & Roth, Learning question classifiers: The role of semantic information, Journal of Natural Language Engineering, vol. 12, no. 3, pp. 229-249. * How to deal with unknown answer types? Ask TA for Chinese-specific SE

Query Generation Part of question analysis to generate a query for the search engine to retrieve candidate passages Goal: retrieve all documents containing candidate answers and none others Factors to consider Drop counter productive words What organization => drop what and organization Keep critical words Who is the CEO of Microsoft? => keep CEO and Microsoft (but drop who ) Expand critical words 北大在那里? => expand 北大 to 北京大学 Iterate and use feedback from previous retrieval Ask TA for Chinese-specific SE

Answer Extraction Heuristic (*) Approximate matching btw question/query and the candidate passages; using various heuristic features to compute scores (bag-of-words approach) Radev et al., Ranking suspected answers to natural language questions using predictive annotation, in ANLP2000. Pattern-based (*) When did Mozart die? => Mozart expired in 1791 When did X die? => X expired in <Date> When did Beethoven die? => Beethoven expired in <Date> ; <Date> = 1784 Ravichandran & Hovy, Learning surface text patterns for a question answering system, in ACL2002. Relationship-based Take advantage of relationship among words Who wrote the Declaration of Independence? => [X.write], [write, Declaration of Independence ] Jefferson wrote the Declaration of Independence. => [Jefferson.write], [write, Declaration of Independence ] Cui et al., Unsupervised learning of soft patterns for generating definitions from online news, in WWW2004. Logic-based Convert question to a goal and apply theorem-proving to prove the goal is true or not. Moldovan and Rus, Logic form transformation of WordNet and its applicability to question answering, in ACL2001. Ask TA for Chinese-specific SE

A Simplified View of a QA System Question Question Analysis Keyword Query Search Web or Corpus Answer Type Documents or Passages Answer Extraction Search Engine Answer(S) * John Prager, Open-Domain Question-Answering, Foundations and Trends in Information Retrieval 1:2 (2006)

Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

Evaluation * Part of this slides are included with the permission of Tetsuya Sakai

QA: Community-wide Efforts TREC QA Track (1999 2007) http://trec.nist.gov/data/qamain.html CLEF MLQA, ResPubliQA Track (2003 2011) http://www.clef-campaign.org/ NTCIR QAC, CLQA, ACLIA (2002 2011) http://research.nii.ac.jp/ntcir/index-en.html Conferences ACL, IJCNLP, NAACL, EACL, COLING, SIGIR, WSDM, WWW, CLEF, LREC, AAAI, IJCAI

Information Access Evaluation Workshops/Forums (USA 1992-) Collaboration (constructing shared data) and Competition (what approaches perform best?) Cross-Language Evaluation Forum (Europe 2000-) NII Test Collection for Information Retrieval systems (Asia 1999-) Question answering Cross-language retrieval Patent processing Opinion analysis : ACLIA= Advanced Cross-Lingual Information Access

Information Retrieval for Question Answering (IR4QA) Task@ACLIA ACLIA=Advanced Cross-lingual Information Access (Japanese, Simplified/Traditional Chinese, English) Question answers Question Question classification Question type Document retrieval Ranked list of documents Answer extraction IR4QA=document retrieval task in the context of QA

Constructing Test Collections via Pooling Runs submitted by Participating teams Ranked list Topic Topic (search Topic request) (search request) (search request) 50-100 topics Target Documents (several million) Ranked list : Ranked list pool pool pool Several hundred documents/topic Manual relevance Assessment (creating right answers) Highly Relevant, Relevant, : nonrelevant

PARTICIPANTS Relevance assessment is costly and time-consuming TIME Develop algorithms, systems, Tune them Submit runs IDLE TIME = NO EXPERIMENTS = NO PROGRESS (e.g. 4 months) Start Working Again! Release topics Pooling, RELEVANCE ASSESSMENT, Ranking runs, double-checking Release Evaluation results ORGANIZERS

NTCIR-8 ACLIA = IR4QA + CCLQA ACLIA: Advanced Cross-lingual Information Access * http://aclia.lti.cs.cmu.edu/ntcir8/

ACLIA Tasks English to Japanese CLQA (with J to J as a subtask) English to Chinese CLQA (CS or CT, with C to C as a subtask) English to Japanese CLIR (embedded in E-J CLQA) English to Chinese CLIR (embedded in E-C CLQA)

Question & Answering Roadmap 2001 * Burger et al. Issues, Tasks and Program Structures to Roadmap Research in Question & Answering (Q&A), 2001

Open Advancement of QA 2008 * Challenge Set Profile * Ferrucci et al. Towards the Open Advancement of Question Answering Systems

Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

IBM Watson

Falcon QA Architecture Traditional IR Module TREC8 TREC9 A QA system gives direct answers to a question instead of documents Falcon QA system (LCC) Moldovan et al. ACL 2000 Surdeanu et al. IEEE Trans. PDS 2002 Best QA system in TREC 8 & 9 Average question answering time TREC 8: 48 seconds TREC 9: 94 seconds QP 1.1% 1.2% PR (21.3 sec) 44.4% (24.9 sec) 26.5% PS 5.4% 2.2% PO 0.1% 0.1% AP (23.4 sec) 48.7% (65.5 sec) 69.7% Falcon QA system module analysis: processing time

IBM Watson Hardware Cluster of 90 IBM Power 750 servers + IO + network in 10 racks 2,880 3.5 GHz POWER7 processor cores (8 cores per processor) 16T of RAM Cost about USD$ 3 million Content are stored in RAM Software Jave and C++, Apache Hadoop, Apache UIMA, IBM DeepQA software, SUSE Linux Enterprise Server 11 More than 100 different techniques are used Data Encyclopedias, dictionaries, thesauri, newswire articles, and literary works Database, taxonomies, and ontologies (DBPedia, WordNet, and Yago) 200 millions pages of structured and unstructured content on 4T disk People Led by Dr. David Ferrucci with his 46 people research, PM, annotation, system, strategy team Reference Ferrucci, D, et al. (2010), "Building Watson: An Overview of the DeepQA Project", AI Magazine (AI Magazine.) 31 (3), http://www.stanford.edu/class/cs124/aimagzine-deepqa.pdf IBM Journal of Research and Development: This is Watson ; http://ieeexplore.ieee.org/xpl/tocresult.jsp?reload=true&isnumber=6177717

Behind IBM Watson

IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw

IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw

IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw

IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw

IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw

IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw

IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw

Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)

Question Answering The Easy Way? Community QA

Search vs. Question Answering (QA) User intention Understanding what users want is difficult!

Scalable Question Answering & Distillation Goal Create a web-scale QA repository and service Key idea Leverage existing knowledge in the QA forms Methods Extract and aggregate QA pairs in web-scale Learn user intents through analysis of QA repository Serve enriched answers instead of 10 blue links

Yahoo! Crawl Status 03/04/2009 SQuAD Crawled Y! Answers Remaining 9,000,000 8,000,000 7,000,000 6,000,000 5,000,000 4,000,000 3,000,000 2,000,000 1,000,000 0 Total: 55,554,314; Crawled: 43,852,589; Crawled/Total: 78.94%

Community Question and Answering

Community QnA in Details Topic Context 1 Context 2

Online Discussion Forum topic

FAQ Context dependent About 28,424,184 results on Live Search using query: FAQ travel (Google: about 64,200,000)

Challenges Question Mining Answer Summarization Question Answering Question Generation Question Utility Question Search & Recommendation

List of Related Papers Using Graded-Relevance Metrics for Evaluating Community QA Answer Selection Sakai et al.; WSDM 2011 Comparable Entity Mining from Comparative Questions Li et al.; ACL 2010 Learning to Recommend Questions Based on User Rating Sun et al.; CIKM 2009 A Structural Support Vector Method for Extracting Contexts and Answers of Questions from Online Forums Yang et al.; EMNLP 2009 Recommending Questions Using the MDL-based Tree Cut Model Cao et al.; WWW 2008 Searching Questions by Identifying Question Topic and Question Focus Duan et al.; ACL 2008 Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums Ding el al.; ACL 2008 Finding Question Answer Pairs from Online Forums Cong et al.; SIGIR 2008 Question Utility: A Novel Static Ranking of Question Search Song et al.; AAAI 2008 Answer Summarization: Understanding and Summarizing Answers in Community- Based Question Answering Services Liu et al; COLING 2008 Automatic Question Generation from Queries Lin; NSF Workshop on Question Generation Shared Task and Evaluation Challenge 2008

Question Mining & Answering (ACL 2008 & SIGIR 2008) Extract question and answer pairs Community QnA Create a resolved question list Extract & index question, best answer, and other answers Live Qna, Yahoo! Answers, Baidu Zhidao, Forum Extract and index threads and postings, find questions and their answers

QA Pairs in Online Forums

Question Search & Recommendation (ACL 2008 & WWW 2008) Query We would like to know what will be available to see in the Forbidden City because we understand that it will be under repairs. Question search Is it true that the Forbidden City is undergoing renovation & we won't be allow to enter? Question recommendation Would you get a lower price by not needing a guide for the Forbidden City and etc? Can anybody recommend a budget hotel near Forbidden City? Question = Topic + Focus + Others (TFO) Search: same topic similar foci Recommend: same topic different foci

Identifying Topic and Focus Travel @Yahoo! Answers Travel @Yahoo! Answers Asia Pacific Asia Pacific China Japan Europe Europe China Japan China 1. Anyone know where to see the Dragon Boat Festival in Beijing? 2. Where is a good (Less expensive) place to shop in Beijing? 3. What's the cheapest way to get from Beijing to Hong Kong? Europe 1. How far is it from Berlin to Hamburg? 2. What is the cheapest way from Berlin to Hamburg? 3. Where to see between Hamburg and Berlin? 4. How long does it take from Hamburg to Berlin? Specificity: the inverse of the entropy of the topic term s distribution over the sub-categories Order topic terms by their specificity

Motivation Question Utility (AAAI 2008) How useful is a question? How should we rank questions without queries? Definition How likely a question would be asked again? ( ' ) argmax Q p( Q Q') argmax Q p( Q) p( Qp ' ( Qw ) Q) p( Q') The prior probability of question Q reflecting a static rank of the question i.e. Question Utility w Q ' The probability generating query Q from question Q (Relevance score)

Answer Summarization (COLING 2008) Example: Where to stay in Paris? 2,645 answers (Yahoo! Answers 03/04/09) Is the best answer the best answer? Question clustering Find similar questions Answer summarization Aggregate answers for a question cluster Answer Taxonomy Question Taxonomy

Mixed Mode Question Answering Knowledge Distillation & Dissemination Knowledge Distillation and Dissemination Mixed Mode Scalable Question Answering and Distillation FAQ Highly Structured QnA QnA Structured QnA Forum Semi-structured QnA Web Unstructured QnA

Q&A = Knowledge = Power Q&A is complement to web keyword search Q&A can enhance existing QnA and search services Leverage existing knowledge in the question and answer forms and their authors Acquire or elicit human knowledge automatically

Discussion