Question Answering Chin-Yew Lin Senior Researcher Knowledge Mining Group Microsoft Research Asia
Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)
Taylor s 4 Levels of Information Need Visceral ( 本 能 的 ) Actual but unexpressed Conscious ( 自 觉 的 ) Formalised ( 形 式 化 的 ) Formulated description of need within brain Expressed statement of need Compromised ( 折 衷 的 ) Question as presented to the information system Compromised ( 折 衷 的 ) need Support question refinement, reformulation Compromised ( 折 衷 的 ) need * Taylor, R. S. (1968). Question-negotiation and information seeking in libraries. College & Research Libraries, 29(3), 178-194. * Adapted from Tetsuya Sakai s presentation on a related topic.(http://research.microsoft.com/en-us/people/tesakai/)
What is Question Answering? A natural way to Request information that we do not know Check information that we are not sure Interact with people around us Fulfill our visceral need A research field Develop automatic systems to answer questions Involve IR, NLP, and AI A better way to find what we want than search?
Critics Powerset has a lot of skeptics (BW 09/17/07): Search expert Danny Sullivan, editor-in-chief of the online news site Search Engine Land, noted that no claims of the superiority of natural-language search have ever held up. And he disputed the idea that most people would rather ask questions than simply type in a few words, noting Google didn't train people to query that way but simply responded to the way users were already conducting searches. "Linguistics will not solve most search problems," adds Apostolos Gerasoulis, executive vice-president of search technology at Ask.com, the search engine unit of IAC/InterActiveCorp (IACI).
Open-Domain Question Answering (ODQA) Question Answering Domain specific Domain independent Structured data Free text Web Fixed set of collections Single document
Sample Questions 9: How far is Yaroslavi from Moscow? 15: When was London s Docklands Light Railway constructed? 22: When did the Jurassic Period end? 29: What is the brightest star visible from Earth? 30: What are the Valdez Principles? 73: Where is the Taj Mahal? - from TRECs 8 and 9
Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)
Apple s Knowledge Navigator 1987
Things to Think about What type of technologies are necessary to build an Apple Knowledge Navigator? Is QA part of the puzzle? Do you think that we already have all the pieces together? If you am to design a modern Apple Knowledge Navigator, what would it be like? Do you want one?
What Have We Learned? NLP & Summarization Lexical analysis ( 词 法 分 析 ) Segmentation ( 分 词 ) Tokenization ( 符 号 化 ) POS tagging ( 词 性 标 注 ) Syntactic analysis ( 句 法 分 析 ) Parsing Semantic analysis ( 语 义 分 析 ) WSD ( 词 义 消 歧 ) Ref resolution ( 指 代 消 解 ) Discourse analysis ( 话 语 分 析 ) Challenges Ambiguity ( 歧 异 性 ) Variations ( 多 样 性 ) Search Engine Overview Search engine architecture Crawler Web page parser Index builder Inverse index Signature file Suffix tree Web graph builder Link analysis PageRank, HITS Query analysis Indexing & ranking Relevance(Q, D) IR models Top-K query & index pruning Caching 80% of queries are cached User interface
Term Based Apps Web Search...... News Search MM Search Serve Index Acquisition Data Web text
Term Based Apps Serve Index Acquisition Data Web Search Unstructured data...... News Search Semi-structured data MM Search Structured data
Term Based Entity Based Apps Serve Index Acquisition Data Web search Unstructured data News search MM search Semi-structured data Structured Data Ingester Structured data
Apps Serve Index Acquisition Data Web search Unstructured data Term Based News search MM search QA Semi-structured data Structured Data Ingester Entity Based Entitybased Search... Semi- Structured Data Ingester KGraph Unstructured Data Ingester Structured data
Apps Serve Index Acquisition Data Web search Unstructured data Term Based News search MM search QA Semi-structured data Structured Data Ingester Entity Based Entitybased Search... Semi- Structured Data Ingester KGraph Unstructured Data Ingester... Structured data
Architecture of a Typical Search Engine Query User Interface Online Part Caching Indexing and Ranking Inverted Index Index Builder Page Ranks Link Analysis Cached Pages Page & Site Statistics Web Page Parser Pages Crawler Links & Anchors Link Map Web Graph Builder Offline Part Web Graph Web * Ji-Rong Wen Search Engine Overview 18
Architecture of a Typical QA Engine Question User Interface Online Part Question Analysis Caching Indexing and Ranking Answer Reranking Inverted Index Index Builder Page Ranks Link Analysis Cached Pages Page & Site Statistics Annotation Web Page Parser Pages Web Crawler Links & Anchors Link Map KB Web Graph Builder Ontology Offline Part Web Graph 19 * Adapted from Ji-Rong Wen Search Engine Overview
Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)
QA Terminology Material in this section is based on: John Prager, Open-Domain Question-Answering, Foundations and Trends in Information Retrieval 1:2 (2006)
Terminology: Question Phrase The part of question says what is being sought Wh-words who, what, which, when, where, why, and how Wh-words + nouns, adjectives or adverbs what company, which president, how long, how fast.
Terminology: Question Type Question category for distinguishing different processing strategies FACTOID: How far is it from Earth to Mars? LIST: List the name of chewing gums. DEFINITION: Who is Vlad the Impaler? RELATIONSHIP: What is the connection between Valentina Tereshkova and Sally Ride? SUPERLATIVE: What is the largest city on Earth? YES-NO: Is Osama bin Laden alive? OPINION: What do most Americans think of gun control? CAUSE & EFFECT: Why did Iraq invade Kuwait?
Terminology: Answer Type The class of object sought by the question Person (from Who ) Place (from Where ) Date (from When ) Number (from How many ) Explanation (from Why ) Method (from How ) See UIUC question classification for more: http://cogcomp.cs.illinois.edu/data/qa/qc/ See USC/ISI Question Answer Typology for more: http://www.isi.edu/naturallanguage/projects/webclopedia/taxonomy/taxonomy_toplevel. html Hermjakob, U., Parsing and Question Classification for Question Answering, Workshop on Open-domain QA in ACL-2001, (http://www.isi.edu/~ulf/papers/acl01-qa-parsing.pdf)
Terminology: Question Focus & Topic Question focus is the property or entity that is being sought by the question McCarren Airport is located in what city? What is the population of Japan? What color is yak milk? Give hint on answer type Question topic is the object or event that the question is generally about What is the height of Mt. Everest? Where on the body is a mortarboard worn? Answer passages most likely would contain question topic
Terminology: Candidate Passage & Answer Candidate passage is a text passage retrieved by a search engine given a question Candidate answer is a small piece of text ranked according to its likelihood of being an answer to a question 50 Queen Elizabeth II September 8, 2003 by baking a mixture of flour and water
Terminology: Authority List A collection of instances of an answer type of interest used to test a term for class membership Days of week (Sun, Mon, Tue, ) Planets Elements States/Provinces/Counties/Countries Animals Plants Colors People Organizations Existing databases or lists Movies (IMDB) Books (Amazon) Freebase Sempute NeedleSeek (http://needleseek.msra.cn)
Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)
Inside a QA System Material in this section is based on: John Prager, Open-Domain Question-Answering, Foundations and Trends in Information Retrieval 1:2 (2006)
A Simplified View of a QA System Question Question Analysis Keyword Query Search Web or Corpus Answer Type Documents or Passages Answer Extraction Search Engine Answer(S) * John Prager, Open-Domain Question-Answering, Foundations and Trends in Information Retrieval 1:2 (2006)
Search Engine Review Prof. Xiaojun Wan s lecture Build your own search engine The easy way Lucence http://lucene.apache.org/ Indri http://www.lemurproject.org/indri/ Ask TA for Chinese-specific SE
Named Entity Tagger Read Daniel Bikel s paper Build your own named entity tagger Daniel M. Bikel, Richard Schwartz and Ralph M. Weischedel. 1999. An Algorithm that Learns What s in a Name in the Machine Learning Journal Special Issue on Natural Language Learning The easy way Stanford Named Entity Recognizer (NER) http://nlp.stanford.edu/software/crf-ner.shtml Apache opennlp http://incubator.apache.org/opennlp/documentation.html Ask TA for Chinese-specific SE
Answer Type Inventory Decide a set of answer types that cover majority of questions in a chosen domain Hint: using a query log, check Dr. Wei WU s lecture Factors to consider Question analysis predict answer types Mountain: What mountain/peak Organization: What organization/company/group/agency NER must recognize instances of answer types Resources WordNet (http://wordnet.princeton.edu/), Wikipedia, UIUC question classes, USC/ISI question answer typology, MSRA Sempute NeedleSeek, Freebase, Yago and so on. Ask TA for Chinese-specific SE
Question Classification Part of question analysis to determine expected answer type Approaches Manually Person: Who/whom/whose ; The name of the person who Distance: How far/wide/broad/narrow/tall/high ; What is the distance/height/breadth of Machine learning See UIUC question classifier (http://cogcomp.cs.illinois.edu/data/qa/qc/) Li & Roth, Learning question classifiers: The role of semantic information, Journal of Natural Language Engineering, vol. 12, no. 3, pp. 229-249. * How to deal with unknown answer types? Ask TA for Chinese-specific SE
Query Generation Part of question analysis to generate a query for the search engine to retrieve candidate passages Goal: retrieve all documents containing candidate answers and none others Factors to consider Drop counter productive words What organization => drop what and organization Keep critical words Who is the CEO of Microsoft? => keep CEO and Microsoft (but drop who ) Expand critical words 北 大 在 那 里? => expand 北 大 to 北 京 大 学 Iterate and use feedback from previous retrieval Ask TA for Chinese-specific SE
Answer Extraction Heuristic (*) Approximate matching btw question/query and the candidate passages; using various heuristic features to compute scores (bag-of-words approach) Radev et al., Ranking suspected answers to natural language questions using predictive annotation, in ANLP2000. Pattern-based (*) When did Mozart die? => Mozart expired in 1791 When did X die? => X expired in <Date> When did Beethoven die? => Beethoven expired in <Date> ; <Date> = 1784 Ravichandran & Hovy, Learning surface text patterns for a question answering system, in ACL2002. Relationship-based Take advantage of relationship among words Who wrote the Declaration of Independence? => [X.write], [write, Declaration of Independence ] Jefferson wrote the Declaration of Independence. => [Jefferson.write], [write, Declaration of Independence ] Cui et al., Unsupervised learning of soft patterns for generating definitions from online news, in WWW2004. Logic-based Convert question to a goal and apply theorem-proving to prove the goal is true or not. Moldovan and Rus, Logic form transformation of WordNet and its applicability to question answering, in ACL2001. Ask TA for Chinese-specific SE
A Simplified View of a QA System Question Question Analysis Keyword Query Search Web or Corpus Answer Type Documents or Passages Answer Extraction Search Engine Answer(S) * John Prager, Open-Domain Question-Answering, Foundations and Trends in Information Retrieval 1:2 (2006)
Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)
Evaluation * Part of this slides are included with the permission of Tetsuya Sakai
QA: Community-wide Efforts TREC QA Track (1999 2007) http://trec.nist.gov/data/qamain.html CLEF MLQA, ResPubliQA Track (2003 2011) http://www.clef-campaign.org/ NTCIR QAC, CLQA, ACLIA (2002 2011) http://research.nii.ac.jp/ntcir/index-en.html Conferences ACL, IJCNLP, NAACL, EACL, COLING, SIGIR, WSDM, WWW, CLEF, LREC, AAAI, IJCAI
Information Access Evaluation Workshops/Forums (USA 1992-) Collaboration (constructing shared data) and Competition (what approaches perform best?) Cross-Language Evaluation Forum (Europe 2000-) NII Test Collection for Information Retrieval systems (Asia 1999-) Question answering Cross-language retrieval Patent processing Opinion analysis : ACLIA= Advanced Cross-Lingual Information Access
Information Retrieval for Question Answering (IR4QA) Task@ACLIA ACLIA=Advanced Cross-lingual Information Access (Japanese, Simplified/Traditional Chinese, English) Question answers Question Question classification Question type Document retrieval Ranked list of documents Answer extraction IR4QA=document retrieval task in the context of QA
Constructing Test Collections via Pooling Runs submitted by Participating teams Ranked list Topic Topic (search Topic request) (search request) (search request) 50-100 topics Target Documents (several million) Ranked list : Ranked list pool pool pool Several hundred documents/topic Manual relevance Assessment (creating right answers) Highly Relevant, Relevant, : nonrelevant
PARTICIPANTS Relevance assessment is costly and time-consuming TIME Develop algorithms, systems, Tune them Submit runs IDLE TIME = NO EXPERIMENTS = NO PROGRESS (e.g. 4 months) Start Working Again! Release topics Pooling, RELEVANCE ASSESSMENT, Ranking runs, double-checking Release Evaluation results ORGANIZERS
NTCIR-8 ACLIA = IR4QA + CCLQA ACLIA: Advanced Cross-lingual Information Access * http://aclia.lti.cs.cmu.edu/ntcir8/
ACLIA Tasks English to Japanese CLQA (with J to J as a subtask) English to Chinese CLQA (CS or CT, with C to C as a subtask) English to Japanese CLIR (embedded in E-J CLQA) English to Chinese CLIR (embedded in E-C CLQA)
Question & Answering Roadmap 2001 * Burger et al. Issues, Tasks and Program Structures to Roadmap Research in Question & Answering (Q&A), 2001
Open Advancement of QA 2008 * Challenge Set Profile * Ferrucci et al. Towards the Open Advancement of Question Answering Systems
Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)
IBM Watson
Falcon QA Architecture Traditional IR Module TREC8 TREC9 A QA system gives direct answers to a question instead of documents Falcon QA system (LCC) Moldovan et al. ACL 2000 Surdeanu et al. IEEE Trans. PDS 2002 Best QA system in TREC 8 & 9 Average question answering time TREC 8: 48 seconds TREC 9: 94 seconds QP 1.1% 1.2% PR (21.3 sec) 44.4% (24.9 sec) 26.5% PS 5.4% 2.2% PO 0.1% 0.1% AP (23.4 sec) 48.7% (65.5 sec) 69.7% Falcon QA system module analysis: processing time
IBM Watson Hardware Cluster of 90 IBM Power 750 servers + IO + network in 10 racks 2,880 3.5 GHz POWER7 processor cores (8 cores per processor) 16T of RAM Cost about USD$ 3 million Content are stored in RAM Software Jave and C++, Apache Hadoop, Apache UIMA, IBM DeepQA software, SUSE Linux Enterprise Server 11 More than 100 different techniques are used Data Encyclopedias, dictionaries, thesauri, newswire articles, and literary works Database, taxonomies, and ontologies (DBPedia, WordNet, and Yago) 200 millions pages of structured and unstructured content on 4T disk People Led by Dr. David Ferrucci with his 46 people research, PM, annotation, system, strategy team Reference Ferrucci, D, et al. (2010), "Building Watson: An Overview of the DeepQA Project", AI Magazine (AI Magazine.) 31 (3), http://www.stanford.edu/class/cs124/aimagzine-deepqa.pdf IBM Journal of Research and Development: This is Watson ; http://ieeexplore.ieee.org/xpl/tocresult.jsp?reload=true&isnumber=6177717
Behind IBM Watson
IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw
IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw
IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw
IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw
IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw
IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw
IBM Watson Videos Final Jeopardy and the Future of Watson http://www.youtube.com/watch?v=li-m7o_brng The Science behind an Answer http://www.youtube.com/watch?feature=player_ detailpage&v=dywo4zksfxw
Agenda What is Question Answering Why Question Answering QA Terminology Inside a QA System Evaluation IBM Watson Community Question Answering (SQuAD)
Question Answering The Easy Way? Community QA
Search vs. Question Answering (QA) User intention Understanding what users want is difficult!
Scalable Question Answering & Distillation Goal Create a web-scale QA repository and service Key idea Leverage existing knowledge in the QA forms Methods Extract and aggregate QA pairs in web-scale Learn user intents through analysis of QA repository Serve enriched answers instead of 10 blue links
Yahoo! Crawl Status 03/04/2009 SQuAD Crawled Y! Answers Remaining 9,000,000 8,000,000 7,000,000 6,000,000 5,000,000 4,000,000 3,000,000 2,000,000 1,000,000 0 Total: 55,554,314; Crawled: 43,852,589; Crawled/Total: 78.94%
Community Question and Answering
Community QnA in Details Topic Context 1 Context 2
Online Discussion Forum topic
FAQ Context dependent About 28,424,184 results on Live Search using query: FAQ travel (Google: about 64,200,000)
Challenges Question Mining Answer Summarization Question Answering Question Generation Question Utility Question Search & Recommendation
List of Related Papers Using Graded-Relevance Metrics for Evaluating Community QA Answer Selection Sakai et al.; WSDM 2011 Comparable Entity Mining from Comparative Questions Li et al.; ACL 2010 Learning to Recommend Questions Based on User Rating Sun et al.; CIKM 2009 A Structural Support Vector Method for Extracting Contexts and Answers of Questions from Online Forums Yang et al.; EMNLP 2009 Recommending Questions Using the MDL-based Tree Cut Model Cao et al.; WWW 2008 Searching Questions by Identifying Question Topic and Question Focus Duan et al.; ACL 2008 Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums Ding el al.; ACL 2008 Finding Question Answer Pairs from Online Forums Cong et al.; SIGIR 2008 Question Utility: A Novel Static Ranking of Question Search Song et al.; AAAI 2008 Answer Summarization: Understanding and Summarizing Answers in Community- Based Question Answering Services Liu et al; COLING 2008 Automatic Question Generation from Queries Lin; NSF Workshop on Question Generation Shared Task and Evaluation Challenge 2008
Question Mining & Answering (ACL 2008 & SIGIR 2008) Extract question and answer pairs Community QnA Create a resolved question list Extract & index question, best answer, and other answers Live Qna, Yahoo! Answers, Baidu Zhidao, Forum Extract and index threads and postings, find questions and their answers
QA Pairs in Online Forums
Question Search & Recommendation (ACL 2008 & WWW 2008) Query We would like to know what will be available to see in the Forbidden City because we understand that it will be under repairs. Question search Is it true that the Forbidden City is undergoing renovation & we won't be allow to enter? Question recommendation Would you get a lower price by not needing a guide for the Forbidden City and etc? Can anybody recommend a budget hotel near Forbidden City? Question = Topic + Focus + Others (TFO) Search: same topic similar foci Recommend: same topic different foci
Identifying Topic and Focus Travel @Yahoo! Answers Travel @Yahoo! Answers Asia Pacific Asia Pacific China Japan Europe Europe China Japan China 1. Anyone know where to see the Dragon Boat Festival in Beijing? 2. Where is a good (Less expensive) place to shop in Beijing? 3. What's the cheapest way to get from Beijing to Hong Kong? Europe 1. How far is it from Berlin to Hamburg? 2. What is the cheapest way from Berlin to Hamburg? 3. Where to see between Hamburg and Berlin? 4. How long does it take from Hamburg to Berlin? Specificity: the inverse of the entropy of the topic term s distribution over the sub-categories Order topic terms by their specificity
Motivation Question Utility (AAAI 2008) How useful is a question? How should we rank questions without queries? Definition How likely a question would be asked again? ( ' ) argmax Q p( Q Q') argmax Q p( Q) p( Qp ' ( Qw ) Q) p( Q') The prior probability of question Q reflecting a static rank of the question i.e. Question Utility w Q ' The probability generating query Q from question Q (Relevance score)
Answer Summarization (COLING 2008) Example: Where to stay in Paris? 2,645 answers (Yahoo! Answers 03/04/09) Is the best answer the best answer? Question clustering Find similar questions Answer summarization Aggregate answers for a question cluster Answer Taxonomy Question Taxonomy
Mixed Mode Question Answering Knowledge Distillation & Dissemination Knowledge Distillation and Dissemination Mixed Mode Scalable Question Answering and Distillation FAQ Highly Structured QnA QnA Structured QnA Forum Semi-structured QnA Web Unstructured QnA
Q&A = Knowledge = Power Q&A is complement to web keyword search Q&A can enhance existing QnA and search services Leverage existing knowledge in the question and answer forms and their authors Acquire or elicit human knowledge automatically
Discussion