Building Watson A Brief Overview of the DeepQA Project Eddie Epstein, Manager of Scaleout DeepQA Team @ IBM Research 1
Want to Play Chess or Just Chat? Chess A finite, mathematically well-defined search space Limited number of moves and states All the symbols are completely grounded in the mathematical rules of the game Human Language Words by themselves have no meaning Only grounded in human cognition Words navigate, align and communicate an infinite space of intended meaning Computers can not ground words to human experiences to derive meaning
The Jeopardy! Challenge: A compelling and notable way to drive and measure the technology of automatic Answering along 5 Key Dimensions Broad/Open Domain Complex Language High Precision Accurate Confidence High Speed $200 If you're standing, it's the direction you should look to check out the wainscoting. $600 In cell division, mitosis splits the nucleus & cytokinesis splits this liquid cushioning the nucleus $1000 Of the 4 countries in the world that the U.S. does not have diplomatic relations with, the one that s farthest north
Broad Domain We do NOT attempt to anticipate all questions and build specialized databases. In a random sample of 20,000 questions we found 2,500 distinct types*. The most frequent occurring <3% of the time. The distribution has a very long tail. And for each these types 1000 s of different things may be asked. Even going for the head of the tail will barely make a dent *13% are non-distinct (e.g., it, this, these or NA) Our Focus is on reusable NLP technology for analyzing volumes of as-is text. Structured sources (DBs and KBs) are used to help interpret the text. 4
The Best Human Performance: Our Analysis Reveals the Winner s Cloud Each dot represents an actual historical human Jeopardy! game Winning Human Performance Grand Champion Human Performance 2007 QA Computer System 5 More Confident Less Confident
Not Just for Fun A long, tiresome speech delivered by a frothy pie topping Category: Edible Rhyme Time 6
Not Just for Fun Some s require and Synthesis A long, tiresome speech delivered by a frothy pie topping. Diatribe. Harangue. Meringue. Whipped Cream. Answer: Meringue Harangue Category: Edible Rhyme Time 7
Divide and Conquer (Typical in Final Jeopardy!) When "60 Minutes" premiered this man was U.S. president.? 8
Divide and Conquer (Typical in Final Jeopardy!) Must identify and solve sub-questions from different sources to answer the top level question Lyndon B Johnson In 1968 this man was U.S. president. When "60 Minutes" premiered this man was U.S. president. 9
The Missing Link Shirts, TV remote controls, Telephones
The Missing Link Buttons Shirts, TV remote controls, Telephones
The Missing Link Buttons Shirts, TV remote controls, Telephones On hearing of the discovery of George Mallory's body, he told reporters he still thinks he was first.
The Missing Link Buttons Shirts, TV remote controls, Telephones Mt Everest Edmund Hillary He was first On hearing of the discovery of George Mallory's body, he told reporters he still thinks he was first.
Watson s Knowledge for Jeopardy! Watson has analyzed and stored the equivalent of about 1 million books (e.g., encyclopedias, dictionaries, news articles, reference texts, plays, etc) Watson also uses structured sources such as WordNet and DBpedia
Automatic Learning From Reading Sentence Parsing Generalization & Statistical Aggregation Volumes of Text Syntactic Frames subject verb object Semantic Frames Inventors patent inventions (.8) Officials Submit Resignations (.7) People earn degrees at schools (0.9) Fluid is a liquid (.6) Liquid is a fluid (.5) Vessels Sink (0.7) People sink 8-balls (0.5) (in pool/0.8)
How Watson Processes a IN 1698, THIS COMET DISCOVERER TOOK A SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY SCIENTIFIC SEA VOYAGE
How Watson Processes a IN 1698, THIS COMET DISCOVERER TOOK A SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY SCIENTIFIC SEA VOYAGE Analysis Keywords: 1698, comet, paramour, pink, AnswerType(comet discoverer) Date(1698) Took(discoverer, ship) Called(ship, Paramour Pink)
How Watson Processes a IN 1698, THIS COMET DISCOVERER TOOK A SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY SCIENTIFIC SEA VOYAGE Analysis Keywords: 1698, comet, paramour, pink, AnswerType(comet discoverer) Date(1698) Took(discoverer, ship) Called(ship, Paramour Pink) Primary Search Primary Search Results
How Watson Processes a IN 1698, THIS COMET DISCOVERER TOOK A SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY SCIENTIFIC SEA VOYAGE Analysis Keywords: 1698, comet, paramour, pink, AnswerType(comet discoverer) Date(1698) Took(discoverer, ship) Called(ship, Paramour Pink) Primary Search Primary Search Results Candidate Answer Generation Isaac Newton Wilhelm Tempel HMS Paramour Christiaan Huygens Halley s Comet Edmond Halley Pink Panther Peter Sellers
How Watson Processes a IN 1698, THIS COMET DISCOVERER TOOK A SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY SCIENTIFIC SEA VOYAGE Analysis Keywords: 1698, comet, paramour, pink, AnswerType(comet discoverer) Date(1698) Took(discoverer, ship) Called(ship, Paramour Pink) Primary Search Primary Search Results Candidate Answer Generation Isaac Newton Wilhelm Tempel HMS Paramour Christiaan Huygens Halley s Comet Edmond Halley Pink Panther Peter Sellers Evidence Retrieval
How Watson Processes a IN 1698, THIS COMET DISCOVERER TOOK A SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY SCIENTIFIC SEA VOYAGE Analysis Keywords: 1698, comet, paramour, pink, AnswerType(comet discoverer) Date(1698) Took(discoverer, ship) Called(ship, Paramour Pink) Primary Search Primary Search Results Candidate Answer Generation Isaac Newton Wilhelm Tempel HMS Paramour Christiaan Huygens Lexical Taxonomic Spatial [0.58 0-1.3 0.97] [0.71 1 13.4 0.72] [0.12 0 2.0 0.40] [0.84 1 10.6 0.21] Temporal Halley s Comet [0.33 0 6.3 0.83] Edmond Halley [0.21 1 11.1 0.92] Pink Panther [0.91 0-8.2 0.61] Peter Sellers Evidence Retrieval [0.91 0-1.7 0.60] Evidence Scoring
How Watson Processes a IN 1698, THIS COMET DISCOVERER TOOK A SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY SCIENTIFIC SEA VOYAGE Analysis Keywords: 1698, comet, paramour, pink, AnswerType(comet discoverer) Date(1698) Took(discoverer, ship) Called(ship, Paramour Pink) Primary Search Primary Search Results Candidate Answer Generation Isaac Newton Wilhelm Tempel HMS Paramour Christiaan Huygens Lexical Taxonomic Spatial [0.58 0-1.3 0.97] [0.71 1 13.4 0.72] [0.12 0 2.0 0.40] [0.84 1 10.6 0.21] Temporal Calculate Rank & Confidence Halley s Comet [0.33 0 6.3 0.83] Edmond Halley Pink Panther Peter Sellers Evidence Retrieval [0.21 1 11.1 0.92] [0.91 0-8.2 0.61] [0.91 0-1.7 0.60] Evidence Scoring 1) Edmond Halley (0.85) 2) Christiaan Huygens (0.20) 3) Peter Sellers (0.05)
Evidence Profiles: Time, Popularity, Source, etc. Clue: You ll find Bethel College and a Seminary in this holy Minnesota city. Positive Evidence Negative Evidence
Opportunities for Parallel Processing Primary Searches are independent, and each Search Result is analyzed independently to generate candidate answers Candidate Generation Isaac Newton Primary Search Results Wilhelm Tempel Christiaan Huygens Wilhelm Tempel Edmond Halley Halley s Comet Edmond Halley HMS Paramour Pink Panther Peter Sellers
Opportunities for Parallel Processing Evidence is Retrieved for each Candidate Answer independently Evidence scoring is done independently Evidence Retrieval Evidence Scoring [0.58 0-1.3 0.97] Christiaan Huygens Edmond Halley [0.71 1 13.4 0.72] [0.12 0 2.0 0.40] [0.84 1 10.6 0.21] 1) Edmond Halley (0.85) 2) Christiaan Huygens (0.20) 3) Peter Sellers (0.05)
DeepQA: The Technology Behind Watson Massively Parallel Probabilistic Evidence-Based Architecture Case/ / Topic Data /Case Analysis Multiple Interpretations 100s sources Fact, Finding and 100s Possible Answers Hypothesis Generation 1000 s of Pieces of Evidence Hypothesis and Evidence Scoring 100,000 s scores from many simultaneous Text Analysis Algorithms Synthesis Final Confidence Merging & Ranking Hypothesis Generation Hypothesis and Evidence Scoring... Confidence- Weighted Answers
DeepQA: The Technology Behind Watson Massively Parallel Probabilistic Evidence-Based Architecture Case/ / Topic Data /Case Analysis Multiple Interpretations 100s sources Fact, Finding and 100s Possible Answers Hypothesis Generation 1000 s of Pieces of Evidence Hypothesis and Evidence Scoring 100,000 s scores from many simultaneous Text Analysis Algorithms Synthesis Final Confidence Merging & Ranking Hypothesis Generation Hypothesis and Evidence Scoring... Confidence- Weighted Answers
DeepQA: The Technology Behind Watson Massively Parallel Probabilistic Evidence-Based Architecture Case/ / Topic Data /Case Analysis Multiple Interpretations 100s sources Fact, Finding and 100s Possible Answers Hypothesis Generation 1000 s of Pieces of Evidence Hypothesis and Evidence Scoring 100,000 s scores from many simultaneous Text Analysis Algorithms Synthesis Final Confidence Merging & Ranking Hypothesis Generation Hypothesis and Evidence Scoring... Confidence- Weighted Answers
DeepQA on Apache UIMA Aggregate Analysis Engine Analysis Engine Analysis Engine Analysis Engine Analysis Engine Primary Candidate Answer Analysis Searches Generation Scoring Analysis Engine Supporting Analysis Engine Deep Evidence Analysis Engine Final Evidence Search Scoring Merger Flow Controller
Multipliers used to Organize Analysis Data Aggregate Analysis Engine Analysis Engine Analysis Engine Analysis Engine Analysis Engine Primary Candidate Answer Analysis Searches Generation Scoring Analysis Engine Analysis Engine Analysis Engine Supporting Deep Evidence Final Flow Controller Evidence Search Scoring Merger
An Aggregate AE in Core UIMA is Single Threaded Aggregate Analysis Engine Analysis Engine 1 Annotator Analysis Engine Annotator Flow Controller
UIMA-AS Enables Multithreaded Operation Async Aggregate Analysis Engine 5 Input Queue 4 In Queue Flow Controller Analysis Engine 1 2 3 Annotator In Queue Analysis Engine Analysis Engine Analysis Engine Annotator
Selected Watson Processes Distributed Search Daemons Evidence Evidence Control Evidence Primary Control Searches Control 1..N Evidence Evidence Control Evidence Supporting Control Evidence Control Search Hypothesis Hypothesis Hypothesis Hypothesis Generation Hypothesis Generation Hypothesis Generation Hypothesis Generation Hypothesis Generation Hypothesis Generation Candidate Generation Generation Hypothesis Answer Generation Generation Generation Scoring Deep Deep Evidence Retrieval Deep Evidence Evidence Retrieval and Deep Retrieval and Scoring Deep Evidence and Scoring Deep Evidence Deep Scoring Evidence Deep Scoring Evidence Deep Scoring Evidence Scoring Evidence Scoring Scoring Evidence Evidence Control Evidence Supporting Control Evidence Control Scaleout Top level Aggregate & Topic Analysis Scaleout Control Final Confidence Merging & Ranking Answer & Confidence
Watson Inter-Process Data Flow Hypothesis Hypothesis Generation Hypothesis Generation Hypothesis Generation Hypothesis Generation Candidate Generation Generation Hypothesis Hypothesis Generation Hypothesis Generation Hypothesis Generation Generation Hypothesis Answer Generation Scoring Deep Deep Evidence Retrieval Deep Evidence Evidence Retrieval and Deep Retrieval and Scoring Deep Evidence and Scoring Deep Evidence Deep Scoring Evidence Deep Scoring Evidence Deep Scoring Evidence Scoring Evidence Scoring Scoring Evidence Evidence Control Evidence Supporting Control Evidence Control Search JMS Broker Primary Primary Search Sup Evid Search Scaleout Distributed Search Daemons Top Level Aggregate Content & Prismatic Servers Broker-based transactions Indri transactions CS & Prismatic transactions
Development System Timing Before Scale Out Single-threaded computation Search indexes on disk Remote Sesame server
After first 8 months of Scaleout Work Move everything into RAM Scale out components with UIMA-AS Distribute search
12 more months of Scaleout Work Pre-compute deep NLP analysis of entire text corpus Hammer on every computation outlier Expand cluster
Technology Classics The Great Outdoors Speak of the Dickens Mind Your Manners Before and After $200 $200 $200 $200 $200 $200 $400 $400 $400 $400 $400 $400 $600 $600 $600 $600 $600 $600 $800 $800 $800 $800 $800 $800 $1000 $1000 $1000 $1000 $1000 $1000 Jeopardy! Game Control System Real-Time Game Configuration Used in Sparring and Exhibition Games Human Player 1 Human Player 2 Strategy & Text-to-Speech Watson s Game Controller Clue & Category Answers & Confidences Watson s QA Engine 400 Processes 90 IBM Power 750s 15 TB RAM 2 TB Disk Clues, Scores & Other Game Data Analysis of natural language content equivalent to 1 Million Books
Some Growing Pains (early answers) THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them
Some Growing Pains (early answers) THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One
Some Growing Pains (early answers) the People THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One NEW YORK TIMES HEADLINES An exclamation point was warranted for the "end of" this! In 1918
Some Growing Pains (early answers) the People THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One NEW YORK TIMES HEADLINES An exclamation point was warranted for the "end of" this! In 1918 a sentence
Some Growing Pains (early answers) the People THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One WW I NEW YORK TIMES HEADLINES An exclamation point was warranted for the "end of" this! In 1918 a sentence MILESTONES In 1994, 25 years after this event, 1 participant said, "For one crowning moment, we were creatures of the cosmic ocean
Some Growing Pains (early answers) the People THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One WW I NEW YORK TIMES HEADLINES An exclamation point was warranted for the "end of" this! In 1918 a sentence MILESTONES In 1994, 25 years after this event, 1 participant said, "For one crowning moment, we were creatures of the cosmic ocean the Big Bang
Some Growing Pains (early answers) the People THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One WW I NEW YORK TIMES HEADLINES An exclamation point was warranted for the "end of" this! In 1918 a sentence Apollo 11 moon landing MILESTONES In 1994, 25 years after this event, 1 participant said, "For one crowning moment, we were creatures of the cosmic ocean the Big Bang THE QUEEN'S ENGLISH Give a Brit a tinkle when you get into town & you've done this
Some Growing Pains (early answers) the People THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One WW I NEW YORK TIMES HEADLINES An exclamation point was warranted for the "end of" this! In 1918 a sentence Apollo 11 moon landing MILESTONES In 1994, 25 years after this event, 1 participant said, "For one crowning moment, we were creatures of the cosmic ocean the Big Bang THE QUEEN'S ENGLISH Give a Brit a tinkle when you get into town & you've done this urinate
Some Growing Pains (early answers) the People THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One WW I NEW YORK TIMES HEADLINES An exclamation point was warranted for the "end of" this! In 1918 a sentence Apollo 11 moon landing MILESTONES In 1994, 25 years after this event, 1 participant said, "For one crowning moment, we were creatures of the cosmic ocean the Big Bang THE QUEEN'S ENGLISH Give a Brit a tinkle when you get into town & you've done this Call on the phone urinate FATHERLY NICKNAMES This Frenchman was "The Father of Bacteriology"
Some Growing Pains (early answers) the People THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One WW I NEW YORK TIMES HEADLINES An exclamation point was warranted for the "end of" this! In 1918 a sentence Apollo 11 moon landing MILESTONES In 1994, 25 years after this event, 1 participant said, "For one crowning moment, we were creatures of the cosmic ocean the Big Bang THE QUEEN'S ENGLISH Give a Brit a tinkle when you get into town & you've done this Call on the phone urinate FATHERLY NICKNAMES This Frenchman was "The Father of Bacteriology" How Tasty Was My Little Frenchman
Some Growing Pains (early answers) the People THE AMERICAN DREAM Decades before Lincoln, Daniel Webster spoke of government "made for", "made by" & "answerable to" them No One WW I NEW YORK TIMES HEADLINES An exclamation point was warranted for the "end of" this! In 1918 a sentence Apollo 11 moon landing MILESTONES In 1994, 25 years after this event, 1 participant said, "For one crowning moment, we were creatures of the cosmic ocean the Big Bang THE QUEEN'S ENGLISH Give a Brit a tinkle when you get into town & you've done this Call on the phone urinate Louis Pasteur FATHERLY NICKNAMES This Frenchman was "The Father of Bacteriology" How Tasty Was My Little Frenchman
In-Category Learning CELEBRATIONS OF THE MONTH What the Jeopardy! Clue is asking for is NOT always obvious Watson can try to infer the type of thing being asked for from the previous answers. In this example after seeing 2 correct answers Watson starts to dynamically learn a confidence that the question is asking for something that it can classify as a month. Clue Type Watson s Answer Correct Answer D-DAY ANNIVERSARY & MAGNA CARTA DAY NATIONAL PHILANTHROPY DAY & ALL SOULS' DAY NATIONAL TEACHER DAY & KENTUCKY DERBY DAY ADMINISTRATIVE PROFESSIONALS DAY & NATIONAL CPAS GOOF-OFF DAY NATIONAL MAGIC DAY & NEVADA ADMISSION DAY day Runnymede June day Day/month(.2) Day of the Dead Churchill Downs November May day / month(.6) April April day / month(.8) October October
DeepQA: Incremental Progress in Answering Precision on the Jeopardy Challenge: 6/2007-11/2010 IBM Watson Playing in the Winners Cloud v0.8 11/10 V0.7 04/10 v0.6 10/09 v0.5 05/09 v0.4 12/08 v0.3 08/08 v0.2 05/08 v0.1 12/07 Baseline 12/06
Watson Workload Optimized System (Power 750) 90 x IBM Power 750 servers 32 POWER7 3.55 GHz cores each 10 Gb Ethernet network 15 Terabytes of total memory 2 Terabyte filesystem for Jeopardy! application Linux provides a cost-effective open platform which is performance-optimized to exploit POWER 7 systems 10 racks include servers, networking, shared disk system, cluster controllers
Precision, Confidence & Speed Deep Analytics Combining many analytics in a novel architecture, we achieved very high levels of Precision and Confidence over a huge variety of as-is content. Speed By optimizing Watson s computation for Jeopardy! on over 2,800 POWER7 processing cores we went from hours per question on a single CPU to an average of under 3 seconds. Results in 2010 Watson played 55 real-time sparring games against former Tournament of Champion Players, playing competitively in all games -- and placing 1 st in 71% of them!
Potential Business Applications Healthcare / Life Sciences: Diagnostic Assistance, Evidenced- Based, Collaborative Medicine Tech Support: Help-desk, Contact Centers Enterprise Knowledge Management and Business Intelligence Government: Improved Information Sharing and Security
Evidence Profiles from disparate data is a powerful idea Each dimension contributes to supporting or refuting hypotheses based on Strength of evidence Importance of dimension for diagnosis (learned from training data) Evidence dimensions are combined to produce an overall confidence Overall Confidence Positive Evidence Negative Evidence
The Core Technical Team* Researchers and Engineers in NLP, ML, IR, KR&R and CL at IBM Labs and a growing number of universities Siddarth Patwardhan There is a broader team that contributed to delivering Watson for the Stage, to compete in Jeopardy Games 56 IBM Confidential *NOT full-time Equivalents. Names listed if contributed some time to that part of project.
THANK YOU