1 Paul J. Ledak Vice President, IBM Research 2011 IBM Corporation
2 Watson answers a grand challenge Can we design a computing system that rivals a human s ability to answer questions posed in natural language, interpreting meaning and context and retrieving, analyzing and understanding vast amounts of lnowledge in real-time?
3 Want to Play Chess or Just Chat? Chess A finite, mathematically well-defined search space Limited number of moves and states All the symbols are completely grounded in the mathematical rules of the game Human Language Words by themselves have no meaning Only grounded in human cognition Words navigate, align and communicate an infinite space of intended meaning Computers can not ground words to human experiences to derive meaning
4 Easy Questions? ln((12,546,798 * π)) ^ 2 / 34, = Select Payment where Owner= David Jones and Type(Product)= Laptop, Owner David Jones Serial Number AK Invoice # Vendor Payment INV10895 MyBuy $ Serial Number Type Invoice # AK LapTop INV10895 David Jones David Jones = Dave Jones David Jones 4 IBM Confidential
5 Hard Questions? Computer programs are explicit, fast and exacting with numbers and symbols. But Natural Language is highly contextual, ambiguous and often imprecise, using puns, slang, jargon, acronyms, and misused words. Structured Where was X born? Unstructured One day, from among his city views of Ulm, Otto chose a water color to send to Albert Einstein as a remembrance of Einstein s birthplace. Un dia, viendo la ciudad de Ulm, Otto escogio una acuarela para enviarsela a Albert Einstein como un recuerdo del lugar de naciemiento de Einstein. X ran this? If leadership is an art then surely Jack Welch has proved himself a master painter during his tenure at GE. Si el liderazgo es un arte, con toda seguridad Jack Welch ha demostrado ser un excelente artista durante su vinculacion a GE.
6 The Jeopardy! Challenge: A compelling way to conclusively demonstrate the technology of automatic Question Answering along 5 Key Dimensions Broad/Open Domain Complex Language High Precision Accurate Confidence High Speed $200 Si esta de pie, es la direccion en la cual debe mirear para ver el guardaexcoba. $600 En la division celular, la mitosis divide el nucleo y la citokinesis divide este liquido que protege el nucleo. $1000 La primera persona que se menciona por su nombre en El Hombre de la Mascara de Hierro es este heroe de un libro anterior de historietas del mismo autor $2000 De los 4 paises en el mundo con los cuales los Estados Unidos no mantiene relaciones diplomaticas, cual es el que esta ubicado mas al norte?
7 Watson an IBM Power 750 System Watson consists of 2,880 Power 7 Processor cores One question can take 2 hours on a single processor but must answer in 2-4 seconds 15 Terabytes RAM memory 500 Gigabytes of accumulated human knowledge in natural language text No connection to the internet 80 Teraflops 80 kw Power consumption w 25 tons of air conditioning 7
8 Broad Domain We do NOT attempt to anticipate all questions and build specialized databases. In a random sample of 20,000 questions we found 2,500 distinct types*. The most frequent occurring <3% of the timthe distribution has a very long tail. And for each these types 1000 s of different things may be asked. Even going for the head of the tail will barely make a dent *13% are non-distinct (e.g., it, this, these or NA) Our Focus is on reusable Natural Language Processing technology for analyzing volumes of as-is text. 8
9 Question DeepQA: The Technology Behind Watson Massively Parallel Probabilistic Evidence-Based Architecture Natural Language Text Analytics Solution Hypothesis and Evidence Profiles Machine Learning Question & Topic Analysis Primary Search Multiple Interpretations Answer Sources 100 s sources Question Decomposition Candidate Answer Generation 100 s Possible Answers Hypothesis Generation Answer Scoring 1000 s of Pieces of Evidence Evidence Sources Evidence Retrieval Hypothesis and Evidence Scoring Deep Evidence Scoring 100,000 s Scores from many Deep Analysis Algorithms Synthesis Learned Models help combine and weigh the Evidence Balance& Combine Models Models Models Models Models Models Final Confidence Merging & Ranking Analitica de Texto con Lenguage Natural Hipotesis y Perfiles de Evidencia La Maquina Aprende Answer & Confidence
10 Automatic Learning From Reading Sentence Parsing Generalization & Statistical Aggregation Volumes of Text Syntactic Frames subject verb object Semantic Frames Inventors patent inventions (.8) Officials Submit Resignations (.7) People earn degrees at schools (0.9) Fluid is a liquid (.6) Liquid is a fluid (.5) Vessels Sink (0.7) People sink 8-balls (0.5) (in pool/0.8)
11 Evaluating Possibilities and Their Evidence In cell division, mitosis splits the nucleus & cytokinesis splits this liquid cushioning the nucleus. En la division celular, la mitosis divide el nucleo y la citokinesis divide este liquido que protege el nucleo. Organelle Vacuole Cytoplasm Plasma Mitochondria Blood Is( Cytoplasm, liquid ) = 0.2 Is( organelle, liquid ) = 0.1 Is( vacuole, liquid ) = 0.2 Is( plasma, liquid ) = 0.7 Many candidate answers (CAs) are generated from many different searches Each possibility is evaluated according to different dimensions of evidence. Just One piece of evidence is the right type. In this case a liquid. Cytoplasm is a fluid surrounding the nucleus Wordnet Is_a(Fluid, Liquid)? Learned Is_a(Fluid, Liquid) yes.
12 Keyword Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer s arrival in India. En Mayo de 1898 Portugal celebro el 400 aniversario de la llegada de este explorador a la India. In May, Gary arrived in India after he celebrated his anniversary in Portugal. En Mayo, Gary llego a la India luego de celebrar su aniversario en Portugal. llego celebrao Keyword Matching celebrar In Mayo 1898 Keyword Matching In Mayo 400th aniversario Keyword Matching aniversario This evidence suggests Gary is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence Portugal llegada India explorerador Keyword Matching Keyword Matching Gary India in Portugal 12 IBM Confidential
13 Deeper Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer s arrival in India. En Mayo de 1898 Portugal celebro el 400 aniversario de la llegada de este explorador a la India. On the 27 th of May 1498, Vasco da Gama landed in Kappad Beach. El 27 de Mayo de 1498, Vasco da Gama arribo a la playa de Kappad. Search Far and Wide celebro Portugal Explore many hypotheses Find Judge Evidence arribo Mayo th aniversario Many inference algorithms Temporal Reasoning Date Math 27th Mayo 1498 Stronger evidence can be much harder to find and score. llegada India explorador Statistical Paraphrasing GeoSpatial Reasoning Paraphrase s Geo- KB The evidence is still not 100% certain. Playa de Kappad Vasco da Gama 13
14 Evidence: Time, Popularity, Source, Classification etc. Clue: You ll find Bethel College and a Seminary in this holy Minnesota city. Podra encontrar la Universidad de Bethel y un Seminario en esta ciudad sagrada de Minnessota. There s a Bethel College and a Seminary in both cities. System is not weighing location evidence high enough to give St. Paul the edge.
15 Evidence: Puns Clue: You ll find Bethel College and a Seminary in this holy Minnesota city. Podra encontrar la Universidad de Bethel y un Seminario en esta ciudad sagrada de Minnessota. Humans may get this based on the pun since St. Paul since is a holy city. We added a Pun Scorer that discovers and scores Pun relationships.
16 DeepQA: Incremental Progress in Answering Precision on the Jeopardy Challenge: 6/ /2010 IBM Watson Playing in the Winners Cloud v0.8 11/10 V0.7 04/10 v0.6 10/09 v0.5 05/09 v0.4 12/08 v0.3 08/08 v0.2 05/08 v0.1 12/07 Baseline 12/06
17 Potential Business Applications Healthcare / Life Sciences: Diagnostic Assistance, Evidenced- Based, Collaborative Medicine Tech Support: Help-desk, Contact Centers Enterprise Knowledge Management and Business Intelligence Government: Improved Information Sharing and Security
18 DeepQA in Continuous Evidence-Based Diagnostic Analysis Symptoms Considers and synthesizes a broad range of evidence improving quality, reducing cost Diagnosis Models Fin d M ed s Hist Fa m Symp Confidence Family History Patient History Medications Tests/Findings Renal failure UTI Diabetes Notes/Hypotheses Influenza hypokalemia esophogitis Most Confident Diagnosis: Diabetes UTI Influenza Most Confident Diagnosis: Diabetes and Esophogitis Huge Volumes of Texts, Journals, References, DBs etc.
19 Imagine if you had all the answers you need to win IBM Business Analytics and Optimization solutions can ensure that you do Used by Watson Related Innovations IBM Content Analytics Natural Language Processing and content analysis leveraging UIMA InfoSphere BigInsights Big Data analysis (Hadoop) IBM Power Systems Thousands of parallel processes InfoSphere Warehouse DB2, Informix, Netezza Aggregating and storing data and content InfoSphere Streams Massively parallel analysis Business Analytics BI, Predictive Analytics and more ECM Solutions IBM ediscovery Analyzer IBM Classification Module IBM OmniFind Enterprise Search Workload Optimized Systems Integrated, Optimized by Workload IBM Global Business Services Research, expertise and analytical assets
20 IBM Business Analytics and Optimization Solutions enable you apply capabilities like those in Watson to optimize your business IBM is uniquely qualified to help you Plan an Information Agenda Master Your Information Apply Business Analytics to align with your business strategy to ensure it is accurate, relevant and governed to anticipate and shape business outcomes IBM Corporation
RC24789 (W0904-093) April 22, 2009 Computer Science IBM Research Report Towards the Open Advancement of Question Answering Systems David Ferrucci 1, Eric Nyberg 2, James Allan 3, Ken Barker 4, Eric Brown
Big Data Analytics - Zwischen Wunsch und Realität Dr. Wolfgang Rother IBM Deutschland GmbH Nahmitzer Damm 12 12277 Berlin Email: firstname.lastname@example.org 1 Agenda Über Daten Paradigmenwechsel Apache Hadoop Ein
NESSI White Paper, December 2012 Big Data A New World of Opportunities Contents 1. Executive Summary... 3 2. Introduction... 4 2.1. Political context... 4 2.2. Research and Big Data... 5 2.3. Purpose of
Customer Cloud Architecture for Big Data and Analytics Executive Overview Using analytics reveals patterns, trends and associations in data that help an organization understand the behavior of the people
IBM Software Business Analytics Big Data Business Analytics for Big Data Unlock value to fuel performance 2 Business Analytics for Big Data Contents 2 Introduction 3 Extracting insights from big data 4
International Journal of Computer Science and Applications, Technomathematics Research Foundation Vol. 11, No. 3, pp. 116 127, 2014 ANALYTICS ON BIG AVIATION DATA: TURNING DATA INTO INSIGHTS RAJENDRA AKERKAR
3 Big Data: Challenges and Opportunities Roberto V. Zicari Contents Introduction... 104 The Story as it is Told from the Business Perspective... 104 The Story as it is Told from the Technology Perspective...
IBM's Watson could usher in new era of ALS research and medicine http://www.ibm.com/smarterplanet/us/en/healthcare_soluti ons/ideas/index.html?re=cs1 By Sharon Gaudin February 17, 2011 06:00 AM ET IBM's
TABLE OF CONTENTS 4 WHAT IS PREDICTIVE ANALYTICS? 6 WHO USES PREDICTIVE ANALYTICS AND HOW? 7 HOW DOES PREDICTIVE ANALYTICS HELP AUTOMATE DECISIONS? 8 HOW DOES PREDICTIVE ANALYTICS DIFFER FROM DATA MINING
IBM Industries White paper Business analytics in the cloud Driving business innovation through cloud computing and analytics solutions 2 Business analytics in the cloud Contents 2 Abstract 3 The case for
Fact-based question decomposition in DeepQA Factoid questions often contain more than one fact or assertion about their answers. Question-answering (QA) systems, however, typically do not use such fine-grained
DELIVERING ON THE PROMISE OF BIG DATA AND THE CLOUD by Mark Jacobsohn Senior Vice President Booz Allen Hamilton Joshua Sullivan, PhD Vice President Booz Allen Hamilton WHY CAN T WE SEEM TO DO MORE WITH
Understanding and Exploiting User Intent in Community Question Answering Long Chen Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer
Hurwitz ViCtOrY index Advanced Analytics: The Hurwitz Victory Index Report SAP Hurwitz Index d o u b l e v i c t o r Marcia Kaufman COO and Principal Analyst Daniel Kirsch Senior Analyst Table of Contents
IBM Global Business Services Business Analytics and Optimization In collaboration with Saïd Business School at the University of Oxford Executive Report IBM Institute for Business Value Analytics: Real-world
OPEN DATA CENTER ALLIANCE : sm Big Data Consumer Guide SM Table of Contents Legal Notice...3 Executive Summary...4 Introduction...5 Objective...5 Big Data 101...5 Defining Big Data...5 Big Data Evolution...7
TABLE OF CONTENTS Introduction... 3 The Importance of Triplestores... 4 Why Triplestores... 5 The Top 8 Things You Should Know When Considering a Triplestore... 9 Inferencing... 9 Integration with Text
ANALYSE THIS PREDICT THAT How institutions compete and win with data analytics Foreward 4 1.0 Executive summary 5 2.0 Major competitive growth forces 8 2.1 From the information age, to the personalisation
Big Data in Big Companies Date: May 2013 Authored by: Thomas H. Davenport Jill Dyché Copyright Thomas H. Davenport and SAS Institute Inc. All Rights Reserved. Used with permission Introduction Big data
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
For: Application Development & Delivery Professionals The Forrester Wave : Big Data Hadoop Solutions, Q1 2014 by Mike Gualtieri and Noel Yuhanna, February 27, 2014 Key Takeaways Hadoop s Momentum Is Unstoppable
ISSN 0103-9741 Monografias em Ciência da Computação n 46/08 Brazilian Institute for Web Science Research Nelson Maculan Carlos José Pereira de Lucena (editors) Departamento de Informática PONTIFÍCIA UNIVERSIDADE
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...
www.pwc.com/us/bigdata Capitalizing on the promise of Big Data How a buzzword morphed into a lasting trend that will transform the way you do business January 2013 Your competitors may already be staking
Integrating Predictive Analytics and Social Media Yafeng Lu, Robert Krüger, Student Member, IEEE, Dennis Thom, Feng Wang, Steffen Koch, Member, IEEE, Thomas Ertl, Member, IEEE, and Ross Maciejewski, Member,