Paul J. Ledak Vice President, IBM Research. 2011 IBM Corporation



Similar documents
IBM Watson : Beyond playing Jeopardy!

What is Watson An Overview

Putting IBM Watson to Work In Healthcare

MAN VS. MACHINE. How IBM Built a Jeopardy! Champion x The Analytics Edge

How Big Data and Artificial Intelligence Change the Game for. presented by Jamie Bisker Senior Analyst, P&C Insurance Aite Group

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:>

IBM Big Data in Government

Watson. An analytical computing system that specializes in natural human language and provides specific answers to complex questions at rapid speeds

A Strategic Approach to Unlock the Opportunities from Big Data

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Big Data, Analytics, Intelligence: Potenziale und Nutzen

IBM Netezza High Capacity Appliance

Dr. John E. Kelly III Senior Vice President, Director of Research. Differentiating IBM: Research

Big Data & Analytics for Semiconductor Manufacturing

Andre Standback. IT 103, Sec /21/12. IBM s Watson. GMU Honor Code on I am fully aware of the

Auto Classification and the Holy Grail for Records Managers

Technology and Trends for Smarter Business Analytics

Sources: Summary Data is exploding in volume, variety and velocity timely

III JORNADAS DE DATA MINING

Watson, what s on, what s next?

Cognitive z. Mathew Thoennes IBM Research System z Research June 13, 2016

LEARNING MASTERS. Explore the Northeast

LINIO COLOMBIA. Starting-Up & Leading E-Commerce. Luca Ranaldi, CEO. Pedro Freire, VP Marketing and Business Development

Auto-Classification for Document Archiving and Records Declaration

IBM Watson and Medical Records Text Analytics HIMSS Presentation

Ask your child what he or she is learning to say in Spanish at school. Encourage your child to act as if he or she is your teacher.

IBM's Watson could usher in new era of ALS research and medicine ons/ideas/index.html?

Customized Report- Big Data

IBM Content Analytics with Enterprise Search, Version 3.0

Beyond Watson: The Business Implications of Big Data

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

What is Artificial Intelligence?

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

Data Refinery with Big Data Aspects

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Using In-Memory Computing to Simplify Big Data Analytics

IoT Analytics: Four Key Essentials and Four Target Industries

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

Ranking de Universidades de Grupo of Eight (Go8)

Driving Business Value with Big Data and Analytics

Ask your child what he or she is learning to say in Spanish at school. Encourage your child to act as if he or she is your teacher.

Sales Management Main Features

IBM System x reference architecture solutions for big data

CSC384 Intro to Artificial Intelligence

Spanish GCSE Student Guide

Big Data overview. Livio Ventura. SICS Software week, Sept Cloud and Big Data Day

IBM PureSystems: Familia de Sistemas Expertos Integrados

ANALYTICS STRATEGY: creating a roadmap for success

Getting the most out of big data

Tap into Big Data at the Speed of Business

BIG Data Analytics Move to Competitive Advantage

Increase Revenue THE JOURNEY TO BIG DATA. Gary Evans. CTO EMC Ireland. Twitter.com/Gary3vans. Copyright 2013 EMC Corporation. All rights reserved.

BtoB MKT Trends. El Escenario Online. Luciana Sario. Gerente de Marketing IDC Latin America 2009 IDC W W W. I D C. C O M / G M S 1

demonstrates competence in

IBM Business Analytics and Optimization The Path to Breakaway Performance

Focus on the business, not the business of data warehousing!

Big Data and Analytics 21 A Technical Perspective Abhishek Bhattacharya, Aditya Gandhi and Pankaj Jain November 2012

How To Speak Spain

AP SPANISH LANGUAGE 2011 PRESENTATIONAL WRITING SCORING GUIDELINES

DIPLOMADO DE JAVA - OCA

Using Ultra-Large Data Sets in Healthcare New Questions-New Answers

Big Data and Trusted Information

Big Data: Study in Structured and Unstructured Data

Manejo Basico del Servidor de Aplicaciones WebSphere Application Server 6.0

Web Data Mining: A Case Study. Abstract. Introduction

SAP HANA Vora : Gain Contextual Awareness for a Smarter Digital Enterprise

CELEBRA EL AÑO NUEVO CHINO LESSON PLAN FOR GRADES K 2

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

Getting Started Practical Input For Your Roadmap

OFFICE OF COMMON INTEREST COMMUNITY OMBUDSMAN CIC#: DEPARTMENT OF JUSTICE

Keywords: Big Data, HDFS, Map Reduce, Hadoop

IBM Big Data Platform

IBM Data Warehousing and Analytics Portfolio Summary

Exploiting Data at Rest and Data in Motion with a Big Data Platform

The Prolog Interface to the Unstructured Information Management Architecture

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Manifest for Big Data Pig, Hive & Jaql

Big Data with Rough Set Using Map- Reduce

PREDICTIVE ANALYTICS FOR THE HEALTHCARE INDUSTRY

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada

Transcription:

Paul J. Ledak Vice President, IBM Research 2011 IBM Corporation

Watson answers a grand challenge Can we design a computing system that rivals a human s ability to answer questions posed in natural language, interpreting meaning and context and retrieving, analyzing and understanding vast amounts of lnowledge in real-time?

Want to Play Chess or Just Chat? Chess A finite, mathematically well-defined search space Limited number of moves and states All the symbols are completely grounded in the mathematical rules of the game Human Language Words by themselves have no meaning Only grounded in human cognition Words navigate, align and communicate an infinite space of intended meaning Computers can not ground words to human experiences to derive meaning

Easy Questions? ln((12,546,798 * π)) ^ 2 / 34,567.46 = 0.00885 Select Payment where Owner= David Jones and Type(Product)= Laptop, Owner David Jones Serial Number 45322190-AK Invoice # Vendor Payment INV10895 MyBuy $104.56 Serial Number Type Invoice # 45322190-AK LapTop INV10895 David Jones David Jones = Dave Jones David Jones 4 IBM Confidential

Hard Questions? Computer programs are explicit, fast and exacting with numbers and symbols. But Natural Language is highly contextual, ambiguous and often imprecise, using puns, slang, jargon, acronyms, and misused words. Structured Where was X born? Unstructured One day, from among his city views of Ulm, Otto chose a water color to send to Albert Einstein as a remembrance of Einstein s birthplace. Un dia, viendo la ciudad de Ulm, Otto escogio una acuarela para enviarsela a Albert Einstein como un recuerdo del lugar de naciemiento de Einstein. X ran this? If leadership is an art then surely Jack Welch has proved himself a master painter during his tenure at GE. Si el liderazgo es un arte, con toda seguridad Jack Welch ha demostrado ser un excelente artista durante su vinculacion a GE.

The Jeopardy! Challenge: A compelling way to conclusively demonstrate the technology of automatic Question Answering along 5 Key Dimensions Broad/Open Domain Complex Language High Precision Accurate Confidence High Speed $200 Si esta de pie, es la direccion en la cual debe mirear para ver el guardaexcoba. $600 En la division celular, la mitosis divide el nucleo y la citokinesis divide este liquido que protege el nucleo. $1000 La primera persona que se menciona por su nombre en El Hombre de la Mascara de Hierro es este heroe de un libro anterior de historietas del mismo autor $2000 De los 4 paises en el mundo con los cuales los Estados Unidos no mantiene relaciones diplomaticas, cual es el que esta ubicado mas al norte?

Watson an IBM Power 750 System Watson consists of 2,880 Power 7 Processor cores One question can take 2 hours on a single processor but must answer in 2-4 seconds 15 Terabytes RAM memory 500 Gigabytes of accumulated human knowledge in natural language text No connection to the internet 80 Teraflops 80 kw Power consumption w 25 tons of air conditioning 7

Broad Domain We do NOT attempt to anticipate all questions and build specialized databases. In a random sample of 20,000 questions we found 2,500 distinct types*. The most frequent occurring <3% of the timthe distribution has a very long tail. And for each these types 1000 s of different things may be asked. Even going for the head of the tail will barely make a dent *13% are non-distinct (e.g., it, this, these or NA) Our Focus is on reusable Natural Language Processing technology for analyzing volumes of as-is text. 8

Question DeepQA: The Technology Behind Watson Massively Parallel Probabilistic Evidence-Based Architecture Natural Language Text Analytics Solution Hypothesis and Evidence Profiles Machine Learning Question & Topic Analysis Primary Search Multiple Interpretations Answer Sources 100 s sources Question Decomposition Candidate Answer Generation 100 s Possible Answers Hypothesis Generation Answer Scoring 1000 s of Pieces of Evidence Evidence Sources Evidence Retrieval Hypothesis and Evidence Scoring Deep Evidence Scoring 100,000 s Scores from many Deep Analysis Algorithms Synthesis Learned Models help combine and weigh the Evidence Balance& Combine Models Models Models Models Models Models Final Confidence Merging & Ranking Analitica de Texto con Lenguage Natural Hipotesis y Perfiles de Evidencia La Maquina Aprende Answer & Confidence

Automatic Learning From Reading Sentence Parsing Generalization & Statistical Aggregation Volumes of Text Syntactic Frames subject verb object Semantic Frames Inventors patent inventions (.8) Officials Submit Resignations (.7) People earn degrees at schools (0.9) Fluid is a liquid (.6) Liquid is a fluid (.5) Vessels Sink (0.7) People sink 8-balls (0.5) (in pool/0.8)

Evaluating Possibilities and Their Evidence In cell division, mitosis splits the nucleus & cytokinesis splits this liquid cushioning the nucleus. En la division celular, la mitosis divide el nucleo y la citokinesis divide este liquido que protege el nucleo. Organelle Vacuole Cytoplasm Plasma Mitochondria Blood Is( Cytoplasm, liquid ) = 0.2 Is( organelle, liquid ) = 0.1 Is( vacuole, liquid ) = 0.2 Is( plasma, liquid ) = 0.7 Many candidate answers (CAs) are generated from many different searches Each possibility is evaluated according to different dimensions of evidence. Just One piece of evidence is the right type. In this case a liquid. Cytoplasm is a fluid surrounding the nucleus Wordnet Is_a(Fluid, Liquid)? Learned Is_a(Fluid, Liquid) yes.

Keyword Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer s arrival in India. En Mayo de 1898 Portugal celebro el 400 aniversario de la llegada de este explorador a la India. In May, Gary arrived in India after he celebrated his anniversary in Portugal. En Mayo, Gary llego a la India luego de celebrar su aniversario en Portugal. llego celebrao Keyword Matching celebrar In Mayo 1898 Keyword Matching In Mayo 400th aniversario Keyword Matching aniversario This evidence suggests Gary is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence Portugal llegada India explorerador Keyword Matching Keyword Matching Gary India in Portugal 12 IBM Confidential

Deeper Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer s arrival in India. En Mayo de 1898 Portugal celebro el 400 aniversario de la llegada de este explorador a la India. On the 27 th of May 1498, Vasco da Gama landed in Kappad Beach. El 27 de Mayo de 1498, Vasco da Gama arribo a la playa de Kappad. Search Far and Wide celebro Portugal Explore many hypotheses Find Judge Evidence arribo Mayo 1898 400th aniversario Many inference algorithms Temporal Reasoning Date Math 27th Mayo 1498 Stronger evidence can be much harder to find and score. llegada India explorador Statistical Paraphrasing GeoSpatial Reasoning Paraphrase s Geo- KB The evidence is still not 100% certain. Playa de Kappad Vasco da Gama 13

Evidence: Time, Popularity, Source, Classification etc. Clue: You ll find Bethel College and a Seminary in this holy Minnesota city. Podra encontrar la Universidad de Bethel y un Seminario en esta ciudad sagrada de Minnessota. There s a Bethel College and a Seminary in both cities. System is not weighing location evidence high enough to give St. Paul the edge.

Evidence: Puns Clue: You ll find Bethel College and a Seminary in this holy Minnesota city. Podra encontrar la Universidad de Bethel y un Seminario en esta ciudad sagrada de Minnessota. Humans may get this based on the pun since St. Paul since is a holy city. We added a Pun Scorer that discovers and scores Pun relationships.

DeepQA: Incremental Progress in Answering Precision on the Jeopardy Challenge: 6/2007-11/2010 IBM Watson Playing in the Winners Cloud v0.8 11/10 V0.7 04/10 v0.6 10/09 v0.5 05/09 v0.4 12/08 v0.3 08/08 v0.2 05/08 v0.1 12/07 Baseline 12/06

Potential Business Applications Healthcare / Life Sciences: Diagnostic Assistance, Evidenced- Based, Collaborative Medicine Tech Support: Help-desk, Contact Centers Enterprise Knowledge Management and Business Intelligence Government: Improved Information Sharing and Security

DeepQA in Continuous Evidence-Based Diagnostic Analysis Symptoms Considers and synthesizes a broad range of evidence improving quality, reducing cost Diagnosis Models Fin d M ed s Hist Fa m Symp Confidence Family History Patient History Medications Tests/Findings Renal failure UTI Diabetes Notes/Hypotheses Influenza hypokalemia esophogitis Most Confident Diagnosis: Diabetes UTI Influenza Most Confident Diagnosis: Diabetes and Esophogitis Huge Volumes of Texts, Journals, References, DBs etc.

Imagine if you had all the answers you need to win IBM Business Analytics and Optimization solutions can ensure that you do Used by Watson Related Innovations IBM Content Analytics Natural Language Processing and content analysis leveraging UIMA InfoSphere BigInsights Big Data analysis (Hadoop) IBM Power Systems Thousands of parallel processes InfoSphere Warehouse DB2, Informix, Netezza Aggregating and storing data and content InfoSphere Streams Massively parallel analysis Business Analytics BI, Predictive Analytics and more ECM Solutions IBM ediscovery Analyzer IBM Classification Module IBM OmniFind Enterprise Search Workload Optimized Systems Integrated, Optimized by Workload IBM Global Business Services Research, expertise and analytical assets

IBM Business Analytics and Optimization Solutions enable you apply capabilities like those in Watson to optimize your business IBM is uniquely qualified to help you Plan an Information Agenda Master Your Information Apply Business Analytics to align with your business strategy to ensure it is accurate, relevant and governed to anticipate and shape business outcomes 20 2011 IBM Corporation

GRACIAS IBM Confidential