Project Lab, SS 2016, 9CP April 22, 2016, TU Darmstadt, Germany Question Answering Technologies Behind (and with) IBM Watson Steffen Remus, Chinara Mammadova, Chris Biemann chinara_mammadova@yahoo.com remus@lt.informatik.tu-darmstadt.de
Cognitive Computing Eras of Computing: Tabulated era: Computers are designed to count Programmable era: All functions are programmed and tightly controlled Cognitive era: Systems that learn, and get smarter over time Adaptivity Interactivity Iterativity Contextuality Surely, it will be hard to understand such a system in detail. But who would want to meticulously control every piece of such a system, when one can simply let it emerge? -- Chris Biemann Language Technology is a natural spearhead of cognitive computing: Language is too variable, too voilatile and too situative to be covered with static logics Language is a natural interface to humans, allowing natural interactions with laymen, thus a lot of data for learning 2
Natural Language Understanding the key to intelligent behavior Most information and knowledge is encoded in unstructured form in natural language When humans learn about a new topic, they read about it machines should do the same Natural language content on the internet is growing constantly Natural Language is evolving, and natural language processing should account for that 3
Outline This week General Introduction Demo of technologies: BlueMix Services Watson Private Instance 2015 Lab App: Simpsons Q&A Presentation of possible projects Discussion, group finding, initial project distribution Next week: Fixing groups and project distribution Distribution of logins, technical infrastructure etc. 4
Watson in Teaching and Education@TUDA Summer 2013: Watson Tutorial by AlfioGliozzo@TUDA Seminar Knowledge Engineering for Question Answering Systems (with J. Fürnkranz) Summer 2014: Watson Tutorial 2013 Seminar Knowledge Engineering for Question Answering Systems (with J. Fürnkranz) Project lab Question Answering Systems Project: Semantic Technologies in IBM Watson Summer 2015: Project lab Question Answering Technologies Behind IBM Watson 1st Student Lab in Europe with access to private instances to IBM Watson Goals: Understand the state of the art in question answering Hands-on experience on language technology in the context of QA building Watson 5
Example 1: Location Questions http://maggie.lt.informatik.tu-darmstadt.de/lqa/ Simple yet complete QA System on unstructured data Translate NL query into index query to Wikipedia Name tagging on result pages Rank locations based on relevance to query 6
Example 2: QA over structured data http://maggie.lt.informatik.tu-darmstadt.de/pal-server/ Parse question and translate it to SPARQL query Run on public SPARQL endpoints to obtain answers 7
Example 3: Watson powered Q&A game http://maggie.lt.informatik.tu-darmstadt.de/simpsons-demo/ Feed data Create question and answers Get and process answers Present as a game 8
2016 Project lab: using Watson Experience Manager... s 9
2016 Project lab:... to build a cognitive app powered by QAAPI and BlueMix Services s Pic: http://www.weevermedia.com/app-marketing/users-download-apps 10
Course Requirements Demonstrate a cool prototype in a final meeting Report: How did you achieve the prototype, which techniques did you use? For grading individual members of groups: indicate who did what Supervision: Support with Watson Q&A Support with BlueMix services General support 11
Project Lab, SS 2016 Question Answering Technologies Behind IBM Watson So... What is Watson?
So. What is Watson? 2010 IBM Corporation 13
Watson as a collection of services 14
Alchemy API included http://www.alchemyapi.com/products/demo/alchemylanguage 15
Watson API Example Dialog Service 16
Bluemix Console 17
Demo / References http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/servi ces-catalog.html http://quappjs.eu-gb.mybluemix.net/ https://www.lt.informatik.tu-darmstadt.de/de/teaching/lectures-andclasses/summer-term-2015/question-answering-technologies-behindibm-watson/ http://maggie.lt.informatik.tu-darmstadt.de/watsondemo 18
Watson Private Instance One of the Watson cloud services Configured and accessed through Watson Experience Manager (WEM), a browser-based tool to develop powered-by-watson apps Not accessible with a standard BlueMix account: per invitation only 19
Watson Experience Manager (WEM) 20
Watson Private Instance Prerequisites: Corpus Selection Data Collection Data Preparation WEM: Watson Experience Manager - The interface to Watson Manage Corpus Train Watson Configure Watson Test Watson 21
Corpus Selection What makes a good use case? 1. Question / Answer Patterns 2. Data with unstructured information 3. Need for evidence and confidence 22
Best Practices Narrow the scope Watson can processes file types: doc, html and pdf Keep total uploaded file size under 10GB Rules of Thumb: Avoid duplicate documents Few high quality documents are more valuable that many redundant low quality documents 23
Curating Input Data Prerequisites: Well segmented documents Titles, sections, paragraphs Watson understands html, pdf, txt Titles are very important Tables / nested tables should be avoided 24
WEM Components 25 25
WEM Corpus Management 26
User Roles 27
Creating Training Data Create the training data that represents the kinds of questions that users in a production environment will ask and the kinds of responses they will receive. Prepare Watson for training Collect representative questions Expert Training Manage the process of questions and answers 28
Representative Questions Those, users might ask Help to train IBM Watson Two ways to add questions Question input tool Expert Training tool Help to identify the content Domain experts review questions Answer should exist in the document to match a question Watson learn from the content where answers found 29
Watson Question Input Tool 30
Expert Training Add Question 31
Expert Training Explicitly link expected input (the question) to expected output (the answer) Find answers By matching the question to a similar question question cluster By matching the question to a complete answer from a list of formatted answers. By specifying one or more correct answer passages from a separate list of documents. 32
Expert Training Match Question 33
Expert Training Match an Answer 34
Expert Training Specify Answer 35
Expert Training Question Review 36
WEM Corpus Statistics 37
Test and Deploy 38
RESTful API 39
JWatson Java Watson Rest API Wrapper https://github.com/tudarmstadt-lt/jwatson 40
Project Lab Task 1 Get the Answer
Problem Description For how much does Bart sell his soul to Milhouse? Where is the correct answer? Watson isn t able to extract the answer from this comprehensive paragraph 42
Expected Result For how much does Bart sell his soul to Milhouse? 43
Task Description Goal: The goal of this project is to provide an appropriate single answer, for example entity name, number, or date using the Watson response Tasks Post/Pre-processing and answer extraction Prepare an interface to display answer and evidences Add Documents Train Watson by adding new questions to the ground truth Finally, present your work in a great presentation 44
Project Lab: Task 2 Information Network
Example: News Explorer http://www.news-explorer.com 46
Task Description Goal: Develop a Web App that connects relevant information and presents connected / related information E.g. named entities Tasks: Extract key information from texts Show related information Prepare an interface to display answer and evidences Extract, Curate and Add Documents Train Watson by adding new questions to the ground truth Finally, present your work in a great presentation 47
GoT Network http://gameofthrones.wikia.com 48
Project Lab: Task 3 Social Watson
Watson in Social Interactions 50
Task Description Goal: Develop a mobile application that will allow Watson to analyze conversations and present facts with evidences Tasks: Create a stunning mobile app using IBM Watson Prepare an interface to display answer and evidences Extract, Curate and Add Documents Train Watson by adding new questions to the ground truth Finally, present your work in a great presentation 51
Project Lab: Task 4 Your Personal Assistant Answers according to individual settings
Hello Watson, Hello Watson, what are the highlights nearby? Based on Darmstadt as your location and your preference to visit museums, I recommend going to Hessisches Landesmuseum. I have 5 more recommendations, do you want another one? 53
Travel Corpus* Darmstadt 54
Task Description Goal: Use personal information of users to improve Watsons answers Tasks: Focus on travelling assistance Post/Pre-processing and answer extraction Create user management Reformulate questions / re-rank answers based on a user s profile / preferences Prepare an interface to display answer and evidences Train Watson by adding questions to the ground truth Finally, present your work in a great presentation * http://sirius.clarity-lab.org/ 55
Project Lab: Task 5 Expert Finder
Find experts for a particular field of interest http://aminer.org 57
Task Description Goal: Search for topic, find experts in the field Tasks: Crawl webpages of TU-Darmstadt Identify Persons and Fields of Expertise Generate pseudo documents and feed to Watson Prepare an interface to display answer and evidences Finally, present your work in a great presentation 58
Project Lab: Task 6 Technical Assistance
Task Description Goal: Ask questions about technical details Tasks: Analyze Technical Documentations and FAQs Identify concepts Detect tones in the question of a user and respond to the question appropriately Prepare an interface to display answer and evidences Finally, present your work in a great presentation 60
Project Lab: Task X Tell us what you d like to have Pic: http://www.weevermedia.com/app-marketing/users-download-apps
Further information http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/servi ces-catalog.html https://console.ng.bluemix.net http://www.ibm.com/watsonacademy http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/apis/ http://ibm.co/1ycv7qz https://www- 304.ibm.com/connections/communities/service/html/communityview?com munityuuid=f5d2b281-cc69-4de0-b85e-cd332acc74a4 http://watson4all.blogspot.de/ https://www.youtube.com/playlist?list=plzdyxllnkry- 2oGTHZrKILIDz7u4-p3dR 62
How to Proceed Next week: Fixing groups and project distribution Distribution of logins, technical infrastructure etc. About every two weeks Meeting with student assistant Report progress Fix issues and setbacks About once in a month Meet with lab coordinators and student assistance Report progress Collect feedbacks and set next goals 63
64
brainstormin 1) Expert finder crawl TU DA pages, identify both persons and content, set up a QA system for finding experts within TU Darmstadt 2) News Tracker (English) Extract and visualize NEtwork of the day/week from English daily news, likenodservices- Relationship Extraction- Alchemy API- IBM Graph? 65