Giuseppe Riccardi, Marco Ronchetti University of Trento 1
Outline Searching Information Next Generation Search Interfaces Needle E-learning Application Multimedia Docs Indexing, Search and Presentation Demo Conclusion Giuseppe Riccardi University of Trento 2
Searching Information (Past) Query model State-of-the-art -> bag of words Loyal Users Indexing Web pages, pdf, doc... State-of-the-Art ->index data structures Evaluation $$$ Increasing the quality of the retrieved docs through Ranking Crawling Giuseppe Riccardi University of Trento 3
Where is Mississippi (11/30/2006) Giuseppe Riccardi University of Trento 4
Microsoft Reorg a bulwark against Google Outline (11/30/2006) Giuseppe Riccardi University of Trento 5
Searching Information (Present) Meta-Engines Docs are retrieved by de-facto standard search engines Query-Answer pairs extraction (e.g. ask.com) Docs are Clustered (many-to-one) (e.g. vivissimo.com) Visualized via multiple views (many-to-many) Giuseppe Riccardi University of Trento 6
Microsoft Reorg a bulwark against Google Outline (11/30/2006) Giuseppe Riccardi University of Trento 7
Companies Microsoft Reorg a bulwark against Google Giuseppe Riccardi University of Trento 8
Topics Microsoft Reorg a bulwark against Google Giuseppe Riccardi University of Trento 9
Stories Microsoft Reorg a bulwark against Google Giuseppe Riccardi University of Trento 10
Read/Center Story Microsoft Reorg a bulwark against Google Giuseppe Riccardi University of Trento 11
Microsoft Reorg a bulwark against Google Giuseppe Riccardi University of Trento 12
What is next? Typical tasks of web users Transactional Navigational Informational Task-Driven search Information search is part of a given task Business Intelligence (e.g. Decision Making) E-Learning ( e.g. Student E-Tutoring) Vertical Search Engines Giuseppe Riccardi University of Trento 13
Next Generation Information Search Interfaces Search Multimedia Documents Indexing, Ranking Large Scale Real-time Query Multimodal Input (text, gesture, speech) Vertical Engine Limited Domain (e.g. business, education) Structured & Annotated Content (not free!) Certification of the results What is the success rate of medication X? Results Multimedia Presentation Bandwitdth Giuseppe Riccardi University of Trento 14
Needle Research Program Search Multimedia Content Audio, Video, Metadata streams Indexing Video Automatic Speech Recognition (Unlimited-CSR) Semantic Segmentation Topic segmentation Domain Ontology Input Natural Language Query (Spoken or Text or Multimodal) Presentation Multimedia Presentation Usability Giuseppe Riccardi University of Trento 15
Needle: E-Learning Domain Large amount of different kinds of educational resources. Video lectures Books Slides Interactive whiteboard streams Goal: Information Search Interfaces for e-learning Giuseppe Riccardi University of Trento 16
Where is the content? Domain: Education Domain MSRI Math-CS research/advanced topics (skewed) Video/Audio lectures Presentation vgs from video close shots MIT Courseware (syllabus, lecture notes) Video/Audio lectures Wide range of topics University of Trento Video/Audio lectures Synch powerpoint presentation-video-audio Skewed topics (CS & other) Giuseppe Riccardi University of Trento 17
System Components LODE Content Creation video lecture acquisition and synchronization with the learning materials, and of their reproduction in a web browser. Needle : Interface for searching though the multimedia content and generating the multimedia documents. Giuseppe Riccardi University of Trento 18
LODE LODE is a software for lowcost acquisition of lectures no special requirements for the end user. Good quality audio and video + Images of the slides projected in class Tools for navigating the lecture (by section title, by other indexes or through a time-slider) Annotating video lectures with documents. One DVD for a 50-hours class (MP4). Streaming or Download Off-Line(DVD) Giuseppe Riccardi University of Trento 19
System Architecture Transcripts Slides Interactive Whiteboard Forums Meeting Recordings Audio Video Giuseppe Riccardi University of Trento 20
DB Structure 5 main entities: Actor ( the teacher), Event (a lecture), Series (a course), View (part of a document) and Document (a MS PowerPoint presentation). Giuseppe Riccardi University of Trento 21
Multimedia Database (2003-Present) Lecture Topics 8% 6% Computer Science 86% Meteorology Sociology Languages 76% 24% English Italian Giuseppe Riccardi University of Trento 22
Database Statistics English Italian speakers hours speakers hours HTL06 WeeNet Summer School [2006] 10 SSSW06 [2006] 10 SSSW05 [2005] 10 Distribuited Systems - Design [2005] 1 40 Machine Learning [2006] 1 40 Corso Meteorologia [2005] 2 6 8 24 Programmazione 2 [2006] 1 40 Science Faculty Seminars [2005] 3 3 Architettura degli Elaboratori [2004] 1 40 Ingegneria del Software [2004] 1 40 Lab. di Algoritmi e Strutture Dati [2004] 1 15 Lab. Sistemi Operativi [2004] 1 15 Lab. Programmazione 2 [2003] 1 40 Programmazione 2 [2003] 1 40 Sociologia del Turismo [2003] 1 20 34 86 19 277 416 Giuseppe Riccardi University of Trento 23
Utterance Length Statistics Min : 1 Max : 78 Average : 19,9 Frequency Words Giuseppe Riccardi University of Trento 24
Multimedia Indexing (speech driven) operatore new Speech Video Slides time Giuseppe Riccardi University of Trento 25
Multimedia Indexing (Metadata driven) operatore new Slides Video time Giuseppe Riccardi University of Trento 26
Prototype Multimedia data streams (Audio, Video, ASR, Metadata) Indexing Multimedia docs search Present & Browse Giuseppe Riccardi University of Trento 27
Demo With Angela Fogarolli, Alessandro Bertacco (UNITN) Giuseppe Riccardi University of Trento 28
E-learning evaluation Kirkpatrick s 4 levels Level 1 Reactions (Qualitative) Did they like it? Was the material relevant to their work? Level 2 Learning (Quantitative) formal to informal testing to team assessment and self-assessment. Level 3 Behavior (Qualitative) Are the newly acquired skills, knowledge, or attitude being used in the everyday environment of the learner? Level 4 Results (Quantitative) measures the success of the program in terms that managers and executives can understand Kirkpatrick, D.L. (1994). Evaluating Training Programs: The Four Levels. San Francisco, CA: Berrett-Koehler. Giuseppe Riccardi University of Trento 29
Future Research Multimedia database with resource of other kind (interactive whiteboard recording, discussion, real and virtual meeting registration). Ontologies linking to offer knowledge-supported search. Training of Unlimited-ASR Portable (domains) Spoken Language Understanding (Query/Doc) Semantic indexing Evaluation E-learning domain Content Creation Inter-University collaborative efforts Giuseppe Riccardi University of Trento 30
Conclusion Information Search Past & Present Next Generation Information Search Needle Multimedia Documents Indexing Search Presentation Content Creation Inter-University collaborative efforts Giuseppe Riccardi University of Trento 31