Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Size: px

Start display at page:

Download "Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree."

Emil Francis
8 years ago
Views:

1 Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table Multimedia Archives Tennis Mountain Machine Fire 1

Technologies The Netherlands Problem statement US flag Tree

2 The science of labeling To understand anything in science, things need a name that is universally recognized categories living organisms chemical elements human genome Worldwide endeavor in naming visual information How difficult is the problem? Human vision consumes 50% brain power Van Essen, Science

human genome Worldwide endeavor in naming visual information How difficult is

3 Slide credit: Andrew Zisserman Naming visual information Concept detection Focus of today s talk Does the image contain an airplane? Object localization Where is the airplane, (if any)? Object segmentation Which pixels are part of an airplane, (if any)? Concept detection in a nutshell Visualization by Jasper Schulte 3

Object localization Where is the airplane, (if any)?

4 Variation in appearance So many images of one thing, due to minor differences in: illumination background occlusion viewpoint, This is the sensory gap MediaMill concept detection 4

5 Step 1: sampling points (standard) Mikolajczyk, IJCV 2005 Lazebnik, CVPR 2006 Zhang, IJCV 2007 Orientation and scale of concepts change Salient point methods robustly detect regions Preferred for robust concept detection Harris-Laplace Dense sampling Spatial Pyramid Step 2: point description Burghouts, CVIU 2009 van de Sande, PAMI 2010 Lowe s SIFT descriptor measures intensity only but illumination of concepts change so, detection suffers from unstable region description Color descriptors Increase illumination invariance Increase discriminative power 5

point description Burghouts, CVIU 2009 van de Sande, PAMI 2010 Lowe s SIFT descriptor measures intensity only but illumination of concepts

6 Van de Sande, PAMI 2010 Illumination invariance Light intensity change Invariance properties of the descriptors used Light intensity shift Light intensity change and shift Light color change Light color change and shift SIFT OpponentSIFT +/- + +/- +/- +/- C-SIFT /- +/- rgsift /- +/- RGB-SIFT Leung and Malik, IJCV, 2001 Sivic and Zisserman, ICCV, 2003 van Gemert, CVIU, 2010 Step 3: descriptor quantization Codebook model Create a codeword vocabulary Discretize image with codewords Represent image as codebook histogram

+ + +/- +/- RGB-SIFT + + + + + Leung and Malik, IJCV, 2001 Sivic and Zisserman, ICCV, 2003 van Gemert, CVIU, 2010 Step 3: descriptor quantization

7 Step 4a: Efficient classification Maji et al., CVPR 2008 For the Intersection Kernel h i is i piecewise linear, and quite smooth, blue plot. We can approximate with fewer uniformly spaced segments, red plot. Saves time & space! Van de Sande, TMM 2011 Step 4b: GPU-empowered classification CPU 4 CPU 1x CPU Opteron 250 (2,4GHz) 1x CPU Core i7 920 (2,66GHz) x CPU Opteron 250 (2,4 GHz) 4x CPU Core i7 920 (2,66GHz) 25x CPU Opteron 250 (2,4GHz) Tim me (s) x CPU Opteron 250 (2,4GHz) 1x GPU Geforce GTX260 (27 cores) x speed-up single CPU 10x speed-up quad core GPU 2x speed-up 49cpu cluster Total Feature Vector Length 7

Van de Sande, TMM 2011 Step 4b: GPU-empowered classification 40000 1 CPU 4 CPU 1x CPU Opteron 250 (2,4GHz) 1x CPU Core i7 920 (2,66GHz) 30000 4x CPU Opteron 250 (2,4 GHz)

8 MediaMill concept detection Evaluation best treated by TRECVID Situation in 2000 Various concept definitions Specific and small data sets Hard to compare methodologies = Researchers NIST Since 2001 worldwide evaluation by NIST TRECVID benchmark 8

Specific and small data sets Hard to compare methodologies =

9 NIST TRECVID benchmark Promote progress in video retrieval research Provide common dataset Challenging tasks Independent evaluation protocol Forum for researchers to compare results Snoek et al. TRECVID 2009 TRECVID 2009 results Best performer for 10 out of 20 concepts Best overall performer 9

Provide common dataset Challenging tasks Independent evaluation protocol Forum

10 Snoek, TMM 2007 MediaMill video search engine CrossBrowser combines query results and time time ranked results TRECVID 2010 Internet Archive web videos 10

11 Snoek et al, TRECVID 2010 TRECVID 2010 results Best performer for 6 out of 30 concepts Best overall performer Snoek et al, TRECVID Are we making progress? >1000 other detection systems 11

overall performer Snoek et al, TRECVID 04-10 Are

12 Community myths or facts? Chua et al., ACM Multimedia 2007 Video search is practically solved and progress has only been incremental Yang and Hauptmann, ACM CIVR 2008 Current solutions are weak and generalize poorly We have done an experiment Two video search engines from 2006 and 2009 MediaMill Challenge 2006 system MediaMill TRECVID 2009 system How well do they detect 36 LSCOM concepts? 12

Yang and Hauptmann, ACM CIVR 2008 Current solutions are weak and generalize poorly We have done an

13 Four video data set mixtures Training TRECVID 2005 TRECVID 2007 Broadcast news Documentary video Within domain Testing Broadcast news Documentary video Cross domain Snoek & Smeulders, IEEE Computer 2010 Performance doubled in just 3 years 36 concept detectors Even when using training data of different origin Vocabulary still limited 13

domain Snoek & Smeulders, IEEE Computer 2010 Performance doubled in just 3 years 36

14 What s next? TOWARDS A HUMAN-SIZE VOCABULARY massive amounts of How to obtain labeled examples? 5 billion but only human experts provide good quality examples 14

15 Xirong Li et al, TMM 2009 Learning social tag relevance by neighbor voting Exploit consistency in tagging behavior of different users for visually similar images Updated tag relevance Objective tags are identified and reinforced Based on 3.5 Million images downloaded from Flickr 15

visually similar images Updated tag relevance Objective tags are

16 What s next? TOWARDS MORE PRECISE RECOGNITION Efstratios Gavves et al, CVIU 2012 Visual synonyms for landmark images Homonyms Metonyms same word consistent location same word inconsistent location Synonyms different word consistent location Random pairs different word inconsistent location Color Visual word a feature is mapped to Shape Location and scale of feature 16

landmark images Homonyms Metonyms same word consistent location same word inconsistent

17 Efstratios Gavves et al, CVIU 2012 All Souls College - Oxford query retrieval results Baseline (200,000 words) Synonym baseline (2,000 words) Face recognition too Snoek 2011, Patent pending 17

18 Complex events feasible? Flash mob gathering Changing a vehicle tire Attempting a board trick how to recognize in collection of 100,000 Internet clips? What s next? APPLICATIONS 18

19 Recent applications SenseCam lifelogging Byrne et al, MMTA 2010 TV archive search Huurnink et al, TMM 2012 Crowdsourcing online concert video Freiburg et al, MM 2011 Mobile video retrieval interfaces Hürst et al, MMM 2011 Conclusion Visual search is maturing quickly What s next? a human-size vocabulary more precise recognition and great application potential 19

retrieval interfaces Hürst et al, MMM 2011 Conclusion Visual search is maturing quickly

20 Thank you dr. Cees Snoek twitter.com/cgmsnoek cgmsnoek 20

Evaluating Sources and Strategies for Learning Video Concepts from Social Media

Evaluating Sources and Strategies for Learning Video Concepts from Social Media Svetlana Kordumova Intelligent Systems Lab Amsterdam University of Amsterdam The Netherlands Email: s.kordumova@uva.nl Xirong