Total Cluster: A person agnostic clustering method for broadcast videos

Size: px

Start display at page:

Download "Total Cluster: A person agnostic clustering method for broadcast videos"

Darcy Allen
7 years ago
Views:

1 Total Cluster: A person agnostic clustering method for broadcast videos Makarand Tapaswi, Omkar M. Parkhi, Esa Rahtu, Eric Sommerlade, Rainer Stiefelhagen, Andrew Zisserman

2 Face track clustering Input Video with face tracks Output Face track clusters 2

3 Why cluster face tracks? Large number of videos Good basis for manual or automatic labelling Facilitates video retrieval or summarization 3

4 Overview Editing structure in videos Shots, threads and scenes Using editing structure for clustering Negative pairs Scene level clustering Episode level clustering Dataset and evaluation Scrubs, Buffy Weighted clustering purity Clustering results T1 T4 T2 T3

, 2007) (Cinbis et al., 2011) (Khoury et al., 2013) (Wu et al.

5 Related work Person Identification in TV series (Everingham et al., 2006) (Cour et al., 2009) (Bojanowski et al., 2013) (Ramanathan et al., 2014) Face Clustering (Ramanan et al., 2007) (Cinbis et al., 2011) (Khoury et al., 2013) (Wu et al., 2013, 2013b) Novelty of the current work Error free clustering Utilize video-editing structure Use state-of-the-art face track descriptor 5

6 Overview Editing structure in videos Shots, threads and scenes Using editing structure for clustering Negative pairs Scene level clustering Episode level clustering Dataset and evaluation Scrubs, Buffy Weighted clustering purity Clustering results T1 T4 T2 T3

7 Video editing overview Episode Scenes Threads Shots 7

8 Shots and Threads shot thread (n.) (n.) A sequence set of shots of taken uninterrupted from the frames same camera for a short with the period same of viewing time angle and content A B A C D C Scrubs S01 E01 D 8

9 Scenes scene (n.) A sequence of shots in a video with the same characters, at the same location and time Scene 4 Scene 5 Scene 6 Scene boundary detection (Tapaswi et al. 2014) Maximize within-scene visual similarity Don t break shot threads 9

10 Overview Editing structure in videos Shots, threads and scenes Using editing structure for clustering Face tracking Negative pairs Scene level clustering Episode level clustering Dataset and evaluation Scrubs, Buffy Weighted clustering purity Clustering results T1 T4 T2 T3

11 Face tracks Face tracking Face track descriptor (Parkhi et al. 2014) frontal+profile VJ detector multi-pose head detector KLT tracking false positive removal dense SIFT Fisher encoding discriminative dimensionality reduction 11

12 Negative constraints do NOT merge these pairs! Tracks in the same frame (Cinbis et al. 2011, Wu et al. 2013) Tracks in a thread A B A C B 12

13 Overview Editing structure in videos Shots, threads and scenes Using editing structure for clustering Negative pairs Scene level clustering Episode level clustering Dataset and evaluation Scrubs, Buffy Weighted clustering purity Clustering results T1 T4 T2 T3

14 Within scene clustering for tracks not in negative pairs Strong face similarity T1 T2 In a thread, area overlap relaxed face similarity 128D T4 T3 T1 T2 Dr. Kelso Turk 14

15 Overview Editing structure in videos Shots, threads and scenes Using editing structure for clustering Negative pairs Scene level clustering Episode level clustering Dataset and evaluation Scrubs, Buffy Weighted clustering purity Clustering results T1 T4 T2 T3

16 Cluster descriptor compare frontal vs. frontal, profile vs. profile Frontal Frontal descriptor Mixed Both descriptors Profile Profile descriptor 16

17 Full episode clustering Exemplar SVMs Cluster across scenes one positive cluster against negative clusters + YouTube Faces score clusters with e-svm merge high-scoring 17

18 Recap Video editing structure A B A B Do not merge track pairs T1 T2 Scene level clustering Turk Episode level clustering 18

19 Overview Editing structure in videos Shots, threads and scenes Using editing structure for clustering Negative pairs Scene level clustering Episode level clustering Dataset and evaluation Scrubs, Buffy Weighted clustering purity Clustering results T1 T4 T2 T3

20 Data set episodes 1..5, 23 ~20 minutes each sitcom, interns at a hospital episodes 1..6 ~40 minutes each supernatural drama, action average episode SCRUBS BUFFY # named characters # tracks # tracks in thread # do not merge pairs in shots in threads

21 Evaluation metrics Goal: minimize #clusters, don t make errors! WCP: weighted clustering purity Cluster 1: Purity p 1 = 3/3 Cluster 2: Purity p 2 = 4/5 WCP: 1 N c n c p c = Trade-off WCP vs. #clusters Cluster 1 Purity: 1 Cluster 2 Purity: Purity # clusters 21

22 #Clusters Within scene clustering #Clusters SCRUBS Tracks Best HAC Ours BUFFY SCR-1 SCR-2 SCR-3 SCR-4 SCR BF-1 BF-2 BF-3 BF-5 BF-6 Lower number of purity = 1 Prevent errors 22

23 #Clusters Full episode clustering #Clusters SCRUBS Tracks Best HAC Ours BUFFY 100 SCR-1 SCR-2 SCR-3 SCR-4 SCR BF-1 BF-2 BF-3 BF-5 BF-6 Full episode clustering hard to choose threshold for HAC proposed method reduces purity 1 23

24 Summary 1 Purity Minimize #clusters at very high purity Affords manual and automatic annotation Leverage video editing structure Face tracks are not independent data points 4x number of do not merge track pairs A B # clusters A Stage-wise clustering Small number of characters in one scene At purity 1, #clusters is about half #tracks 24

25 Thank you! Questions? Intro, Related work, Video-editing, Negative pairs Scene clustering, Episode clustering, Dataset, Evaluation

Interactive person re-identification in TV series

Interactive person re-identification in TV series Mika Fischer Hazım Kemal Ekenel Rainer Stiefelhagen CV:HCI lab, Karlsruhe Institute of Technology Adenauerring 2, 76131 Karlsruhe, Germany E-mail: {mika.fischer,ekenel,rainer.stiefelhagen}@kit.edu