Big Data: Image & Video Analytics How it could support Archiving & Indexing & Searching Dieter Haas, IBM Deutschland GmbH
The Big Data Wave 60% of internet traffic is multimedia content (images and videos) 100 Hours of video are uploaded to YouTube every minute 500 Billion consumer photos are taken every year 100 Terabytes of wide area imagery are recorded every day Exabyte Trend Petabytes of broadcast assets thousands hours of productions every year Massive Multimedia is the Biggest Wave of All! 2
Multimedia is used across Industries for various Purposes Safety / Security Exa High 10s millions cameras Medical 1B medical images Customer Peta Tera Giga Data Volume Structured data Video Image Audio Text Expressiveness Med Low 1B camera phones 1990 s 2000 s 2010 s 2020 s Media Wide Area Imagery Digital Marketing Enterprise Video 100++ video hrs/min 100 s TB per day 12% of video views Used by 1/3 of enterprises 3
Multiple Industries Require Large-Scale Image and Video Analytics Safety and Security IBM Intell. Video Analytics (IVA): Volume: 10K s managed cameras per city Velocity: real-time alerts, 20M video events/day Variety: street scenes, rail stations, crowds, people, environmental conditions Veracity: analysis of complex activities (trip wires, abandoned objects) Data-in-Motion Medical Imaging and Healthcare Cognitive Systems: Volume: 1B medical images per year (growing 20-40% /yr) Data-at-Rest Velocity: 50K radiology images per day per radiology dept. Variety: images, video, text, patent records, cases, scientific literature, ontologies/semantics Veracity: subjective interpretation across millions of categories (modalities, body views, organ systems, pathologies, anomalies) Images Content-Filtering Content Classification Content-based Search Video Real-time Alerting Real World Events Cross-Camera Mining Multimedia Broadcast Monitoring Behavior Analysis Activity Based Intelligence us ms sec min hr day wk mo yr 4 IBM Enterprise Content Management IMARS*: Volume: 70PB broadcast/yr, 40K hrs per news archive Velocity: 72 videohrs/min to YouTube Variety: mobile, user generated, professional Veracity: robust content extraction for objects, places, scenes, activities, people *IBM Multimedia Analysis and Retrieval System Retail, Consumer and Mobile Commerce System V: Volume: 500B consumer photos/yr Velocity: 100M customers per week for large retailers Variety: transient and dynamic content Veractity: predicting consumer attributes from diverse sources including visual data (images and video)
Multimedia Semantic Analysis Challenge: Bridging the Semantic Gap X-ray Enlarged Heart Shale boundary How-to Tsunami Semantics Designer shoes Abandoned Bag Family member Funny Home run Dancing Semantic Gap Medical Education Scientific Enterprise News Multimedia Surveillance Retail Social Media Entertainment Sports 5
A Multi-layer Learning Architecture for Image and Video Analysis Scenes Objects Semantics Actions Activities Locations Living Vehicles Objects Actions Scenes Places Settings Objects Animals Cars Behaviors Activities People Activities People People Objects Faces Events Clustering Expectation Maximization Unlabeled Data Nearest Neighbor Ensemble Classifiers SVMs K-means Regression Decision Tree Models Factor Graph Bayes Net GMM Active Learning Addaboost Markov Model GMM Neural Net Deep Belief Nets Labeled Data N N N N N N N N N N Negative Examples P P P P P P P P P P Positive Examples Color Texture Edges Shape Features Energy Zerocrossings Frequencies Spectrum Motion Background Regions Tracks Camera Motion Shot Boundaries Moving Objects Scene Dynamics Multimedia Data Need to learn effective semantic classifiers using a wide diversity of audio-visual features and models Need to design a rich space of semantic concepts that captures multiple facets of audio-visual content 6
Semantics Models Features It requires a Large Library of Visual and Spatial Feature Extractions for Representing Diverse Visual Contents Visual Features Complexity Spatial Granularities Spatial Information Local Spatial Relation Thumbnail Image Color Correlogram Color Moments Color Histogram Spatial Scales Dominant Colors Image Statistics Image Type Siftogram Shape Moments Fourier Shape Edge Histogram Local Binary Patterns Hough Circle Scale- Orientation Max- Response Filters Curvelets Tamura Texture Interest Points Color Wavelet Color Wavelet Texture Wavelet Texture Global Center Cross Grid Layout 1 2 3 Horiz. Parts Horizontal Vertical Distribution 7 Color (Pixels) Edges/Shape Spatial-Frequency Information Texture Pyramid Feature Combinations = Visual Features x Spatial Granularities Pyramid3 Concatenated Features: [ [ 1 ], [ 2 ], [ 3 ] ]
IBM Multimedia Analysis and Retrieval System (IMARS) IMARS is a trainable system for classifying images and video automatically based on visual contents IMARS creates classifiers from training examples using visual feature extraction and machine learning IMARS provides a large number of built-in visual feature representations that enable learning of highly effective semantic classifiers Can be trained and adapted for a variety of domains natural photos, Web video, social media, medical images 8
IMARS Classification of Activities and Sports for Photos and Videos Hot Air Ballooning Hang Gliding Figure Skating Skiing Accurate recognition of 150 sports and activity categories (results on 23K photos) Equestrian Softball 9
IMARS Semantic Classification of Activities on PASCAL VOC 13 Concert Sailing 10
IMARS Semantic Classification of Activities in Video Dancing Performance Celebration 11
IMARS Semantic Index per Video Scene 1 12
IMARS Semantic Index per Video Scene 2 13
IMARS Image Similarity Analysis The system can perform image similarity analysis at different levels DUPLICATES Copies of exactly the same picture Based on hashing NEAR-DUPLICATES Images that are very similar, but not necessarily the same picture Based on visual descriptors similarity CLUSTERING Images that are visually similar (more general than near duplicates) Based on visual descriptors similarity SEMANTIC SIMILARITY Images that are semantically similar Based on distinctive classifiers scores 14
Visual Recognition Service on Watson Developer Cloud http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/visual-recognition.html 15
Applying Image & Video Analytics in Media How can Image & Video Analytics be of Value for Media Companies Search in Raw Material for semantic Content based on Classifiers Archiv and Index Video Material based on Classifiers 16
Combining AREMA with IMARS on WATSON IMARS 17
Summary and Links https://www.ibm.com/developerworks/ community/alphaworks/tech/imars http://www.ibm.com/smarterplanet/us /en/ibmwatson/developercloud/visual -recognition.html 18
Dieter Haas Analytics Solution Sales, Social Analytics & Consumer Insight Technical Leader Media Industry mobile: +49 171 3391182 e-mail: dhaas@de.ibm.com