Food brand image (Logos) recognition

Similar documents
Local features and matching. Image classification & object localization

Face Recognition in Low-resolution Images by Using Local Zernike Moments

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Randomized Trees for Real-Time Keypoint Recognition

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Big Data: Image & Video Analytics

Signature Segmentation and Recognition from Scanned Documents

The use of computer vision technologies to augment human monitoring of secure computing facilities

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Automatic Grocery Shopping Assistant

Make and Model Recognition of Cars

Footwear Print Retrieval System for Real Crime Scene Marks

MusicGuide: Album Reviews on the Go Serdar Sali

Image Classification for Dogs and Cats

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service

Android Ros Application

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

Probabilistic Latent Semantic Analysis (plsa)

More Local Structure Information for Make-Model Recognition

Distinctive Image Features from Scale-Invariant Keypoints

Automatic georeferencing of imagery from high-resolution, low-altitude, low-cost aerial platforms

Segmentation of building models from dense 3D point-clouds

Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

How To Filter Spam Image From A Picture By Color Or Color

Active Learning with Boosting for Spam Detection

Augmented Reality Tic-Tac-Toe

Fast Matching of Binary Features

CS231M Project Report - Automated Real-Time Face Tracking and Blending

QUALITY TESTING OF WATER PUMP PULLEY USING IMAGE PROCESSING

G E N E R A L A P P R O A CH: LO O K I N G F O R D O M I N A N T O R I E N T A T I O N I N I M A G E P A T C H E S

IMAGE PROCESSING BASED APPROACH TO FOOD BALANCE ANALYSIS FOR PERSONAL FOOD LOGGING

Canny Edge Detection

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

MIFT: A Mirror Reflection Invariant Feature Descriptor

Image Segmentation and Registration

LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE.

The Delicate Art of Flower Classification

A Comparison of Keypoint Descriptors in the Context of Pedestrian Detection: FREAK vs. SURF vs. BRISK

Document Image Retrieval using Signatures as Queries

Tracking and Recognition in Sports Videos

Circle Object Recognition Based on Monocular Vision for Home Security Robot

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

Classifying Manipulation Primitives from Visual Data

EXPLORING IMAGE-BASED CLASSIFICATION TO DETECT VEHICLE MAKE AND MODEL FINAL REPORT

Build Panoramas on Android Phones

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM

Simultaneous Gamma Correction and Registration in the Frequency Domain

A Learning Based Method for Super-Resolution of Low Resolution Images

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor

Feature Tracking and Optical Flow

CS 534: Computer Vision 3D Model-based recognition

Convolution. 1D Formula: 2D Formula: Example on the web:

Matt Cabot Rory Taca QR CODES

Semi-structured document image matching and recognition

Surgical Tools Recognition and Pupil Segmentation for Cataract Surgical Process Modeling

COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS

Colour Image Segmentation Technique for Screen Printing

Object Recognition. Selim Aksoy. Bilkent University

Application of Face Recognition to Person Matching in Trains

3D Model based Object Class Detection in An Arbitrary View

ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan

The Visual Internet of Things System Based on Depth Camera

CODING MODE DECISION ALGORITHM FOR BINARY DESCRIPTOR CODING

Convolutional Feature Maps

Face Recognition For Remote Database Backup System

A New Image Edge Detection Method using Quality-based Clustering. Bijay Neupane Zeyar Aung Wei Lee Woon. Technical Report DNA #

A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms

Predicting the Stock Market with News Articles

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

SOURCE SCANNER IDENTIFICATION FOR SCANNED DOCUMENTS. Nitin Khanna and Edward J. Delp

Open issues and research trends in Content-based Image Retrieval

GPS-aided Recognition-based User Tracking System with Augmented Reality in Extreme Large-scale Areas

Edge tracking for motion segmentation and depth ordering

The Scientific Data Mining Process

Determining optimal window size for texture feature extraction methods

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

Using Pattern Recognition for Self-Localization in Semiconductor Manufacturing Systems

Part-Based Recognition

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability

Speed Performance Improvement of Vehicle Blob Tracking System

Fingerprint s Core Point Detection using Gradient Field Mask

High Level Describable Attributes for Predicting Aesthetics and Interestingness

Environmental Remote Sensing GEOG 2021

High Quality Image Magnification using Cross-Scale Self-Similarity

An Active Head Tracking System for Distance Education and Videoconferencing Applications

Vision based Vehicle Tracking using a high angle camera

Jiří Matas. Hough Transform

MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH

OUTLIER ANALYSIS. Data Mining 1

A Dynamic Approach to Extract Texts and Captions from Videos

Neural Network based Vehicle Classification for Intelligent Traffic Control

Mean-Shift Tracking with Random Sampling

FACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM

Edge detection. (Trucco, Chapt 4 AND Jain et al., Chapt 5) -Edges are significant local changes of intensity in an image.

J. P. Oakley and R. T. Shann. Department of Electrical Engineering University of Manchester Manchester M13 9PL U.K. Abstract

Transcription:

Food brand image (Logos) recognition Ritobrata Sur(rsur@stanford.edu), Shengkai Wang (sk.wang@stanford.edu) Mentor: Hui Chao (huichao@qti.qualcomm.com) Final Report, March 19, 2014. 1. Introduction Food label brand image (Logos) recognition has many useful applications such as displaying nutritional information and advertisement. Various methods have been proposed using local and global feature matching, however, it continues to be a challenging area. Unlike nature scene images, which are rich in texture details, brand images often lack texture variation, and therefore provide fewer key feature points for matching. In addition, reference and test images may be acquired with different resolution, size, quality, and illumination conditions. These factors combine to make logo detection more challenging. To illustrate the basic idea let us look at the Fig. 1. Suppose a customer goes to a grocery store to pick up some food items. On the shelves there are numerous items and the person has certain dietary needs / restrictions. The customer takes a picture using their phone. The program is supposed to identify the food labels and quickly guide the user as to where their desired items are located. This program can also be used by the grocery stores to keep track of the misplaced items. Fig. 1. Example illustrating the concept of logo recognition The logo recognition problem has become popular from the 1990s, and ever since then there have been mainly five methods, falling into two categories: local and global. The local methods compute a number of statistical and morphological shape features for each connected component of an image foreground and background. This category includes graphical distribution features [1] and local invariants [2,3]. More recently, scale-invariant feature transform [4] (SIFT) has been used for more 1

effective feature extraction and mapping, and was successfully applied to vehicle logo recognition [5]. The global methods try to identify the structure of the logo image as a whole. This category covers the wavelet decomposition method [6], and neural-network-based architecture recognition [7]. Currently, for the purposes of this class, we trained our program to identify 6 different classes (logos) viz. Coca Cola, Pepsi, Dole, Quaker, Special K and Nestle. The code-words of the images were identified using the difference of Gaussian (DoG) detectors. The code-words were described using a Scale-Invariant Feature Transform (SIFT) descriptor. Then a dictionary of code-words was created by using two alternate methods: Bag of Words (BoW) and Spatial Pyramid Matching (SPM). The results using BoW was poor with smaller number of training examples but it improved significantly by increasing the training set size. The object identification was done by two distinct methods: sliding window method and keypoint matching by homography transformation and using RANSAC to remove outliers. The following sections describe the methods in greater detail. 2. Methods The algorithm involves three main steps (Fig. 2): (a) Feature extractor, (b) Logo classifier and (c) Automatic detector. The training step uses presorted pictures corresponding to different logos to train classifiers. During detection, the features are mapped to classifiers to obtain matches within the training set. The functions and methods behind each of these steps are listed in the following sections. (a) Feature extractor Fig. 2. The logo identification algorithm has the three main steps The code-word blobs were detected using the difference of Gaussian model. The code-words thus obtained were described using SIFT [4] descriptors. An online library for SIFT detector and descriptor, vl_sift [8] was utilized for this project. A comprehensive and detailed description of the methods used in the library can be found in the website [8]. A brief discussion is presented in this report to guide the reader about the specific parameters used and the reasoning behind it. As discussed in D. Lowe s article [4], the stable keypoint locations in the scale space can be efficiently and reliably extracted by using the difference of Gaussian convolution of the images in nearby scales. The same procedure is used in the vl_sift function [8]. Once the features are extracted, the low contrast points are filtered out below a threshold of the values of the difference of Gaussian values at the local extremum points [4, section 4]. The best value for this factor was found by trial and error. For identification of meaningful keypoints, it was also very crucial to increase the 2

threshold for the r factor for setting the edge threshold as described in the section 4.1 of D. Lowe s article [4]. This is because for logos, it is very common to have a large principal curvature across the edge but a small one in the perpendicular direction, which is otherwise rejected by the edge threshold. But this factor had to be optimized to remove noisy keypoints. A sample of the keypoint extraction is shown in Fig. 3. Fig. 3. Sample SIFT blob detection. The circles indicate the blob the line shows the dominant direction in the SIFT descriptor. The SIFT descriptor [4] uses the spatial gradients in 8 directions in 4 X 4 pixel bins corresponding to the extracted features. The histograms for each of the 8 X 4 X 4 = 128 values forms the SIFT descriptor. The magnification factor (determines the size of the code word corresponding to the blob size) and the Gaussian window size (for reducing the influence of the gradients farther from the keypoint center) were key in determining the optimum parameters for describing the code-words effectively. The same descriptor parameters were used in training and test phases. (b) Logo classifier For a given class, to construct a dictionary of code-words was constructed by i. Bag of words and ii. Spatial Pyramid Matching. i. Bag of words (BoW): A BoW dictionary containing 256 code-words (e.g. Fig. 4) is trained from over 30000 features extracted from 150 training images of 6 different logo classes by using K- means clustering. The size of the dictionary was optimized for speed and accuracy. 3

Fig. 4. Constructing a bag of words dictionary for logo identification For classification, the following steps are implemented to identify specific logo classes: 1. Extract SIFT features 2. Project each SIFT feature onto the dictionary space to get code-word 3. Populate a histogram of feature by counting the matched code-words 4. Run classification using the histograms, either nearest neighbor (NN) or SVM It was found that the BOW model made it extremely necessary to generate a large training set to remove spurious features. Using 5 sample images for each logo, cross-validation (leave one out) accuracy was only 33%. However when trained with 25 images for each class, a cross-validation accuracy was improved to 93%. ii. Spatial Pyramid Matching (SPM): The BoW model completely ignores the spatial location of the code-words in relation to each other. This can be vital in identifying complete logos. This problem is tackled by using spatial pyramid matching technique [9] where the picture is divided into successively smaller grids and the descriptor histogram is calculated for a weighted sum of these features (Fig. 5). Fig. 5. Spatial pyramid matching for generating code-words 4

This method significantly improves the performance of the training step even with a smaller number of examples. However, it introduces dependence on the orientation of the images and slows the performance of the algorithm. Therefore it is necessary to generate artificially perturbed set of images [10] in terms of orientation, aspect ratio, blur, shake, illumination, etc. (Fig. 6) to train a robust set of code-words in all its variations. (c) Automatic detector Fig. 6. Training data orbit for robust training The object identification was done by two distinct methods: i. keypoint matching by homography transformation and using RANSAC to remove outliers and ii. sliding window method. i. Keypoint matching by homography transformation and using RANSAC [10]: A fast method for matching logos is to directly calculate the homography pairs with the closest training image keypoints and map the training image on to the test image. Then use RANSAC to remove spurious keypoints by maximizing the number of matching pairs between the two images. Then we calculate the logo bounding box in the test image using Hough voting from the matched keypoints. This method is fast but it becomes erroneous in the presence of multiple instances of the same logo. An example is shown in Fig. 7. 5

Quaker Fig. 7. Logo matching using homography mapping and RANSAC filtering ii. Sliding window [11]: To detect multiple logos from the same image, the sliding window method was used. As the name suggests, a rectangular box is slid to scan across the extents of the test image. At each window, the keypoints are classified by using the codewords within the box. The match is determined by a score within the window. This is demonstrated in Fig. 8. This method is extremely slow in finding the match as it also iterates over the box size. Sometimes due to close location of the multiple logos of the same class, it may be hard to pin-point the location of the specific occurrence of the logo. 3. Performance analysis Fig. 8. Logo matching using sliding window method The different variations of the algorithm yield different results as shown in Fig. 9. The general trend is a better success rate with a bigger training set and improved classification scheme (eg. SPM vs 6

BoW). For training set size larger than 10, traditional BoW works fairly well. But for smaller training set, improvements like SPM and training sample orbit are necessary. 100 Accuracy [%] 80 60 40 20 SIFT/NN SIFT/SVM SPM/NN SPM/SVM Orbit/NN Orbit/SVM 4. Conclusions 0 0 5 10 15 20 25 30 35 40 45 50 # Training Sample per Class Fig. 9. Accuracy of different classification methods as a function of training set size We experimented on developing a pipeline program for automatic food logo detection and classification in photo images. A three-step approach was proposed and implemented, using DoG + SIFT for feature extraction, variations of BoW for classification, sliding windows and Hough Voting with RANSAC pre-filtering for detection. Despite of the limited size of the the current training data, very promising results have been achieved. 5. References [1] Kato, T., 1992. Database architecture for content-based image retrieval. Proceedings of the SPIE, Image Storage and Retrieval Systems, Vol. 1662, San Jose, CA, February 1992, pp. 112 123 [2] Doermann, D., Rivlin, E., Weiss I., Applying algebraic and differential invariants for logo recognition, Machine Vision and Applications, 9 (2) (1996), pp. 73 86 [3] Kliot, M., Rivlin, E., Shape retrieval in pictorial databases via geometric features. Technical Report CIS9701, Technion IIT, Computer Science Department, Haifa, Israel, 1997 [4] Lowe D. G., Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91-110 [5] Psyllos, A. P., Anagnostopoulos, C. N., Kayafas, E. (2010). Vehicle logo recognition using a SIFT-based enhanced matching scheme. Intelligent Transportation Systems, IEEE Transactions on, 11(2), 322-328. [6] Jaisimha, M.Y., Wavelet features for similarity based retrieval of logo images. Proceedings of the SPIE, Document Recognition III, Vol. 2660, San Jose, CA, January 1996, pp. 89 100 7

[7] Cesarini, F., Fracesconi, E., Gori, M., Marinai, S., Sheng, J.Q., Soda. G., 1997. A neural-based architecture for spot-noisy logo recognition. Proceedings of the Fourth International Conference on Document Analysis and Recognition, Ulm, Germany, August 1997, pp. 175 179 [8] http://www.vlfeat.org/matlab/vl_sift.htm [9] Lazebnik S., Schmid C., and Ponce J., Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories., 2006. [10] Constantinopoulos C., Meinhardt-Llopis E., Liu Y., and Caselles V., A robust pipeline for logo detection, IEEE International Conference on Multimedia and Expo (ICME), 2011. [11] Viola P., and Jones M., Rapid Object Detection using a Boosted Cascade of Simple Features, Computer vision and pattern recognition, 2001. 8