The Stanford Mobile Visual Search Data Set
|
|
|
- Timothy Evans
- 10 years ago
- Views:
Transcription
1 The Stanford Mobile Visual Search Data Set Vijay Chandrasekhar 1, David M. Chen 1, Sam S. Tsai 1, Ngai-Man Cheung 1, Huizhong Chen 1, Gabriel Takacs 1, Yuriy Reznik 2, Ramakrishna Vedantham 3, Radek Grzeszczuk 3, Jeff Bach 4, Bernd Girod 1, 1 Information Systems Laboratory, Stanford University, Stanford, CA Qualcomm Inc., San Diego, CA Nokia Research Center, Palo Alto, CA Navteq, Chicago, IL ABSTRACT We survey popular data sets used in computer vision literature and point out their limitations for mobile visual search applications. To overcome many of the limitations, we propose the Stanford Mobile Visual Search data set. The data set contains camera-phone images of products, CDs, books, outdoor landmarks, business cards, text documents, museum paintings and video clips. The data set has several key characteristics lacking in existing data sets: rigid objects, widely varying lighting conditions, perspective distortion, foreground and background clutter, realistic ground-truth reference data, and query data collected from heterogeneous low and high-end camera phones. We hope that the data set will help push research forward in the field of mobile visual search. 1. INTRODUCTION Mobile phones have evolved into powerful image and video processing devices, equipped with high-resolution cameras, color displays, and hardware-accelerated graphics. They are also equipped with GPS, and connected to broadband wireless networks. All this enables a new class of applications which use the camera phone to initiate search queries about objects in visual proximity to the user (Fig 1). Such applications can be used, e.g., for identifying products, comparison shopping, finding information about movies, CDs, buildings, shops, real estate, print media or artworks. First commercial deployments of such systems include Google Goggles, Google Shopper [11], Nokia Point and Find [22], Kooaba [15], Layar [16], Ricoh icandy [7] and Amazon Snaptell [1]. Mobile visual search applications pose a number of unique challenges. First, the system latency has to be low to support interactive queries, despite stringent bandwidth and Contact Vijay Chandrasekhar at [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bearthisnoticeandthefullcitationonthefirstpage. Tocopyotherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX...$ Figure 1: A snapshot of an outdoor visual search application. The system augments the viewfinder with information about the objects it recognizes in the camera phone image. computational constraints. One way to reduce system latency significantly is to carry out feature extraction on the mobile device, and transmit compressed feature data across the network [10]. State-of-the-art retrieval systems [14, 23] typically extract affine-covariant features (Maximally Stable Extremal Regions (MSER), Hessian Affine points) from the query image. This might take several seconds on the mobile device. For feature extraction on the device to be effective, we need fast and robust interest point detection algorithms and compact descriptors. There is growing industry interest in this area, with MPEG recently launching a standardization effort [19]. It is envisioned that the standard will specify bitstream syntax of descriptors, and parts of the descriptor-extraction process needed to ensure interoperability. Next, camera phone images tend to be of lower quality compared to digital camera images. Images that are degraded by motion blur or poor focus pose difficulties for visual recognition. However, image quality is rapidly improving with higher resolution, better optics and built-in flashes on camera phones. Outdoor applications pose additional challenges. Current retrieval systems work best for highly textured rigid planar objects taken under controlled lighting conditions. Landmarks, on the other hand, tend to have fewer features, exhibit repetitive structures and their 3-D geometric distortions are not captured by simple affine or projective transformations. Ground truth data collection is more difficult, too. There are different ways of bootstrapping databases
2 for outdoor applications. One approach is to mine data from online collections like Flickr. However, these images tend to be poorly labelled, and include a lot of clutter. Another approach is to harness data collected by companies like Navteq, Google (StreetView) or Earthmine. In this case, the data are acquired by vehicle-mounted powerful cameras with wide-angle lenses to capture spherical panoramic images. In both cases, visual recognition is challenging because the camera phone query images are usually taken under very different lighting conditions compared to reference database images. Buildings and their surroundings (e.g., trees) tend to look different in different seasons. Shadows, pedestrians and foreground clutter are some of the other challenges in this application domain. OCR on mobile phones enables another dimension of applications, from text input to text-based queries to a database. OCR engines work well on high quality scanned images. However, the performance of mobile OCR drops rapidly for images that are out of focus and blurry, have perspective distortion or non-ideal lighting conditions. To improve performance of mobile visual search applications, we need good data sets that capture the most common problems that we encounter in this domain. A good data set for visual search applications should have the following characteristics: Should have good ground truth reference images Should have query images with a wide range of camera phones (flash/no-flash, auto-focus/no auto-focus) Should be collected under widely varying lighting conditions Should capture typical perspective distortions, motion blur, foreground and background clutter common to mobile visual search applications. Should represent different categories (e.g., buildings, books, CDs, DVDs, text documents, products) Should contain rigid objects so that a transformation can be estimated between the query and reference database image. We surveyed popular data sets in the computer vision literature, and observed that they were all limited in different ways. To overcome many of the limitations in existing data sets, we propose the Stanford Mobile Visual Search (SMVS) data set that we hope will help move research forward in this field. In Section 2, we survey popular computer vision data sets, and point out their limitations. In Section 3, we propose the SMVS data set for different mobile visual search applications. 2. SURVEYOF DATASETS Popular computer vision data sets for evaluating image retrieval algorithms consist of a set of query images and their ground truth reference images. The number of query images typically range from a few hundred to a few thousand. The scalability of the retrieval methods is tested by retrieving the query images in the presence of distractor images, or images that do not belong to the data set [14, 23]. The distractor images are typically obtained by mining Flickr or other photo sharing websites. Here, we survey popular data sets in computer vision literature and discuss their limitations for our application. See Fig. 2 for examples from each data set, and Tab. 1 for a summary of the different data sets. ZuBuD. The Zurich Building (ZuBuD) dataset [12] consists of 201 buildings in Zurich, with 5 views of each building. There are 115 query images which are not contained in the database. Query and database images differ in viewpoint, but variations in illumination are rare because the different images for the same building are taken at the same time of day. The ZuBuD is considered an easy data set, with close to 100% accuracy being reported in several papers [13, 25]. Simple approaches like color histograms and descriptors based on DCT [25] yield high performance for this dataset. Oxford Buildings. The Oxford Buildings Datset [23] consists of 5062 images collected from Flickr by searching for particular Oxford landmarks. The collection has been manually annotated to generate a comprehensive ground truth for 11 different landmarks, each represented by 5 possible queries. This gives only a small set of 55 queries. Another problem with this data set is that completely different views of the same building are labelled by the same name. Ideally, different facades of each building should be distinguished from each other, when evaluating retrieval performance. INRIA Holidays. The INRIA Holidays dataset [14] is a set of images which contains personal holiday photos of the authors in [14]. The dataset includes a large variety of outdoor scene types (natural, man-made, water and fire effects). The dataset contains 500 image groups, each of which represents a distinct scene or object. The data set contains perspective distortions and clutter. However, variations in lighting are rare as the pictures are taken at the same time from each location. Also, the data set contains scenes of many non-rigid objects (fire, beaches, etc), which will not produce repeatable features, if images are taken at different times. University of Kentucky. The University of Kentucky (UKY) [21] consists of 2550 groups of 4 images each of objects like CD-covers, lamps, keyboards and computer equipment. Similar to ZuBuD and INRIA data sets, this data set also offers little variation in lighting conditions. Further, there is no foreground or background clutter with only the object of interest present in each image. Image Net. The ImageNet dataset [6] consists of images organized by nouns in the WordNet hierarchy [8]. Each node of the hierarchy is depicted by hundreds and thousands of images. E.g., Fig. 2 illustrates some images for the word tiger. Such a data set is useful for testing classification algorithms, but not so much for testing retrieval algorithms. We summarize the limitations of the different data sets in Tab. 1. To overcome the limitations in these data sets, we propose the Stanford Mobile Visual Search (SMVS) data set. 3. STANFORD MOBILE VISUAL SEARCH DATA SET We present the SMVS (version 0.9) data set in the hope that it will be useful for a wide range of visual search applications like product recognition, landmark recognition, out-
3 ZuBuD Oxford INRIA UKY Image Nets Figure 2: Limitations with popular data sets in computer vision. The left most image in each row is the database image, and the other 3 images are query images. ZuBuD, INRIA and UKY consist of images taken at the same time and location. ImageNets is not suitable for image retrieval applications. The Oxford dataset has different faades of the same building labelled with the same name. door augmented reality [27], business card recognition, text recognition, video recognition and TV-on-the-go [5]. We collect data for several different categories: CDs, DVDs, books, software products, landmarks, business cards, text documents, museum paintings and video clips. Sample query and database images are shown in Figure 4. Current and subsequent versions of the dataset will be available at [3]. The number of database and query images for different categories is shown in Tab. 2. We provide a total 3300 query images for 1200 distinct classes across 8 image categories. Typically, a small number of query images ( 1000s) suffice to measure the performance of a retrieval system as the rest of the database can be padded with distractor images. Ideally, we would like to have a large distractor set for each query category. However, it is challenging to collect distractor sets for each category. Instead, we plan to release two distractor sets upon request: one containing Flickr images, and the other containing building images from Navteq. The distractor sets will be available in sets of 1K, 10K, 100K and 1M. Researchers can test scalability using these distractor data sets, or the ones provided in [23, 14]. Next, we discuss how the query and reference database images are collected, and evaluation measures that are in particular relevant for mobile applications. Reference Database Images. For product categories (CDs, DVDs and books), the references are clean versions of images obtained from the product websites. For landmarks, the reference images are obtained from data collected by Navteq s vehicle-mounted cameras. For video clips, the reference images are the key frame from the reference video clips. The videos contain diverse content like movie trailers, news reports, and sports. For text documents, we collect (1) reference images from [20], a website that mines the front pages of newspapers from around the world, and (2) research papers. For business cards, the reference image is obtained from a high quality upright scan of the card. For museum paintings, we collect data from the Cantor Arts Center at Stanford University for different genres: history, portraits, landscapes and modern-art. The reference images are obtained from the artists websites like [24] or other online sources. All reference images are high quality JPEG compressed color images. The resolution of reference images varies for each category. Query Images. We capture query images with several different camera phones, including some digital cameras. The list of companies and models used is as follows: Apple (iphone4), Palm (Pre), Nokia (N95, N97, N900, E63, N5800, N86), Motorola (Droid), Canon (G11) and LG (LG300). For product cate-
4 Data Database Query Classes Rigid Lighting Clutter Perspective Camera Set (#) (#) (#) Phone ZuBuD Oxford INRIA UKY ImageNet 11M 15K 15K SMVS Table 1: Comparison of different data sets. Classes refers to the number of distinct objects in the data set. Rigid refers to whether on not the objects in the database are rigid. Lighting refers to whether or not the query images capture widely varying lighting conditions. Clutter refers to whether or not the query images contain foreground/background clutter. Perspective refers to whether the data set contains typical perspective distortions. Camera-phone refers to whether the images were captured with mobile devices. SMVS is a good data set for mobile visual search applications. gories like CDs, DVDs, books, text documents and business cards, we capture the images indoors under widely varying lighting conditions over several days. We include foreground and background clutter that would be typically present in the application, e.g., a picture of a CD would might other CDs in the background. For landmarks, we capture images of buildings in San Francisco. We collected query images several months after the reference data was collected. For video clips, the query images were taken from laptop, computer and TV screens to include typical specular distortions. Finally, the paintings were captured at the Cantor Arts Center at Stanford University under controlled lighting conditions typical of museums. The resolution of the query images varies for each camera phone. We provide the original JPEG compressed high quality color images obtained from the camera. We also provide auxiliary information like phone model number, and GPS location, where applicable. As noted in Tab. 1, the SMVS query data set has the following key characteristics that is lacking in other data sets: rigid objects, widely varying lighting conditions, perspective distortion, foreground and background clutter, realistic ground-truth reference data, and query images from heterogeneous low and high-end camera phones. Category Database Query CD DVD Books Video Clips Landmarks Business Cards Text documents Paintings Table 2: Number of query and database images in the SMVS data set for different categories. Evaluation measures. A naive retrieval system would match all database images against each query image. Such a brute-force matching scheme provides as an upper-bound on the performance that can be achieved with the feature matching pipeline. Here, we report results for brute-force pairwise matching for different interest point detectors and descriptors using the ratio-test [17] and RANSAC [9]. For RANSAC, we use affine models with a minimum threshold of 10 matches post- RANSAC for declaring a pair of images to be a valid match. In Fig. 3, we report results for 3 state-of-the-art schemes: (1) SIFT Difference-of-Gaussian (DoG) interest point detector and SIFT descriptor (code: [28]), (2) Hessian-affine interest point detector and SIFT descriptor (code [18]), and (3) Fast Hessian blob interest point detector [2] sped up with integral images, and the recently proposed Compressed Histogram of Gradients (CHoG) descriptor [4]. We report the percentage of images that match, the average number of features and the average number of features that match post-ransac for each category. First, we note that indoor categories are easier than outdoor categories. E.g., some categories like CDs, DVDs and book covers achieve over 95% accuracy. The most challenging category is landmarks as the query data is collected several months after the database. Second, we note that option (1): SIFT interest point detector and descriptor, performs the best. However, option (1) is computationally complex and is not suitable for implementation on mobile devices. Third, we note that option (3) performs comes close to achieving the performance of (1), with worse performance (10-20% drop) for some categories. The performance hit is incurred due to the fast Hessian-based interest point detector, which is not as robust as the DoG interest point detector. One reason for lower robustness is observed in [26]: the fast box-filtering step causes the interest point detection to lose rotation invariance which affects oriented query images. The CHoG descriptor used in option (3) is a low-bitrate 60- bit descriptor which is shown to perform on par with the 128-dimensional 1024-bit SIFT descriptor using extensive evaluation in [4]. We note that option (3) is most suitable for implementation on mobile devices as the fast hessian interest point detector is an order-of-magnitude faster than SIFT DoG, and the CHoG descriptors generate an order of magnitude less data than SIFT descriptors for efficient transmission [10]. Finally, we list aspects critical for mobile visual search applications. A good image retrieval system should exhibit the follow characteristics when tested on the SMVS dataset. High Precision-Recall as size of database increases Low retrieval latency Fast pre-processing algorithms for improving image qual-
5 Image Match Accuracy (%) DoG + SIFT Hessian Affine + SIFT Fast Hessian + CHoG 40 DVDs Videos Books CDs Prints Cards Art Landmarks Avg. # Features DoG + SIFT 300 Hessian Affine + SIFT Fast Hessian + CHoG 200 DVDs Videos Books CDs Prints Cards Art Landmarks Avg. # Matches Post RANSAC DoG + SIFT Hessian Affine + SIFT Fast Hessian + CHoG 0 DVDs Videos Books CDs Prints Cards Art Landmarks (a) (b) (c) Figure 3: Results for each category (PR = Post RANSAC). We note that indoor categories like CDs are easier than outdoor categories like landmarks. Books, CD covers, DVD covers and video clips achieve over 95% accuracy. ity Fast and robust interest point detection Compact feature data for efficient transmission and storage 4. SUMMARY We survey popular data sets used in computer vision literature and note that they are limited in many ways. We propose the Stanford Mobile Visual Search data set to overcome several of the limitations in existing data sets. The SMVS data set has several key characteristics lacking in existing data sets: rigid objects, several categories of objects, widely varying lighting conditions, perspective distortion, typical foreground and background clutter, realistic ground-truth reference data, and query data collected from heterogeneous low and high-end camera phones. We hope that this data set will help push research forward in the field of mobile visual search. 5. REFERENCES [1] Amazon. SnapTell, [2] H. Bay, T. Tuytelaars, and L. V. Gool. SURF: Speeded Up Robust Features. In Proc. of European Conference on Computer Vision (ECCV), Graz, Austria, May [3] V. Chandrasekhar, D.M.Chen, S.S.Tsai, N.M.Cheung, H.Chen, G.Takacs, Y.Reznik, R.Vedantham, R.Grzeszczuk, J.Back, and B.Girod. Stanford Mobile Visual Search Data Set, [4] V. Chandrasekhar, G. Takacs, D. M. Chen, S. S. Tsai, R. Grzeszczuk, Y. Reznik, and B. Girod. Compressed Histogram of Gradients: A Low Bitrate Descriptor. In International Journal of Computer Vision, Special Issue on Mobile Vision, under review. [5] D. M. Chen, N. M. Cheung, S. S. Tsai, V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod. Dynamic Selection of a Feature-Rich Query Frame for Mobile Video Retrieval. In Proc. of IEEE International Conference on Image Processing (ICIP), Hong Kong, September [6] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, Florida, June [7] B. Erol, E. Antúnez, and J. Hull. Hotpaper: multimedia interaction with paper using mobile phones. In Proc. of the 16th ACM Multimedia Conference, New York, NY, USA, [8] C. Fellbaum. WordNet: An Electronic Lexical Database. Bradford Books, [9] M. A. Fischler and R. C. Bolles. Random Sample Consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of ACM, 24(6): , [10] B. Girod, V. Chandrasekhar, D. M. Chen, N. M. Cheung, R. Grzeszczuk, Y. Reznik, G. Takacs, S. S. Tsai, and R. Vedantham. Mobile Visual Search. In IEEE Signal Processing Magazine, Special Issue on Mobile Media Search, under review. [11] Google. Google Goggles, [12] L. V. G. H.Shao, T. Svoboda. Zubud-Zürich buildings database for image based recognition. Technical Report 260, ETH Zürich, [13] S. J. Matas. Sub-linear indexing for large scale object recognition. In Proc. of British Machine Vision Conference (BMVC), Oxford, UK, June [14] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In Proc. of European Conference on Computer Vision (ECCV), Berlin, Heidelberg, [15] Kooaba. Kooaba, [16] Layar. Layar, [17] D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91 110, [18] K. Mikolajczyk. Software for computing Hessian-affine interest points and SIFT descriptor, [19] MPEG. Requirements for compact descriptors for visual search. In ISO/IEC JTC1/SC29/WG11/W11531, Geneva, Switzerland, July [20] Newseum. Newseum. todaysfrontpages/hr.asp?fpvname=ca_mss&ref_pge=gal&b_pge=1. [21] D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, USA, June [22] Nokia. Nokia Point and Find, [23] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object Retrieval with Large Vocabularies and Fast Spatial Matching. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, Minnesota, [24] W. T. Richards. William Trot Richards: The Complete Works. [25] J. M. S.Obdrzalek. Image retrieval using local compact dct-based representation. In Proc. of the 25th DAGM Symposium, Magdeburg, Germany, September [26] G. Takacs, V. Chandrasekhar, H. Chen, D. M. Chen, S. S. Tsai, R. Grzeszczuk, and B. Girod. Permutable Descriptors for Orientation Invariant Matching. In Proc. of SPIE Workshop on Applications of Digital Image Processing (ADIP), San Diego, California, August [27] G. Takacs, V. Chandrasekhar, N. Gelfand, Y. Xiong, W. Chen, T. Bismpigiannis, R. Grzeszczuk, K. Pulli, and B. Girod. Outdoors augmented reality on mobile phone using loxel-based visual feature organization. In Proc. of ACM International Conference on Multimedia Information Retrieval (ACM MIR), Vancouver, Canada, October [28] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms,
6 CDs DVDs Books Landmarks Video Clips Cards Print Paintings Figure 4: Stanford Mobile Visual Search (SMVS) data set. The data set consists of images for many different categories captured with a variety of camera-phones, and under widely varying lighting conditions. Database and query images alternate in each category.
CODING MODE DECISION ALGORITHM FOR BINARY DESCRIPTOR CODING
CODING MODE DECISION ALGORITHM FOR BINARY DESCRIPTOR CODING Pedro Monteiro and João Ascenso Instituto Superior Técnico - Instituto de Telecomunicações ABSTRACT In visual sensor networks, local feature
Interactive Robust Multitudinous Visual Search on Mobile Devices
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 12, December 2014,
Build Panoramas on Android Phones
Build Panoramas on Android Phones Tao Chu, Bowen Meng, Zixuan Wang Stanford University, Stanford CA Abstract The purpose of this work is to implement panorama stitching from a sequence of photos taken
TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service
TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service Feng Tang, Daniel R. Tretter, Qian Lin HP Laboratories HPL-2012-131R1 Keyword(s): image recognition; cloud service;
Fast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
MusicGuide: Album Reviews on the Go Serdar Sali
MusicGuide: Album Reviews on the Go Serdar Sali Abstract The cameras on mobile phones have untapped potential as input devices. In this paper, we present MusicGuide, an application that can be used to
Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.
Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table
3D Model based Object Class Detection in An Arbitrary View
3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/
Fast Image Labeling for Creating High-Resolution Panoramic Images on Mobile Devices
2009 11th IEEE International Symposium on Multimedia Fast Image Labeling for Creating High-Resolution Panoramic Images on Mobile Devices Yingen Xiong and Kari Pulli Nokia Research Center Palo Alto, CA
Virtual Zoom: Augmented Reality on a Mobile Device
Virtual Zoom: Augmented Reality on a Mobile Device Sergey Karayev University of Washington Honors Thesis in Computer Science June 5, 2009 Abstract The world around us is photographed millions of times
Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
The use of computer vision technologies to augment human monitoring of secure computing facilities
The use of computer vision technologies to augment human monitoring of secure computing facilities Marius Potgieter School of Information and Communication Technology Nelson Mandela Metropolitan University
MIFT: A Mirror Reflection Invariant Feature Descriptor
MIFT: A Mirror Reflection Invariant Feature Descriptor Xiaojie Guo, Xiaochun Cao, Jiawan Zhang, and Xuewei Li School of Computer Science and Technology Tianjin University, China {xguo,xcao,jwzhang,lixuewei}@tju.edu.cn
A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow
, pp.233-237 http://dx.doi.org/10.14257/astl.2014.51.53 A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow Giwoo Kim 1, Hye-Youn Lim 1 and Dae-Seong Kang 1, 1 Department of electronices
Randomized Trees for Real-Time Keypoint Recognition
Randomized Trees for Real-Time Keypoint Recognition Vincent Lepetit Pascal Lagger Pascal Fua Computer Vision Laboratory École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland Email:
siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service
siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service Ahmad Pahlavan Tafti 1, Hamid Hassannia 2, and Zeyun Yu 1 1 Department of Computer Science, University of Wisconsin -Milwaukee,
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH
MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH VRVis Research Center for Virtual Reality and Visualization, Virtual Habitat, Inffeldgasse
How To Write A Compact Video Description For Copy Detection With Precise Alignment
Compact Video Description for Copy Detection with Precise Temporal Alignment Matthijs Douze, Hervé Jégou, Cordelia Schmid, Patrick Pérez To cite this version: Matthijs Douze, Hervé Jégou, Cordelia Schmid,
Probabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} References
Object Recognition. Selim Aksoy. Bilkent University [email protected]
Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] Image classification Image (scene) classification is a fundamental
FastKeypointRecognitioninTenLinesofCode
FastKeypointRecognitioninTenLinesofCode Mustafa Özuysal Pascal Fua Vincent Lepetit Computer Vision Laboratory École Polytechnique Fédérale de Lausanne(EPFL) 115 Lausanne, Switzerland Email: {Mustafa.Oezuysal,
Simultaneous Gamma Correction and Registration in the Frequency Domain
Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong [email protected] William Bishop [email protected] Department of Electrical and Computer Engineering University
Interactive person re-identification in TV series
Interactive person re-identification in TV series Mika Fischer Hazım Kemal Ekenel Rainer Stiefelhagen CV:HCI lab, Karlsruhe Institute of Technology Adenauerring 2, 76131 Karlsruhe, Germany E-mail: {mika.fischer,ekenel,rainer.stiefelhagen}@kit.edu
Automatic georeferencing of imagery from high-resolution, low-altitude, low-cost aerial platforms
Automatic georeferencing of imagery from high-resolution, low-altitude, low-cost aerial platforms Amanda Geniviva, Jason Faulring and Carl Salvaggio Rochester Institute of Technology, 54 Lomb Memorial
AN EFFICIENT HYBRID REAL TIME FACE RECOGNITION ALGORITHM IN JAVA ENVIRONMENT ABSTRACT
AN EFFICIENT HYBRID REAL TIME FACE RECOGNITION ALGORITHM IN JAVA ENVIRONMENT M. A. Abdou*, M. H. Fathy** *Informatics Research Institute, City for Scientific Research and Technology Applications (SRTA-City),
FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION
FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G. Lowe Computer Science Department, University of British Columbia, Vancouver, B.C., Canada [email protected],
Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm
Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm Nandakishore Ramaswamy Qualcomm Inc 5775 Morehouse Dr, Sam Diego, CA 92122. USA [email protected] K.
Character Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
Query Specific Fusion for Image Retrieval
Query Specific Fusion for Image Retrieval Shaoting Zhang 2,MingYang 1, Timothee Cour 1,KaiYu 1, and Dimitris N. Metaxas 2 1 NEC Laboratories America, Inc. Cupertino, CA 95014 {myang,timothee,kyu}@sv.nec-labs.com
Image Segmentation and Registration
Image Segmentation and Registration Dr. Christine Tanner ([email protected]) Computer Vision Laboratory, ETH Zürich Dr. Verena Kaynig, Machine Learning Laboratory, ETH Zürich Outline Segmentation
Automatic Grocery Shopping Assistant
Automatic Grocery Shopping Assistant Linda He Yi Department of Electrical Engineering Stanford University Stanford, CA [email protected] Feiqiao Brian Yu Department of Electrical Engineering Stanford University
Video compression: Performance of available codec software
Video compression: Performance of available codec software Introduction. Digital Video A digital video is a collection of images presented sequentially to produce the effect of continuous motion. It takes
A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms
A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms H. Kandil and A. Atwan Information Technology Department, Faculty of Computer and Information Sciences, Mansoura University,El-Gomhoria
Computer Forensics Application. ebay-uab Collaborative Research: Product Image Analysis for Authorship Identification
Computer Forensics Application ebay-uab Collaborative Research: Product Image Analysis for Authorship Identification Project Overview A new framework that provides additional clues extracted from images
False alarm in outdoor environments
Accepted 1.0 Savantic letter 1(6) False alarm in outdoor environments Accepted 1.0 Savantic letter 2(6) Table of contents Revision history 3 References 3 1 Introduction 4 2 Pre-processing 4 3 Detection,
Face Model Fitting on Low Resolution Images
Face Model Fitting on Low Resolution Images Xiaoming Liu Peter H. Tu Frederick W. Wheeler Visualization and Computer Vision Lab General Electric Global Research Center Niskayuna, NY, 1239, USA {liux,tu,wheeler}@research.ge.com
Introduction. Selim Aksoy. Bilkent University [email protected]
Introduction Selim Aksoy Department of Computer Engineering Bilkent University [email protected] What is computer vision? What does it mean, to see? The plain man's answer (and Aristotle's, too)
Edge tracking for motion segmentation and depth ordering
Edge tracking for motion segmentation and depth ordering P. Smith, T. Drummond and R. Cipolla Department of Engineering University of Cambridge Cambridge CB2 1PZ,UK {pas1001 twd20 cipolla}@eng.cam.ac.uk
AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION
AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION Saurabh Asija 1, Rakesh Singh 2 1 Research Scholar (Computer Engineering Department), Punjabi University, Patiala. 2 Asst.
Digital Image Processing EE368/CS232
Digital Image Processing EE368/CS232 Bernd Girod, Gordon Wetzstein Department of Electrical Engineering Stanford University Digital Image Processing: Bernd Girod, 2013-2014 Stanford University -- Introduction
Segmentation of building models from dense 3D point-clouds
Segmentation of building models from dense 3D point-clouds Joachim Bauer, Konrad Karner, Konrad Schindler, Andreas Klaus, Christopher Zach VRVis Research Center for Virtual Reality and Visualization, Institute
Similarity Search in a Very Large Scale Using Hadoop and HBase
http:/cedric.cnam.fr Similarity Search in a Very Large Scale Using Hadoop and HBase Rapport de recherche CEDRIC 2012 http://cedric.cnam.fr Stanislav Barton (CNAM), Vlastislav Dohnal (Masaryk University),
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
Camera geometry and image alignment
Computer Vision and Machine Learning Winter School ENS Lyon 2010 Camera geometry and image alignment Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique,
Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
Android Ros Application
Android Ros Application Advanced Practical course : Sensor-enabled Intelligent Environments 2011/2012 Presentation by: Rim Zahir Supervisor: Dejan Pangercic SIFT Matching Objects Android Camera Topic :
Open issues and research trends in Content-based Image Retrieval
Open issues and research trends in Content-based Image Retrieval Raimondo Schettini DISCo Universita di Milano Bicocca [email protected] www.disco.unimib.it/schettini/ IEEE Signal Processing Society
Situated Visualization with Augmented Reality. Augmented Reality
, Austria 1 Augmented Reality Overlay computer graphics on real world Example application areas Tourist navigation Underground infrastructure Maintenance Games Simplify Qualcomm Vuforia [Wagner ISMAR 2008]
FACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 2, February 2014,
VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS
VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS Norbert Buch 1, Mark Cracknell 2, James Orwell 1 and Sergio A. Velastin 1 1. Kingston University, Penrhyn Road, Kingston upon Thames, KT1 2EE,
What is a DSLR and what is a compact camera? And newer versions of DSLR are now mirrorless
1 2 What is a DSLR and what is a compact camera? And newer versions of DSLR are now mirrorless 3 The Parts Your camera is made up of many parts, but there are a few in particular that we want to look at
Augmented Reality Tic-Tac-Toe
Augmented Reality Tic-Tac-Toe Joe Maguire, David Saltzman Department of Electrical Engineering [email protected], [email protected] Abstract: This project implements an augmented reality version
Data Mining With Big Data Image De-Duplication In Social Networking Websites
Data Mining With Big Data Image De-Duplication In Social Networking Websites Hashmi S.Taslim First Year ME Jalgaon [email protected] ABSTRACT Big data is the term for a collection of data sets which
CARDA: Content Management Systems for Augmented Reality with Dynamic Annotation
, pp.62-67 http://dx.doi.org/10.14257/astl.2015.90.14 CARDA: Content Management Systems for Augmented Reality with Dynamic Annotation Byeong Jeong Kim 1 and Seop Hyeong Park 1 1 Department of Electronic
The Delicate Art of Flower Classification
The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC [email protected] Note: The following is my contribution to a group project for a graduate machine learning
CS231M Project Report - Automated Real-Time Face Tracking and Blending
CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, [email protected] June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android
Surgical Tools Recognition and Pupil Segmentation for Cataract Surgical Process Modeling
Surgical Tools Recognition and Pupil Segmentation for Cataract Surgical Process Modeling David Bouget, Florent Lalys, Pierre Jannin To cite this version: David Bouget, Florent Lalys, Pierre Jannin. Surgical
Applications of Deep Learning to the GEOINT mission. June 2015
Applications of Deep Learning to the GEOINT mission June 2015 Overview Motivation Deep Learning Recap GEOINT applications: Imagery exploitation OSINT exploitation Geospatial and activity based analytics
Very Low Frame-Rate Video Streaming For Face-to-Face Teleconference
Very Low Frame-Rate Video Streaming For Face-to-Face Teleconference Jue Wang, Michael F. Cohen Department of Electrical Engineering, University of Washington Microsoft Research Abstract Providing the best
UNIVERSITY OF CENTRAL FLORIDA AT TRECVID 2003. Yun Zhai, Zeeshan Rasheed, Mubarak Shah
UNIVERSITY OF CENTRAL FLORIDA AT TRECVID 2003 Yun Zhai, Zeeshan Rasheed, Mubarak Shah Computer Vision Laboratory School of Computer Science University of Central Florida, Orlando, Florida ABSTRACT In this
CS 534: Computer Vision 3D Model-based recognition
CS 534: Computer Vision 3D Model-based recognition Ahmed Elgammal Dept of Computer Science CS 534 3D Model-based Vision - 1 High Level Vision Object Recognition: What it means? Two main recognition tasks:!
A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA
A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA N. Zarrinpanjeh a, F. Dadrassjavan b, H. Fattahi c * a Islamic Azad University of Qazvin - [email protected]
Vision based Vehicle Tracking using a high angle camera
Vision based Vehicle Tracking using a high angle camera Raúl Ignacio Ramos García Dule Shu [email protected] [email protected] Abstract A vehicle tracking and grouping algorithm is presented in this work
Data Cleansing for Remote Battery System Monitoring
Data Cleansing for Remote Battery System Monitoring Gregory W. Ratcliff Randall Wald Taghi M. Khoshgoftaar Director, Life Cycle Management Senior Research Associate Director, Data Mining and Emerson Network
automatic road sign detection from survey video
automatic road sign detection from survey video by paul stapleton gps4.us.com 10 ACSM BULLETIN february 2012 tech feature In the fields of road asset management and mapping for navigation, clients increasingly
Image Classification for Dogs and Cats
Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering {bang3,yan10}@ualberta.ca Kai Zhou Department of Computing Science [email protected] Abstract
Introduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
High Level Describable Attributes for Predicting Aesthetics and Interestingness
High Level Describable Attributes for Predicting Aesthetics and Interestingness Sagnik Dhar Vicente Ordonez Tamara L Berg Stony Brook University Stony Brook, NY 11794, USA [email protected] Abstract
Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006
Practical Tour of Visual tracking David Fleet and Allan Jepson January, 2006 Designing a Visual Tracker: What is the state? pose and motion (position, velocity, acceleration, ) shape (size, deformation,
Make and Model Recognition of Cars
Make and Model Recognition of Cars Sparta Cheung Department of Electrical and Computer Engineering University of California, San Diego 9500 Gilman Dr., La Jolla, CA 92093 [email protected] Alice Chu Department
Template-based Eye and Mouth Detection for 3D Video Conferencing
Template-based Eye and Mouth Detection for 3D Video Conferencing Jürgen Rurainsky and Peter Eisert Fraunhofer Institute for Telecommunications - Heinrich-Hertz-Institute, Image Processing Department, Einsteinufer
Distinctive Image Features from Scale-Invariant Keypoints
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe Computer Science Department University of British Columbia Vancouver, B.C., Canada [email protected] January 5, 2004 Abstract This paper
PHYSIOLOGICALLY-BASED DETECTION OF COMPUTER GENERATED FACES IN VIDEO
PHYSIOLOGICALLY-BASED DETECTION OF COMPUTER GENERATED FACES IN VIDEO V. Conotter, E. Bodnari, G. Boato H. Farid Department of Information Engineering and Computer Science University of Trento, Trento (ITALY)
REPRESENTATION, CODING AND INTERACTIVE RENDERING OF HIGH- RESOLUTION PANORAMIC IMAGES AND VIDEO USING MPEG-4
REPRESENTATION, CODING AND INTERACTIVE RENDERING OF HIGH- RESOLUTION PANORAMIC IMAGES AND VIDEO USING MPEG-4 S. Heymann, A. Smolic, K. Mueller, Y. Guo, J. Rurainsky, P. Eisert, T. Wiegand Fraunhofer Institute
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
A Comparison of Keypoint Descriptors in the Context of Pedestrian Detection: FREAK vs. SURF vs. BRISK
A Comparison of Keypoint Descriptors in the Context of Pedestrian Detection: FREAK vs. SURF vs. BRISK Cameron Schaeffer Stanford University CS Department [email protected] Abstract The subject of keypoint
Text Localization & Segmentation in Images, Web Pages and Videos Media Mining I
Text Localization & Segmentation in Images, Web Pages and Videos Media Mining I Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} PSNR_Y
BRIEF: Binary Robust Independent Elementary Features
BRIEF: Binary Robust Independent Elementary Features Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua CVLab, EPFL, Lausanne, Switzerland e-mail: [email protected] Abstract.
Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning
Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning By: Shan Suthaharan Suthaharan, S. (2014). Big data classification: Problems and challenges in network
Geo-Services and Computer Vision for Object Awareness in Mobile System Applications
Geo-Services and Computer Vision for Object Awareness in Mobile System Applications Authors Patrick LULEY, Lucas PALETTA, Alexander ALMER JOANNEUM RESEARCH Forschungsgesellschaft mbh, Institute of Digital
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
A Genetic Algorithm-Evolved 3D Point Cloud Descriptor
A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal
Context-aware Library Management System using Augmented Reality
International Journal of Electronic and Electrical Engineering. ISSN 0974-2174 Volume 7, Number 9 (2014), pp. 923-929 International Research Publication House http://www.irphouse.com Context-aware Library
Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance
2012 IEEE International Conference on Multimedia and Expo Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance Rogerio Feris, Sharath Pankanti IBM T. J. Watson Research Center
IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS
IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS Alexander Velizhev 1 (presenter) Roman Shapovalov 2 Konrad Schindler 3 1 Hexagon Technology Center, Heerbrugg, Switzerland 2 Graphics & Media
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles [email protected] and lunbo
Statistical Modeling of Huffman Tables Coding
Statistical Modeling of Huffman Tables Coding S. Battiato 1, C. Bosco 1, A. Bruna 2, G. Di Blasi 1, G.Gallo 1 1 D.M.I. University of Catania - Viale A. Doria 6, 95125, Catania, Italy {battiato, bosco,
