Web 3.0 image search: a World First

Similar documents
Content-Based Image Retrieval

Visualization methods for patent data

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

Pragmatic Web 4.0. Towards an active and interactive Semantic Media Web. Fachtagung Semantische Technologien September 2013 HU Berlin

Object Class Recognition using Images of Abstract Regions

Filters for Black & White Photography

Big Data: Rethinking Text Visualization

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Implementing Topic Maps 4 Crucial Steps to Successful Enterprise Knowledge Management. Executive Summary

Search Result Optimization using Annotators

The Flat Shape Everything around us is shaped

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Taxonomy Enterprise System Search Makes Finding Files Easy

Auto-Classification for Document Archiving and Records Declaration

BACHELOR OF ARTS (APPLIED ARTS) (3D ANIMATION) PORTFOLIO REQUIREMENTS

Open issues and research trends in Content-based Image Retrieval

Animal Adaptations. Standards. Multiple Intelligences Utilized. Teaching First Step Nonfiction. Titles in this series: Reading.

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Auto-Classification in SharePoint. How BA Insight AutoClassifier Integrates with the SharePoint Managed Metadata Service

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Digital Asset Management and Controlled Vocabulary

Arya Progen Technologies & Engineering India Pvt. Ltd.

CLOUD BASED SEMANTIC EVENT PROCESSING FOR

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

The Delicate Art of Flower Classification

Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web

Recent Interview with Dean Haritos, CEO of PushMX Software of Silicon Valley, California

Ridiculously Good Outsourcing. The Monetization of Big Data: Made Possible By Humans. (888) TASK

Knowledge Discovery from patents using KMX Text Analytics

How To Make Sense Of Data With Altilia

Administrator s Guide

Chapter 3 Data Warehouse - technological growth

Digital Photography for Adults

UTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES

Search Engine Submission

CSC384 Intro to Artificial Intelligence

Dimensional Modeling 101. Presented by: Michael Davis CEO OmegaSoft,LLC

Service Road Map for ANDS Core Infrastructure and Applications Programs

SITE OPTIMIZATION OVERVIEW

Best Practices for Structural Metadata Version 1 Yale University Library June 1, 2008

Search Engine Optimisation Guide May 2009

Extracting and Preparing Metadata to Make Video Files Searchable

The SEO Performance Platform

Professor, D.Sc. (Tech.) Eugene Kovshov MSTU «STANKIN», Moscow, Russia

Meeting the challenges of today s oil and gas exploration and production industry.

100 People: A World Portrait. Lesson Plan.

White Paper. Enterprise IPTV and Video Streaming with the Blue Coat ProxySG >

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Organizing image files in Lightroom part 2

ANIMATION a system for animation scene and contents creation, retrieval and display

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Digital Asset Management (DAM):

Research of Postal Data mining system based on big data

WATER BODY EXTRACTION FROM MULTI SPECTRAL IMAGE BY SPECTRAL PATTERN ANALYSIS

OVERVIEW OF JPSEARCH: A STANDARD FOR IMAGE SEARCH AND RETRIEVAL

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Administrator's Guide

1 Introduction. Rhys Causey 1,2, Ronald Baecker 1,2, Kelly Rankin 2, and Peter Wolf 2


QUALITY CONTROL PROCESS FOR TAXONOMY DEVELOPMENT

Importance of Metadata in Digital Asset Management

Structured Content: the Key to Agile. Web Experience Management. Introduction

The University of Jordan

A terminology model approach for defining and managing statistical metadata

Wealth Inequality and Racial Wealth Accumulation. Jessica Gordon Nembhard, Ph.D. Assistant Professor, African American Studies

Case Study. Application Development & Modernization ERP System. Case Study. Nations Photo Lab (Photo finishing Industry)

FOSS, 24th April 2014 Digital Image Management

Search Engine Design understanding how algorithms behind search engines are established

CONCEPTCLASSIFIER FOR SHAREPOINT

Why are Organizations Interested?

Exercise 1 : Branding with Confidence

Perception of Light and Color

Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities

M3039 MPEG 97/ January 1998

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

'Developments and benefits of hydrographic surveying using multispectral imagery in the coastal zone

Predicate logic Proofs Artificial intelligence. Predicate logic. SET07106 Mathematics for Software Engineering

Relieve Marketing Asset Chaos and Drive New Levels of Brand Consistency

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Transcription:

Web 3.0 image search: a World First The digital age has provided a virtually free worldwide digital distribution infrastructure through the internet. Many areas of commerce, government and academia have and continue to digitize their visual image assets to take advantage of this low-cost distribution and communication vehicle, in order to reach a broader audience and provide that audience with much deeper accessibility to information, products and services. Invisible images Whilst the internet is the perfect distribution architecture to bring digital images to the user, there remains a distinct bottleneck in accessing images. Legacy image search systems rely on producers of images to annotate each one with captions and keywords describing the content of the image. Only then can current technology carry out a search request on that text to retrieve an image. Without caption and keyword information, an image is effectively invisible to existing search methods. Some typical examples are given below. Annotation: Colour image, Photography, Horizontal, Mountains, Snow, Trees, Water, Lake, Forest, Rock, Sky, Landscape Annotation: Colour image, Photography, Horizontal, Grandfather, Elderly Male, Child, Baby Annotation: Colour image, Photography, Horizontal, Group, 2 Couples, 2 Mid Adult Male Caucasian, 2 Mid Adult Female, Man, Woman, Trees Cost of production The high cost of tagging images with words means that traditionally only professional image producers and aggregators can carry the burden of cost associated with priming images with captions and keywords for search. The average cost is between $1-3 US dollars per image, depending on the level of keywording; whether the image is part of a bulk collection outsourced to a professional keywording company; is an internal cost for a production company or indeed is opportunity cost to the individual photographer. With billions of archived images waiting to be digitised and shared, one of the main prohibitors to digitisation is the cost associated with keywording images to make them visible to today s search algorithms.

Making the invisible, visible At our inception the imense vision was to make all the worlds digital images searchable, independent of keywords. In order to carry out this grand vision, imense has created a unique portfolio of products combining many years of research in computer vision, machine learning, natural language processing and probabilistic inference techniques. The imense portfolio of products brings in a new era of image search and classification. Our products can provide efficient means of searching images that do not have keywords or tags and when combined with any existing keywords and tags deliver unparalleled search results. Web 3.0 image search combining content & keywords A world s first, the imense Web 3.0 image search platform allows users to describe the content they require in text format and retrieve accurate results whether the images have keywords or not. Technology features overview Automatic Image Classification Creates a combined visual content & metadata index for image search Semantic Search Understands syntax and meaning of search queries for more accurate retrieval Statistical Ranking of Concepts Adds relevance weighting to each concept within an image for more accurate search results Ontological Reasoning Reasons about visual content when keywords or specific classifiers are not evident Spatial Search User can query for concepts in particular areas of an image (e.g. man on left, copy space on right) Automatic Image Classification As the image passes through the classification process the system automatically identifies regions, scenes, objects, facial aspects and spatial positions of those regions, objects and faces within the image. As part of this process the attributes within the image are given statistical relevancy based on how they typify the concept. All information about the image is then stored mathematically in many hundreds of dimensional vectors, within an index, which is independent of language.

Automatic classification of the content of an image lends itself to many applications, combining this with existing metadata allows users to search more accurately, for many more things in an image, in addition to making images with poor or non-existent keywords visible for the first time at a dramatically reduced cost compared with manually adding keywords. Semantic Search The second part of the system is a unique retrieval architecture, which understands the syntax and meaning of a users query and uses a linguistic ontology to translate this into a query against the visual ontology index and any metadata or keywords associated with the image. The retrieval system takes textual queries and reasons about them through understanding their syntax and meaning. For example, in a traditional system if a user queries beach without people the text system looks for the words beach and people and does not understand the meaning of without. Imense beach without people Google beach without people However the imense system understands the meaning of the phrase and delivers only images with a very low probabilistic rating of people being within the image. This level of semantic understanding combined with the capability to understand the content of an image allows users to more fully express their content wishes for a more effective and more satisfying search experience.

Statistical Ranking of Concepts As part of the classification process, statistical weighting is added to each identified region, object, scene or facial characteristic within an image, as to how relevant it is to the image. This dramatically improves the quality of search results compared with systems that rely on keywords. 5 people, Group, 3 females, 2 males, Mid Adults, African male, Caucasian male, Asian female, Outside, Summer, Sky, Water, Rocks, Sand, Beach Blue, Yellow, Clouds, Sky, Water, Sand, Beach, Waves, Horizon For example, here we have two images one is a typical beach scene, with 50% sand, some water and sky; the second is a group of people with a small amount of sand and water in the background. Using traditional image search systems and assuming each image had been annotated, both images would have beach within the annotation. So when querying beach against a traditional text index, both images would be returned with the same ranking. 5 people 95%, Group 94%, 3 female 95%, 2 males 95%, Mid Adults 80%, African male 78%, Caucasian male 78%, Asian female 70% Outside 90%, Summer 90%, Sky 80% Water 20%, Rocks 20%, Sand 5%, Beach 2% Blue 90%, Yellow 90%, Clouds 90%, Sky 90%, Water 90%, Sand 90%, Beach 90%, Waves 80%, Horizon 80%, Copy space top left, Copy space top right In contrast, the imense system automatically understands the first image is 90% relevant against a 2% relevance for the second image and so returns are ranked in this way. Since the imense system is based on statistical relevance, rather than keywords alone, the more images in a search index, the better search results become. This is the converse of traditional algorithms where the more images we have, the more keywords we have to choose from and hence poorer and poorer results. Many organizations struggle to refine metadata structures and revise controlled vocabularies in an effort to improve search results. Whilst organizations with large budgets have been able to do this in the past, as image archives grow, the task is increasingly complex and expensive. Automated statistical weighting of the relevance of a concept within an image brings accuracy to search results that keywording can never hope to achieve.

Ontological Reasoning - Linguistic & Visual Ontologies Ontological reasoning is the cornerstone of the semantic web, a vision of a future where machines are able to reason about various aspects of available information to produce more comprehensive and semantically relevant results to search queries. Rather than simply matching keywords, the web of the future will make use of ontologies to understand the relationship between disparate pieces of information in order to more accurately analyse and retrieve information. As part of the classification process, image content is classified into regions, scenes, objects and facial aspects, such as gender, ethnicity, age etc. These are then stored mathematically as dimensional vectors in a visual ontology, which maps the relationship of particular concepts to other concepts. For example, if a region is classified as a multiple connected object with 2 5 sub objects with fur texture then this has a relationship to the concept of animal. We may then identify regions of the colors brown and black which may also then be associated to the overall object. All of this is then stored mathematically as a visual ontology of the image (in other words, a map of the relationships of the attributes within the image) The linguistic ontology then allows a user to type in something like Alsatian, the system may not have a specific classifier for Alsatian, however the linguistic ontology understands that an Alsatian is an animal with four legs, mainly brown and black and so we can use this information to interrogate the visual ontology for the most accurate result. Similarly if a user queries Camel we may not have the keyword Camel in metadata or a specific classifier for it. At this point the linguistic ontology tells us that a camel is a large desert dwelling animal with yellowish fur. So we use our visual ontology to look for large animals in the desert with yellowish fur.

Spatial Search As part of the classification process, the spatial context of identified regions, objects, scenes and faces is encoded within the index. This means the system can return semantically accurate results for queries involving spatial prepositions such as with, next to, on, beside against etc. In addition to querying properties which are in the top bottom center left or right of an image, such as copy space etc. For example a user can type one woman and specify copy space on the left or right, as below. Summary Automatic Image Classification creates a combined visual content & metadata index used for visual search, this process not only allows images, which have not been tagged or keyworded to be searched, but for those images which have been tagged delivers unparalleled search results. In addition the Semantic Search capabilities then allow a user to pose queries that traditional search engines cannot understand, such as beach without people, for more accurate search results. Statistical Ranking then brings an accuracy to search results which keywording can never hope to achieve, by automatically understanding how relevant a particular concept is to an image and ordering the results accordingly. The system also automatically understands when a query does not match any keywords or concepts it understands, at this point it is able to reason about content through Ontological analysis and retrieve results through this method. Finally, the imense system allows user to pose previously impossible queries, such as woman with copy space on right or left or couple on left with trees on right. The imense Web 3.0 search system dramatically reduces the cost of digitisation by removing much of the need for keywording or tags. Whilst at the same time providing an unparalleled search experience for any organisation using digital images as part of their workflow. For more information, please contact sales@imense.com