Web 3.0 image search: a World First

Web 3.0 image search: a World First The digital age has provided a virtually free worldwide digital distribution infrastructure through the internet. Many areas of commerce, government and academia have and continue to digitize their visual image assets to take advantage of this low-cost distribution and communication vehicle, in order to reach a broader audience and provide that audience with much deeper accessibility to information, products and services. Invisible images Whilst the internet is the perfect distribution architecture to bring digital images to the user, there remains a distinct bottleneck in accessing images. Legacy image search systems rely on producers of images to annotate each one with captions and keywords describing the content of the image. Only then can current technology carry out a search request on that text to retrieve an image. Without caption and keyword information, an image is effectively invisible to existing search methods. Some typical examples are given below. Annotation: Colour image, Photography, Horizontal, Mountains, Snow, Trees, Water, Lake, Forest, Rock, Sky, Landscape Annotation: Colour image, Photography, Horizontal, Grandfather, Elderly Male, Child, Baby Annotation: Colour image, Photography, Horizontal, Group, 2 Couples, 2 Mid Adult Male Caucasian, 2 Mid Adult Female, Man, Woman, Trees Cost of production The high cost of tagging images with words means that traditionally only professional image producers and aggregators can carry the burden of cost associated with priming images with captions and keywords for search. The average cost is between $1-3 US dollars per image, depending on the level of keywording; whether the image is part of a bulk collection outsourced to a professional keywording company; is an internal cost for a production company or indeed is opportunity cost to the individual photographer. With billions of archived images waiting to be digitised and shared, one of the main prohibitors to digitisation is the cost associated with keywording images to make them visible to today s search algorithms.

Making the invisible, visible At our inception the imense vision was to make all the worlds digital images searchable, independent of keywords. In order to carry out this grand vision, imense has created a unique portfolio of products combining many years of research in computer vision, machine learning, natural language processing and probabilistic inference techniques. The imense portfolio of products brings in a new era of image search and classification. Our products can provide efficient means of searching images that do not have keywords or tags and when combined with any existing keywords and tags deliver unparalleled search results. Web 3.0 image search combining content & keywords A world s first, the imense Web 3.0 image search platform allows users to describe the content they require in text format and retrieve accurate results whether the images have keywords or not. Technology features overview Automatic Image Classification Creates a combined visual content & metadata index for image search Semantic Search Understands syntax and meaning of search queries for more accurate retrieval Statistical Ranking of Concepts Adds relevance weighting to each concept within an image for more accurate search results Ontological Reasoning Reasons about visual content when keywords or specific classifiers are not evident Spatial Search User can query for concepts in particular areas of an image (e.g. man on left, copy space on right) Automatic Image Classification As the image passes through the classification process the system automatically identifies regions, scenes, objects, facial aspects and spatial positions of those regions, objects and faces within the image. As part of this process the attributes within the image are given statistical relevancy based on how they typify the concept. All information about the image is then stored mathematically in many hundreds of dimensional vectors, within an index, which is independent of language.

Automatic classification of the content of an image lends itself to many applications, combining this with existing metadata allows users to search more accurately, for many more things in an image, in addition to making images with poor or non-existent keywords visible for the first time at a dramatically reduced cost compared with manually adding keywords. Semantic Search The second part of the system is a unique retrieval architecture, which understands the syntax and meaning of a users query and uses a linguistic ontology to translate this into a query against the visual ontology index and any metadata or keywords associated with the image. The retrieval system takes textual queries and reasons about them through understanding their syntax and meaning. For example, in a traditional system if a user queries beach without people the text system looks for the words beach and people and does not understand the meaning of without. Imense beach without people Google beach without people However the imense system understands the meaning of the phrase and delivers only images with a very low probabilistic rating of people being within the image. This level of semantic understanding combined with the capability to understand the content of an image allows users to more fully express their content wishes for a more effective and more satisfying search experience.

Statistical Ranking of Concepts As part of the classification process, statistical weighting is added to each identified region, object, scene or facial characteristic within an image, as to how relevant it is to the image. This dramatically improves the quality of search results compared with systems that rely on keywords. 5 people, Group, 3 females, 2 males, Mid Adults, African male, Caucasian male, Asian female, Outside, Summer, Sky, Water, Rocks, Sand, Beach Blue, Yellow, Clouds, Sky, Water, Sand, Beach, Waves, Horizon For example, here we have two images one is a typical beach scene, with 50% sand, some water and sky; the second is a group of people with a small amount of sand and water in the background. Using traditional image search systems and assuming each image had been annotated, both images would have beach within the annotation. So when querying beach against a traditional text index, both images would be returned with the same ranking. 5 people 95%, Group 94%, 3 female 95%, 2 males 95%, Mid Adults 80%, African male 78%, Caucasian male 78%, Asian female 70% Outside 90%, Summer 90%, Sky 80% Water 20%, Rocks 20%, Sand 5%, Beach 2% Blue 90%, Yellow 90%, Clouds 90%, Sky 90%, Water 90%, Sand 90%, Beach 90%, Waves 80%, Horizon 80%, Copy space top left, Copy space top right In contrast, the imense system automatically understands the first image is 90% relevant against a 2% relevance for the second image and so returns are ranked in this way. Since the imense system is based on statistical relevance, rather than keywords alone, the more images in a search index, the better search results become. This is the converse of traditional algorithms where the more images we have, the more keywords we have to choose from and hence poorer and poorer results. Many organizations struggle to refine metadata structures and revise controlled vocabularies in an effort to improve search results. Whilst organizations with large budgets have been able to do this in the past, as image archives grow, the task is increasingly complex and expensive. Automated statistical weighting of the relevance of a concept within an image brings accuracy to search results that keywording can never hope to achieve.

Ontological Reasoning - Linguistic & Visual Ontologies Ontological reasoning is the cornerstone of the semantic web, a vision of a future where machines are able to reason about various aspects of available information to produce more comprehensive and semantically relevant results to search queries. Rather than simply matching keywords, the web of the future will make use of ontologies to understand the relationship between disparate pieces of information in order to more accurately analyse and retrieve information. As part of the classification process, image content is classified into regions, scenes, objects and facial aspects, such as gender, ethnicity, age etc. These are then stored mathematically as dimensional vectors in a visual ontology, which maps the relationship of particular concepts to other concepts. For example, if a region is classified as a multiple connected object with 2 5 sub objects with fur texture then this has a relationship to the concept of animal. We may then identify regions of the colors brown and black which may also then be associated to the overall object. All of this is then stored mathematically as a visual ontology of the image (in other words, a map of the relationships of the attributes within the image) The linguistic ontology then allows a user to type in something like Alsatian, the system may not have a specific classifier for Alsatian, however the linguistic ontology understands that an Alsatian is an animal with four legs, mainly brown and black and so we can use this information to interrogate the visual ontology for the most accurate result. Similarly if a user queries Camel we may not have the keyword Camel in metadata or a specific classifier for it. At this point the linguistic ontology tells us that a camel is a large desert dwelling animal with yellowish fur. So we use our visual ontology to look for large animals in the desert with yellowish fur.

Spatial Search As part of the classification process, the spatial context of identified regions, objects, scenes and faces is encoded within the index. This means the system can return semantically accurate results for queries involving spatial prepositions such as with, next to, on, beside against etc. In addition to querying properties which are in the top bottom center left or right of an image, such as copy space etc. For example a user can type one woman and specify copy space on the left or right, as below. Summary Automatic Image Classification creates a combined visual content & metadata index used for visual search, this process not only allows images, which have not been tagged or keyworded to be searched, but for those images which have been tagged delivers unparalleled search results. In addition the Semantic Search capabilities then allow a user to pose queries that traditional search engines cannot understand, such as beach without people, for more accurate search results. Statistical Ranking then brings an accuracy to search results which keywording can never hope to achieve, by automatically understanding how relevant a particular concept is to an image and ordering the results accordingly. The system also automatically understands when a query does not match any keywords or concepts it understands, at this point it is able to reason about content through Ontological analysis and retrieve results through this method. Finally, the imense system allows user to pose previously impossible queries, such as woman with copy space on right or left or couple on left with trees on right. The imense Web 3.0 search system dramatically reduces the cost of digitisation by removing much of the need for keywording or tags. Whilst at the same time providing an unparalleled search experience for any organisation using digital images as part of their workflow. For more information, please contact sales@imense.com