A Content-Based Image Meta-Search Engine using Relevance Feedback

Size: px
Start display at page:

Download "A Content-Based Image Meta-Search Engine using Relevance Feedback"

Transcription

1 A Content-Based Image Meta-Search Engine using Relevance Feedback Ana B. Benitez, Mandis Beigi, and Shih-Fu Chang Department of Electrical Engineering & New Media Technology Center Columbia University New York, NY {ana, mandis, sfchang ee.columbia.edu ABSTRACT Search engines are the most powerful resources for finding information on the rapidly expanding World-Wide Web. Finding the desired search engines and learning how to use them, however, can be very time consuming. Metasearch engines, which integrate a group of such search tools, enable users to access information across the world in a transparent and more efficient manner. The recent emergence of visual information retrieval (VIR) systems on the Web is leading to the same efficiency problem. This paper describes MetaSEEk, a meta-search engine used for retrieving images based on their visual content on the Web. MetaSEEk is designed to intelligently select and interface with multiple on-line image search engines by ranking their performance for different classes of user queries. User feedback is also integrated in the ranking refinement. MetaSEEk has been developed to explore the issues involved in querying large, distributed, on-line visual information system sources. We compare MetaSEEk with the previous version of the system and a base line meta-search engine. The base line system does not use the feedback of previous searches in recommending target search engines for future queries. Keywords: meta-search engine, content-based visual query, visual information retrieval system, performance monitoring, relevance feedback, image clustering, World Wide Web 1. INTRODUCTION Over the past few years, the World-Wide Web has experienced an explosive growth in terms of the volume of content and users. One of the key factors for the Web s increasing popularity has been the simultaneous development of catalogs and search engines that assist users in navigating the Web. These powerful resources try to index either the entire plethora of documents on the World-Wide Web or specialized sections of content. The former search engines are known as large-scale search engines. They usually fail to discriminate between desired data and unneeded information due to generality. The latter type of search engines is known as specialty search engine. They are more effective in finding data in special areas, but they can not be applied to general topics. In order to locate information of interest, an experienced user of the Internet would begin to query the appropriate specialty search engines, and continue querying general search engines only when the specialty ones fail to yield satisfactory information. Nevertheless, the proliferation of individual search engines has replaced the problem of finding information on the Internet with the problem of knowing where search engines are, what their strength is, and how to use them. Consequently, searching the Web for specific information has become a very time consuming and inefficient task for even experienced users. This situation has motivated the recent research and development in integrate search or meta-search engines [8]. Meta-search engines serve as common gateways, which automatically link users to multiple and maybe competing search engines. They accept requests from users, sometimes, along with user-specified query plans to select the target search engines. The meta-search engines may also keep track of the past performance of each

2 search engine and use it in selecting target search engines for future queries. Many approaches have been proposed for meta-searching. Section 2 presents an overview of these approaches, the majority of which have been designed for text databases. Digital images and video are becoming an integral part of human communications [5]. The ease of creating and capturing digital imagery has trigged the recent development of visual information retrieval (VIR) systems on the web [2][10][18][20][21]. These systems usually provide methods for retrieving digital images by using examples and/or visual sketches. In order to query the visual repositories, the visual features of the imagery, such as colors, textures, shapes, etc, are used in combination with text and other related information. Currently, users are finding new VIR systems on-line what leads, once again, to the problem of efficiently and effectively retrieving the information of interest from the appropriate visual information repositories. We have developed a prototype content-based meta-search engine for images, MetaSEEk [4], to investigate the issues involved with efficiently querying large, distributed on-line visual information sources. Our meta-search engine, MetaSEEk, adopts the principle that Web resources should be used efficiently. For each query, MetaSEEk selects and queries the target engines that may provide the desired results by weighing search tools previous successes and failures in similar query conditions. The system keeps track of each target search engine s performance by integrating user feedback. An important issue examined in MetaSEEk is the reliability of the selection and ranking of the target search engines for different types of queries. Another important technical aspect is the heterogeneity among the different target search engines and possible technical approaches to enhancing interoperability. Implementation of the meta-search engine is described in section 3. Section 4 describes the experiments, the evaluation measures and the comparison between the MetaSEEk prototype, the previous version of the system [4], and a base line search-engine that randomly selects the search engines to send the queries to. Open issues for future research are described in section 5. Finally, section 6 summarizes the results of this paper. 2. RELATED RESEARCH Meta-search engines serve as common gateways linking users to multiple search engines in a transparent manner. Working meta-search engines usually include three basic components, as depicted in Figure 1 [8]. The dispatching component selects the target search engines for each query. The query translator translates the user-specified query to compatible scripts to each target search engine. The display interface component merges the query results from each search engine, removes duplicates, and displays them to the user in a uniform format. Meta-search tools can be classified, in first instance, as having automatic or manual query dispatchers. Automated ones decide the search engines to be queried with no direct control from the users. They may either query all supported target search engines, or try to select the most suitable search engines for each specific query based on some sort of knowledge. The later option is the preferable one in minimizing resource consumption with a satisfactory search quality. On the contrary, in manual meta-search engines the selection process is entirely up to the user. In most manual meta-search engines, only one search engine is activated at a time, and the resulting data is displayed in the original format of the search engine that produced it. At the present time the wealth of meta-search engines on the WWW is still growing. Many approaches have been proposed for meta-searching. We will overview only a few of these efforts. The GlOSS (Glossary-of-Servers Server) project [13] uses a meta-index to estimate which databases are potentially most useful for a given query. This meta-index is constructed by integrating the indexes of target databases. For each database and each word, the number of documents containing that word is included in the metaindex. The two main drawbacks of this approach are: first, it requires each of the search engines to cooperate with the meta-searcher by supplying up-to-date indexing information, and second, as the number of databases increases, the complexity may become prohibitive.

3 Target Search Engine 1 Target Search Engine 2 Target Search Engine n Query Translator Query Translator... Query Translator Meta Search Engine Query Dispatcher Performance Monitor Performance Database Display Interface Result Merger Format Converter Client Figure 1: Basic components of a meta-search engine. The Harvest system [3] was developed by the Internet Research Task Force Research Group on Resource Discovery (IRTF-RD). Harvest is an integrated set of tools to gather, extract, organize, search, cache, and replicate relevant information across the Internet. Its two main subsystems are the Gatherer and the Broker. The Gatherer collects information from one or more designated sites. The Broker collects information from one or more Gatherers or other Brokers, indexes the retrieved data and provides a WWW query interface to them. This system is intended to be a scalable form of infrastructure for building and distributing content, indexing information, as well as for accessing Web information. Wide Area Information Servers (WAIS) [14] is an Internet system in which specialized subject databases are created at multiple server locations, kept track of by a directory of servers at one location, and made accessible for keyword searching by users. Given a query, the directory of servers is searched and the query is then forwarded to selected databases, the system returns a list of documents, ordered from most to least relevant based on keyword frequency. MetaCrawler [17] is a meta-search service that does not maintain any local database; rather it relies on the databases of various general Web-based sources. When a query is submitted, MetaCrawler dispatches queries to each one of those search engines, retrieves the HTML source of all the returned documents, and applies further analysis to clean up unavailable links and irrelevant documents. MetaCrawler obtains high accuracy, but at the cost of high network bandwidth. Other automated Web meta-searchers are Dogpile and Metafind [11]. These systems basically dispatch queries to each one of the search engines that they target and present the returned documents to the user in a uniform format. Many manual query dispatch search engines are also available on the Web. Tools such as All-in-One [6],

4 CUSI [15], search.com [7], Metasearch [1], and InterNIC [12] are pages of forms that forward queries to a number of different search engines by user selection. The SavvySearch meta-search tool [8] employs a meta-index approach for selecting relevant search engines based on the terms in a user s query. Previous experience about query successes and failures is tracked to enhance selection quality. This system combines automatic and manual query dispatching. When a query is submitted, SavvySearch proposes a search plan of target search engines into different ordered steps that the user can either regard or disregard. Their experimental findings suggest that a meta-index approach can be effective in making search engine selection decisions. However, the potentially large amount of knowledge required to make decisions raises some questions about the overall efficiency of the system. The ProFusion system [16] is a Web meta-search engine that supports both manual and automatic query dispatch. In automatic query dispatch, ProFusion analyzes the incoming queries, categorizes them, and automatically picks the best search engines for the query based on a priori knowledge (confidence factors) which represents the suitability of each search engine for each category. It uses these confidence factors to merge the search results into a re-weight list of the returned documents, removes duplicates and, optionally, broken links, and presents the final rank-ordered list to the user. ProFusion's performance has been compared to individual search engines and other meta-searchers, demonstrating its ability to retrieve more relevant information and present fewer duplicate pages. 3. METASEEK MetaSEEk is a content-based meta-search engine for images on the Web, which automatically links users to multiple image search engines on-line. We have developed MetaSEEk to investigate the issues involved with efficiently querying large, distributed on-line visual information sources. The overall architecture of MetaSEEk is shown in Figure 1. The three main components of the system include: the query translator, the query dispatcher, and the display interface component. Upon receiving a query, the dispatcher selects the target search engines to be queried by consulting the performance database at the MetaSEEk site. This database contains performance scores of past query successes and failures for each supported search option. The query translators, then, translate the user query to suitable scripts conforming to the interfaces of the selected search engines. Finally, the display component merges and ranks the results from each search option, and presents them to the user. MetaSEEk evaluates the quality of the results returned by each search option based on the user feedback. This information is used to modify the corresponding entries in the performance database. Each aforementioned component is described in detail in the following sections. In designing MetaSEEk, we adopt the principle that web resources should be used as efficiently as possible while still providing quality results. This has been the basis for many implementation decisions, such as querying only those search engines that provided good results in similar query conditions in the past, and avoiding downloading images whenever there was a feasible alternative. Queries can be submitted to MetaSEEk at The underlying system is implemented in C and currently runs on a HP platform. MetaSEEk uses socket programming to communicate with the individual target search engines in a transparent manner to the user. HTTP commands are sent to the remote search engines to pose queries and download their results, in a similar manner to web browsers such as Netscape and Mosaic. The remaining of this section is structured as follows: section 3.1 introduces the concept behind contentbased visual query; section 3.2 presents the query interface of MetaSEEk; section 3.3 describes the query dispatcher and the performance database; finally, section 3.4 describes the display interface component Content-based visual query Content-based visual queries (CBVQ) have emerged as a challenging research area in the past few years due to the ease of creating, capturing, and collecting digital imaginary. Many approaches have been proposed for

5 content-based image retrieval by extraction of significant features from the visual data, such as color, texture, shape, structure, and/or even, semantic meaning [2][10][18][20]. While text document retrieval is based on the words composing each document, there is no such clear searching unit for visual information; additional processing is required to obtain adequate and comparable representations of its visual content. In this framework, the image data can be viewed as arrays of feature vectors trying to characterize its visual content. A distance function is then required to compute the closeness of features vectors, or in other words, an approximate measure of the visual similarity among images. Content-Based visual queries are usually initiated by selecting a query image from an initial set of random sample images or a set of images returned by a keyword query, or by inputting a URL to an external image. Given the input image, the search engine computes the visual features (if necessary) of the image and uses them to retrieve matched or most similar images from its database. Keyword-based search may be used to match images with particular subjects (e.g., nature, people) and narrow down the search scope. To efficiently handle visual query retrieval in large visual databases, issues such as clustering and indexing on the feature vectors have been addressed. Images in the database may be first classified into various categories on the basis of content; the decision criteria could be on generic classification methods or special learning techniques. Indexing techniques are then applied to the image feature vectors in each category in order to support efficient access to the database. Special indexing algorithms have been proposed in the context of content-based image retrieval [9]. Each visual information retrieval system supports a specific (and often proprietary also) set of visual features with particular feature combinations, extraction methods, distance metrics, clustering strategies, and indexing algorithms. Some of these visual repositories allow visual queries to combine low level visual features (e.g. color, texture) with text and other related information. MetaSEEk provides a unique integrating tool in the heterogeneous environment of on-line VIR systems Query interface MetaSEEk currently supports the following four on-line search engines: VisualSEEk [18], WebSEEk [21], QBIC [10], and Virage [2]. Each one of these target VIR systems presents individual functionalities and limitations. VisualSEEk, QBIC and Virage provide methods for retrieving digital images on the basis of visual features by using examples. QBIC and VisualSEEk support customized searches by using visual sketches or by specifying an external image; whereas Virage permits the user to specify the importance weights of each feature in the search. QBIC also makes available image retrieval based on keyword. WebSEEk, on the other hand, is a semi-automatic image search and cataloging engine. It supports text-based and content-based searching; nevertheless currently MetaSEEk only makes use of its text-based searching capabilities. Image retrieval based on visual content usually returns a ranked list of images, which have the highest similarity to the query input. In the current version of MetaSEEk, the user interface allows for browsing of random, and for retrieving images based on visual content and on keyword from the remote search engines. When querying based on visual content, the user can select any of the example images from any of the supported databases, or input the URL to an external image. Two visual features are available for user s choice: color and texture. These two methods can be selected individually or they can be combined. Figure 2 illustrates the user interface for the MetaSEEk search engine. As noted above, not all the target search engines share the same options. The query dispatcher of MetaSEEk is aware of each database s searching capabilities and makes the decision of which search engines the queries should be sent to. The dispatch mechanism is explained in the section 3.3. In addition, one target search engine may provide different search options (e.g. color only, texture only, or combined). MetaSEEk allows multiple search options from the same target search engines to be selected at the same time. The user can specify the number of search options to be searched simultaneously, the value for the maximum waiting time, and the category of interest by using pull-down menus. The user can adjust the number of queries to be sent to the individual search engines depending on the network load. During periods of low network traffic, the user may increase the querying concurrency. The maximum waiting time prevents the query system from stalling if a search engine happens to be down, unreachable, or experiencing significant delays. In selecting a category other than All, the user is expected to submit input images that belong to that category (e.g. nature, cars).

6 This dispatching component processes this information to improve the recommendation mechanism of target search engines. Figure 2: The query interface of MetaSEEk Query dispatcher and performance database Upon receiving a query, the dispatcher selects the target search engines and the target search options to be queried. A search option is a query method on a specific search engine. For example, a search option is to query VisualSEEk search engine based on texture. The dispatcher makes the decision based on the type of query submitted to the

7 meta-search engine, and the database containing performance of past queries. If the user requests random image samples or a keyword query, the system simply poses queries to the search engines that allow these actions (QBIC, Virage and VisualSEEk support random samples; QBIC and WebSEEk support keyword queries). For content-based visual queries, the procedure of selecting target search engines is based on the performance database. This database contains performance scores indicating how well or how poorly each search option of the target search engines has performed in the past on every query image. A query image, a group of features, and a semantic category (section 3.3.1) specify a visual query. Upon receiving a visual query, MetaSEEk searches the performance database and retrieves the query image s performance scores. In the performance database structure (section 3.3.2), each row corresponds to a query image seen before. Each performance score column corresponds to one search option. The dispatch component will decide to query the search options with highest scores that agree with the user feature and semantic category selection. MetaSEEk evaluates the quality of the results returned by each search option based on user feedback (section 3.3.3). An automatic off-line probing procedure is executed to establish initial some performance scores to build a non-trivial performance database based on some training examples (3.3.4). How about new images that have not been queried before? No performance scores are recorded in the performance database. The simplest solution would be to randomly select the search options to be queried. Instead, we consider an alternative approach: recommending remote search options by relating new queries to past ones for which we have performance information. Query images in the performance database are clustered into several classes on the basis of visual content. We maintain a cluster structure for each feature query: color, texture and both features. When the user issues a query with a new query image, the system downloads the image, and matches it to the corresponding clustering structure in order to obtain a list of the most similar clusters. Selected images from the few closest clusters will be presented to the user. The dispatcher is able to recommend suitable search engines based on the average performance scores of the cluster selected by the user. Finally, the new image will be added to the performance database for future queries. Many approaches have been proposed recently to the clustering of visual data to support efficient image retrieval. We decided on the K-means clustering algorithm because of its simplicity and its reduced computation. The clustering algorithm is not executed every time a new image is added to the database, but every 10 new images. Color and texture are the features extracted locally by MetaSEEk for the clustering. The Tamura algorithm [19] is used for computing the texture feature vectors. For color, the feature vectors are calculated using the color histogram algorithm. The distance between two feature vectors is calculated using the Euclidean distance. Note that other feature vectors can be used if necessary. Our performance monitoring and search engine recommendation framework is general and can accommodate different feature vectors Semantic categorization We use semantic categorization to constraint the search scope. Visual databases usually contain special types of images. As an example, we have observed that the QBIC database includes a great number of images of people; whereas the Virage s database has more images in other categories. If a user is interested in finding the picture of a baby, it is more likely that QBIC will yield more appropriate results in less time. We try to take advantage of this fact by offering the possibility of searching within the scope of a specific category. If the user selects a category different than All, he or she is supposed to query only images included in that category. Separate performance databases and clustering structures (mentioned above) are maintained for different categories. User collaboration is required in selecting the adequate category for each search to improve the recommendation mechanism. In the current version of MetaSEEk, the user can select among ten different categories (i.e. animals, art, buildings, flowers, geography, landscapes, nature, objects, people, and transport) Database structure The meta-search database contains the feature vectors (color and texture), the performance scores, the clustering class, and the semantic category for all the images that have been queried on MetaSEEk. All this information is needed by the dispatcher to recommend suitable search engines for incoming queries. The database is organized into a hierarchical structure. The images are first classified depending on their semantic meaning (e.g. animals, people). The user guidance is used to produce the semantic meaning. The K-means clustering algorithm is then

8 used to cluster the images in each semantic category into classes on the basis of color, texture or both features. Figure 3 presents the conceptual structure of the database. Animals Art Transport Semantic Categories Color Texture Color and Texture Feature Groupings C 1 C 2 C n Clustering Classes Img 1 Img 2 Img m Query images Figure 3: Conceptual structure of the meta-search database. At the lowest level of the database, the image entries contain the information shown in Table 1. The image name is the complete URL address of the image, which may be located on a local or remote site. The performance scores are used by the query dispatcher to decide target search engines. These scores are updated every time the image is queried depending on the user feedback. The feature vectors for color and texture are calculated using the color histogram and Tamura algorithms, respectively. The K-means clustering algorithm makes use of these feature vectors in organizing the images based on visual content. Performance Score Vector (entry per search option) Image URL QBIC color percentages 7 QBIC color layout 9 QBIC texture 3 VisualSEEK color percentages -1 VisualSEEk color layout 3 VisualSEEK texture 6 Virage color NA Virage composition NA Virage texture NA Color Feature Vector (0.0000, , , ) Texture Feature Vector (0.3456, , ) Table 1: Example of image entry in the meta-search database.

9 A score of NA means that specific query option can not accept the image with that score as query input. Some search engines do not accept external images as query input (e.g. URL search). Therefore, they can only be queried when an image from their own database is selected. For all the images that do not belong to this database, the scores corresponding to search options that relate to this search engine will be set to NA Performance monitoring The dispatcher of the meta-search engine determines which search options on target search engines should be executed. As mentioned earlier, in content-based visual queries, the decision is made based on the performance metrics calculated from prior query experience, and whether or not the target engine supports the requested search. This performance metrics tries to keep track of the effectiveness of the search option in each search engine for different types of queries. The performance of individual search engines is expected to change over time as their algorithms are improved and new images are collected for the database. For this reason, we decided to construct the performance metrics from accumulated user feedback. For every image that has been queried, the performance database keeps a vector of performance indexes, one index for each search option supported by the individual search engines. The performance of each search option is a signed integer where a positive number indicates a good performance and a negative number corresponds to a poor performance. The performance metrics of a result image is updated every time the user submits that image as the query input of a content-based visual query. The visit of an image increments by one the performance metric of the search option that returned the visited image. If an image is not selected (i.e. no visit) the performance score remains unchanged. The user can also specify if he/she likes or dislikes a particular result image, which will in turn increment, or decrement the performance metric of the corresponding search engines. These modifications of the database are shown on Table 2. Event Score Visit +1 No visit 0 Like +1 Dislike -1 Table 2: The assigned values for the performance metrics. MetaSEEk removes all the duplicate images returned by different search options from the same search engine. If the user, however, clicks on the like/dislike button of an image having a duplicate, MetaSEEk will increment/decrement the performance scores for all the search options that return that image even though the duplicates will not be displayed to the user Bootstraping the performance database The above method of acquiring knowledge about performance of different search engine requires a continuous process of iterative query sessions with users to accumulate results. Non-trivial entries in the database can be obtained through this process. Before a non-trivial performance database is available, some training periods are needed. An interesting issue arises in selecting target search engines in this initial process. One solution is to use all search engines or randomly selected search engines in the initial training period. This will likely result in low efficiency or unreliable training results. In view of this, we propose to use initial performance scores based on an automatic off-line probing process. The probing procedure is carried out before the training period of MetaSEEk. In other words, the initial probing process establishes the machine-generated performance scores to be used in the training period of MetaSEEk. The performance scores based on user feedback from the training period are used for later on-line operation of MetaSEEk. During the on-line operations, the performance database is continuously updated through user feedback.

10 Experimental findings in meta-search engines for text databases suggest that a performance index approach can be effective in making search engine selection decisions [8]. However, the potentially large amount of knowledge required to make these decisions raises questions about the overall efficiency of the system. MetaSEEk deals with this problem by automatically probing the search engines with a set of sample images. This procedure is carried out before the meta-search engine is available to take queries from users. It provides some base performance knowledge to start the training of the meta-search engine. The system performance database is not built over the collected data, but rather separately. Once enough knowledge has been accumulated from training MetaSEEk, the dispatcher will discard this artificial knowledge and base its search engine selection decisions exclusively on accumulated training performance. The strategy for probing the search engines and collecting initial performance data tries to emulate users judgement for result images as relevant or non-relevant. First, forty probe images are selected for each semantic category to be the sample probe. Every probe image in this set is then posed to all search options in all search engines, and their results downloaded for further analysis. For each result image within a visual distance α from the probe image, the performance score of the search option that returned the image will be increased. On the contrary, the performance score of the search option that returned the image will be decreased, for each result image not within a visual distance δ from the sample image. When the probing is finished, the clustering algorithm is applied to the images in the sample probe based on color, texture, and both features. We determine the thresholds α and δ through subject evaluation of query results and their feature distances. On the other hand, the strategy for training the meta-search engine begins with the selection of twenty training images for each semantic category. After posing each image in this set to MetaSEEk, an expert user evaluates the results returned from each search option as relevant or non-relevant. The set of probing images and the set of training images are completely independent Display component Once the results are retrieved from each individual search engine, they are organized and presented to the user by the display component. This process depends on the type of action requested by the user: random samples, keyword search or image search. The results from a random image or a keyword search are merged randomly and presented to the user. The order of these results is not important. In content-based visual queries, the display mechanism used is more elaborated. Image search based on visual content returns a ranked list of images in the order of the closest to the farthest similarity to the query image. MetaSEEk performs additional ranking of these images by using the scores in the performance database. The result images returned by each query option are interleaved before displaying them to the user. The performance scores of the query image will determine the displaying order and the number of images in each interleaved group of the results from each search option. For example, suppose that images are retrieved from two search options with performance scores of 2 and 1. The display component will display 2 images from the search option having a score of 2, and 1 image from the search option with a score of 1 until all the returned images are displayed. The use of the above specific merging algorithm is not meant to replace the ranking algorithms used in each target search engine. Instead, it is used to cope with the heterogeneity among different algorithms used in different search engines. Unlike the consistent distance metrics used in text search engines, each visual search engine uses different algorithms and metrics. In order to evaluate the similarities of the returned images from different engines with the query input, a common set of features could be computed locally in the meta-search engine for all the returned images and be compared with the query inputs. This option, however, would be too costly for the network since most of the images would have to be downloaded in order for the feature vectors to be computed. The above selected merging method avoids this problem by simply rank interleaving the images and ignoring the actual distances between the images. Therefore it is possible for an image with a lower similarity measure to the input query to be displayed before an image with a higher similarity. Note that MetaSEEk shows the icons of the result images only. Full-resolution images are not required if new feature extraction processes are not needed at MetaSEEk.

11 4. EXPERIMENTS AND EVALUATION This version of MetaSEEk has been developed with the primary objective of investigating whether or not our recommendations of search engines for incoming queries were appropriate. Another important aspect is to determine how useful is the semantic and clustering categorization to improve the meta-search engine performance. A set of experiments was conducted to evaluate the performance of MetaSEEk: the reliability of the selection and ranking of the remote search engines for different user queries, and the profit of the semantic categorization in that task. We selected twelve images of the semantic category Animals as the set of target images for our experiment. Beginning with random samples, we tried to find each target image by successively querying the meta-search engine based on visual content and by selecting one of the result images from the previous search as the query input. In this process, we monitored the number of queries necessary to find each target image and the average recommendation precision of these queries. This experiment was repeated twice (two passes) to evaluate the improvement of the recommendation mechanism over experience (time). In other words, after finding all the target images, we tried to find them again following the same procedure. The recommendation precision is a measure of the accuracy of the search option recommendation. It is calculated as the number of relevant recommended search options over the total number of recommended search options with respect to a given query. We calculated the precision of all queries by querying a maximum of four search options simultaneously (top ranked search options). Search options with a positive user judgement (number of liked images less the number of disliked images) in a given query are considered to be relevant to the query. We collected data and compared three different systems: the current version of MetaSEEk (the category MetaSEEk), the previous version of the system [4] (the plain MetaSEEk without semantic categorization), and a base line meta-search engine (without performance monitoring and recommendation). The category MetaSEEk prototype includes all the advanced features mentioned in section 3. The plain MetaSEEk prototype only lacks semantic and clustering categorization. When the plain MetaSEEk receives a new query not encountered before, the dispatcher recommends target search engines based on the average performance score of the closest images in the database to the query image based on visual feature distance. The base line meta-seach engine does not use any past performance of the different search engines in selecting the target search engines. Upon receiving a query, it randomly selects a set of possible search options and queries them. When a query is submitted to the plain or the category MetaSEEk, the system only queries those search options that have provided the most desirable results in the past, according to the information in the performance database. Furthermore, the scores in the performance database are updated for every query using user feedback. For these two systems, the number of queries till an image is found is expected to decrease as the system accumulates more knowledge. However, in the base line system, where the performance database is not used to query search options, the number of queries is expected to change randomly. The variation of the precision is supposed to present a similar pattern with increasing values. The experimental results are reported in Figure 4, and Figure 5 for the three different systems. The search number indicates the target image that we were trying to locate. Each search number corresponds to the same image in all of the graphs. Table 2 shows the mean of the number of queries and the precision for the first and second pass of the experiment, and the percentages of searches that produce better results, lower number of queries or higher precision, from the first to the second pass. For these experiments, both, the plain MetaSEEk and the category MetaSEEk were trained with the same set of training images. The set of training images and the set of target images did not contain common images. The category MetaSEEk offers better overall performance than the other two systems. The number of queries till the desired image is found tends to decrease considerably as the knowledge of the system increases. The recommendation precision has a clearly increasing tendency with time. On the other hand, the plain MetaSEEk outperforms the base line system with a higher mean precision and a lower mean value of the number of queries for finding an image. The category MetaSEEk uses a more elaborated and precise method to relate new queries with old ones than the plain MetaSEEk, so it was expected to provide better results than the plain MetaSEEk.

12 The same experiments were conducted for a set of twelve target images of the semantic category Landscape. The experimental results obtained were very similar to the ones presented in the paper. Number of queries Precision 1 st pass mean 2 nd pass mean Improvement 1 st to 2 nd pass 1 st pass mean 2 nd pass mean Improvement 1 st to 2 nd pass Category MetaSEEk % % Plain MetaSEEk % % Base line meta-search engine % % Table 3: Summary of the experimental results for category Animals Number of queries First Pass Second Pass Search number Figure 4: a) Number of queries till target images are found for category MetaSEEk Number of queries First Pass Second Pass Search number

13 Figure 4: b) Number of queries till target images are found for plain MetaSEEk Number of queries First Pass Second Pass Search number Figure 4: c) Number of queries till target images are found for base line meta-search engine Precision Search number First Pass Second Pass Figure 5: a) Precision trend for category MetaSEEk.

14 Precision Search number First Pass Second Pass Figure 5: b) Precision trend for plain MetaSEEk. Precision Search number First Pass Second Pass Figure 5: c) Precision trend for base line meta-search engine. 5. FUTURE WORK The most interesting future research issues that we are considering for MetaSEEk are presented in this section.

15 5.1. User feedback and performance monitoring The experimental results have demonstrated that the performance of a meta-search engine can be improved if search engines and search options are intelligently selected by monitoring their performance using user feedback. Given the importance of user feedback and performance monitoring in the meta-search engine, we are thinking about more sophisticated approaches to implement these two stages. In any case, poor results are guaranteed in the system does not count with the collaboration of the users Customized search The MetaSEEk search engine can be further improved by adding capabilities such as a support for customized search. QBIC and VisualSEEk allow the user to customize the search by manually specifying visual sketches as query input. The customized search on these two systems is supported for color percentages and color layout: to allow the user to manually specify the amounts of different colors, or to specify different color locations, respectively. A customized search would require additional user interface programming and is not currently supported by MetaSEEk. The approach we have in mind will create an artificial image with the visual content the user specifies in the visual sketch. This image will be then submitted as input image to visual content queries on search engines that support URL search Visual features In the current version of MetaSEEk, the visual features for each image queried on the system are computed using color histogram and Tamura texture algorithms. These feature representations were selected for simplicity and execution efficiency rather than optimality. Furthermore, although almost a dozen representations have been proposed for texture feature alone, none has been accepted as the ultimate solution. Actually, this possibility would be very unlikely because of the human perception subjectivity of low level visual features such as color and texture. For this reason, we want to explore a system that uses a larger set of features and adapts it semi/automatically to user preferences. The new system would recommend not only search engines but also feature representations based, probably, on training examples. The K-means clustering algorithm is used to categorize the collected images in classes based on the color and the texture representation mentioned above. This clustering algorithm lacks scalability. When a new image must be added to the database, the algorithm should be applied to all the images to obtain a new database structure. Scalability is a very important property due to the rapid growth of databases. We try to minimize the inconvenience by waiting for ten new images to be added to the local database before computing a new database structure. Nevertheless, we are already considering a new categorization mechanism that would permit extending the database without changing its structure Addition of new databases New target search engines could be added to MetaSEEk manually for the present. We would like to automate this procedure in future versions of the search engine. We are beginning to investigate having a registering program where willing on-line search engines could register with the meta-search engine. The registering program could also help register not so cooperative search engines. In both cases, we believe that the interaction between a new target database and the meta-searcher should not begin blindly. Knowledge is very expensive. In subscribing, a search engine could provide statistical data, which reflects the content of its visual database. This data would not need to be resubmitted again because of the adapting capabilities of MetaSEEk. Non-cooperative search engines could be probed by a set of sample images. MetaSEEk s database accumulates the location and the feature vectors of the queried images. Incidentally, we are creating a potential visual information retrieval system in itself. We think that we should take advantage of this fact by considering our local database as a possible target search engine for incoming queries. The latest version of MetaSEEk also accumulates semantic information of the images with the user help. Being an academic project, MetaSEEk will not exploit this advantage over the rest of search engines that lack semantic level classification. This is, undoubtedly, an extremely attractive possibility to improve the performance of the metasearchers for commercial systems.

16 5.5. Local network of individual search engines MetaSEEk currently supports four on-line search engines: VisualSEEk, WebSEEk, QBIC and Virage. VisualSEEk and WebSEEk are academic visual information systems built at Columbia University; whereas QBIC and Virage are on-line demos of commercial systems. We neither know the design choices (feature extraction methods and clustering strategies, among others) nor have unrestricted access to the visual databases of the two latter systems. At this point, we feel that we need further control over the individual search engines to evaluate and compare the effectiveness of different approaches for some design choices we face. We are creating our own network of individual search engines to solve this problem. Our network of search engines will try to emulate the heterogeneity of the real-world search engines mentioned earlier. It will allow us to do more extensive experiments and to reach better conclusions about the suitability of different design options without inundating commercial VIR demos with our queries. Nonetheless, the most successful approaches will be integrated to the current version of MetaSEEk, which queries on-line search engines for images Learning algorithm Machine learning addresses the question of how to build computer programs that improve their performance at some task through experience. Machine learning algorithms have proven to be very useful in a variety of applications domains. A well-defined learning problem requires a well-specified task, performance metric, and source of training experience. MetaSEEK is a system designed to learn to recommend the most suitable search engines to incoming user queries. It might improve its performance as measured by its availability to retrieve better results in less amount of time, through experience obtained by user feedback. Clearly, MetaSEEk can be classified as a machine learning problem. At present, we are reviewing approaches in machine learning taken in related research areas. 6. SUMMARY The proliferation of text search engines on the Web has motivated the recent research in meta-search engines. In the same trend, impelled by the growing wealth of VIR systems on the Web, we have developed a prototype meta image search engine, MetaSEEk, to explore the issues involved in querying large, distributed, on-line visual information system sources. Our goal is to investigate novel techniques for enhancing interoperability of distributed VIR systems, rather than ranking the performance of individual systems. MetaSEEk uses performance scores to recommend target search engines and search options to send the content-based visual query to. These performance scores are constructed by accumulating their successes and failures in the past queries on the basis of user feedback. When a query is submitted to MetaSEEk, the most relevant search options and search engines are selected by weighing their performance scores to predict the ones that are likely to produce relevant results. The performance scores are also used to efficiently result image merging for minimal resource requirement and enhanced interoperability. If MetaSEEk receives a new query, not encountered before, the system will match it to the content of the database in order to obtain a list of the most similar clusters of past query images. Sample images from the few closest clusters will be then presented to the user. Considering the average performance scores of the cluster selected by the user, MetaSEEk is able to recommend suitable search options. The system also offers the possibility of narrowing down the search scope by selecting a category for the search. Categorization takes advantage of the specialization of the databases in special types of images to improve the recommendation mechanism. The performance scores, the feature clustering and the semantic category for all the images that have been queried on MetaSEEk are stored in the hierarchical structure of meta-search database. Before a non-trivial performance database can be built based on some training examples, an automatic offline probing procedure is carried out to establish some initial performance scores. The machine-generated performance scores are used in the training period of the system; whereas the performance scores based on user

17 feedback from the training period are used for later operation of MetaSEEk. This bootstraping procedure provides efficient and reliable training results to build the performance database on. Experimental results have demonstrated that the performance of a meta-search system can be greatly improved if search engines are intelligent integrated and selected by ranking their performance for different classes of user queries, and by using user feedback in the ranking refinement. 7. REFERENCES [1] Scott Banister. MetaSearch, [2] J. R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, R.C. Jain, and C. Shu, Virage image search engine: an open framework for image management, Symposium on Electronic Imaging: Science and Technology - Storage & Retrieval for Image and Video Databases IV, IS&T/SPIE 96, Feb [3] Mic Bowman, Peter B. Danzig, Udi Manber, Michael F. Schwartz, Darren R. Hardy, and Duane P. Wessels. Harvest: A scalable, customizable discovery and access system, Technical report, University of Colorado- Boulder, [4] Mandis Beigi, Ana B. Benitez, and Shih-Fu Chang, MetaSEEk: A Content-Based Meta-Search Engine for Images, Symposium on Electronic Imaging: Science and Technology - Storage & Retrieval for Image and Video Databases VI, IS&T/SPIE 98, Jan [5] Shih-Fu Chang, John R. Smith, Mandis Beigi, and Ana B. Benitez, Visual Information Retrieval from Large Distributed On-line Repositories, Communications of the ACM, Vol. 40, No. 12, pp ,Dec [6] William Cross. All-In-One Search Page, [7] C Net. Search.com, [8] Daniel Dreilinger, and Adele E. Howe, Experiences with Selecting Search Engines Using Meta-search, to appear in ACM Transactions of Information Systems, [9] Christos Faloutsos, Searching Multimedia Databases by Content, Kluwer Academic Publishers, [10] Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, and Peter Yanker, "Query by Image and Video Content: The QBIC System", IEEE Computer Magazine, Vol. 28, No. 9, pp , Sep [11] Aaron Flin. Dogpile and Metafind. [12] InterNIC Database Administration. InterNIC meta-search engines. [13] Luis Gravano, Hector Garcia-Molina, and Anthony Tomasic, The Effectiveness of Gloss for the Text Database Discovery Problems, Proceedings of the ACM, SIGMOD 94, Minneapolis, May 1994.

18 [14] B. Kahle, and A. Medlar, An Information System for Corporate Users: Wide Area Information Server, ConneXions The Interoperability Report, Vol. 5, No. 11, pp. 2-9, Nov [15] Martin Koster. Configurable Unified Search Engine, [16] Center for Research, Inc., University of Kansas. ProFusion meta-search engine. [17] Erik Selberg, and Oren Etsioni, Multi-service search and comparison using the MetaCrawler, Proceedings of the 4 th International World Wide Web Conference, Dec [18] John R. Smith, and Shih-Fu Chang, VisualSEEk: A Fully Automated Content-Based Image Query System, ACM Multimedia Conference, Boston, MA, Nov [19] H. Tamura, S. Mori, and T. Yamawaki, Textural Features Corresponding to Visual Perception, IEEE Trans. on Systems, Man, and Cybernetics, Vol. 8, No. 6, Jun [20] Shih-Fu Chang, Finding Images/Video from Large Distributed Information Services Columbia s Content- Based Visual Query Project [21] John R. Smith, and Shih-Fu Chang, An Image and Video Search Engine for the World-Wide Web, Symposium on Electronic Imaging: Science and Technology - Storage & Retrieval for Image and Video Databases V, IS&T/SPIE, San Jose, CA, Feb

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Bisecting K-Means for Clustering Web Log data

Bisecting K-Means for Clustering Web Log data Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Data Discovery on the Information Highway

Data Discovery on the Information Highway Data Discovery on the Information Highway Susan Gauch Introduction Information overload on the Web Many possible search engines Need intelligent help to select best information sources customize results

More information

Metasearch Engines. Synonyms Federated search engine

Metasearch Engines. Synonyms Federated search engine etasearch Engines WEIYI ENG Department of Computer Science, State University of New York at Binghamton, Binghamton, NY 13902, USA Synonyms Federated search engine Definition etasearch is to utilize multiple

More information

SYSTEM DEVELOPMENT AND IMPLEMENTATION

SYSTEM DEVELOPMENT AND IMPLEMENTATION CHAPTER 6 SYSTEM DEVELOPMENT AND IMPLEMENTATION 6.0 Introduction This chapter discusses about the development and implementation process of EPUM web-based system. The process is based on the system design

More information

Website Marketing Audit. Example, inc. Website Marketing Audit. For. Example, INC. Provided by

Website Marketing Audit. Example, inc. Website Marketing Audit. For. Example, INC. Provided by Website Marketing Audit For Example, INC Provided by State of your Website Strengths We found the website to be easy to navigate and does not contain any broken links. The structure of the website is clean

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet

More information

A World Wide Web Based Image Search Engine Using Text and Image Content Features

A World Wide Web Based Image Search Engine Using Text and Image Content Features A World Wide Web Based Image Search Engine Using Text and Image Content Features Bo Luo, Xiaogang Wang, and Xiaoou Tang Department of Information Engineering The Chinese University of Hong Kong ABSTRACT

More information

HELP DESK SYSTEMS. Using CaseBased Reasoning

HELP DESK SYSTEMS. Using CaseBased Reasoning HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind

More information

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

ISSN: 2348 9510. A Review: Image Retrieval Using Web Multimedia Mining

ISSN: 2348 9510. A Review: Image Retrieval Using Web Multimedia Mining A Review: Image Retrieval Using Web Multimedia Satish Bansal*, K K Yadav** *, **Assistant Professor Prestige Institute Of Management, Gwalior (MP), India Abstract Multimedia object include audio, video,

More information

Content-Based Image Retrieval

Content-Based Image Retrieval Content-Based Image Retrieval Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Image retrieval Searching a large database for images that match a query: What kind

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Enriched Links: A Framework For Improving Web Navigation Using Pop-Up Views

Enriched Links: A Framework For Improving Web Navigation Using Pop-Up Views Enriched Links: A Framework For Improving Web Navigation Using Pop-Up Views Gary Geisler Interaction Design Laboratory School of Information and Library Science University of North Carolina at Chapel Hill

More information

Unemployment Insurance Data Validation Operations Guide

Unemployment Insurance Data Validation Operations Guide Unemployment Insurance Data Validation Operations Guide ETA Operations Guide 411 U.S. Department of Labor Employment and Training Administration Office of Unemployment Insurance TABLE OF CONTENTS Chapter

More information

Bitrix Site Manager 4.1. User Guide

Bitrix Site Manager 4.1. User Guide Bitrix Site Manager 4.1 User Guide 2 Contents REGISTRATION AND AUTHORISATION...3 SITE SECTIONS...5 Creating a section...6 Changing the section properties...8 SITE PAGES...9 Creating a page...10 Editing

More information

M3039 MPEG 97/ January 1998

M3039 MPEG 97/ January 1998 INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039

More information

ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT

ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT ISSN 1392 124X INFORMATION TECHNOLOGY AND CONTROL, 2005, Vol.34, No.4 ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT Marijus Bernotas, Remigijus Laurutis, Asta Slotkienė Information

More information

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013 ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION, Fuel Consulting, LLC May 2013 DATA AND ANALYSIS INTERACTION Understanding the content, accuracy, source, and completeness of data is critical to the

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Chapter. Solve Performance Problems with FastSOA Patterns. The previous chapters described the FastSOA patterns at an architectural

Chapter. Solve Performance Problems with FastSOA Patterns. The previous chapters described the FastSOA patterns at an architectural Chapter 5 Solve Performance Problems with FastSOA Patterns The previous chapters described the FastSOA patterns at an architectural level. This chapter shows FastSOA mid-tier service and data caching architecture

More information

Sonatype CLM Server - Dashboard. Sonatype CLM Server - Dashboard

Sonatype CLM Server - Dashboard. Sonatype CLM Server - Dashboard Sonatype CLM Server - Dashboard i Sonatype CLM Server - Dashboard Sonatype CLM Server - Dashboard ii Contents 1 Introduction 1 2 Accessing the Dashboard 3 3 Viewing CLM Data in the Dashboard 4 3.1 Filters............................................

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

On-Demand Business Process Integration Based on Intelligent Web Services

On-Demand Business Process Integration Based on Intelligent Web Services 132 On-Demand Business Process Integration Based on Intelligent Web Services Xiaohua Lu 1, Yinsheng Li 1, Ying Huang 2 1 Software School, Fudan University, Shanghai, China Phone: +86-21-55664096-808, {0014010,

More information

White Paper April 2006

White Paper April 2006 White Paper April 2006 Table of Contents 1. Executive Summary...4 1.1 Scorecards...4 1.2 Alerts...4 1.3 Data Collection Agents...4 1.4 Self Tuning Caching System...4 2. Business Intelligence Model...5

More information

D. SamKnows Methodology 20 Each deployed Whitebox performs the following tests: Primary measure(s)

D. SamKnows Methodology 20 Each deployed Whitebox performs the following tests: Primary measure(s) v. Test Node Selection Having a geographically diverse set of test nodes would be of little use if the Whiteboxes running the test did not have a suitable mechanism to determine which node was the best

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

Deposit Identification Utility and Visualization Tool

Deposit Identification Utility and Visualization Tool Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in

More information

Search Engine Optimization with Jahia

Search Engine Optimization with Jahia Search Engine Optimization with Jahia Thomas Messerli 12 Octobre 2009 Copyright 2009 by Graduate Institute Table of Contents 1. Executive Summary...3 2. About Search Engine Optimization...4 3. Optimizing

More information

Electronic Logbook. Hedgehog-2. version 2.7. Quick Start. Revision 1.3 of 10/31/2012

Electronic Logbook. Hedgehog-2. version 2.7. Quick Start. Revision 1.3 of 10/31/2012 Electronic Logbook Hedgehog-2 version 2.7 Quick Start Revision 1.3 of 10/31/2012 Monitor Electric, 2012 2 CONTENTS Introduction 3 Entry List 4 Event Entry 4 Filtering and Sorting 6 Chaining 7 Attachments,

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

World-wide online monitoring interface of the ATLAS experiment

World-wide online monitoring interface of the ATLAS experiment World-wide online monitoring interface of the ATLAS experiment S. Kolos, E. Alexandrov, R. Hauser, M. Mineev and A. Salnikov Abstract The ATLAS[1] collaboration accounts for more than 3000 members located

More information

A Java Tool for Creating ISO/FGDC Geographic Metadata

A Java Tool for Creating ISO/FGDC Geographic Metadata F.J. Zarazaga-Soria, J. Lacasta, J. Nogueras-Iso, M. Pilar Torres, P.R. Muro-Medrano17 A Java Tool for Creating ISO/FGDC Geographic Metadata F. Javier Zarazaga-Soria, Javier Lacasta, Javier Nogueras-Iso,

More information

Colour Image Segmentation Technique for Screen Printing

Colour Image Segmentation Technique for Screen Printing 60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone

More information

2Creating Reports: Basic Techniques. Chapter

2Creating Reports: Basic Techniques. Chapter 2Chapter 2Creating Reports: Chapter Basic Techniques Just as you must first determine the appropriate connection type before accessing your data, you will also want to determine the report type best suited

More information

Make search become the internal function of Internet

Make search become the internal function of Internet Make search become the internal function of Internet Wang Liang 1, Guo Yi-Ping 2, Fang Ming 3 1, 3 (Department of Control Science and Control Engineer, Huazhong University of Science and Technology, WuHan,

More information

Personalization of Web Search With Protected Privacy

Personalization of Web Search With Protected Privacy Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Abstract 1. INTRODUCTION

Abstract 1. INTRODUCTION A Virtual Database Management System For The Internet Alberto Pan, Lucía Ardao, Manuel Álvarez, Juan Raposo and Ángel Viña University of A Coruña. Spain e-mail: {alberto,lucia,mad,jrs,avc}@gris.des.fi.udc.es

More information

Software Re-Engineering and Ux Improvement for ElegantJ BI Business Intelligence Suite

Software Re-Engineering and Ux Improvement for ElegantJ BI Business Intelligence Suite 2011 2012 2013 2014 Q1 Q2 Q3 Q4 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Sales Performance by Category 2014 Product

More information

Building a Database to Predict Customer Needs

Building a Database to Predict Customer Needs INFORMATION TECHNOLOGY TopicalNet, Inc (formerly Continuum Software, Inc.) Building a Database to Predict Customer Needs Since the early 1990s, organizations have used data warehouses and data-mining tools

More information

A Comparative Study of Database Design Tools

A Comparative Study of Database Design Tools A Comparative Study of Database Design Tools Embarcadero Technologies ER/Studio and Sybase PowerDesigner Usability Sciences Corporation 909 Hidden Ridge, Suite 575, Irving, Texas 75038 tel: 972-550-1599

More information

Content Filtering Client Policy & Reporting Administrator s Guide

Content Filtering Client Policy & Reporting Administrator s Guide Content Filtering Client Policy & Reporting Administrator s Guide Notes, Cautions, and Warnings NOTE: A NOTE indicates important information that helps you make better use of your system. CAUTION: A CAUTION

More information

Journal of Global Research in Computer Science RESEARCH SUPPORT SYSTEMS AS AN EFFECTIVE WEB BASED INFORMATION SYSTEM

Journal of Global Research in Computer Science RESEARCH SUPPORT SYSTEMS AS AN EFFECTIVE WEB BASED INFORMATION SYSTEM Volume 2, No. 5, May 2011 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info RESEARCH SUPPORT SYSTEMS AS AN EFFECTIVE WEB BASED INFORMATION SYSTEM Sheilini

More information

ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES

ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES International Journal of Computer Engineering & Technology (IJCET) Volume 7, Issue 2, March-April 2016, pp. 24 29, Article ID: IJCET_07_02_003 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=7&itype=2

More information

Search engine ranking

Search engine ranking Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 2. pp. 417 422. Search engine ranking Mária Princz Faculty of Technical Engineering, University

More information

Concepts of digital forensics

Concepts of digital forensics Chapter 3 Concepts of digital forensics Digital forensics is a branch of forensic science concerned with the use of digital information (produced, stored and transmitted by computers) as source of evidence

More information

IFS-8000 V2.0 INFORMATION FUSION SYSTEM

IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 Overview IFS-8000 v2.0 is a flexible, scalable and modular IT system to support the processes of aggregation of information from intercepts to intelligence

More information

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications IBM Software Information Management Scaling strategies for mission-critical discovery and navigation applications Scaling strategies for mission-critical discovery and navigation applications Contents

More information

Web Application Hosting Cloud Architecture

Web Application Hosting Cloud Architecture Web Application Hosting Cloud Architecture Executive Overview This paper describes vendor neutral best practices for hosting web applications using cloud computing. The architectural elements described

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Collaboration Technology Support Center Microsoft Collaboration Brief

Collaboration Technology Support Center Microsoft Collaboration Brief Collaboration Technology Support Center Microsoft Collaboration Brief September 2005 HOW TO INTEGRATE MICROSOFT EXCHANGE SERVER INTO SAP ENTERPRISE PORTAL Authors Robert Draken, Solution Architect, Comma

More information

CA Nimsoft Monitor. Probe Guide for Active Directory Response. ad_response v1.6 series

CA Nimsoft Monitor. Probe Guide for Active Directory Response. ad_response v1.6 series CA Nimsoft Monitor Probe Guide for Active Directory Response ad_response v1.6 series Legal Notices This online help system (the "System") is for your informational purposes only and is subject to change

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the

More information

Multi-agent System for Web Advertising

Multi-agent System for Web Advertising Multi-agent System for Web Advertising Przemysław Kazienko 1 1 Wrocław University of Technology, Institute of Applied Informatics, Wybrzee S. Wyspiaskiego 27, 50-370 Wrocław, Poland kazienko@pwr.wroc.pl

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Authoring Guide for Perception Version 3

Authoring Guide for Perception Version 3 Authoring Guide for Version 3.1, October 2001 Information in this document is subject to change without notice. Companies, names, and data used in examples herein are fictitious unless otherwise noted.

More information

Natural Language Querying for Content Based Image Retrieval System

Natural Language Querying for Content Based Image Retrieval System Natural Language Querying for Content Based Image Retrieval System Sreena P. H. 1, David Solomon George 2 M.Tech Student, Department of ECE, Rajiv Gandhi Institute of Technology, Kottayam, India 1, Asst.

More information

American Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access

More information

Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis

Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis Derek Foo 1, Jin Guo 2 and Ying Zou 1 Department of Electrical and Computer Engineering 1 School of Computing 2 Queen

More information

Fig (1) (a) Server-side scripting with PHP. (b) Client-side scripting with JavaScript.

Fig (1) (a) Server-side scripting with PHP. (b) Client-side scripting with JavaScript. Client-Side Dynamic Web Page Generation CGI, PHP, JSP, and ASP scripts solve the problem of handling forms and interactions with databases on the server. They can all accept incoming information from forms,

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Pivot Charting in SharePoint with Nevron Chart for SharePoint

Pivot Charting in SharePoint with Nevron Chart for SharePoint Pivot Charting in SharePoint Page 1 of 10 Pivot Charting in SharePoint with Nevron Chart for SharePoint The need for Pivot Charting in SharePoint... 1 Pivot Data Analysis... 2 Functional Division of Pivot

More information

A COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS

A COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS A COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS Caldas, Carlos H. 1 and Soibelman, L. 2 ABSTRACT Information is an important element of project delivery processes.

More information

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable

More information

Features. Emerson Solutions for Abnormal Situations

Features. Emerson Solutions for Abnormal Situations Features Comprehensive solutions for prevention, awareness, response, and analysis of abnormal situations Early detection of potential process and equipment problems Predictive intelligence network to

More information

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS. PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

More information

REVIEW PAPER ON PERFORMANCE OF RESTFUL WEB SERVICES

REVIEW PAPER ON PERFORMANCE OF RESTFUL WEB SERVICES REVIEW PAPER ON PERFORMANCE OF RESTFUL WEB SERVICES Miss.Monali K.Narse 1,Chaitali S.Suratkar 2, Isha M.Shirbhate 3 1 B.E, I.T, JDIET, Yavatmal, Maharashtra, India, monalinarse9990@gmail.com 2 Assistant

More information

Cybersecurity Analytics for a Smarter Planet

Cybersecurity Analytics for a Smarter Planet IBM Institute for Advanced Security December 2010 White Paper Cybersecurity Analytics for a Smarter Planet Enabling complex analytics with ultra-low latencies on cybersecurity data in motion 2 Cybersecurity

More information

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING Ahmet Selman BOZKIR Hacettepe University Computer Engineering Department, Ankara, Turkey selman@cs.hacettepe.edu.tr Ebru Akcapinar

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Migrating to vcloud Automation Center 6.1

Migrating to vcloud Automation Center 6.1 Migrating to vcloud Automation Center 6.1 vcloud Automation Center 6.1 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a

More information

Novell ZENworks 10 Configuration Management SP3

Novell ZENworks 10 Configuration Management SP3 AUTHORIZED DOCUMENTATION Software Distribution Reference Novell ZENworks 10 Configuration Management SP3 10.3 November 17, 2011 www.novell.com Legal Notices Novell, Inc., makes no representations or warranties

More information

HP IMC User Behavior Auditor

HP IMC User Behavior Auditor HP IMC User Behavior Auditor Administrator Guide Abstract This guide describes the User Behavior Auditor (UBA), an add-on service module of the HP Intelligent Management Center. UBA is designed for IMC

More information

Bandwidth Aggregation, Teaming and Bonding

Bandwidth Aggregation, Teaming and Bonding Bandwidth Aggregation, Teaming and Bonding The increased use of Internet sharing combined with graphically rich web sites and multimedia applications have created a virtually insatiable demand for Internet

More information

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches Concepts of Database Management Seventh Edition Chapter 9 Database Management Approaches Objectives Describe distributed database management systems (DDBMSs) Discuss client/server systems Examine the ways

More information

8 Email Strategies for 2008

8 Email Strategies for 2008 TM 8 Strategies for 2008 www.subscribermail.com This report is provided to you courtesy of SubscriberMail, an award-winning provider of email marketing services and technology that enable organizations

More information

A CLIENT-ORIENTATED DYNAMIC WEB SERVER. Cristina Hava Muntean, Jennifer McManis, John Murphy 1 and Liam Murphy 2. Abstract

A CLIENT-ORIENTATED DYNAMIC WEB SERVER. Cristina Hava Muntean, Jennifer McManis, John Murphy 1 and Liam Murphy 2. Abstract A CLIENT-ORIENTATED DYNAMIC WEB SERVER Cristina Hava Muntean, Jennifer McManis, John Murphy 1 and Liam Murphy 2 Abstract The cost of computer systems has decreased continuously in recent years, leading

More information

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities April, 2013 gaddsoftware.com Table of content 1. Introduction... 3 2. Vendor briefings questions and answers... 3 2.1.

More information

The PageRank Citation Ranking: Bring Order to the Web

The PageRank Citation Ranking: Bring Order to the Web The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized

More information

Federation and a CMDB

Federation and a CMDB BEST PRACTICES WHITE PAPER Client Solutions BSM e: bsm@clients.ie t: 01 620 4000 w: www.clients.ie/bsm Federation and a CMDB Table of Contents EXECUTIVE SUMMARY...1 WHAT IS FEDERATION?...2 Federation and

More information

Public Online Data - The Importance of Colorful Query Trainers

Public Online Data - The Importance of Colorful Query Trainers BROWSING LARGE ONLINE DATA WITH QUERY PREVIEWS Egemen Tanin * egemen@cs.umd.edu Catherine Plaisant plaisant@cs.umd.edu Ben Shneiderman * ben@cs.umd.edu Human-Computer Interaction Laboratory and Department

More information

Overall. Accessibility. Content. Marketing. Technology. Report for help.sitebeam.net. Page 1. 133 pages tested on 29th October 2012

Overall. Accessibility. Content. Marketing. Technology. Report for help.sitebeam.net. Page 1. 133 pages tested on 29th October 2012 Overall The overall score for this website. 133 pages tested on 29th October 2012 Accessibility How accessible the website is to mobile and disabled users. Content The quality of the content of this website.

More information

BUSINESS RULES AND GAP ANALYSIS

BUSINESS RULES AND GAP ANALYSIS Leading the Evolution WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Discovery and management of business rules avoids business disruptions WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Business Situation More

More information

ORACLE USER PRODUCTIVITY KIT USAGE TRACKING ADMINISTRATION & REPORTING RELEASE 3.6 PART NO. E17087-01

ORACLE USER PRODUCTIVITY KIT USAGE TRACKING ADMINISTRATION & REPORTING RELEASE 3.6 PART NO. E17087-01 ORACLE USER PRODUCTIVITY KIT USAGE TRACKING ADMINISTRATION & REPORTING RELEASE 3.6 PART NO. E17087-01 FEBRUARY 2010 COPYRIGHT Copyright 1998, 2009, Oracle and/or its affiliates. All rights reserved. Part

More information

www.dotnetsparkles.wordpress.com

www.dotnetsparkles.wordpress.com Database Design Considerations Designing a database requires an understanding of both the business functions you want to model and the database concepts and features used to represent those business functions.

More information

THE BCS PROFESSIONAL EXAMINATION Professional Graduate Diploma. April 2001 EXAMINERS REPORT. User Interface Design

THE BCS PROFESSIONAL EXAMINATION Professional Graduate Diploma. April 2001 EXAMINERS REPORT. User Interface Design THE BCS PROFESSIONAL EXAMINATION Professional Graduate Diploma April 2001 EXAMINERS REPORT User Interface Design Candidates have continued to perform well on this paper, demonstrating knowledge of a range

More information

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that

More information

Rotorcraft Health Management System (RHMS)

Rotorcraft Health Management System (RHMS) AIAC-11 Eleventh Australian International Aerospace Congress Rotorcraft Health Management System (RHMS) Robab Safa-Bakhsh 1, Dmitry Cherkassky 2 1 The Boeing Company, Phantom Works Philadelphia Center

More information

OpenIMS 4.2. Document Management Server. User manual

OpenIMS 4.2. Document Management Server. User manual OpenIMS 4.2 Document Management Server User manual OpenSesame ICT BV Index 1 INTRODUCTION...4 1.1 Client specifications...4 2 INTRODUCTION OPENIMS DMS...5 2.1 Login...5 2.2 Language choice...5 3 OPENIMS

More information

Cloud Storage-based Intelligent Document Archiving for the Management of Big Data

Cloud Storage-based Intelligent Document Archiving for the Management of Big Data Cloud Storage-based Intelligent Document Archiving for the Management of Big Data Keedong Yoo Dept. of Management Information Systems Dankook University Cheonan, Republic of Korea Abstract : The cloud

More information

1 How to Monitor Performance

1 How to Monitor Performance 1 How to Monitor Performance Contents 1.1. Introduction... 1 1.1.1. Purpose of this How To... 1 1.1.2. Target Audience... 1 1.2. Performance - some theory... 1 1.3. Performance - basic rules... 3 1.4.

More information

The TEK System: Browsing the Web in Low- Connectivity Communities

The TEK System: Browsing the Web in Low- Connectivity Communities The System: Browsing the Web in Low- Connectivity Communities Bill Thies, Libby Levison, Saman Amarasinghe MIT Laboratory for Computer Science http://cag.lcs.mit.edu/tek Web Browsing: Current Method Web

More information

Advanced Meta-search of News in the Web

Advanced Meta-search of News in the Web Advanced Meta-search of News in the Web Rubén Tous, Jaime Delgado Universitat Pompeu Fabra (UPF), Departament de Tecnologia, Pg. Circumval lació, 8. E-08003 Barcelona, Spain {ruben.tous, Jaime.delgado}@tecn.upf.es

More information

Web Browsing Quality of Experience Score

Web Browsing Quality of Experience Score Web Browsing Quality of Experience Score A Sandvine Technology Showcase Contents Executive Summary... 1 Introduction to Web QoE... 2 Sandvine s Web Browsing QoE Metric... 3 Maintaining a Web Page Library...

More information