Performance Analysis of a Query by Example Image Search Method in Peer to Peer Overlays

Transcription

1 AGH University of Science and Technology Faculty of Electrical Engineering, Automatics, Computer Science and Electronics Ph.D. Thesis Michał Grega Performance Analysis of a Query by Example Image Search Method in Peer to Peer Overlays Supervisor: Prof. dr hab. inż. Zdzisław Papir

2 AGH University of Science and Technology Faculty of Electrical Engineering, Automatics, Computer Science and Electronics Department of Telecommunications Al. Mickiewicza 30, Kraków, Poland tel fax Copyright c Michał Grega, 2011 All rights reserved Printed in Poland

3 Acknowledgements It is a pleasure to thank the many people who made this thesis possible. First of all I would like to show my gratitude to my supervisor, Prof. Zdzisław Papir. I am grateful for his guidance, his valuable remarks and continuous support. Without the kind and warm atmosphere of knowledge exchange and support he created in our research team it would be much harder, if not impossible for me to finish this work. I am grateful to Dr. -Ing. Nicolas Liebau and his team from the Technische Universität Darmstadt, who advised and supported me during and after the CONTENT project. I would also like to wholeheartedly thank my colleagues, Mikołaj Leszczuk and Lucjan Janowski, who devoted their knowledge and time to discuss my work. Special credit goes also to my friends from our room 309, Kasia Kosek-Szott, Szymon Szott and Piotrek Romaniak. We went together through the PhD course supporting and helping each other not only in our professional career but also on the private side of our lives. My warm thoughts go to my parents and their enormous effort, which brought me to this point. Last, but not least, I would like to thank my wife, Kate, for her patience and loving words of support, which allowed me to advance in times of doubt. Thank you!

4

5 Abstract Nowadays the traditional division between content producers and consumers is becoming blurry. A new type of user emerges, the prosumer, who is at the same time the producer and the consumer of multimedia content. This is the direct result of easy access to content creation tools, such as digital cameras and camcoders, the popularity of content manipulation tools, such as open source image editing tools and open access to multimedia hosting services, such as Google Picasa, YouTube, Flickr and many others. This mass production and availability of content creates a new challenge. It is not enough to create content and make it available in the Internet. It is necessery to allow the content to be easily searched for and accessed by other users. We are used to searching for items in the network with sets of keywords. Users, however, have no incentive to tag each of their movies and photographs with keywords. This is where advanced multimedia search methods, such as Query by Example can be successfully utilized. Another emerging problem is the economical effort required to set up a new multimedia service. The volume of the multimedia data, requirements on storage space, computation power and bandwidth allow only the largest market vendors to easily introduce new multimedia services. An answer to this problem is the departure from the traditional client server architecture to a Peer-to-Peer network, in which network upkeep costs are shared among the network users. This dissertation presents the work in the cross-domain area of Peer-to-Peer networking and advanced multimedia search methods. The author identifies and solves the problems encountered during the application of the Query by Example search technique in both structured and unstructured Peer-to-Peer overlays. The author proves that implementation of Query by Example service in Peerto-Peer overlays is possible while maintaining the quality offered by centralized solution and the benefits of Peer-to-Peer overlays at the same time.

6

7 Contents Acknowledgements Abstract iii v 1 Introduction Motivation and Goal of the Research The Concept of the P2P QbE System Major Research Problems Research Approach Thesis Cooperation and Publications Structure of the Dissertation State of the Art P2P Architectures Client-Server Architecture Centralised P2P Unstructured P2P Hybrid P2P Structured P2P Summary of P2P Architectures The Metadata Management The Metadata Systems Dublin Core The Metadata Systems MPEG Search Methods and Architectures of Metadata Classification Based on Input Method Classification of Search Methods Based on Complexity Advanced Search Methods and Users Summary on Classification of Search Methods The Architectures of Metadata Search Benchmarking

8 viii CONTENTS The Benchmarked Parameters Search Accuracy Benchmarking Simulation Tools Related Work Conclusions Problem Approach Image Database Measurement of QbE Accuracy in Local Database Measurement Methodology QbE Methods Description of the Experiment Experiment Execution and Results Experiment Conclusions Application of QbE in an Unstructured P2P Overlay Implementation of the QbE Service File Distribution and Popularity Simulation Setup Analysys of the Results Conclusions Application of QbE in a Structured P2P Overlay Routing in CAN Overlay Implementation of the QbE Service in the CAN Overlay Simulation Setup Results Conclusions Comparision of QbE in Structured and Unstructured P2P Overlays Research Conclusions Summary 63

9 List of Abbreviations ANSI CAN DCEMS DCMI DHT DS ITU MP MPEG NISO OCR P2P PSTN QbE QoE SATIN TTL VoIP XM American National Standards Institute Content Addressable Network Dublin Core Metadata Element Set Dublin Core Metadata Initiative Distributed Hash Table Description Schemes International Telecommunications Union Mega Pixels Moving Picture Experts Group National Information Standards Organisation Optical Character Recognition Peer-to-Peer Public Switched Telephone Networks Query by Example Quality of Experience European Doctoral School on Advanced Topics In Networking Time To Live Voice over IP MPEG-7 Experimentation Model

10

11 Chapter 1 Introduction This introductory Chapter presents the goal of the research, the thesis of the dissertation and the methodological approach used to solve the problem stated in the thesis. A discussion of the relevant author s publications and a presentation of the overall dissertation structure is provided as well. 1.1 Motivation and Goal of the Research The goal of the presented research is to analyse the possibility of developing a content-based Query by Example (QbE) search system for a Peer-to-Peer (P2P) overlay. System performance will be assessed in terms of search accuracy and badwitdth utilization. A Peer-to-Peer system is a self organising system consisting of end-systems (called peers ) which form an overlay network over the existing network architecture. The structure of the P2P network can significantly differ from the structure of the underlying physical network (Figure 1.1). Peers offer and consume services and resources. They are also characterised by a significant amount of autonomy. Services are exchanged between any participating peers. Such networks are gaining more and more popularity and attention both from users and researchers. On one hand this growing interest can be explained by numerous P2P based applications, ranging from simple file sharing to more sophisticated services such as Voice over IP (VoIP) and online gaming. On the other hand, P2P networking is a challenging topic for researchers because of its distributed architecture, networked cooperation of the peers and lack of central authority (in most of P2P network architectures). This research has been focused on one of the most popular applications of P2P which is file sharing. File sharing in P2P networks became significantly popular since A study performed in 2004 by the CacheLogic company gave a

12 4 Introduction Figure 1.1: The concept of the overlay P2P network over an existing physical network conclusion that Traffic analysis conducted as a part of an European Tier 1 Service Provider field trail has shown that P2P traffic volumes are at least double that of http during the peak evening periods and as much as tenfold at other times [59]. Figure 1.2 shows that in 2006 P2P services in Europe were responsible for 70% of the overall global traffic. A more recent study [61] performed by the Canadian company Sandvine shows that P2P traffic is responsible for 35% of the downstream broadband traffic and over 75% of the upstream traffic in North America. A study performed by Cisco company in 2010 [2] shows, that the share of the network traffic caused by P2P applications is being surpassed by traffic caused by video streaming (YouTube like services), but still is and will be the second largest source of traffic in the networks. While the numbers vary between years and studies due to different test regions and methodologies the bottom line remains the same P2P is responsible for most of the current global network traffic. Rapid global growth in multimedia data has created new, previously unknown business opportunities. Data search itself has grown into a business sector which has allowed companies such as Google and Yahoo to become brands recognised worldwide. Multimedia is now rapidly moving into every-day life. Users are not only able to access media via radio, television and the Internet but now also create their own media content. Digital cameras, handheld camcoders, media enabled mobile

13 1.1 Motivation and Goal of the Research 5 Figure 1.2: The percentage of Internet traffic generated by the P2P applications [59] phones and the wide spread availability of content creation and editing software which were previously available only to professionals, all contribute to this data volume growth. The traditional division of users has to be extended by a new category the prosumers who are, at the at the same time the producers and consumers of multimedia. Data is also being shared within user communities via Web 2.0 services such as YouTube, Google Video and Flickr and also used in media industries. Usergenerated content is of growing importance to news agencies, as it is often the fastest way to come up with breaking news. Decentralised architectures are becoming more significant in sharing user created content. This growth of the amount of the multimedia data causes a new problem of effective search to emerge. Usually, published media are rarely accompanied by a comprehensive textual annotation. Most users have no incentive for adding even a few simple few-keyword metadata for their multimedia object. The usual keyword based search becomes ineffective and not possible in many unannotated repositories. This is the place where Query by Example (QbE) search may become useful as it requires no textual description of the multimedia object to work.

14 6 Introduction Another research area of growing importance is the Quality of Experience (QoE) which is a measure of a subjective quality experienced by the user. In the early days of the Internet the satisfaction of users was of negligible importance, although the complex problem of delivering high-quality service to the end-user was well known from the experience with Public Switched Telephone Networks (PSTN). Nowadays, when the user can choose between different service providers the service with the highest QoE level is the most successful one. In the presented research the proposed QbE solution will be assessed by QoE measurements. This research has been devoted to one type of multimedia content which are still images. Nevertheless, the research approach and the conclusions drawn from the presented results form a firm foundation for research on advanced search methods for other types of multimedia content such as audio, video and 3D objects. 1.2 The Concept of the P2P QbE System The basic concept of the P2P QBE system is to support content-based QbE retrieval of images stored in a distributed manner in a P2P file sharing network. A use case of such a system is presented in Figure 1.3. The mechanics of the generic QbE system are as follows: the user provides an example of a searched item to the system in order to retrieve similar objects from the repository. The system calculates, on the client side, the features which describe the example. The features extracted from the example are called low-level metadata. The concept and definition of metadata is provided in Chapter 2.2. Low-level metadata is a numeric vector which represents signal properties of the image. It can be, for example, a dominant colour of the image or an edge histogram. These low-level features can be easily sent over the network, as usually their size in bytes is much smaller than the size of the original media. The features of different media objects can be numerically compared in order to calculate the similarity between the media items. Figure 1.4 depicts the example of a QbE query. In this case a dominant colour descriptor, from the MPEG-7 descriptor set (Chapter 2.2.2) was used to retrieve similar images from a local database. As can be observed, all returned images have a similar dominant colour tone as the provided example, but the content is different. In the distributed scenario (as depicted in Figure 1.3) the low-level metadata is routed trough the P2P overlay to queried nodes. This is the significant difference between the solution proposed in the dissertation and searching within a local database of images. The queried side is a set of the P2P nodes selected by the routing algorithm of the P2P overlay. The

15 1.3 Major Research Problems 7 The user provides an example Calculation of the MPEG-7 descriptor values of images possesed Propagate the features of the example in the overlay Calculation of MPEG-7 descriptor values of the example Queried Node Search for similar images in possesion Calculate the distance of the example from the images in possession P2P Overlay Querying Node Propagate the distances and thumbnails to the querying node Gather the distances/ thumbnails and create a ranking of the results Based on the ranking and thumbnails the user chooses a file to download Figure 1.3: The use case of the QbE search system queried side stores the pre-calculated low-level metadata of the possessed media. Upon receiving the low-level metadata of the example the queried side calculates the distances of the metadata of the example from the stored metadata (under a predefined metric). The queried side returns the list of distances and thumbnails for the most similar objects to the querying side. 1.3 Major Research Problems The presented research integrates two recognised techniques the QbE method of search and the P2P content delivery method. This is the primary source of new research challenges. The QbE search process, when implemented in a P2P overlay, will require more bandwidth than the traditional keyword query. The extracted features of media items are stored and transmitted in the form of a vector of numbers. While the

16 8 Introduction Figure 1.4: An example of the QbE search principle volume of this vector is small while compared to the size of the media file it is still significantly larger than a few-byte keyword. This may cause the network to become congested with queries, especially in case of unstructured overlays. Another source of network congestion are the thumbnails of media files. In case of images the thumbnails are the images in a very low resolution. They allow the user to decide whether the resulting image fulfills their search criteria. It is critical to choose the proper overlay architecture for the search service. Different architectures have different properties and not each architecture may be able to provide the QbE service. The main difference between the QbE retrieval system for a local media database and the QbE search system in a P2P network is that in case of the local database search all media in the database are available during the search process. In case of the P2P search the available set of media is only a subset of all media existing in the network. Another effect is caused by the dynamic nature of the network the participants join and leave the P2P network during its operation. This property of a P2P network is called churn [67]. As churn adds another level of complexity to the presented research problems and it is not the most significant factor influencing the performance of the search service it was decided to analyse the performance of the QbE application in P2P overlay in the absence of churn. Content-based search is resource-consuming as it requires the analysis of the image itself. The extraction of the descriptor values can be done only once for each

17 1.3 Major Research Problems 9 image and stored for later use, but this can be a problem for large databases. The comparison of the descriptors requires large computing power as the descriptors are typically one dimensional vectors of as much as 200 values. A single comparison can be done almost instantly, but the problem is not scalable in case of large local repositories. Here, the great advantage of the distributed computing power of the P2P overlays can be utilised. A single node is unlikely to be in possession of more than several thousand images. Search in such a small repository can be performed very quickly and all the nodes can do it in parallel. Concluding, the following theoretical and practical problems arise and will be solved in order to introduce the QbE search technique into the P2P architecture. 1. It should be determined which metadata frameworks and QbE techniques are available and which are suitable for use in P2P overlays. 2. It should be recognised which P2P overlays are the best candidates for the introduction of such a service. 3. It should be decided how to compare the performance of the QbE search method in both centralised and P2P environments. Also, a sufficient number of practical experiments has to be conducted in order to analyse the performance of the QbE search in centralised and P2P environments. A comparison of the performance in different environments has to be conducted. The Search time also has to be taken into account. It is very difficult to assess the search time in the simulation environment. Even if such an assessment is made it will lack in precision. This is due to the nature of the simulation environment which simplifies the protocol stack and the behaviour of peers. The protocol stack implemented within the peers in the simulator is simplified when compared to a real implementation in order to allow for resource efficient simulations. The peers are implemented in a uniform way in the simulator, which means that each peer has similar resources in its disposal. This is very different from the real networking environment in which peers differ substantially between each other in terms of available resources and the way these resources are utilised. Also the network links, unlike in the simulator, differ between each other in terms of available bandwidth and latency. In the authors belief the only proper method of accurate search time assessment in this case is implementation and measurement in a real network environment. Such a task is out of the scope of the presented research and is taken into account as one of the possible future research directions.

18 10 Introduction 1.4 Research Approach The proposed methodological approach is depicted in Figure 1.5. In order to reach the research goal of designing the QbE search system for the P2P overlay seven research tasks need to be accomplished. 2. Development of measurement methodology for QbE performance 1. State of the Art analysis - P2P overlays - QbE techniques 3. QbE performance analysis in local database. Selection of QbE method for further research 4. Application of QbE in structured P2P overlay 5. Application of QbE in unstructured P2P 6. Performance analysis 7. Comparison and conclusions Figure 1.5: The research approach 1. The first research task is a detailed analysis of the existing state of the art of both P2P network architectures and QbE search. In the topic of the P2P networks the existing architectures (called overlays) will be analysed in order to specify the requirements for the QbE search system. The analysis of multimedia search methods will allow to get acquainted with the recent achievements in this research area. Special attention will be put on the analysis of the methods of Query by Example of multimedia in local databases. It is foreseen to adapt the methods used in such databases for the need of the designed system. In order to create a system based on extraction of metadata it is required to analyse the standards of metadata extraction, comparison and storage. Also special attention will be put on the analysis of existing, similar systems. The state of the art in both P2P systems and QbE systems are presented in Chapters 2.1, 2.2 and The second research task is to develop a method of measurement of the performance of a distributed QbE search service. Both objective and subjective (QoE) metrics will be taken into account. The main objective measures of search performance are the accuracy and search time. These aspects have been chosen during the preliminary research as the most relevant. The existing methods of analysis of the quality of search are dedicated to the multimedia databases. According to the current state of the art there is no popular,

19 1.4 Research Approach 11 versatile and widely used measurement environment for search services in the P2P networks. The methods of the subjective quality assessment for video and audio multimedia services are well-defined in the International Telecommunications Union (ITU) recommendations. For measurement of the QoE of other multimedia services, including search, there are no standards nor recommendations. A set of proposed measurement methods is described in Chapter There are many techniques used for image QbE search. Some of these techniques are defined within the MPEG-7 standard [37] and some, such as the PictureFinder software, are proprietary solutions developed by companies and research institutions. Those methods yield different results and offer different quality of search results. For the purpose of the research of the QbE technique in the P2P environment it is required to identify the method which offers the highest QoE to the service users and is applicable in scope of the presented research. To identify such a method a set of QbE methods will be implemented in a local database of images. The performance of these methods will be assessed and the best QbE method will be selected for implementation in the P2P overlay. The measurements of performance in the centralised database will also serve as a reference for comparison of search performance between centralised and distributed QbE systems. This problem is addressed in Chapter It is planned to implement the QbE method selected in Task 3 in two architectures of P2P overlays structured and unstructured overlays. This will allow for comparison of the performance of the service in these two, significantly different, P2P architectures. The unstructured overlays are of simple construction. No complex query routing algorithms are implemented and the search process is often based on flooding the network with search queries. Such networks are characterised by low search overhead at the cost of the lower performance. The experiment and its results are covered in Chapter Structured overlays are characterised by a more complex construction and implemented routing routines. Thanks to that higher performance is delivered. Most of the structured P2P overlays are based on the Distributed Hash Tables (DHT). An example of a structured P2P overlay is CAN (Content Addressable Network). For the research purpose the implementation of the QbE service will be done in the simulation environment. Simulation of the P2P overlays is the only feasible way to research these network architectures. As scalability is of utmost importance it is impossible to create a P2P overlay in a laboratory conditions. Even such structures as PlanetLab offers at most

20 12 Introduction approximately thousand of nodes, whereas real P2P overlays are composed often of millions of nodes. The simulation environment will be chosen after the analysis of the existing P2P simulation tools. The experiment and results are described in Chapter During the sixth stage of the research the performance of the QbE service implementations in the structured P2P overlay and unstructured P2P overlay will be analysed. It is foreseen to analyse the performance of the QbE service in terms of accuracy when compared to the centralized solution. Bandwidth consumption will also be covered. 7. The last stage of the research is to compare the performance of the QbE search service in the local image database (Task 3) and P2P overlays (Tasks 4 and 5). The comparison of the performance of a local and distributed QbE services will allow to prove the thesis of the dissertation. 1.5 Thesis The thesis of the dissertation is as follows: It is possible to use a Query by Example mechanism for a search of images based on their low-level description in the unstructured and structured P2P overlays at accuracy comparable to the centralized solutions. 1.6 Cooperation and Publications The research presented in this dissertation was partially funded by the CON- TENT (Content Networks and Services for Home Users) Network of Excellence (No ). Fruitful cooperation within the CONTENT NoE has resulted in the authors participation in the European Doctoral School on Advanced Topics In Networking (SATIN). The results of the research where presented and discussed during the SATIN meetings both with fellow PhD students as well as with senior researchers from European universities. The PhD thesis and research was supported by Nicolas Liebau, PhD Eng. from the Technical University Darmstadt, Germany. Furthermore, the PhD research and preparation of the thesis was funded by a national PhD grant (no. N N ). The subjective tests of the MPEG-7 descriptors were performed for the need of the econtentplus project GAMA.

21 1.6 Cooperation and Publications 13 During the research of the topics described in this dissertation the author has prepared several publications, which are listed and commented below, in their chronological order: 1. Implementation and Application of MPEG-7 Descriptors in Peer-to-Peer Networks for Search Quality Improvement - Introduction to Research [17] was presented during the CONTENT PhD workshop in Madrid, Spain. The paper presents the general idea of the system as well as the basic methodological approach to the problem. 2. Content-based Search for Peer-to-Peer Overlays [22] was presented during the PhD workshop of the MEDHOCNET conference in Corfu, Greece. It presented a detailed description of the methodology as well as a detailed approach to the problem of benchmarking and measurement of a distributed search service. 3. Quality of experience evaluation for multimedia services [21] was presented at a plenary session of the KKRRiT 2008 conference in Wrocław, Poland. The paper described the differences between objective and subjective evaluation methods of multimedia services. 4. Benchmarking of Media Search based on Peeer-to-Peer Overlay Networks [25] was presented during the INFOSCALE 2008 conference workshop in Vico Equense, Italy. The publication presents the outcomes of the P2P benchmarking activity which was carried out in the CONTENT NoE. 5. Advanced Multimedia Search in P2P Overlays [18] was presented during the Students Workshop during the IEEE INFOCOM 2009 conference in Rio de Janeiro, Brazil. The paper presents the basic concept of the P2P QbE system along with a brief summary of the results of the user test, which was performed in order to select the optimal QbE method for further research. 6. Ground-Truth-Less Comparision of Selected Content-Based Image Retrieval Measures [20] was presented during the UCMEDIA 2009 conference in Venice, Italy. The paper presents the comparison of different QbE methods assessed in a series of subjective experiments. The results obtained in these experiments are presented in Chapter Wyszukiwanie przez podanie przykładu w sieci nakładkowej protokołu Gnutella [19] was presented during the KKRRiT 2011 conference in Poznań, Poland. The paper was awarded the second best young author paper award. The paper presents the results of implementation of the QbE service in the Gnutella P2P overlay. The results are presented in Chapter 3.3.

22 14 Introduction 1.7 Structure of the Dissertation Chapter 2 presents the theoretical background of the dissertation. In this Chapter the existing P2P overlays are described. An overview of existing metadata management standards is provided. Afterwards existing P2P benchmarking methods are briefly described. The Chapter is concluded with an overview of P2P simulation tools and a description of similar research projects. Chapter 3 reports the results of the practical experiments. First, the performance of different QbE methods is assessed in subjective experiments. Afterwards the implementation and simulation results of the QbE service in an unstructured and a structured P2P are provided. A comparison of the two implementations is described. Chapter 4 wraps up the results and most significant conclusions. Suggestions for further research directions are provided as well.

23 Chapter 2 State of the Art The investigated search service is created by applying content based media retrieval techniques to the P2P file sharing overlay. Content based media retrieval is a popular research topic. Such services are typically created for centralized media databases. Much effort is also undertaken by the researchers in the area of the P2P overlays. The combination of both techniques will, on one hand, open new possibilities for content providers and consumers but, on the other hand, is an up-to-date research challenge. This Chapter describes basic concepts and the current state of the art in both research areas the P2P overlays and the content-based image retrieval. In-depth understanding of these research areas is required in order to identify new research challenges which result from combination of underlying technologies. The description of the architectures of the P2P networks is provided in Section 2.1. The existing systems of metadata storage are discussed in Section 2.2. In Section 2.3 the existing content-based retrieval systems are presented. Section 2.4 describes the benchmarking system used for the measurements of the system. It is followed up by Section 2.5 which presents the existing software for simulation of P2P overlays. Section 2.6 is devoted to the similar research and related work in the area. 2.1 P2P Architectures The problem how to store the data and how to find data exists since the first databases were created. This Chapter describes the problems and solutions associated with data storage and retrieval. The division of P2P systems presented in this Chapter is based on [65]. For the sake of simplicity the Chapter will focus on file sharing service, but all the assumptions and techniques described

24 16 State of the Art can by applied to any kind of services such as instant messaging, VoIP or social networking Client-Server Architecture The first approach to the mentioned problem is a client-server architecture. A file sharing service following the client-server paradigm consists of two kinds of nodes. A powerful server, which job is to store the files, keep an up-to-date record of them, reply to the queries and finally serve the clients with the file download functionality. The clients in this design are computers of very low resources available, when compared to the server. The sole role of the clients is to be an interface between the server and the user. The client-server architecture has numerous drawbacks. It is expensive to setup as it requires a powerful computer. It is also expensive to support as it requires a lot of bandwidth to deal with both queries and file downloads. The server creates a single point of failure for the file sharing service. The system does not scale without substantial hardware and bandwidth expenses. On the other hand, there are advantages of such set-up. The service, being operated from a single server, is easy to control. Due to full control over the service it is also easy to create an economic model for it Centralised P2P The costs and the scalability problems of the client-server architecture caused the first generation of P2P architecture to emerge. Historically the first solution are centralised P2P networks such as Napster [55]. The main goal of the creators of the centralised P2P were to remove the load caused by the file downloads from the central server. In the centralised P2P architecture service consists of a central server and nodes. The role of the central server is to maintain the index of the files available throughout the network. The files are kept in the nodes and the nodes are responsible for supporting the resources required for the download. If a node wants to share a file in the network it has to advertise the file and its metadata to the central server. If a node wishes to find and download a file, it sends a query to the central server. The server responds with the address (or addresses) of the node (nodes) which have the requested asset. The querying node downloads the file from the node that has it bypassing the central server. This architecture has benefits over the client-server architecture. It allows to distribute the most resource consuming part of the service, being the file download. The concept of the service is relatively simple as no sophisticated routing mechanisms are required. This results in a simple implementation. Thanks to the

25 2.1 P2P Architectures 17 central indexing server the network can be controlled and an economical model for the service can be created. On the other hand, the indexing server in the centralised P2P networks still is the single point of failure. Also the load balancing, while much better than in the client-server solution, still is far from being perfect. Nodes holding files being popular among users have to sustain larger load than nodes that with unpopular content Unstructured P2P The next step in the evolution of P2P overlays is the unstructured P2P architecture. In this solution there is no need for any central authority. Both the storage and search are distributed in the network. The network consists only of nodes, which are sometimes referred as to servents [10] (a word created from server and client ), as in one of the implementations of the unstructured P2P architecture - the Gnutella protocol in version 0.4. The node does not perform file upload in any form. The sole responsibility of the node in the unstructured P2P architecture is to maintain connection with other nodes and to respond the queries for searched files. Search process is based on flooding the network with the query and hoping that the query propagated within the network will eventually reach the nodes, which possess the searched file. This architecture was the first one to remove the single point of failure from the design. If a node disconnects from the network only its content becomes inaccessible and the network as a whole does not cease to function. The drawbacks of the unstructured P2P systems is a huge signalling overhead which is required to maintain the connectivity of the nodes. Also the search process is very ineffective as the query routing is based in the flooding principle. Due to the totally distributed nature of the unstructured P2P system the administrator of the network looses the control over it, which apart from creating legal problems with digital rights management (referred also as piracy ), renders creation of economic model for the service very challenging Hybrid P2P Hybrid P2P networks emerged as a natural combination of the design paradigms of the centralised P2P systems and the unstructured P2P architectures. In the hybrid P2P systems the nodes are formed into a two tier topology. The nodes that have an abundance of resources (computing power and bandwidth) are elevated to a status of super peers (which can be also understood as a kind of local servers).

26 18 State of the Art The super peers are responsible for maintaining the indexes of small chunks of the network (typically up to 100 nodes, as Gnutella version 0.6 [47]). When a node connects to a network it has to establish a connection with a super peer responsible for the part of the network the node is in. After the connection has been established the node advertises the metadata of the files it wants to share to the super peer. In this way the super peer holds a complete index of files which are stored in its part of the network. If a node wants to find a file it sends a query to its super peer. The super peer checks whether the file is available within the part of the network it has an index for. If so, it returns the address of the node which has the searched file in possession. If no result can be found in the super peer s index the query is propagated to other super peers by flooding just as in unstructured P2P systems. The benefit of the hybrid P2P architecture over a unstructured P2P is a reduced amount of signalling required to maintain the network. The benefit of removal of a single point of failure is partly covered. A disconnection of a peer does not influence the network, although the disconnection of a super peer requires either selection of a new super peer or the other super peers to take over the functions of the disconnected one. The main drawback of the hybrid P2P architecture is an uneven load offered to the nodes forming the network. The super peers need to withstand a much more traffic and have to commit more computing power than the regular nodes without any substantial benefit Structured P2P The structured P2P architecture is the most current approach to the design of the P2P system. The novelty of the concept is the introduction of a logical link between the data (a file) and the address in the addressing space in the P2P overlay. A single peer in the structured P2P overlay is responsible for a chunk of the addressing space. If a new file is to be placed in the network it has to be mapped to the addressing space of the network. The mapping is done with a hashing algorithm such as for example SHA1 [11]. After the file was mapped to the address, the node responsible for the part of the addressing space the file address belongs to, has to be informed. There are two general strategies of storage [65]. One is the direct storage. In this method the file is uploaded to the node, which is responsible for the address linked to the file. Second is the indirect storage. In this method the responsible node is informed only of the file metadata and location. Search in the structured P2P overlay is easy. If a node knows the hash of the file searched it automatically knows the part of the addressing space it needs to query.

27 2.1 P2P Architectures 19 The benefit of such architecture is a low search overhead. Also the problem of load balancing is solved by this design. If a given content is popular among the users, it is enough to increase the number of nodes responsible for that part of the addressing space to split the load between them. The main drawback of the structured P2P overlay is a complex construction which makes the design and deployment of hybrid P2P networks a challenging task. Nevertheless several structured P2P overlay designs have been proposed such as Chord [66] or CAN [60] Summary of P2P Architectures The P2P systems have been actively developed since at least 1999 as an answer to the challenge of load distribution from the server to the clients. Different architectural solutions offer a different level of tradeoff between the communication overhead and the storage cost per node (Figure 2.1) with client-server and unstructured P2P solutions on the opposite poles. Communication overhead per node Pure P2P Hybrid P2P DHT Centralised P2P Client-Server Storage cost per node Figure 2.1: Storage cost versus communication overhead (based on [65]) In order to analyse the behaviour and performance of the QbE service in P2P overlays two different P2P architectures were chosen for implementation. These are Gnutella v. 0.4, which represents the pure P2P paradigm and Content Addressable Network (CAN) which represents the DHT based P2P architecture.

28 20 State of the Art 2.2 The Metadata Management As the term of metadata was introduced without any formal definitions there are several ways to define, what metadata really is. The most common and informal understanding of metadata is data about data [68]. This definition provides the best description of the nature of the metadata. To give a better understanding of the concept of the metadata, two additional, complimentary definitions can be introduced. A summary report presented by the Committee on Cataloging: Description & Access defines metadata as structured, encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities [68]. Additionally, Bulterman gives a following definition: [Metadata is a set of] optional structured descriptions that are publicly available to explicitly assist in locating objects [8]. Several metadata systems were introduced as standards. Two most popular will be presented here in more details the Dublin Core and the MPEG-7 standard The Metadata Systems Dublin Core The Dublin Core 1 metadata standard is a simple (referred as the metadata pidgin for digital tourists [30]) and straightforward standard for information resource description. The preliminary works on the proposal of the standard started in 1995 and are continued by the Dublin Core Metadata Initiative (DCMI). The standard proposal was accepted by the National Information Standards Organisation (NISO), which is an association accredited by the American National Standards Institute (ANSI) [56]. The Dublin Core has been also proposed as a Request for Comments (RFC) [70]. The goals of the Dublin Core Metadata Standard are as follows [30]: 1. The set of metadata for a media file is supposed to be kept as small and simple as possible. 2. The semantic used by the standard is supposed to be well understandable and defined. This makes the metadata human-readable. 3. The standard is supposed to be international, having numerous language versions. 4. The standard is supposed to be easily extensible. The standardising organisation is working on creating interfaces which will allow to incorporate other metadata sets to the standard. 1 The name Dublin Core refers to Dublin, Ohio, USA where the first initiative of a metadata standard emerged, not to Dublin, Ireland, which is a common misunderstanding.

29 2.2 The Metadata Management 21 There are two levels of the Dublin Core Metadata Element Set (DCEMS): the Simple DCEMS and the Qualified DCEMS. The Simple DCMES contains of a set of high-level 15 elements, such as Title, Subject or Creator. The Qualified DCEMS is an extension of the Simple DCEMS and introduces three new elements Audience, Provenance and RightsHolder. Additionally the Qualified DCEMS introduces a set of refinements or qualifiers which narrow the semantics of an element. There are four rules which define the relationship between metadata and its data which are applied in the Dublin Core standard [30]: 1. The standard defines a set of elements and element refinements, along with a formal registry. These should be used by the content managers as a best practise 2. The one-to-one principle. A Dublin Core compliant metadata describes a single instance of a media, and not the whole class of media. In other words each media manifestation has its own Dublin Core metadata. 3. Dumb-down principle. A media manager (either a piece of software or a human) should be able to ignore a qualifier and use its value only. It may lead to loss of accuracy, but such a case should be handled. 4. The metadata should be constructed in such a way that it can be parsed and used both by software and human content managers. The Dublin Core standard reveals many advantages such as overall user-friendliness, which allows the access and understanding of the metadata by the human content managers. It has, however, a disadvantage, which eliminates it as a candidate for the described search system. It does not define the methods of extraction and comparison of the lowlevel metadata. The standard focuses only on the high-level metadata. It is worth mentioning, that some of the existing P2P networks allow narrowing of the simple text-based query by input of a metadata. Such, existing implementations (e.g. the KAD implementation in the emule client) allow, for example, to define the author or the title of the searched media. Moreover it is also possible to narrow down the search by inputting some content based metadata to the search query such as bitrate. These content-based metadata fields are, unfortunately far too primitive to allow significant improvement in the search quality The Metadata Systems MPEG-7 The MPEG family of standards consists of five well established groups of standards [53]. MPEG-1 [34] and MPEG-4 [36] are standards for multimedia compression,

30 22 State of the Art storage, production and distribution. MPEG-2 [35] is a standard for audio and video transport for a broadcast-quality TV. MPEG-21 [38] is defined as an open framework for the multimedia content and is still under development. The MPEG-7 standard was proposed by the Moving Pictures Experts Group (MPEG) as an ISO/IEC standard in 2001 [37]. The formal name of the standard is Multimedia Content Description Interface. The main goal of the MPEG-7 standard can be described as to standardise a core set of quantitative measures of audio-visual features called Descriptors (D), and structures of descriptors and their relationships, called Description Schemes (DS) in MPEG-7 parlance [53]. The most important requirements, set for the MPEG-7, are as follows [50]: Applications the standard can be applied in many environments and to numerous tasks ranging from education and tourist information to biomedical applications and storage. One of the most important areas of application of the MPEG-7 standard are the search and retrieval services [53]. Media types the MPEG-7 standard addresses many media types, including images, video, audio and three dimensional (3D) items. This diversity makes the standard all-purpose and allow development of complex system, which integrate many types of media content. Media independence the MPEG-7 metadata can be separated from the media and stored in a different place, also in multiple copies. This feature of the standard makes it useful e.g. in the P2P environments. Object-based approach the MPEG-7 standard structure is object-oriented, which is a desired feature in case of development of advanced metadata systems. In such system object-based approach allows development of a hierarchy of descriptions in which one descriptions inherit parts of the structure from another. Abstraction levels the MPEG-7 standard covers all levels of abstraction of the description of the media. The lowest, machine levels of the description of the media include such signal-processing features of the media such as spectral features in case of audio media or color structure descriptors in case of images. These are utilised for QbE search. On the other hand, MPEG-7 supports the high, semantic level of description of the media. In the scope of this research the low level descriptions of the media will be utilised. Extensibility the structure of the MPEG-7 is open for extension of the basic set of descriptors. This feature of the standard is very valuable, as new types of media are emerging and can not be covered at the moment of the standardisation.

31 2.2 The Metadata Management 23 Several terms, introduced by the MPEG-7 standard that will be used in the scope of this dissertation are presented below and in Figure 2.2 [54]: Data all kind of the multimedia content which can be described with the use of the standardised metadata. Feature a significant extractable characteristic of the data. Descriptor defines the syntax and semantics of the representation of a feature. Descriptor value an instance of a descriptor for a particular feature and a particular piece of data. Description scheme provides information on structure and relation between its parts. The parts of a description scheme can be both descriptors and other description schemes. Description Definition Language, DDL is an interface which allows to extend the existing descriptors and create new ones. Defined in the standard Defined outside standard Description Definition Language Description Scheme Description Scheme Descriptor Description Scheme Descriptor Descriptor Descriptor Figure 2.2: The relations between the parts of the MPEG-7 standard [54] To allow better understanding of the above definitions, a following example can be given (based on [54]): The description scheme Shot Technical Aspects consists of two descriptors, Lens which gives the information on the focal length of the lens used and Sensor size, which gives information on the size of the camera sensor in MP (Mega Pixels). A descriptor value for the Lens descriptor can be, for example, 300 mm and the descriptor value for the Sensor size descriptor can be e.g. 8 MP. In case of the low-level descriptors the standard defines, in most cases, the way of the extraction of the descriptor value. The extracted descriptor has a form of

32 24 State of the Art a vector. The standard does not, except for few cases, define a metric in which the descriptor values can be compared. However, such metrics are known and methods for calculation of the distance between two low-level descriptor values are described in the literature [50]. The MPEG group, apart from providing the standard itself, provides a reference software, called the experimentation Model (XM). This software tool allows extraction and comparison of descriptor values. It can serve as a reference for testing of other MPEG-7 based tools. An interesting inconsistency in the standard can be observed. The MPEG-7 standard does not provide a method of the descriptor value comparison, whereas the XM, being a part of the standard, does so. As the MPEG-7 standard is one of the most advanced achievements towards the metadata management it is used in the presented research. 2.3 Search Methods and Architectures of Metadata This Chapter presents a classification of the search methods used for retrieval of the media. The overview of the classification is depicted in Figure Classification Based on Input Method Most of the existing search system are based on the text search. The user inputs a text string which is then searched in the repository. The search can be exact (where only identical hits are returned) or for similar hits (for example single words from the query). For some application it is useful to utilise a fuzzy textual search method. In this kind of search a distance is calculated from the query to the textual strings in the repository. This distance is calculated in a dedicated metric, e.g. Levensthein Edit Distance [44]. This approach allows for successful search in case when the user misspells the word. It can also be helpful if the words in the database are spelt incorrectly. The textual content in the repository may have different origins. The repository can simply consist of textual documents as in case of Internet search. In case of media repositories the textual search can be performed with use of the file names (most common case in the P2P networks) or the manual annotations assigned by the repository owners. Using this search method we can search for movies with a defined actor by typing the name and hoping, that the movies in the repository have been tagged with the actor s name. Therefore, the effectiveness of this way of media search depends on the accuracy of the textual description of the media

33 2.3 Search Methods and Architectures of Metadata 25 Search methods classification User input method Matching method Source data Source data origin Complexity Textual Query by Example Exact matching Fuzzy matching Low-level features High-level features Semantic Automatic annotation Manual annotation Single-stage Multi-stage Figure 2.3: The classification of search methods and is independent from the media itself. To summarise, the textual search for media is the easiest but also the most primitive way of search. The content-based search allows to search for media basing on their contents. A textual description can be automatically extracted from the media. This can be done, for example, by means of the Optical Character Recognition (OCR), face recognition or speech recognition. Such, content based, automatically generated metadata can be accessed via previously described textual search. Now, a textual search by providing a name of an actor can yield better results, if the repository has been processed by a face recognition algorithm. A more advanced content-based search techniques utilise the low-level features of the video, such as dominant colour, shape histogram or similar. This gives us an opportunity to perform search by providing an example. If the repository provides tools for a low-level content based search and with a Query by Example (QbE) search method it is possible to search for videos or scenes which depict an actor by providing a visual example of this actor. The most advanced search tools utilise the semantic information extracted from the media. If a repository is providing tools for extraction and search within the semantic data derived from the content it would be possible to perform a textual query in form of a sentence: I am searching for love scenes with an actor talking to a woman. From the above example it is easy to notice that the most versatile and useful search system should incorporate all of the above mentioned search methods. If such system would exist, it would be possible to perform a query I am looking

34 26 State of the Art for scenes from movies with the actor from the provided example, who is wearing a blue suit and talking to a woman Classification of Search Methods Based on Complexity The search methods can be classified as single-stage and multi stage methods. In single stage search methods the user provides a query and the searching process is concluded, when the user is provided with the list of relevant results. This method is simple and does not require much input from the user himself. In some systems the user, after receiving the results has a possibility to apply filters or perform further search within the results. This filtering and/or search is performed locally. The other category of search mechanisms are multi stage search systems. In such systems the first part of the querying process is identical with the singlestage search. However, upon receiving the results the user may specify his query by utilising the received results. Such specified query is executed again and is expected to yield better results than the initial query. Multistage search systems are supposed to allow for more effective search. They, however require more effort and expertise from the user Advanced Search Methods and Users The natural question which arises here is for the usefulness of search tools. We got used to the textual search both in case of documents and media and may just not feel the need for more advanced tools. Such tools however, may be useful both for professionals and prosumers. Two usage scenarios are provided below. These scenarios have been developed by the author with cooperation with content providers and specialists [24]. Alice, a 28 year old history teacher is preparing an ancient history lesson for her class. She wishes to show pictures from Pompeii, showing the Vesuvius volcano in the background. She has a problem, however the only photos she finds in the Internet are thousands of family photos shot at Vesuvius. She knows that there are many good pictures in the network, but she can t find any appropriate straight away, but after browsing hundreds of pictures Juan is a 46 years old violin player and composer. Today, on his way home in the train he heard a fascinating Celtic tune from someones mobile ring tone. He remembered the tune perfectly thanks to his musical talent. He would really like to know more of this tune he hums all the time...

35 2.3 Search Methods and Architectures of Metadata 27 Alice runs a Peer-to-Peer client on her notebook and uploads her photo as a query example. In the returned results she quickly finds an appropriate photo for her lesson. Juan runs the same client on his computer and hums the tune to the microphone to use it as a query over the content aware worldwide spread distributed multimedia P2P store. In the results he finds the full song and concert description, the music file together with DRM information to retrieve it. But he also gets a peer cast online radio station that is just playing the whole song or did it within the last week. He realises that the peer cast station plays the kind of Celtic music he likes all the time and finds out that it is run by a Celtic band that wishes to live spread its music to a wide audience over the network. After a few days of listening, he joins the Celtic band community that is running the peer-cast station and shares his own works. Expert publications stress that users are becoming media producers and consumers at the same time ( prosumers ). The society is moving from a common, unified mainstream market to individual and fragmented tastes [23]. This trend can be depicted as an effect of a long tail in Figure 2.4. On the left side a vertical bar represents the standard model of multimedia production, where few broadcast stations produce content for masses of consumers. On the right side of the image the horizontal bar depicts the huge society of bloggers, photo-bloggers, videobloggers and podcasters who offer their multimedia productions to a narrow group of focused consumers. These producers are at the same time consumers of multimedia productions provided to them. This is a Web 2.0 community, which can be referred to as prosumers Summary on Classification of Search Methods The search and retrieval mechanisms can be classified in several ways (Figure 2.3). But the most important conclusion is that the diversity of search mechanisms is similar to the diversity of content types. And, like the content, different retrieval methods are targeted at different end users. Moreover thanks to easy access to the Internet and growing popularity of media the tastes and interests of consumers are becoming more diverse. The search and retrieval technology should keep up with the trend The Architectures of Metadata The traditional approach for storing metadata is to keep it in a single repository along with the data itself. This approach guarantees quick access both to metadata

36 28 State of the Art Number of content consumers Professional photographs, world-wide magazines, photo exhibitions Regional, local, community newspapers photographs, information portals Bloging, photo sharing One-to-many Number of content creators Everybody-to-a-few Figure 2.4: Popularity of content versus the number of the content consumers [23] and to the content as both are kept in a single repository. This feature makes the centralised approach useful for a local use. But when moving into the area of networking this approach becomes ineffective. User, searching for a piece of content has to his disposal only one, local copy of metadata accompanying the content itself. It is depicted in Figure 2.5. {A, B, C...} represent pieces of content and {M A, M B, M C...} the corresponding metadata. The approach adopted in the presented research is to separate the metadata from the content while preserving the link between both of them. Moreover - metadata can be copied and distributed - which is not always possible with the content due to its high volume or the copyright restraints. This leads to easier search and retrieval of metadata and its linked content. The user has to his disposal a distributed and fuzzy repository of metadata stored in many P2P overlay nodes. The concept is depicted in Figure Search Benchmarking According to the English dictionary [69], a benchmark is: a standardized problem or test that serves as a basis for evaluation or comparison (as of computer system performance)

37 2.4 Search Benchmarking 29 Figure 2.5: The traditional approach to content and metadata storing P2P Overlay Node P2P Overlay Node A M A M A D B M B M B M D P2P Overlay Node M B C M C P2P client User Figure 2.6: The distributed storage of media and metadata The search tools, as any other systems, need to be measured in order to assess their widely understood quality. Apart from assessing quality, the results of the benchmarking process allow to compare the proposed or developed system against similar ones. The requirement for such comparison is that all the compared sys-

38 30 State of the Art tems are measured with use of the same measurement tools called benchmarking frameworks. In computer science benchmarking is performed according to a standard. Such standard may be official, approved by one of the standardisation bodies. On the other hand there is a variety of unofficial benchmarking frameworks, approved by the society of users and developers. The problem of measurement of search quality emerged with the first search engines. The methods, mainly for assessment of search accuracy became more sophisticated with introduction of computerised media. The methodology for accuracy benchmarking in media databases is well established. Unfortunately, there are no standards nor widely recognised methods for benchmarking of media search in P2P overlays. The presented research proposes a benchmarking framework which can be used to measure multimedia search accuracy and time in P2P overlays The Benchmarked Parameters There are several aspects of a search system that can be measured. The most important aspect is search accuracy which can be defined as the ability of the system to find the desired results upon the query. The second measurable attribute of the search system is search time. For the database-based systems search time was not an issue as the search was instant. This was thanks to high performance and locality of the database systems. According to works describing the search benchmarking systems speed is not of central concern [43]. Distributed P2P systems are characterised by a considerable and varying delay in communications. Therefore the search time will also be a discussed. Although accuracy and time are not the only measurable characteristics of the search system, author finds them most important. The security aspects of such a system, including the trust management are out of scope of the presented research. A trust system, which can be easily adapted to ensure the security of a P2P system was developed by the author as a masters thesis [16] and presented in [26]. The resource consumption of the search system is another measurable feature. Resources are here understood as memory, processing power, disk space and bandwidth. Memory is consumed mainly by the routing operations of the overlay and can be treated as service-independent. Processing power is consumed mainly by the operations of query preparation and processing (described in detail in Chapter 1.2) and contributes to the search delay. Therefore it will be not analysed separately. The disk space is required, apart from the content storage, to contain the descriptors. Because the space consumed by the descriptors is minimal, when compared with the disk area required for the media it also will not be analysed in

39 2.4 Search Benchmarking 31 this dissertation. Bandwidth consumed by the search service will be analysed during the experiments. Bandwidth resource in QbE search application is consumed during the querying process due to large (when compared to textual queries) size of descriptors and due to transmission of thumbnails of retrieved media items Search Accuracy Benchmarking Assessment methods for search accuracy are well explored in the case of centralized repositories of media [64]. In case of distributed repositories these methods require adaptation. Annotated Image Databases In order to perform benchmarking of the search system accuracy it is required to have a ground truth. It may be defined as a full knowledge of all data stored in the system. It serves as a reference level for the benchmarking of accuracy. In other words to assess the search system accuracy, defined as the ability of the system to find the desired results upon the query, it is required to be able to confront the results against what is actually available in the repository. In the case of benchmarking of multimedia a ground truth is a collection of annotated media files. Such collection is usually manually annotated to make the annotations accurate. For the case of images there are several requirements for the reference collection. First of all, although the search system may have access to an almost unlimited number of images, having a very huge benchmark database is not critical [43]. According to the same author the actual number of pictures in the reference database should vary from 1,000 to 10,000 images. The second requirement for the reference collection is the content of images. To make measurements coherent with the real environment of the application, the content of the images (meaning, in case of image search photographs) should be natural and complex. Complex in this meaning may be defined as semantically rich and multiobject. The format of images also needs to reflect the typical format used by the users being in case of image sharing a low-compressed JPEG. Available sources of such annotated images can be, basically, divided into two groups. The first group contains the annotated collections which are intended to be used in benchmarking. The number of files stored in such databases can vary from hundreds [42] to tens of thousands [28]. Also the content of databases varies - from the natural images [4] to strongly artificial multi angle shots of single objects on a uniform background [14]. Such databases are typically available to the research community free of charge. Another source of annotated images are Internet hosting services for photos. In such services users have a possibility to manually annotate images. Flickr and

40 32 State of the Art Corbis are examples of such services. The greatest advantage of such source of annotated images is the number of photographs hosted. Flickr hosted, by the end of 2010 over 5 billion ( ) photographs growing by 1 billion ( ) per year. Disadvantages are the uncontrollable quality of annotation and the difficulty of access to the repositories. The strong advantage is the natural character of such source of images, which means that those photographs were taken by the users in the natural, every day circumstances. Evaluation Metrics Existing evaluation metrics for search accuracy can be divided into quantitative and qualitative metrics. The former ones are measured objectively and refer to the broadly understood performance of the system. The latter ones refer to the subjective quality of the system and are assessed by the users. An example of a qualitative metric is a Mean Opinion Score (MOS) scale, which was initially standardised by the International Telecommunication Union [31]. In this metric the quality of the system is subjectively assessed by the users in a five-grade scale, where 5 means the best quality and 1 means the worst quality. Another example of qualitative measures is R-factor as a measurement method. It may be utilised in a similar way it is used for subjective evaluation of speech quality in voice transmission systems [32]. One may also take into consideration that MOS may be derived from R-factor (and vice-versa) [33] if a proper mapping function is known. There are numerous quantitative metrics for assessment of search accuracy in the single-stage retrieval process. An overview is given in [64]. These metrics are applicable in the case of a binary classification problems, where all of the items can be classified either as relevant or irrelevant to the query. If a query is sent to a system containing N items, a cut-of value k is set and V n is defined as relevance (Boolean, 0 for irrelevant, 1 for relevant) of a returned nth item. It has to be noted that introduction of relevance (V n ) implies that a ground truth is known. Ground truth is an a priori knowledge, which allows classification of items in the database as relevant or irrelevant to the given query. Such ground truth may not be accessible for all search scenarios (as it will be shown later). In such case the below defined metrics become useless. However, for sake of completeness of discussion on the benchmarking systems the following metrics can be defined: Detections, which is the number of relevant item detected, defined as (2.1). k 1 A k = V n (2.1) n=0

41 2.4 Search Benchmarking 33 False Alarms, which is the number of irrelevant items detected, defined as (2.2). k 1 B k = (1 V n ) (2.2) n=0 Misses, which is the number of undetected relevant items, defined as (2.3). C k = N 1 n=0 V n A k (2.3) Correct Dismissals, which is the number of irrelevant items not detected, defined as (2.4). N 1 D k = (1 V n ) B k (2.4) n=0 Primary Recall, which is the number of detections divided by the total number of relevant items, defined as (2.5). R k = A k A k + C k (2.5) Primary Precision, which is the number of detections divided by the total number of returned items, defined as (2.6). P k = A k A k + B k (2.6) Fallout, which is the number of false alarms divided by the sum of false alarms and correct dismissals, defined as (2.7). F k = B k B k + D k (2.7) Analogous metrics can be defined for the multistage retrieval. As the designed benchmarking system is planned to be used for single stage search techniques metrics dedicated for the multistage retrieval are out of scope of this work. Evaluation Methods Simple calculation of a single metric defined by Formulas 2.1 to 2.7 does not allow to draw conclusion about the search system performance. A trade-off between some

42 34 State of the Art metrics is quite common, so selected metrics have to be compared against each other. In order to make conclusion about the accuracy of the system evaluation methods need to be defined. An overview of the recommended evaluation methods can be found in [64]: Retrieval Effectiveness, defined as the comparison of precision (2.6) versus recall (2.5). It is a recommended evaluation method. Receiver Operating Characteristics, defined as the comparison of detections (2.1) versus false alarms (2.2). Relative Operating Characteristics defined as the comparison of detections (2.1) versus fallout (2.7). R-value defined as the precision (2.6) at different cut-off value k. 3-point Average, defined as average precision (2.6) at defined values of recall (2.5), typically R k = 0.2; 0.5; point Average, defined as average precision (2.6) at 11 defined values of recall (2.5) Existing Benchmarking Systems Existing visual information retrieval systems focus mainly on preparing the annotated media for the benchmarks [58]. Benchmarks focus also on the media stored locally, whereas the proposed benchmarking system focuses on a distributed storage. The US National Institute of Standards, Information Access Division is responsible for preparing a benchmarking system for retrieval of video. The benchmarking system is called TRECVID [63]. It provides the video data, as well as the set of topics (queries) to work with. The Technical Committee 12 (TC-12) of the International Association for Pattern Recognition provides a set of images and reference queries for testing of search and retrieval of images [27]. The TC-12 benchmark provides four core elements: a set of images, a set of queries, a collection of ground truths associated with images and queries, a set of measures of the retrieval performance.

43 2.4 Search Benchmarking 35 Objective evaluation based on observation of user behaviour Metrics and evaluation methods presented in previous Chapters are useful in case of centralized databases of media or for evaluation of binary classification solutions. In case of a QbE service ground truth is not available, as it is impossible to do an arbitrary judgment over a query and a database to decide which items are similar to the query. This is due to the fact that the term of similarity is ambiguous and is understood differently by different users. Therefore, in case of a distributed search system a metric based on user behaviour would be more useful. Image search system evaluation methods may also be based on the observation of user behaviour. The evaluation of the accuracy of the QbE service in the P2P overlay in the presented work will be based on a custom metric. This metric allows for objective, quantitative assessment of the solutions accuracy while being based on observations derived from qualitative experiments. This metric is based on several assumptions. First, it is designed to measure how the results obtained from the QbE service implemented in the P2P overlay differ from the results obtained from the QbE service implemented in the centralized database. We assume, that the results obtained from the centralized solution are known and form the ground truth. Second, we assume that the user is interested in first k results. Therefore, in the perfect situation the first k results returned by the distributed solution would be identical with the first k results returned by the centralized one. This assumption is based on the fact that users tend to focus on first few search results and will rather refine their query than browse lower-ranked results. Moreover, it is assumed that the order of the first k results within does not matter. As the image QbE results are presented in a visual form, the user is likely to observe all the k results without regard to their order. According to [13] approximately 95% of the users focus on the first page of Google textual search results which by default presents 10 results per page. As such value is not available for image search, it was decided to use n = 10 for the presented research. In such case the formula for calculation of value d s of this metric can be defined as in Equation 2.8, where W denotes a set of images being the search result, W denotes the cardinality of the set W and R denotes a set of 10 most best results provided by the centralized solution. d s = W R, d s ɛ [0; 10] (2.8) Such metric compares the distributed solution against the centralized one while retaining the user-like behaviour of focusing on first ten (10) results. In the ideal case the metric will have the value of ten (10) for a given query, which means that

44 36 State of the Art the distributed system performs as well as the centralised one. 2.5 Simulation Tools The author of the presented research has decided to perform the measurements in the simulation environment, as stated in Chapter 1.4. A good P2P simulator has to fulfill a set of requirements [41]. The architecture of the simulator should be modular, in order to allow for easy modification and development of plug-ins. It should simulate the underlying network while maintaining the scalability of the simulator. The simulator should also allow for development of a model of user behaviour and his actions. The simulator should also take into account peer s resources and services the peer is offering. The simulator should be well documented to allow easy simulation setup. Finally, the overlay mechanisms implemented in the simulation environment should be easily convertible into real implementation. These requirements are well met by the PeerfactSim.KOM simulator developed at the Technical University in Darmstad, Germany. The author of the presented dissertation cooperated closely with the authors of the simulator within the CON- TENT Network of Excellence (see Chapter 1.6) on development of P2P application evaluation metrics. There are numerous other simulators offered for researchers. Unfortunately, the vast majority of them is designed and implemented specifically for the need of a given research project performed by the simulator author. A few of the general purpose P2P simulator are presented here. P2PSim [45] offers a wide variety of implemented overlays, but its architecture allows for simulation of structured overlays only. PeerSim [39] is well documented, but does not allow for modelling of the underlying network. Overlay Weaver [62] can be used for development of P2P overlays and applications and can act as a simulator and emulator. Unfortunately, as a simulator it has poor scalability. On the other hand the NeuroGrid [40] simulator scales very well, but it does not model underlying network, peer resources nor churn. PlanetSim [12] has a property which allows for easy transmission from the simulation to the experimentation implementation. It scales well, but it also does not model the underlying network, peer nor the user behaviour. OverSim [7] also provides good scalability and offers the possibility of experimentation code deployment, but does not model the user behaviour and the peer resources. The simulation evaluation of a system has its well known weaknesses. The simulation will never reflect ideally the real network conditions, the behaviour of peers and the behaviour of users. The accuracy of the measurements will always depend on the accuracy of the model implemented in the simulator.

45 2.6 Related Work 37 It has to be however noted that it is very difficult to perform reliable research on the P2P overlays. It is impossible for the researcher to perform a real implementation of the P2P overlay on ten or hundreds of thousands of nodes for scientific purpose only. The distributed research structures such as PlanetLab can deliver, at its best, thousands of nodes (1100 are available in the time of writing). That is why the research has to be performed in a simulation environment. 2.6 Related Work Query by Example techniques for image search have been well known for years, there are numerous services in the Internet allowing such functionality, such as Google Image Search (QbE functionality introduced in 2011) or Tiltomo. These are all centralized solutions. Also there are numerous successful implementations of P2P file sharing overlays. There is however little research effort put to the combination of both techniques. Similar work to the presented one was published by Muller et. al. [52]. In their well-written paper authors compare four overlays (two structured and two unstructured). The paper lacks however an in-depth analysis of the available QbE methods. The authors also do not describe some of the effects typical for P2P overlays like file replication or popularity. The conclusions of the authors differ in some points of the conclusions of the presented dissertation. The authors conclude their paper stating that performance of unstructured systems is sufficient for introduction of a QbE service and similar to structured system. In the presented thesis the author shows that the performance of structured overlays is better in terms of bandwidth when compared to unstructured ones. Comparison of structured and unstructured overlays is also performed by Yang et.al [71] for full text search queries. Results presented by the authors show, that both structured and unstructured overlays have their strengths and weaknesses, which is coherent with conclusions of this dissertation. However, the application that is considered in [71] differs from the one presented here. Novak and Zezula [57] proposed a modified version of a structured P2P overlay Chord for QbE in P2P overlays. Authors prove the usefulness of structured P2P overlays for similarity search. A good overview of other papers related to QbE in P2P overlays is presented by Li et. al. in [46]. There is also an ongoing European research effort in this research area. Project such as SAPIR (Search in Audio Visual Content Using Peer-to-peer IR) [5] or VICTORY (Audio-Visual Content Search in a Distributed P2P Repository) [48] share similar research goals, but their objectives are limited to development of new P2P overlays (in case of SAPIR) or focus on only 3D data (in case of VICTORY). The presented selection of papers and research projects related to the research

46 38 State of the Art area of the dissertation show that the topic is important and is being explored by researchers. This dissertation extends the state of the art in QbE for P2P systems further more and opens new research directions. 2.7 Conclusions The research on the ongoing trends in media production and distribution methods allow to make valuable observations. First, the amount of data produced daily is likely to overgrow the storage and search capability of centralised solutions. Enormous server farms built by the leading content providers such as Google back up this observation. Second thanks to easy access to production tools the profile of the producers changes rapidly and a new group of multimedia users emerges - the prosumers. All these changes show that research on new methods of content storage, search and retrieval is an important and hot topic. During the years of the development of P2P file sharing services many different architectures have emerged. Existing architectures differ in the level of complexity of routing algorithms, in the performance of load distribution and in the effort required to implement the given architecture. In order to perform a through research on a new service (in this case QbE) the most characteristic P2P overlays should be tested. In the presented dissertation analysis of the state of the art has allowed to select two P2P architectures for further investigation. Unstructured P2P solution Gnutella v0.4 represents the most basic approach and is as close to the sore concept of P2P computing as possible. Second of the chosen architectures CAN is a DHT solution which represents the most recent and most promising advances in the P2P research. The research on available metadata management frameworks allow to conclude that the only standard solution allowing for low-level metadata extraction, management and comparison is the MPEG-7 standard. The MPEG-7 descriptors will be used in the presented research for the QbE search. The research on the methods of assessment of quality of search systems allowed to chose three basic factors that influence the overall performance of a search system. These factors are search accuracy, user perceived quality of the search service and resource consumption (especially bandwidth utilization). For assessment of accuracy QoE experiments will be conducted for centralized QbE system while P2P overlays will be measured with use of a custom metric. Analysis of the available simulation tools, their strengths and weaknesses allowed to choose the most proper tool. The implementation of the QbE service in the P2P overlays will be performed in the PeerfactSim.KOM P2P simulator.

47 Chapter 3 Problem Approach This Chapter presents the results of the experiments that were conducted in order to prove the thesis of the dissertation. Chapter 3.1 presents the dataset used in the experiments. Chapter 3.2 presents the comparison of the performance of 12 QbE methods. Chapter 3.3 presents the results of implementation of the QbE service in the Gnutella v. 0.4 overlay. Chapter 3.4 describes the results of the implementation of the QbE in the CAN overlay. Both overlays are compared in section 3.5. The main conclusions are summarised in Chapter Image Database In order to perform experiments on QbE a collection of images had to be created. This collection has to fulfil several requirements. First, it has to consist of images that are likely to be shared by users in a QbE P2P overlay. This draws focus to so called natural images which are basically photographs taken on a typical daily basis by an average user. Second, this collection has to be large enough to allow simulation of a QbE P2P application. The size of the collection on the other hand is limited by computation and storage capabilities. Third, in order to perform experiment the described in Chapter 3.2 the image database has to be accompanied by user-generated tags. These tags are phrases provided by the users which describe the content of the image. In order to fulfil these requirements it was decided to use the API access to the popular photograph hosting service Flickr. From this service 100,000 random images with accompanying tags were downloaded. Additionally a simple experiment was conducted. As it will be described in the following Chapters the volume of traffic between peers in the P2P overlay will be assessed. As a part of the QbE query the user receives a set of thumbnails of

48 40 Problem Approach most similar images to the query. These thumbnails will significantly contribute to the volume of traffic. In order to assess this volume it is necessary to decide on the average size of a thumbnail. It was assumed that the user will be presented with a thumbnail of size 150x195 pixels (which is typical for image search engines) compressed with the JPEG compression algorithm with quality set at 80%. EXIF data of the thumbnail is removed in order to reduce volume. All the images in the collection were analysed and it was observed that an average size of the thumbnail is 4686±1713 [B]. This value will be used when assessing the traffic volume which includes transfer of thumbnails. 3.2 Measurement of QbE Accuracy in Local Database The first goal of the presented research is to assess the accuracy of the QbE service in a local database. This assessment will allow to choose the best method for application on the P2P overlays. Additionally, the performance of a centralized solution will be set as a reference for assessment of P2P solutions. The results of the experiment presented in this Chapter were published during the UCMEDIA 2009 conference [20] Measurement Methodology The standard methods of measurement of the search service accuracy include metrics such as precision, recall, sensitivity, specificity and similar. These methods are defined in Chapter However, these methods cannot be used for assessment of accuracy of the QbE service as they require a ground truth, being an expected and correct result of the retrieval. In case of QbE search it is impossible to have such ground truth because it is impossible to define similarity between two images in an unambiguous way. Moreover, the number of images that may be reached by the search process varies due to a dynamic nature of P2P overlays and their routing algorithms. QbE, by definition (Chapter 1.2), is a service which delivers results similar to the query to users. It has to be noted that the similarity is not strictly defined. In case of image search similar depends on many factors. First, it depends on the scenario, where images are searched for. Similarity will be perceived by the user in a different way in a case of QbE within medical images than within natural photographs. Even within the same context of search the individual definitions of similarity depend on the background of the user (this observation is described in detail in section 3.2.2). After considering the above mentioned observations it was decided that the best way of assessing the QbE service is to perform a user based evaluation of the QbE

49 3.2 Measurement of QbE Accuracy in Local Database 41 performance. This approach has many advantages over objective measurement. First, it always allows for a more accurate estimation of the overall quality of the service. It gives the best view on the user opinion. Apart from numerical results also valuable opinion and remarks can be gathered from the users. This approach has also some significant drawbacks. First, user tests are very difficult and cumbersome to conduct due to the involved human factor. It is difficult in the laboratory conditions to assemble a numerous and representative pool of users to conduct tests on. The test has to be prepared in such a way, that does not consume a lot of users attention, as the users are not keen to spend their time for such experiments without proper gratification. After considering these limitations a test software was prepared. Tests are designed to be run over the Internet via a standard web browser. This approach allows to reach more users, including users from remote locations. The task the user is supposed to perform depends on the test scenario, but the overall test flow is similar in all cases. Initially the user is able to choose the language for the test (English or Polish). Afterwards he is presented with an instruction describing the purpose of the test and the user s task. The user is asked to provide some basic information, such as name (or nickname), age, gender and occupation. Subsequently the user is asked to do a colour blindness test by watching the standard Ishihara colour blindness test plates. Then, the user is presented with the task, which was divided into numerous individual tests. The user is not required to do the whole test (which was designed to take approximately 20 minutes of time). The user is allowed to stop the test at any time and return to it just by providing his previously chosen nickname. The user does not have to solve all tasks prepared in order to have his answers to be analysed. In random time intervals the user is presented with a humorous image in order to encourage and avoid fatigue. The screen shot of the user test interface used in the local database QbE performance test is presented in Figure 3.1. Query example is presented in the centre. Users task is to click the most similar of the surrounding images or the text stating None of the images is similar to the one in the middle. Surrounding images were selected using different QbE algorithms (detailed description is provided in Chapter A progress bar is shown below the images. The test results and the user data are stored to a database and are analysed. The detailed methodology of the result analysis is dependent on the individual test scenario. The advantage of the web-based test interface is that if the number of results acquired from users will be lower than the required it is very easy to invite additional users and acquire more results.

50 42 Problem Approach Figure 3.1: The web-based interface for the psycho-physical experiments QbE Methods Several content-based image QbE techniques have been presented during the last years. While developing real QbE applications, a question arises: which image retrieval method should be applied? The available benchmarks are commonly incomplete in terms of the number of image similarity measures. Furthermore, reexecuting a benchmark usually imposes possession of a well-annotated database of images (ground truth). In this Chapter a ground truth less comparison of several content-based image retrieval measures is presented. Results of the comparison are applicable for several usage scenarios, including the scenario of QbE in the P2P overlays. Results and conclusions from the presented experiment allowed to select a candidate QbE method to be implemented in the P2P overlay. The approach to the problem was to perform a set of QoE experiments in order to allow the subjects to vote for the best QbE method. Best is here understood as the most satisfying and closest to users expectations. A set of QbE retrieval methods was designed and implemented and query results were presented to the subjects. The experiment considered several image QbE methods. From the MPEG7 standard (described in detail in Chapter 2.2.2) following visual descriptors were chosen for the experiment: Edge Histogram (EH), Colour Layout (CL), Dominant

51 3.2 Measurement of QbE Accuracy in Local Database 43 Colour (DC), Colour Structure (CS) and Scalable Colour (SC). These descriptors were chosen as they include QbE tools of different level of complexity and focus on different low-level features of the image. Apart from the MPEG7 descriptor set a collection of other QbE methods was included. VS [29] is a fast image retrieval library software for image to image matching. Similarity is measured based on spatial distribution of colour and texture features. It is especially optimised for fast matching within large data sets. VS applies a hierarchical grid-based approach with overlapping grid cells on different scales. For every grid cell up to 3 colours in a quantised 10-bit representation in CIELab (fr. Comission Internationale de l Eclairage Lab) colour space and a texture energy feature are stored in the descriptor. The question what does a similar image mean? is difficult and very subjective. Nevertheless, instead of similarity definition one can ask what a user can see in the image. On the basis of such description obtained for two different images the similarity can be approximated since if the descriptions are similar the images should be similar too. Therefore it was decided to use QbE method based on user generated words, which describe the contents of the image. This is usually a set of nouns. This QbE method will be referred as the TAG metric. The Jaccard similarity [6] was used as a metric for computation of the distance between two set of tags in the TAG QbE method. The Jaccard metric is denoted J s (A, B), and defined as: J s (A, B) = A B A B (3.1) where, A and B are sets of tags of images a and b respectively and A is the cardinality of A set. Jaccard similarity has an interesting property that will be explained by an example. Lets assume that three images are given. The first one a has 10 tags with a tree tag in it. The second b has only 3 tags with a tree tag also. The problem is which of a or b image is closer to c image with only one tag tree? Jaccard similarity will show b image as closer since J s (C, B) = 1/3 and J s (C, A) = 1/10. The obtained result is correct since in A tree is one of 10 objects and in B one of three. The TAG QbE method is not a perfect one mainly for two reasons. The first reason is the requirement of having the images tagged. Since images downloaded from the Flickr service were used, the images were accompanied by the user provided tags. The entered tags are far from perfect but otherwise it is almost impossible to obtain a database that is at the same time large and correctly described. The TAG metric has another drawback. Synonymous tags change obtained

52 44 Problem Approach results as different people can tag the same images with different synonymous nouns. Nevertheless, the main goal was to examine whether the TAG is more accurate than other considered metrics. Finally, a generic Random Metric has been used in order to see how the real metrics actually differ from the totally random selections. This method provides a random image from the image database as a query result Description of the Experiment QbE interfaces show a predefined number of results, i.e. k images that are the most similar to the query image, where k is determined by the user interface since a subject has to be able to see the results. The results order is determined by the comparing metric. Two assumptions have to be adopted. The first one is that if a subject cannot find a (subjectively) similar image in the first k of them then the metric results are not correct. The second one is that k = 10, a choice based on our experience with image search systems (Chapter 2.4.2). Therefore, as a metric result a the set of the 10 most similar images is considered. An important fact is that it was not considered whether a picture was first or tenth, the only important property was to be in the first metrics were analysed: 1. 5 metrics based on MPEG-7 descriptors 2. VS metric 3. TAG metric 4. Sum of ranks 1 obtained for EH and each MPEG-7-based metric (without EH) (4 different combinations) 5. Sum of logarithm of ranks obtained for each MPEG-7-based metric The 13th metric was a random metric. Seven (7) images were presented at a time since such an interface was the best in terms of the visual layout at all screen resolutions. Therefore, a subject can choose one of seven (7) different images. Each presented image is a result of two draws. First, the metric is drawn (for example the EH-based one) than from ten (10) images marked as ten (10) the most similar images one is drawn (in this case it is a random image from 10 the most similar images obtained for the EH-based metric). Since thirteen (13) different metrics were considered and each time seven (7) images were shown not all of them were visible at once. Nevertheless, more than 2,500 queries where answered therefore all possible combinations where properly represented. 1 Rank is a metric giving 1 to the most similar image, 2 to the second one etc.

53 3.2 Measurement of QbE Accuracy in Local Database 45 Note that it is possible that there is no similar image in the set of 7 presented images. Therefore, an image that a subject can click if he/she cannot find a similar image among the presented was added. It was added as an image to make this answer identical to a similar image answer Experiment Execution and Results The experiment was performed at two stages. First stage was the identification of the user, presentation of the instruction and a colour blindness test. The second stage was the experiment itself. Seven (7) randomly chosen images were presented to the subject. The subject s task was to choose from the small images the one most similar to the middle, large one. Subjects could also chose no similarity answer (see Figure 3.1). The task was repeated 300 times, but the subject was free to end the experiment at any time. The middle image was the query and the surrounding images were query results obtained with different QbE techniques. Actually the subjects were performing a vote for the QbE method that was most satisfying. As the experiment could be terminated by a subject at any time, a different number of answers from each subject was collected. Therefore, for each subject a distribution of answers (i.e. probability of choosing any metric) was computed and analysed. Additionally all answers coming from subjects answering less than 50 queries were removed from further processing. Results from 31 subjects were finally collected and analysed. The results obtained are shown in Figure 3.2. The probability of the no similarity answer was the highest and reached 31%. Note that if lots of answers are no similarity probably the database was too small to have a similar picture for some of the examples. Since the goal of the experiment is to compare different descriptors and metrics quality and not the database quality this value is not shown in the plot. As the main goal of the research is to prove the applicability of QbE method in P2P overlays by comparison with the centralised solution, the improvement of the QbE method is out of scope of the presented dissertation. All QbE methods were analysed with use of the Students t-distribution test [1] with α = The results show that metrics based on EH, DC, CL descriptors as well as the metric based on VS, are not statistically different from the randomly chosen image. On the other hand, only one metric (based on all MPEG-7 descriptors) is not statistically different from the TAG. However, CS is just slightly worse. Moreover, the CS value is computed on the basis of just one descriptor and thus computationally cheaper.

54 46 Problem Approach Probability Random DC EH CL VS EH+DC EH+CL SC EH+CS CS EH+SC ALL TAG The used descriptors Figure 3.2: The obtained results with the confidence intervals, solid lines descriptors combinations; dashed lines the random metric and the TAG Experiment Conclusions The best performing method was the one based on the user-created tags. Unfortunately, it cannot be assumed that images available in the P2P QbE service will be accompanied by tags, therefore this metric cannot be considered for the further research. Second best performing metrics in the test was a combination of the five MPEG- 7 descriptors and the combination of Edge Histogram and Scalable Colour. This was expected, as these metrics take into account both the colour and the texture of the image. As for the CAN overlay is supposed to be addressed with use of the descriptor values the combinations of several descriptors also cannot be taken into account. Next in the terms of the performance is the Colour Structure descriptor. It performs as well as the EH+SC combination, but is cheaper in the terms of the computational load. The Colour Structure descriptor was chosen for further implementation in the structured and unstructured P2P overlays. However, it has to be taken into consideration that further improvement of the

55 3.3 Application of QbE in an Unstructured P2P Overlay 47 performance of the QbE system is possible by use of multiple descriptors or by further improvement of existing QbE methods. It was observed that the definition of similarity strongly depends on the background of the users. The feedback received from the test subjects allows for a conclusion that people with a technical background assess the similarity based on the content of the image. On the other hand, people with a non-technical background seem to assess similarity based on the general composition and colour layout of the image. 3.3 Application of QbE in an Unstructured P2P Overlay The goal of the presented experiment was to implement and assess the QbE service in an unstructured P2P overlay. Gnutella v. 0.4 was chosen, as an exemplary implementation of such overlay paradigm. This Chapter presents the design, execution, results and conclusions regarding the implementation and simulation of QbE service in the Gnutella v. 0.4 overlay Implementation of the QbE Service Same image collection as for local QbE tests was used for the implementation in the Gnutella overlay. However, as it was planned to simulate thousands of nodes it was impossible to implement the QbE service without a dose of simplification. It was observed that most resource consuming task during simulation is the search for similar images within the P2P network nodes. Moreover, similarity between images does not change nor depend on the simulation scenario. In order to speed up the process it was decided to calculate the information required for similarity search in advance. For each of 100,000 images in the collection 64 (value chosen arbitrarily) most similar were calculated according to the Color Structure descriptor algorithm and stored in the database. As each of the overlay nodes has access to this database instead of calculating the most similar image to the query it just has to query the database. It is worth mentioning that this approach has no influence on the quality of the results nor on the research conclusions File Distribution and Popularity There are two effects which are typical for unstructured file sharing P2P overlays and are non existing in the database solution. These are file distribution and file popularity. These effects have been modelled according to real measurements of

56 Number of files 48 Problem Approach existing file sharing P2P overlays by Chu, Gish et. al. and presented in [9] and [15]. File distribution among the nodes in the P2P overlay is not uniform. Some users are more eager to share files than others which results in existence of nodes with huge repositories of files, while other nodes share only a few. Of course, there are also users who do not share any content while consuming the P2P overlay services. Such users are referred as leachers and such nodes are not implemented within a simulated network as they have no impact on the performance of P2P (are generating queries only). This effect simulated by following measurements presented in [9]. Figure 3.3 presents an example of distribution of files among the nodes Distribution of files among nodes Node ID Figure 3.3: The distribution of files among nodes File popularity in P2P overlay causes single files to be stored multiple times within the network, while other files are stored only in single instance. Chu et. al. [9] state that the most popular 10% of files account for about 50% of all stored data. This effect is caused by users who tend to like some content much more than the other. This effect was also included in the simulation scenario and modelled according to the results published in [9] and [15]. Figure 3.4 depicts

57 Number of file replications 3.3 Application of QbE in an Unstructured P2P Overlay 49 an example of the number of replications of files in the overlay. The number of replicas is capped at 2,500, as it is the number of nodes, that were simulated in the experiment. Obviously there cannot be more replicas than nodes in the overlay. It is worth observing that the modelled effect is on one hand strongly long-tail, resulting in numerous files existing only in one instance, while on the other hand there are 17 most popular files that are replicated in every node in the overlay Replication of files in the overlay File ID Figure 3.4: File replication due to popularity Concluding, in the simulation of the QbE service in the Gnutella P2P overlay two significant effects were modelled based on real network measurements. These are a non uniform file distribution among the nodes and a long-tail replication of files in the overlay due to their popularity Simulation Setup As mentioned in Chapter 2.5 the tool chosen for simulation is the PeerfactSim.KOM simulator. In the application layer the modified version of Gnutella version 0.4 was used. The modifications covered introduction of the QbE service, simulation of

58 50 Problem Approach P2P specific properties of file distribution and popularity and measurement applications for data collection. In the network layer standard implementations (e.g. SimpleNetFactory, SimpleStaticLatency) of the protocol stack and network behaviour were used. These implementation are part of the simulator package and were created and tested by the simulator designers. Network parameters were set up so that they do not influence the performance of the application by e.g. limiting available bandwidth. This allows to observe the behaviour of the application itself. This is required in order to get total and accurate information on number of messages passed and volume of information. 2,500 nodes were simulated. This number was chosen as a compromise between reasonable simulation time (several hours of real time on a standard PC) and as high as possible number of nodes. Total duration of simulation was set to twenty (20) minutes of simulated time. During the first phase the simulation warm up of total duration of ten (10) minutes was performed. During this time the nodes were joining the network and files were distributed among the nodes. These ten (10) minutes were also used for the network to reach a stable state. This division was chosen after observation of the behaviour of the simulated network. Network of that size required less than ten (10) minutes for all nodes to join the network and to populate it with files. After the warm-up time all the traffic related to the creation of network disappeared and only normal maintenance related traffic remained. Second phase of ten (10) minutes of simulated time was the operation of the network during which the queries were sent and replies were gathered. During this phase approximately 135±5 queries were issued by random nodes to the network. This value was set up according to real network measurements based on [9]. No churn was introduced to the network, as it would unnecessary complicate the analysis of the performance of the QbE application. Each simulation was repeated five (5) times with different random number generator seed for each TTL value. This allowed to assess the confidence interval. The number of simulation iterations was chosen as a compromise between the time duration single simulation run (from 2 to 12 hours of real time) and a number of runs required for credible calculation of confidence intervals Analysys of the Results As mentioned earlier the search accuracy in P2P overlay depends on how much data is accessible in the moment of search. Accessible means that the data is available in the network and is reachable according to the utilized routing algorithm. The routing algorithm utilized in the Gnutella v. 0.4 P2P overlay is straightforward and is based on flooding. Each node analyses a query and passes it to its

59 3.3 Application of QbE in an Unstructured P2P Overlay 51 neighbours. The depth of flooding (also referred as a search horizon ) is defined by the Time To Live (TTL) parameter that is embedded in the query message. After each hop the value of the TTL parameter is decreased by one (1). When the parameter reaches zero (0) the query is no longer forwarded. Moreover, nodes have a memory of received queries. If a query is received again it is discarded. This allows to avoid looping queries. The Gnutella protocol specification suggests TTL=7 for most cases and forbids this value to exceed 15 [10]. As can be presumed the higher is the TTL value the more accurate the results of search are, but at the cost of generated traffic. A set of simulation experiments was performed in which the TTL value was set in range from 2 to 6. The accuracy of search calculated according to the metric (2.8) and the total amount of data sent by nodes were observed. For each TTL value experiment was repeated 5 times. The results are presented in Figure 3.5. Confidence intervals are calculated for both search accuracy and for total amount of the data sent. In some cases the narrow width of the confidence intervals does not allow them to be clearly depicted in the plot. Please note, that for each run of the simulation and for each of the TTL values approximatly 135 queries were issued to the network. The method of data analysis is presented in Figure 3.6 (such figure can be created for each of five analysed TTL values). For each run j and query i a value of an accuracy metric d si,j is collected. Then, for each run and for all the queries within a run an average d sj is calculated. No further statistical analysis of results of single queries is performed, as it is questionable whether the single query events are independent. Afterwards, all the d sj values are averaged towards d sf and a confidence interval is computed. All confidence intervals in this work are calculated for α = 0.05 using Students t-distribution test following the methodology presented in [3] and [1]. It is assumed that the values obtained from separate simulation runs are distributed normally. The formula for calculation of the confidence interval on c is as in Equation 3.2 [51]. d sf t α/2,n 1 s/ n c d sf + t α/2,n 1 s/ n (3.2) In this formula n stands for the number of samples (five), n 1 denotes the number degrees of freedom of the Students t-distribution (four in our case), s denotes standard deviation calculated from the set of five d sj values. Value of t α/2,n 1 = is given by the Students t-distribution table. Calculation of total volume of sent data is more simple. For each node the amount of data sent is calculated and summed together. Calculation of confidence intervals for total volume of sent data follows the same methodology as presented by Formula 3.2. As a saturation effect can be observed starting from TTL=4, where the accu-

60 Total ammount of data sent by the nodes [MB] 52 Problem Approach 3500 Search Accuracy vs. Volume of Sent Data TTL=6 TTL= TTL= TTL=2 TTL= Search Accuracy d sf Figure 3.5: Search accuracy and total amount of data sent for different TTL values Figure 3.6: Method of data analysis for accuracy racy is close to maximum and is not further improved, while the amount of data sent by the nodes during the search process increases significantly. The optimal value is TTL=4. Of course, that is for the given size of the overlay. Additional results are presented in Table 3.1.

61 3.3 Application of QbE in an Unstructured P2P Overlay 53 Table 3.1: Gnutella v. 0.4 simulation results for different values of TTLs Parameter TTL=2 TTL=3 TTL=4 TTL=5 TTL=6 Average accuracy d sf 1.98±0.16 6,99± ± ± ±0.05 Total amount of sent data 28.32± ± ± ± ±256 [MB] Average required upload bandwidth [KBps] 0.02±0 0.2± ± ± ±0.17 The achieved average accuracy (9.7 out of 10) is almost perfect and comparable with the centralized solution. The total amount of data sent in the network is significant (1460 [MB]) but when divided between all the nodes and the by the simulation time (the average bandwidth required for the service 0.98 [KBps]) becomes minimal and perfectly achievable in almost any underlying network architecture. The only parameter that requires more attention is the momentary upload bandwidth requirement. The momentary upload bandwidth requirement is the highest amount of traffic a P2P node has to serve in a given second of the simulation. High value measured (for TTL= ±18 [KBps]) suggests that the network traffic in this overlay and application is bursty. That means that there exist periods in time, in which the nodes have to serve huge amounts of traffic. Such property requires creation of buffers in the application and increases delays. The analysis of bandwidth fluctuations in time for a single node confirms this assumption. Figure 3.7 presents the amount of traffic served by a node in time. The node was chosen according to the worst case principle, that is the node which has to serve the highest momentary traffic Conclusions This section presented results of implementation of the QbE service in the Gnutella v. 0.4 overlay. Although some simplifications had to be introduced in order to allow for a large scale simulation the QbE was successfully introduced into this P2P overlay. It was observed that the accuracy of search depends on the value of TTL parameter at a cost of amount of traffic in the network. The value of TTL should be chosen carefully according to the size of the overlay. On one hand too low value of TTL decreases the accuracy of the results and on the other hand too high value significantly increases the amount of data that has to be sent in the network. It was also observed that the network traffic possesses the property of burstiness. That means that the although the average traffic in the network is very low the momentary traffic that has to be served by a single node can be very high. In

62 Traffic served [KBps] 54 Problem Approach 250 Traffic served by the worst-case node Simulation time [s] Figure 3.7: Burstiness of traffic for QbE service in P2P Gnutella v. 0.4 overlay order to serve traffic bursts program buffers have to be implemented which will increase an overall delay. 3.4 Application of QbE in a Structured P2P Overlay The goal of the third experiment was to implement and assess the QbE service in a structured P2P overlay. This Chapter presents details of the implementation of the QbE service in the Content Addressable Network in a simulated environment Routing in CAN Overlay General mechanics of structured P2P overlays was covered in Chapter However, as the implementation of QbE in the CAN overlay utilizes the CAN s routing algorithm this Chapter describes the principles in detail. CAN implements the concept of the Distributed Hash Table (DHT). It is a

63 3.4 Application of QbE in a Structured P2P Overlay 55 system in which a hash table, which is a set of (key, value) pairs, is distributed among a network. A node of a network, knowing the key is able to retrieve the value at any given moment. In case of CAN the key is the hash value of a item stored in the network obtained with use of a hashing algorithm. The value is the underlying network address of the item itself. Addressing space of the CAN network is multidimensional and its dimension is equal to the length of vector obtained from the hashing algorithm. When a new node enters the network it is assigned a zone of responsibility. This zone is a fragment (n-dimensional cuboid, where n is the number of CAN addressing space dimensions) of the addressing space of the network. Usually, an algorithm for searching of the largest existing zone is implemented. The largest existing zone is split between the node that was responsible for it and the new one. Since the new node is assigned a zone of responsibility it is now obliged to store all (key, value) pairs of which key s belong to that zone. When a new item is added to the CAN network its key is calculated using the hashing algorithm. Value is the underlying network address of the node which stores the item (usually IP address). When the key is known, the pair (key, value) can be routed to the node which is responsible for it. As none of the nodes has full knowledge of a current network layout a routing algorithm has to be used. A node calculates a value (called distance, although it does not have to fulfil all of the requirements of distance in a strict mathematical meaning) between the key and the responsibility zones of its neighbours. The (key, value) pair is propagated to the node whose responsibility zone has smallest distance to the pair. In such way the pair is propagated in a general direction of the target node and will reach it after a number of iterations. Similar action is taken when a query is issued. In this case the (key, value) pair is the hash of the query and the underlying network address of the query issuer respectively. As it can be observed from the description a DHT based P2P overlay does not have the properties of the unstructured overlays. Significantly, it is not affected by the effect of the replication of items within the network. On one hand it is a benefit. If an item is stored within a network multiple times, it is still represented by one (key, value) pair in a single node. On the other hand, popularity means also that certain files are more often searched than others. This causes some nodes (responsible for the more popular content) to carry higher load than nodes that store less popular content. In CAN routing is performed in a much more efficient way. The amount of messages and data exchanged in the network in cases of storing and searching for files is much lower than in an unstructured overlay as there is no flooding of the network. Each store and search message is routed to exactly one receiver at each hop. This is at the cost of the complexity of the algorithm (problem of routing in

64 56 Problem Approach n-dimensional space) Implementation of the QbE Service in the CAN Overlay As it can be deduced from the description of the principles of the CAN operation it is very well suited for QbE service. It can be easily adopted in such a way that the dimensionality of the network addressing space matches the dimensionality of the QbE descriptor vectors. Hashing algorithm implemented for the QbE enabled CAN overlay was the MPEG-7 Colour Structure (CS) calculation algorithm. The implementation of the CS algorithm used both for unstructured and structured implementation of the QbE outputs a 256-value vector. This will be used as a key in the CAN overlay, resulting in 265 dimensional CAN addressing space. On one hand the CAN addressing space can have any number of dimensions, however the higher the dimensionality the more complex and computation power consuming is the navigation within the addressing space. On the other hand the MPEG-7 descriptors may vary in length depending on the parameters of the descriptor calculation algorithm. The longer the MPEG-7 vector the more accurate the representation of the image. This causes a trade off between the accuracy of MPEG-7 image representation and the complexity of routing in the network. While this is a promising and interesting research direction, it was decided to be left out of the scope of the presented research. When a node stores an image in the overlay, the (key, value) pair are respectively the image MPEG-7 CS descriptor value accompanied by the thumbnail and the underlying network address of the storing node. This pair is then forwarded to the node that is responsible for the given part of the addressing space using the CAN routing algorithm. The distance used in the calculation of the path is the same distance calculation algorithm that is used for QbE using the CS descriptor. When a node performs a QbE search an MPEG-7 CS descriptor value is calculated and used as a key for the query. For the implementation in the simulation environment real CS descriptor values for the previously described and used database of 100,000 natural photographs were used. The values were precalculated in order to speed up the simulation. A side effect of using the CS descriptor as an addressing space is that a node responsible for a part of an addressing space holds (key, value) pairs for images whose descriptors are close to each other (in CS descriptor metric) thus are visually similar. It is depicted in Figure (3.8) which presents a set of images whose (key, value) pairs are held by a single node. Visual and contextual similarity is noticeable. During the implementation it was observed that descriptor values for the test

65 3.4 Application of QbE in a Structured P2P Overlay 57 Figure 3.8: Samples of images stored by a single CAN node set of 100,000 photographs do not fill the address space with equal density in every point of the space. As a result, when responsibility zones were divided according to their size the distribution of images among nodes was far from equal. In order to distribute images more evenly, when the new node enters the network instead of finding and splitting the largest zone in half, the zone is divided unevenly, in order to achieve similar number of images in each of new zones Simulation Setup Similarly to the previous Chapter, the QbE service in the CAN overlay was simulated in the PeerfactSIM.KOM simulator. A version of the CAN overlay provided by Robert Makyla, as a part of his master thesis [49] was adapted in order to use the MPEG-7 Colour Structure descriptor as a key and its distance calculation algorithm as CAN distance. Same network layer stack and network parameters as in case of simulation of QbE in Gnutella v. 0.4 were used. During the simulation implementation and testing a serious problem was encountered. Due to the complexity of task of calculation of zones, neighbours and routes in the 256 dimensional space the simulation implementation of the CAN overlay was not as scalable in the simulator as of the Gnutella overlay. As a result it was impossible to simulate a network larger than 500 nodes with an available equipment. Using one of the most powerful available PCs the simulation time for this given number of nodes was of several days. As the simulation environment is not supporting cluster calculations, transferring the simulations to a powerful computation cluster was considered to be too compli-

66 58 Problem Approach cated for the scope of the presented work. It is also worth mentioning that in a qualitative meaning the observed results were consistent for networks consisting of 50 to 500 nodes. This problem will not exist for implementation in a real network as all the calculations in P2P overlay are distributed. In order to allow for comparison of the performance of QbE in Gnutella and CAN overlays the experiment for the Gnutella overlay was repeated for the same size of the overlay as in CAN. Both networks are compared in Chapter 3.5. Twenty (20) minutes of network operation were simulated. During the first ten (10) simulated minutes the nodes joined the network and files were stored within the network. This time was also used for a warm-up of the simulator and for the network to reach the stable state. During the next ten (10) simulated minutes 33 queries were issued. This value was set up according to real network measurements based on [9]. No churn was introduced to the network, as it would unnecessary complicate the analysis of the performance of the QbE application. Due to very long time of single simulation execution the confidence intervals were not assessed. Simulation for 100 nodes with different random generation seeds shows very narrow confidence intervals for all measured parameters. It can be assumed that the simulation is characterised by the same level of stability for larger number of nodes Results The most prominent property of the DHT is that the accuracy of search is maximal for each query. Note that when a query is issued it is routed to the node which possesses the most similar items to the query. No random factor is involved. In case a node is responsible for less images than required as a result for the query (10 in case of the implemented QbE mechanism) it may forward the query to its closest neighbours. In the MPEG-7 descriptor addressing space adjacent nodes are responsible for visually similar images. Even in case of churn (disconnection of a node during network operation) CAN utilises mechanisms, that ensure that the whole addressing space is covered by responsibility zones of nodes. The leaving node s zone is divided and served by its neighbours. Due to this property of CAN it can be assumed that always d s = 10. The sophisticated routing mechanism causes the load of the network to be very low, both in total and calculated for a single node over time. The effect of burstiness of the traffic which was depicted for Gnutella overlay in Figure 3.7 still exists, but the momentary upload bandwidth requirement is much lower. The peak amount of sent data is never higher than required to send the query reply with thumbnails. As mentioned in Chapter 3.1 the average size of the thumbnail is 4686±1713 [B]. For CAN the bandwidth required for reply is as of that required for sending ten (10) thumbnails, that is 46860±17130 [B] with small additional

67 3.5 Comparision of QbE in Structured and Unstructured P2P Overlays 59 Table 3.2: CAN simulation results Parameter Value Average accuracy d sf 10 Total amount of sent data 3.1 [MB] Average required upload bandwidth [KBps] application layer signalling overhead. Table 3.2 summarizes the performance of the simulated P2P CAN overlay with the QbE service. While maintaining same accuracy as a centralized QbE solution the traffic generated is kept low Conclusions This section presented results of simulation of the QbE service in the CAN overlay which represents the structured architecture of P2P overlay. Although simulation scalability problems were encountered the QbE service was successfully introduced. The CAN addressing and routing principles are very well suited for implementation of hash based search algorithms such as QbE based on MPEG-7 method. It was observed that the search accuracy of the QbE service in the CAN network equals the accucacy of the centralized solution while maintaining low network overhead. It comes at a cost of complex routing algorithm and high computational costs. The traffic generated by the introduction of the QbE to the network is low. While the network traffic has the property of burstiness its values are low and such traffic can be served by most of the current network interfaces. It was also mentioned, that there is a trade-off between the accuracy of MPEG- 7 representation and the complexity of routing within the network. 3.5 Comparision of QbE in Structured and Unstructured P2P Overlays Results presented in Chapters 3.3 and 3.4 cannot be directly compared. Experiment for the Gnutella v. 0.4 was performed with use of 2,500 nodes, while for CAN overlay only 500 nodes were used due to the simulator scalability limitations. In order to compare results for both overlays simulations with an equal number of nodes (100) for both overlays were performed. Values of all parameters for simulations of both overlays were chosen to follow the measurements of a real P2P system as described in [9] and [15]. The results

68 60 Problem Approach Table 3.3: Comparison of QbE performance in structured and unstructured P2P overlays for the same number of nodes Parameter Value for Gnutella value for CAN Average accuracy d sf Total amount of sent data 25.88[MB] [MB] Average required upload bandwidth 0.431[KBps] [KBps] Maximum momentary upload bandwidth requirement [KBps] [KBps] are presented in Table 3.3. The accuracy achieved in the Gnutella v. 0.4 overlay is always lower than maximum (10), while thanks to the Distributed Hash Table based routing algorithm of CAN its accuracy is always maximal and equal to this achievable in the centralized solution. Gnutella routing mechanism is based on flooding of a network with queries. Also each node receiving a query sends its best matching thumbnails to the querying node. In a CAN overlay only one node receives the query and thus only one replies. This causes huge differences in amount of traffic required by each of the services. While QbE Gnutella requires [KBps] the CAN implementation requires only [KBps] of bandwidth on average. Although both overlays traffic reveals a property of burstiness, the amount of traffic that has to be served in a given moment of time is significantly lower for CAN overlay than for Gnutella. 3.6 Research Conclusions This Chapter presented results of three experiments in which the Query by Example service was implemented in different network architectures. For research purpose a database of 100,000 natural photographs was downloaded from the Flickr image hosting service. In the first experiment 12 different QbE methods were implemented in a centralized QbE service. An experiment involving end users was conducted. 2,500 individual answers were gathered. Analysis of the results allowed to conclude, that the most suitable and yet yielding satisfying results MPEG- 7 descriptor is Colour Structure (CS). This MPEG-7 descriptor was used in following experiments. In the second experiment the CS MPEG-7 descriptor was implemented in an unstructured P2P overlay, being Gnutella v The implementation was benchmarked in terms of accuracy and bandwidth required by the service. It was

69 3.6 Research Conclusions 61 observed that while successful implementation of the QbE service in this overlay architecture is possible, the accuracy will be always lower than in the centralized solution at the cost of significant bandwidth requirements. It was also observed that the key parameter, that bonds the accuracy and bandwidth is the TTL, which has to be set accordingly to the network size. In the third experiment the CS MPEG-7 descriptor was implemented in a structured P2P overlay represented by the CAN network. Same benchmarks were used as in case of Gnutella v It was observed, that the concept of the Distributed Hash Table which is implemented in the CAN overlay is well suited for using MPEG-7 descriptors. Simulations have shown that the accuracy of such a solution is identical with the centralized one while maintaining minimal computation overhead. As due to the simulator limitation it was impossible to simulate the CAN overlay for 2,500 nodes another experiment was conducted. The number of nodes for the Gnutella overlay was limited to match the network size for the CAN overlay and the two solutions were compared. While the Gnutella overlay achieves high accuracy, it is outclassed by the CAN overlay which is as accurate as the centralized solution thanks to its routing algorithm. Also the bandwidth required for the service is significantly lower for the CAN overlay.

70 62 Problem Approach

71 Chapter 4 Summary The main result of the presented work are two implementations of the Query by Example service in two different Peer-to-Peer network architectures. The QbE service was simulated in the unstructured P2P overlay based on the Gnutella v. 0.4 architecture and in the structured P2P overlay based on the Content Addressable Network (CAN) architecture. The performance of both solutions was compared in terms of search accuracy, total volume of sent data and upload bandwidth required for the service. The thesis of the dissertation has been stated as follows: It is possible to use a Query by Example mechanism based on low-level description of the media for search of images in the structured and unstructured P2P overlays at accuracy comparable to the centralized solutions at a cost of higher bandwidth utilization or complex routing algorithms. Three major research problems were identified. 1. It should be investigated which metadata frameworks and QbE techniques are available and which are suitable to be used in P2P overlays. Through analysis of the available publications and after conducting an experiment in which 12 QbE methods were compared the Colour Structure descriptor of the MPEG-7 standard was chosen as the QbE technique to be used in the P2P overlays. 2. It should be investigated which P2P overlays can be foreseen to be best candidates for introduction of such service. The QbE service was successfully introduced in both unstructured and structured overlays. 3. It should be decided how to compare the performance of the QbE search method both in centralised and P2P environments. Implementation of QbE service in P2P overlays was assessed in terms of search

72 64 Summary accuracy, total volume of sent data and upload bandwidth required for the service. In the scope of the presented dissertation it has been proven that it is possible to introduce the QbE based on low-level description of the media (MPEG-7 descriptors) service into the P2P overlays. The proposed solutions, especially for the CAN architecture, were as accurate as the centralized solution at a cost of a minimal bandwidth requirement. All the benefits, which make P2P attractive over centralized solutions were maintained. Especially the distributed and social character of the service makes it attractive to the end users allowing the introduction of such service without substantial hardware investments. The QbE search method allows for much more convenient, natural and advanced search for images when compared to text based search. The original results presented in the dissertation and published are: Twelve (12) QbE methods were compared in terms of user satisfaction in a complex subjective evaluation. It was proven, that the most suitable QbE method to be used in the subsequent experiments is the Colour Structure descriptor, which is a part of the MPEG-7 standard. The results of the experiment were published in [20]. The QbE service was implemented and benchmarked in a simulation environment in the Gnutella v. 0.4 overlay. It was concluded that the QbE service for images may be implemented in an unstructured P2P overlay at a cost of significant requirement on the bandwidth consumed by the service. It was also observed that the performance of the network strongly depends on the value of the Time To Live parameter. The results of the experiment were published in [19]. The QbE service was implemented and benchmarked in a simulation environment in the CAN overlay. It was shown that the CAN overlay is very well suited for such a service. The benefits of the Distributed Hash Table algorithm can be utilised in order to achieve search accuracy equal to the search accuracy of the centralised solution. This benefit comes at a cost of a complex routing algorithm. Both simulation implementations were compared. The comparison allowed to conclude that while it is possible to implement the QbE service in both P2P environments the implementation in the CAN overlay is more effective in terms of achieved search accuracy and consumption of resources. It has to be stressed that the presented simulation implementation of the QbE has practical applicability. The simulation implementation was written in the

73 Java programming language. Thanks to the modular architecture of the simulator the developed implementation code can be directly used in the creation of the QbE P2P service. Such an application, besides being an interesting commercial endeavour, would allow for more accurate assessment of the service, especially in terms of search time measurements. It is considered by the author as one of the possible future research directions. It is also considered to focus in further work on the QbE implementation in the CAN architecture, as the most promising of tested overlays. It is planned to test the influence of the size of the MPEG-7 descriptor (and the resulting dimensionality of the CAN overlay) on the search accuracy and the network performance. 65

74 66 Summary

75 Bibliography [1] NIST/SEMATECH e-handbook of Statistical Methods. [2] Cisco Visual Networking Index: Forecast and Methodology, , [3] Aczel, A. and Sounderpandian, J. Complete Business Statistics. The Mcgraw-hill/Irwin Series. McGraw-Hill College, [4] Agarwal, S. and Roth, D. Learning a sparse representation for object detection. In Proceedings of the European Conference on Computer Vision, vol. 4, Springer-Verlag, Copenhagen, Denmark, [5] Agosti, M., Buccio, E. D., Nunzio, G. M. D., Ferro, N., Melucci, M., Miotto, R., and Orio, N. Distributed Information Retrieval and Automatic Identification of Music Works in SAPIR. In M. Ceci, D. Malerba, and L. Tanca, editors, SEBD, [6] Arasu, A., Ganti, V., and Kaushik, R. Efficient exact set-similarity joins. In Proceedings of VLDB [7] Baumgart, I., Heep, B., and Krause, S. OverSim: A scalable and flexible overlay framework for simulation and real network applications. In 9th International Conference on Peer-to-Peer Computing (IEEE P2P 09), [8] Bulterman, D. C. A. Is It Time for a Moratorium on Metadata? IEEE MultiMedia, 11(4):10 17, [9] Chu, J., Labonte, K., and Levine, B. N. Availability and Popularity Measurements of Peer-to-Peer File Systems. Technical Report 04 36, Deptartment of Computer Science, University of Massachusetts, [10] Clip2. The Gnutella Protocol Specification v0.4.

76 68 BIBLIOGRAPHY [11] FIPS. Secure Hash Standard. Federal Information Processing Standards Publication, [12] Garcia, P., Pairot, C., Mondejar, R., Pujol, J., Tejedor, H., and Rallo, R. PlanetSim: A New Overlay Network Simulation Framework [13] Garner, R. First Page Or Bust: 95% of Non-Branded Natural Clicks Come From Page One. Search Insider, [14] Geusebroek, J. M., Burghouts, G. J., and Smeulders, A. W. M. The Amsterdam Library of Object Images. Int. J. Comput. Vision, 61(1): , mark/pub/2005/geusebroekijcv05a.pdf. [15] Gish, A. S., Shavitt, Y., and Tankel, T. Geographical statistics and characteristics of p2p query strings. In In The 6th International Workshop on Peer-to-Peer Systems IPTPS [16] Grega, M. Trust Management in Ad-Hoc Networks. Master s thesis, AGH University of Science and Technology, Krakow, Poland, [17] Grega, M. Implementation and Application of MPEG-7 Descriptors in Peerto-Peer Networks for Serach Quality Improvement - Introduction to Research. In CONTENT PhD Student Workshop. Madrid, Spain, [18] Grega, M. Advanced Multimedia Search in P2P Overlays. In INFOCOM Workshops IEEE, [19] Grega, M. Wyszukiwanie przez podanie przykładu w sieci nakładkowej protokołu Gnutella. In Krajowa Konferencja Radiokomunikacji Radiofonii i Telewizji [20] Grega, M., Fraczek, R., Liebau, N., Luedtke, A., Janowski, L., and Papir, Z. Ground-Truth-Less Comparison of Selected Content-Based Image Retrieval Measures. User Centric Media 2009 conference, [21] Grega, M., Janowski, L., Leszczuk, M., Romaniak, P., and Papir, Z. Quality of experience evaluation for multimedia services. In Krajowa Konferencja Radiokomunikacji, Radiofonii i Telewizji, [22] Grega, M., Kluska, B., Leszczuk, M., and Papir, Z. Content-based Search for Peer-to-Peer Overlays. In CONTENT PhD Student Workshop, MEDHOCNET. Corfu, Greece, 2007.

77 BIBLIOGRAPHY 69 [23] Grega, M., Leszczuk, M., Romaniak, P., et al. Future and Challenges in European Research - User Centric Media. User Centric Media Cluster Expert Publication for European Commision, [24] Grega, M., Leszczuk, M., Romaniak, P., et al. IBIS Indexing Based Extension of Peer-to-Peer Networks for Search Quality Improvement. FP 7 Small or medium-scale focused research project (STREP) pro-posal, [25] Grega, M., Leszczuk, M., Yelmo, I., Cuevas-Rumin, R., Fiorese, A., and Tang, S. Benchmarking of Media Search based on Peeer-to-Peer Overlay Networks. CHORUS P2P 1pp4mm workshop, INFOSCALE, [26] Grega, M., Szott, S., and Pacyna, P. Collaborative networking with trust and misbehavior - a file sharing case. In OPCOM. Berlin, Germany, [27] Grubinger, M., Clough, P., Muller, H., and Deselears, T. The IAPR TC-12 Benchmark A New Evaluation Resource for Visual Information Systems. In In the Proceedings of the International Workshop OntoImage 2006 Language Resources for Content-Based Image Retrieval, [28] Grubinger, M., Leung, C., and Clough, P. The IAPR Benchmark for Assessing Retrieval Performance in Cross Language Evaluation Tasks. In Proceedings of the MUSCLE ImageCLEF Workshop on Image and Video Retrieval Evaluation, Viena, Austria. [29] Hermes, T., Miene, A., and Herzog, O. Graphical Search for Images by PictureFinder. Multimedia Tools and Applications. Special Issue on Multimedia Retrieval Algorithmics., [30] Hillmann, D. Using Dublin Core. DCMI Recommended Resource, [31] International Telecommunication Union. Recommendation ITU- T P.800, Methods for subjective determination of transmission quality, Geneva, Switzerland. [32] International Telecommunication Union. Recommendation G.107, The E-model, a computational model for use in transmission planning, Geneva, Switzerland. [33] International Telecommunication Union. ITU Telecommunication Standardization Sector Temporary Document XX-E WP 2/12, Information about a new method for deriving the transmission rating factor r from MOS in closed form, Geneva, Switzerland.

78 70 BIBLIOGRAPHY [34] ISO/IEC. Information technology Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s. ISO/IEC 11172, [35] ISO/IEC. Information technology Generic coding of moving pictures and associated audio information. ISO/IEC 13818, [36] ISO/IEC. Information technology Coding of audio-visual objects. ISO/IEC 14496, [37] ISO/IEC. Information technology Multimedia content description interface. ISO/IEC 15938, [38] ISO/IEC. nformation technology Multimedia framework. ISO/IEC 21000, [39] Jelasity, M., Montresor, A., Jesi, G. P., and Voulgaris, S. The Peersim Simulator. [40] Joseph, S. and Hoshiai, T. Decentralized Meta-Data Strategies: Effective Peer-to-Peer Search. IEICE Transactions on Communications, E86- B(6): , [41] Kovacevic, A., Kaune, S., Liebau, N., Steinmetz, R., and Mukherjee, P. Benchmarking Platform for Peer-to-Peer Systems (Benchmarking Plattform für Peer-to-Peer Systeme). it - Information Technology, 49(5): , [42] Leibe, B., Leonardis, A., and Schiele, B. Combined object categorization and segmentation with an implicit shape model. In Proceedings of the Workshop on Statistical Learning in Computer Vision. Prague, Czech Republic, [43] Leung, C. H. C. and Ip, H. H.-S. Benchmarking for Content-Based Visual Information Search. In Proceedings of the 4th International Conference on Advances in Visual Information Systems, Springer-Verlag, [44] Levenshtein, V. I. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10: , [45] Li, J., Stribling, J., Morris, R., Kaashoek, M. F., and Gil, T. M. A performance vs. cost framework for evaluating DHT design tradeoffs under churn. In Proc. of the 24th Infocom

79 BIBLIOGRAPHY 71 [46] Li, J. and Zhang, G. The state of the art in content-based image retrieval in P2P networks. In Proceedings of the Second International Conference on Internet Multimedia Computing and Service, ICIMCS 10, ACM, New York, NY, USA, [47] Limewire Foundation. The Gnutella 0.6 protocol specification, [48] Mademlis, A., Daras, P., Tzovaras, D., and Strintzis, M. G. 3D volume watermarking using 3D krawtchouk moments. In VISAPP (1), [49] Makyla, R. Indexing and Content-Based Addressing of Objects in P2P Networks. Master of Science Thesis, supervisor: Mikołaj Leszczuk, [50] Manjunath, B. S., Salembier, P., and Sikora, T. Introduction to MPEG-7: Multimedia Content Description Interface. John Wiley and Sons Ltd., [51] Montgomery, D. and Runger, G. Applied Statistics and Probability for Engineers. John Wiley & Sons, [52] Müller, W., Boykin, P. O., Sarshar, N., and Roychowdhury, V. P. Comparison of Image Similarity Queries in P2P Systems. In Peerto-Peer Computing, [53] Nack, F. and Lindsay, A. T. Everything you wanted to know about MPEG-7, part 1. IEEE Multimedia, 6(3):65 77, [54] Nack, F. and Lindsay, A. T. Everything you wanted to know about MPEG-7, part 2. IEEE Multimedia, 6(4):64 73, [55] Napster. Napster Messages. [56] National Information Standards Organization. ANSI/NISO Z The Dublin Core Metadata Element Set, [57] Novak, D. and Zezula, P. M-Chord: a scalable distributed similarity search structure. In Proceedings of the 1st international conference on Scalable information systems, InfoScale 06. ACM, New York, NY, USA, [58] Over, P., Leung, C., Ip, H., and Grubinger, M. Multimedia Retrieval Benchmarks. IEEE MultiMedia, 11(2):80 84, 2004.

80 72 BIBLIOGRAPHY [59] Parker, A. Addressing the cost and performance challenges of digital media content delivery. In P2P Media Summit. Santa Monica, CA, [60] Ratnasamy, S., Francis, P., Handley, M., Karp, R., and Schenker, S. A scalable content-addressable network. In SIGCOMM 01: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, ACM, New York, NY, USA, [61] Sandvine Incorporated Analysis of Traffic Demographics in North-American Broadband Networks. Whitepaper, [62] Shudo, K., Tanaka, Y., and Sekiguchi, S. Overlay Weaver: An overlay construction toolkit. Computer Communications, 31(2): , [63] Smeaton, A. F., Over, P., and Kraaij, W. Evaluation campaigns and TRECVid. In MIR 06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, ACM Press, New York, NY, USA, [64] Smith, J. R. Image Retrieval Evaluation. In CBAIVL 98: Proceedings of the IEEE Workshop on Content - Based Access of Image and Video Libraries, 112. IEEE Computer Society, Washington, DC, USA, [65] Steinmetz, R. and Wehrle, K. Peer-to-Peer Systems and Applications. Springer-Verlag Berlin Heidelberg, [66] Stoica, I., R.Morris, D.Karger, F.Kaashoek, and Balakrishnan, H. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. In Proceedings of the 2001 ACM Sigcomm Conference, ACM Press, [67] Stutzbach, D. and Rejaie, R. Understanding churn in peer-to-peer networks. In IMC 06: Proceedings of the 6th ACM SIGCOMM on Internet measurement, ACM Press, New York, NY, USA, [68] Task Force on Metadata. Summary Report. Tech. rep., Committee on Cataloging: Description & Access, [69] Webster. Merriam-Webster Online Dictionary. Merriam-Webster, Incorporated, [70] Weibel, S., Kunze, J., Lagoze, C., and Wolf, M. Dublin Core Metadata for Resource Discovery. RFC2413, 1998.

81 BIBLIOGRAPHY 73 [71] Yang, Y., Dunlap, R., Rexroad, M., and Cooper, B. F. Performance of Full Text Search in Structured and Unstructured Peer-to-Peer Systems. In IEEE INFOCOM. IEEE Press, 2006.