LITERATURE REVIEW. Chapter 2

Similar documents

Multicast vs. P2P for content distribution

How To Create A P2P Network

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390

Adapting Distributed Hash Tables for Mobile Ad Hoc Networks

RESEARCH ISSUES IN PEER-TO-PEER DATA MANAGEMENT

Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam

CHAPTER 7 SUMMARY AND CONCLUSION

How To Provide Qos Based Routing In The Internet

8 Conclusion and Future Work

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

PEER TO PEER FILE SHARING USING NETWORK CODING

A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM

An Introduction to Peer-to-Peer Networks

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

Definition. A Historical Example

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Unit 3 - Advanced Internet Architectures

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

PEER-TO-PEER NETWORK

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

Relational Databases in the Cloud

Peer-to-Peer Systems: "A Shared Social Network"

Cisco Application Networking for Citrix Presentation Server

Software Life-Cycle Management

Anonymous Communication in Peer-to-Peer Networks for Providing more Privacy and Security

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

High Performance Cluster Support for NLB on Window

P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung

Content Distribution over IP: Developments and Challenges

CHAPTER 6. VOICE COMMUNICATION OVER HYBRID MANETs

Peer-to-peer Cooperative Backup System

HPAM: Hybrid Protocol for Application Level Multicast. Yeo Chai Kiat

CHAPTER 1 INTRODUCTION

Monitoring within an Autonomic Network: A. Framework

TOPOLOGIES NETWORK SECURITY SERVICES

Architectures and protocols in Peer-to-Peer networks

Giving life to today s media distribution services

2. Research and Development on the Autonomic Operation. Control Infrastructure Technologies in the Cloud Computing Environment

Introduction. The Inherent Unpredictability of IP Networks # $# #

CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS

System Models for Distributed and Cloud Computing

White Paper. Requirements of Network Virtualization

Sync Security and Privacy Brief

Denial of Service Attacks and Resilient Overlay Networks

A Topology-Aware Relay Lookup Scheme for P2P VoIP System

Global Server Load Balancing

SiteCelerate white paper

Implementation of P2P Reputation Management Using Distributed Identities and Decentralized Recommendation Chains

Bit Chat: A Peer-to-Peer Instant Messenger

How To Write A Transport Layer Protocol For Wireless Networks

DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing

Internet Anonymity and the Design Process - A Practical Approach

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES

Decentralized Peer-to-Peer Network Architecture: Gnutella and Freenet

PROPOSAL AND EVALUATION OF A COOPERATIVE MECHANISM FOR HYBRID P2P FILE-SHARING NETWORKS

ITL BULLETIN FOR JANUARY 2011

Client/server and peer-to-peer models: basic concepts

Data Center Network Topologies: FatTree

Exploiting peer group concept for adaptive and highly available services

Middleware and Distributed Systems. System Models. Dr. Martin v. Löwis. Freitag, 14. Oktober 11

Testing Network Virtualization For Data Center and Cloud VERYX TECHNOLOGIES

Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems

Information Searching Methods In P2P file-sharing systems

Building a Highly Available and Scalable Web Farm

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

Module 15: Network Structures

Chapter 14: Distributed Operating Systems

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

Advanced Peer to Peer Discovery and Interaction Framework

Scalable Data Collection for Internet-based Digital Government Applications

A distributed system is defined as

Chapter 16: Distributed Operating Systems

Scalable Source Routing

A Survey Study on Monitoring Service for Grid

Introduction to LAN/WAN. Network Layer

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

PSON: A Scalable Peer-to-Peer File Sharing System Supporting Complex Queries

A Link Load Balancing Solution for Multi-Homed Networks

Denial of Service Resilience in Peer to Peer. D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica, W. Zwaenepoel Presented by: Ahmet Canik

Classic Grid Architecture

Secure cloud access system using JAR ABSTRACT:

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Local-Area Network -LAN

Operating System Concepts. Operating System 資訊工程學系袁賢銘老師

A SWOT ANALYSIS ON CISCO HIGH AVAILABILITY VIRTUALIZATION CLUSTERS DISASTER RECOVERY PLAN

Security in Structured P2P Systems

Wireless Sensor Networks Chapter 14: Security in WSNs

A Measurement of NAT & Firewall Characteristics in Peer to Peer Systems

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

IPTV AND VOD NETWORK ARCHITECTURES. Diogo Miguel Mateus Farinha

Computer Networking Networks

Trace Driven Analysis of the Long Term Evolution of Gnutella Peer-to-Peer Traffic

CDN and Traffic-structure

A very short history of networking

PEER-TO-PEER (P2P) systems have emerged as an appealing

Collaborative & Integrated Network & Systems Management: Management Using Grid Technologies

Load Balancing in Distributed Data Base and Distributed Computing System

Transcription:

Chapter 2 LITERATURE REVIEW In the recent years, the evolution of a new wave of innovative network architecture labeled Peer-to-Peer (P2P) has been witnessed [49]. P2P networks are networks in which all peers cooperate with each other to perform a critical function in a decentralized manner. All peers are both users and providers of resources and can access each other directly without intermediary agents. Compared with a centralized system, a P2P system provides an easy way to aggregate large amounts of resource residing on the edge of Internet or in ad-hoc networks with a low cost of system maintenance. P2P systems attract increasing attention from researchers. Such architecture and systems are characterized by direct access between peer computers, rather then through a centralized server. File sharing is the dominant P2P application on the Internet, allowing users to easily contribute, search and obtain content. Rest of the chapter is organized as follows. P2P (P2P) Networks are explored in Section 2.1. Types of P2P Networks are presented in Section 2.2. File sharing systems are given Section 2.3. Section 2.4 discusses overlay networks. Section 2.5 presents Overlay P2P networks. Section 2.6 discusses the re-review of Limitations of P2P Systems. Section 2.7 presents review of Few Algorithms. Various middleware approaches are given Section 2.8. Some Middlewares are given in Section 2.9. Section 2.10 explores on MAs and their application domain. Analysis of review work is presented in Section 2.11. Finally chapter is summarized in Section 2.12. 2.1 Peer-to-Peer (P2P) Networks A P2P system is defined as any distributed network architecture composed of participants that make a portion of their resources, such as processing power, disk storage or network bandwidth are directly available to other network participants, 16

without the need for central coordination instances such as servers or stable hosts (see Figure 2.1). In other words P2P is a specific form of relational dynamic, based on the assumed equipotency of its participants, organized through the free cooperation of equals in view of the performance of a common task, for the creation of a common good, with forms of decision-making and autonomy that are widely distributed throughout the network [15]. Figure 2.1. The Basic Architecture of P2P Network In P2P networks all peers cooperate with each other to perform a critical function in a decentralized manner. All peers are both users and providers of resources and can access each other directly without intermediary agents. Compared with a centralized system, a P2P system provides an easy way to aggregate large amounts of resource residing on the edge of Internet or in ad-hoc networks with a low cost of system maintenance. P2P systems attract increasing attention from researchers recently. Such architecture and systems are characterized by direct access between peer systems, rather then through a centralized server. More simply, a P2P network links the resources of all the nodes on a network and allows the resources to be shared in a manner that eliminates the need for a central host. In P2P systems, nodes or peers of equal roles and responsibilities, often with various capabilities, exchange information or share resources directly with each other. P2P systems can function without any central administration and coordination instance. A P2P network differs from conventional client/server or multitiered server's networks. Peers are both suppliers and consumers of resources, in contrast to the traditional client/server model where only servers supply and clients consume (see Figure 2.2). 17

Figure 2.2. The Basic Client/Server Architecture 2.2 Types of P2P Networks P2P is a paradigm for sharing of computing resources/services such as data files, cache storage, and disk space or processing cycles. In comparison with the conventional client/server model, P2P systems are characterized by symmetric roles among the peers, where every node in the network acts alike and the processing and communication are widely distributed among the peers. Unlike the conventional centralized systems, P2P systems offer scalability [4] and fault-tolerance [4, 8, 9]. It is a feasible approach to implement global-scale systems such as the Grid [11, 14]. An important achievement of P2P networks is that all clients provide resources, including bandwidth, storage space, and computing power. Thus, as nodes arrive and demand on the system increases, the total capacity of the system also increases. This is not true for client/server architecture with a fixed set of servers, in which adding more clients could mean slower data transfer for all users. The distributed nature of P2P networks also increases robustness in case of failures by replicating data over multiple peers, and in pure P2P systems by enabling peers to find the data without relying on a centralized index server [28]. In the latter case, there is no single point of failure in the system. A growing application of P2P technology is the harnessing the dormant processing power in desktop PCs. Companies can use the processing capabilities of many smaller, less powerful computers replace large and expensive supercomputers 18

[8, 13]. These features can complete the large computing tasks using the processing of existing in-house computers or by accessing computers through the Internet. 2.3.1 Structured P2P (P2P) Networks Structured P2P network employ a globally consistent protocol to ensure that any node can efficiently route a search to some peer that has the desired file, even if the file is extremely rare (see Figure 2.3). Such a guarantee necessitates a more structured pattern of overlay links. By far the most common type of structured P2P network is the distributed hash table (DHT) [40], in which a variant of consistent hashing is used to assign ownership of each file to a particular peer, in a way analogous to a traditional hash table's assignment of each key to a particular array slot. Figure 2.3. Distributed Hash Table (DHT) DHTs are a class of decentralized distributed systems that provide a lookup service similar to a hash table: (key, value) pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given key. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows DHTs to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures. 19

DHTs form an infrastructure that can be used to build P2P networks. Notable distributed networks that use DHTs include BitTorrent s distributed tracker, the Kad network, the Storm botnet, YaCy, and the Coral Content Distribution Network. DHT-based networks have been widely utilized for accomplishing efficient resource discovery for grid computing systems, as it aids in resource management and scheduling of applications. Resource discovery activity involves searching for the appropriate resource types that match the user s application requirements. Recent advances in the domain of decentralized resource discovery have been based on extending the existing DHTs with the capability of multi-dimensional data organization and query routing. 2.3.2 Unstructured Peer to Peer Networks An unstructured P2P network is formed when the overlay links are established arbitrarily. Such networks can be easily constructed as a new peer that wants to join the network can copy existing links of another node and then form its own links over time. In an unstructured P2P network, if a peer wants to find a desired piece of data in the network, the query has to be flooded through the network to find as many peers as possible that share the data (see Figure 2.4). The main disadvantage with such networks is that the queries may not always be resolved. Popular content is likely to be available at several peers and any peer searching for it is likely to find the same thing. But if a peer is looking for rare data shared by only a few other peers, then it is highly unlikely that search will be successful. Since there is no correlation between a peer and the content managed by it, there is no guarantee that flooding will find a peer that has the desired data. Flooding also causes a high amount of signaling traffic in the network and hence such networks typically have very poor search efficiency. Many of the popular P2P networks are unstructured. In pure P2P networks, peers act as equals, merging the roles of clients and server. In such networks, there is no central server managing the network, neither is there a central router. Some examples of pure P2P Application Layer networks designed for file sharing are Gnutella and Freenet [41]. 20

There also exist hybrid P2P systems, which distribute their clients into two groups: client nodes and overlay nodes. Typically, each client is able to act according to the momentary need of the network and can become part of the respective overlay network used to coordinate the P2P structure. This division between normal and better nodes is done in order to address the scaling problems on early pure P2P networks. Examples for such networks are for example Gnutella (version 2.2). Figure 2.4. Hybrid P2P based system which uses the Centralized Directory Model for information retrieval. Another type of hybrid P2P network is a network using on the one hand central server(s) or bootstrapping mechanisms, on the other hand P2P for their data transfers. These networks are in general called centralized networks because of their lack of ability to work without their central server(s). An example for such a network is the edonkey network (ed2k) [42]. 2.3 File Sharing System File sharing is the dominant P2P application on the Internet, allowing users to easily contribute, search and obtain content. And it was popularized by file sharing systems like Napster [26]. P2P file sharing networks have inspired new structures and 21

philosophies in other areas of human interaction. In such social contexts, P2P as a meme refers to the egalitarian social networking that is currently emerging throughout society, in general enabled by the Internet technologies. The P2P file sharing architecture can be classified according to what extent they rely to one or more servers to facilitate the interaction between peers. P2P systems are categorized into centralized, decentralized structured, decentralized unstructured, shown in Figure 2.5. Peer-to-Peer systems Centralized (e.g., Napster) Decentralized Structured (e.g., Chord, CAN) Unstructured (e.g., Gnutella, Freenet) Figure 2.5. Classification of P2P System networks Centralized - In these types of systems, there is a central control over the peers. There is server which carries the information regarding the peers, data files and other resources. If any peer wants to communicate or wants to use the resources of other peer have to send the request to the server. Server then searches the location of the peer node/resource through its database/index. After getting the information, peer directly communicates with the desired peer. This system is very similar to the client/server model, viz., Napster which is very popular for sharing the music files. The security measure can be implemented due to the central server. At the time of request sending the authorization and authentication of the peer can be checked. It is easy to locate and search an object/peer node due to central server. These systems are easy to implement as the structure is similar to client-server model, i.e., complexity is low. These types of systems are not scalable due to limitation of computational capability, bandwidth etc. These systems have poor fault tolerance due to unavailability of replication of objects and load balancing. These types of systems are 22

not reliable due to single point failure, malicious attack and network congestions near the server. These types of systems are least secure. The overhead on the performance of the system is high. Distributed Databases may be used in these types of systems. In centralized P2P systems the resource discovery is done using the central server which kept all the information regarding resource e.g., Napster [26]. Multiple servers [43] are also proposed to enhance performance in centralized systems. Decentralized Structured: Decentralized structured P2P networks (e.g., Chord[43], CAN[45, 46], Tapestry[44, 47], Pastry[44] and TRIAD[114]) use a logical structure to organize the peer nodes of the network. Decentralized structured P2P networks uses a distributed hash table like mechanism, to lookup files. These structured P2P networks are efficient in locating the object quickly due to the logical structure (search space is reduced exponentially). As decentralized structured networks impose a tight control on the overlay topology, hence they are not robust to peer nodes dynamics. Easy to locate and search an object/peer node, due to logical structure. The traffic of messages in these types of networks is reduced. These systems are scalable, due to dynamic routing protocols. They have good performance. Performance of the system is least effected due to scalability. This type of system is reliable in nature, support failure node detection and replication of objects. This type of system has tight control over the overlay topology hence they are not robust to peer dynamics. Performance of these types of systems greatly effected, if churn rate is high. Database searching is comparatively complex with centralized systems. This type of system is not suitable for the Ad-Hoc peer nodes, as performance of the system is affected due to high churn rate of the nodes. A location dependent application is proposed in [48]. Decentralized Unstructured: These types of systems are actual peer to peer systems, i.e., which are more close to the definition of P2P systems [49, 54]. In this type of system there are not any central control, all peer may act as server (which provides the service) as well as client (which take the service). Peer wants to communicate 23

with other peer, have to broadcast/flooded the request to all the connected peers for searching the peer node/data object as there is not any central index. Only the peers having the data are responded and send the data object through the reverse path to the requesting peer node. The flooding or broadcasting of requests creates the unnecessary traffic on the network, which is main drawback of the system. A lot of work is going on to reduce the traffic of the network. Various techniques are also proposed, i.e., forwarding based, cached based and overlay optimization [28], etc. These types of systems are not having the tight control over the overlay topology, so they support peer dynamics. The performance is not much affected due to high churn rate. These systems are distributed in nature, so there is no single point failure. The scalability is poor due to overhead of traffic to discover the object/peer nodes, as system grows after a limit its performance goes on decreasing. It is very costly to search a resource in unstructured system. Flooding is used to search a resource. To enhance the search, Random Walk is proposed in [50], Location aware topology matching is proposed in [51]. For providing the fault tolerance, a Self maintenance and self repairing technique are used [52]. For providing security to information, these systems use PKI [53] for information sharing. Alliatrust, a reputation management scheme is used [55] which deals with threats, e.g., Free riders, polluted contents etc. To cop up with Query loss and system overloading a congestion aware search protocol may used [52]. This includes congestion aware forwarding (CAF), random early stop (RES), Emergency signaling (ES). Location dependent queries, using the Voronoi Diagram are used [48]. 2.4 Overlay Networks An overlay network is a computer network (see Figure 2.6), which is built on top of another network. Nodes in the overlay can be thought of as being connected by virtual or logical links, each of which corresponds to a path, perhaps through many physical links, in the underlying network. For example, distributed systems such as cloud computing, P2P networks, and client/server applications are overlay 24

networks because their nodes run on top of the Internet. Internet was built as an overlay upon the telephone network. Nowadays the Internet is the basis for more overlaid network than can be constructed in order to permit routing of messages to destinations not specified by an IP address. For example, distributed hash tables can be used to route messages to a node having a specific logical address, whose IP address is not known in advance. Figure 2.6. The Typical Overlay Network Overlay networks have also been proposed as a way to improve Internet routing, such as through quality of service (QoS) guarantees to achieve higherquality streaming media. Previous proposals such as IntServ, DiffServ, and IP Multicast have not seen wide acceptance largely because they require modification of all routers in the network. On the other hand, an overlay network may be incrementally deployed on end-hosts running the overlay protocol software, without cooperation from ISPs. The overlay has no control over how packets are routed in the underlying network between two overlay nodes, but it controls the sequence of overlay nodes a message traverses before reaching its destination. 25

2.5 Overlay P2P networks A P2P overlay network logically connects peers on the top of IP. Two main classes of such overlays dominate, structured and unstructured. The differences relate to the choice of the neighbors in the overlay, and the presence of an underlying naming structure. Overlay networks represent the main approach to build large-scale distributed systems that we retained. An overlay network forms a logical structure connecting participating entities on top of the physical network, be it IP or a wireless network. Such an overlay might form a structured overlay network following a specific topology or an unstructured network where participating entities are connected in a random or pseudo-random fashion. In between, lay weakly structured P2P overlays where nodes are linked depending on a proximity measure providing more flexibility than structured overlays and better performance than fully unstructured ones. Proximity-aware overlays connect participating entities so that they are connected to close neighbors according to a given proximity metric reflecting some degree of affinity (computation, interest, etc.) between peers. Researcher need to use this approach to provide algorithmic foundations of large-scale dynamic systems. The P2P, main design principle of being completely decentralized and selforganized, the P2P concept makes the way for new type of applications such as fileswapping applications and collaboration tools over the Internet that has recently attracted tremendous user interest. Using software like KaZaA [56], Gnutella [53, 54] or the now- obsolete Napster [53, 57], users access files on other peer nodes and download these files to their computer. These file-swapping communities are commonly used for sharing media files, often MP3 music files. KaZaA, Gnutella, Audiogalaxy and imesh based networks allowed users to continue to share music files at a rate similar to Napster at its peak. This Internet climate began to shift back to P2P with the development, popularity, and attention given to Napster. Another application domain of P2P is the sharing and aggregation of large-scale geographically distributed processing and storage capacities of idle computers around the globe to form a virtual super-computer as the SETI@Home project [47, 58] did. 26

P2P networking is an important enabling technology for the realization of selfmanaged and autonomous systems, where each node manages its own activities by itself, thus ensuring a consistent state of the system. The technology also allows for peripheral sharing, in which one peer can access scanners, printers, microphones and other devices that are connected to another peer. 2.6 The Limitations of P2P System The Internet started out as a fully symmetric, P2P network of cooperating users. It has grown to accommodate the millions of people flocking online, technologies have been put in place that have split the network into a system with relatively few servers and many clients. These phenomena pose challenges and obstacles to P2P applications: both the network and the applications have to be designed together to work in cycle. Application authors must design robust applications that can function in the complex Internet environment, and network designers must build in capabilities to handle new P2P applications. Fortunately, many of these issues are familiar from the experience of the early Internet; the researcher must learn lesions and follow up in the new system design. P2P systems are usually large-scale dynamic systems whose nodes are distributed on a wide geographic area. However, owing to the properties of their nodes which can join and leave continually, P2P systems are dynamic systems with high rate of roil and unpredictable topology. A direct consequence is that resources or nodes are restricted to temporary availability only. A network element can disappear at a given time from the network and reappear at another locality of the network with an unpredictable pattern. Under these circumstances, one of the most challenging problems of P2Ps is to manage the dynamic and distributed network so that their requesters when needed can always successfully locate resources. In order to enable resource awareness in such a large-scale dynamic distributed environment, a specific resource management strategy is required which takes into account the P2P characteristics. Within the scope of this research, a suitable solution for resource management in P2P systems must fulfill the following requirements: 27

Fault Tolerance [8]: The term fault tolerance means that a system can provide its services even in the presence of faults that are caused either by internal system errors or occur due to some influence of its environment. Thus, Scalability and reliability are defined in traditional distributed system terms, such as the bandwidth usage how many systems can be reached from one node, how many systems can be supported, how many users can be supported, and how much storage can be used. Reliability is related to systems and network failure, disconnection, availability of resources, etc. With the lack of strong central authority for autonomous peers, improving system scalability and reliability is an important goal. As a result, algorithmic innovation in the area of resource discovery and search has been a clear area of research, resulting in new algorithms for existing systems, and the development of new P2P platform. P2P systems are used in situations when a system has to function properly without any kind of centralized monitoring or management facility. Because of the dynamic behavior of P2P nodes, an appropriate resource management strategy for P2P systems must support fault-tolerance in its operations. Therefore, automatic self-recovery from failures without seriously affecting overall performance becomes extremely important for P2P systems. Sometimes it is not possible to recover from a failure, however. It is then necessary that the system be capable of adequately providing the services in the presence of such partial failure. In case of a failure a P2P system must be capable of providing continuous service while necessary repairs are being made. In other words, operation such as routing between any two nodes n 1 and n 2 must be completed successfully even when some nodes on the way from n 1 to n 2 fail unpredictably. Low cost for network maintenance [5, 22, 24]: the management of a node s insertion or deletion in the network, as well as the dissemination and replication of resources generate control messages in the network. Control messages are mainly used to keep the topology-changing network up-to-date and in a consistent state. However, since the number of control messages can become very large and grow 28

even larger than the number of data packets, it is required to keep the proportion of control messages to the data packets as low as possible. The cost for resource management should not be higher than the cost of the network resource utilization itself. Load Balancing: the load distribution is measured by investigating how good the network management duties are distributed between the peers in the network. A parameter for assessing this is for example the routing table and the location table at each node of the system. A suitable resource management strategy for P2P should ensure a well-balanced distribution of the management duties between the nodes of the system [3, 9, 18]. High Availability: the availability of a P2P management solution defines the probability that a resource is successfully located in the system. A resource management strategy is said to be highly available, when it enables any existing resources of the system to be found when it is requested with a probability of almost 100%. This depends on the fault-tolerant routing and the resource distribution strategies [2, 17]. Cost sharing/reduction: Centralized systems that serve many clients typically bear the majority of the cost of the system. When that main cost becomes too large, a P2P architecture can help spread the cost over all the peers [1]. For example, in the filesharing space, the developed system will enable the cost sharing of file storage, and will able to maintain the index required for sharing. Much of the cost sharing is realized by the utilization and aggregation of otherwise unused resources which results both in net marginal cost reductions and a lower cost for the most costly system component. Because peers tend to be autonomous, it is important for costs to be shared reasonably equitably. Resource aggregation and interoperability [7]: A decentralized approach lends itself naturally to aggregation of resources. Each node in the P2P system brings with 29

it certain resources such as compute power or storage space. Applications that benefit from huge amounts of these resources, such as compute-intensive simulations or distributed file systems, naturally lean toward a P2P structure to aggregate these resources to solve the larger problem. Interoperability is also an important requirement for the aggregation of diverse resources. Increased autonomy: In many cases, users of a distributed system are unwilling to rely on any centralized service provider. Instead, they prefer that all data and work on their behalf be performed locally. P2P systems support this level of autonomy simply because they require that the local node do work on behalf of its user. Privacy: Related to autonomy is the notion of anonymity and privacy. A user may not want anyone or any service provider to know about his or her involvement in the system. With a central server, it is difficult to ensure anonymity because the server will typically be able to identify the client, at least by Internet address. By employing a P2P structure in which activities are performed locally, users can avoid having to provide any information about themselves to anyone else. Anonymity can be built into a P2P application by using a forwarding scheme for messages to ensure that the original requestor of a service cannot be tracked. Dynamism [6]: P2P systems assume that the computing environment is highly dynamic. That is, resources, such as compute nodes, will be entering and leaving the system continuously. When an application is intended to support a highly dynamic environment, the P2P approach is a natural fit. In communication applications, such as Instant Messaging, so-called Buddy Lists are used to inform users when persons with whom they wish to communicate become available. Without this support, users would be required to poll for chat partners by sending periodic messages to them. Dynamic Service Relationships [23]: Dynamic Service Relationships become an important issue in P2P systems due to the fact that those systems are nondeterministic, dynamic and are self-organizing based on the immediately available 30

resources. A P2P system is typically loosely coupled; moreover it is capable of adapting to changes in the system structure and its environment: number of peers, their roles, and infrastructure. In order to build a loosely coupled system that is capable of dynamic re- configuration, several mechanisms should be in place: Discovery: There must be a distributed search mechanism that allows for finding services and service providers based on certain criteria. The challenge is to find the right number of look-up services that should be available in the system. Another challenge is how to decide which peer will run a look-up service in a fully distributed environment. Again we need a decision-making system or voting. Running a look-up service requires additional resources such as power and memory from the peer, therefore cannot be always requested form the peer on a free of charge base. Thus, shortest path of the resource lookup operation is a benchmark for the effectiveness of the resource management. Herewith, any requested resource should be found within an optimal lookup path length that is as close as possible to the Moore Bound D = log Δ -1(Nmax(Δ - 2) + 2) - log Δ-1 -Δ,[33,34].Here, D is the diameter of a Moore graph which is defined as the lowest possible end-to-end distance between any two nodes in a connected graph. Naming /Addressing [6]: In order to identify a resource (peer or service) a unique identification mechanism or naming concept needs to be introduced into a P2P system. How to address a peer in the global network? Addresses that are normally used to access the node in the network (such as IP-address in the TCP/IP network) do not help a lot the P2P system is heterogeneous; therefore different addressing protocols can be theoretically used within one P2P network. Self-description of the data: Systems with a fixed infrastructure as well as fixed subsystems grouping are based on pre-defined relationships. P2P systems being loosely coupled systems must provide mechanisms for data self-description in order to build relationships between services. The challenge of self-description is 31

in definition of the level of details of the meta-language. There is a challenge of interpretation of the resource description written in a language not necessarily known by the resource requesting it. The problem is how to introduce new data that is not known before. The system should be able to understand that this new data is some kind of something one already knows. Negotiation/Trading: The consumer of the P2P resource is end-user or another P2P resource; we need a mechanism that will allow for finding the optimal available resource to be consumed based on some criteria that is given by the consumer. The task can be solved by introduction of suitable protocols, but the negotiation in the distributed environment needs again a complex decisionmaking mechanism on both consumer and provider sides. Mobility: Taking the dynamic aspect of P2P networking into consideration, the devices in P2P systems are often mobile. That includes physical mobility as well as logical changes to the overall application structure. On the application level there is a need to support changing network technologies (LAN, WLAN, BlueTooth, etc.) and addressing in a mobile environment. Some main issues that need to be discussed: Mobility of Peers and Services [16]: Mobility means that participants of the network are changing their location while the system is operational. Each time a network node moves (or is moved), the entire topology of the network changes. Two nodes that were just close together and could easily interact with each other might be far apart in the next moment requiring some dedicated routers in between in order to establish any kind of communication, and experiencing increased latency. Or it might happen that a node moves out of the scope of the network and loses the connection completely, while other nodes come into reach and want to participate. Most P2P systems need to be able to work under these circumstances. Also mobility usually implies that devices just have limited electrical power and limited network bandwidth, requiring efficient use of these two resources. Mobility of services brings additional challenges to the design of a P2P 32

system when we take QoS (Quality of Service) and availability/reliability issues into account. The requestor of the service expects to get a certain level of service quality and also expects to have the service available when needed. This is a challenge in mobile distributed environments when services are not only distributed but also changing their locations. Geo-location [21]: In its extreme form, P2P systems can have global scope, where nodes from all over the world can communicate with each other. It is a major challenge to determine the (current) physical location of a peer in P2P networks. Another issue of geo-location concerns synchronization of services and communication. This is much more difficult than in a centralized system as there is no notion of a globally shared clock/state. Enabling ad-hoc communication and Coordination [10, 15]: Related to dynamism is the notion of supporting ad-hoc environments. By ad hoc, we mean environments where members come and go based perhaps on their current physical location or their current interests. Again, P2P fits these applications because it naturally takes into account changes in the group of participants. Coordination requirements come in many different flavors in P2P systems. When a request for a service needs to be fulfilled, their needs to be a way to determine which of possibly many service providers will serve the request. Such a situation occurs primarily when several providers can service a request for load-balancing purposes. Security [19, 36, 37, 38]: P2P systems are subjected to numerous challenges with respect to security. Making sure a user of the system is really the one he claims to be. In P2P systems service and resource consumers might require proof of information about the provider; otherwise authentication cannot be considered successful. Therefore, distributed trust establishment mechanisms are needed. Deciding who is allowed to access what. In centralized systems the user rights are pre-defined and therefore the decision to allow access for the certain user is taken based on these 33

predefined rights [11]. In P2P systems the requestor is not known a prior, that leads to a complex decision making process. Making sure data cannot be read by nonauthorized parties, making sure it was not changed on the wire without this being recognized, proofing from whom the data came, for example with cryptographic signatures, or making sure that actions that have been executed cannot be claimed never to have happened (non-repudiation). Thus, the system must be especially hardened against insider-attacks, because people can very easily become insiders. State and data management: P2P systems are characterized by the fact that a single failing peer must not bring down the system as a whole. Of course, specific services (those that had lived on the dying peer) might not be available anymore, but the system still fulfills a useful purpose. In many systems this requires facilities for some kind of distributed data management [39]. As a consequence, we have to look at the following challenges: replication, caching [35], consistency and synchronization, and finding the nearest copy. Lifecycle Management and garbage collection: In traditional systems, distributed garbage collection is difficult because of the fact that a part of the system might fail unexpectedly. This might of course also happen in P2P systems, but there we have the additional problem that peers become offline as part of their regular operation. As a consequence, an unavailable peer is not a consequence of a (more or less seldom happening) failure, but part of the ordinary operation. Therefore, more efficient algorithms are required. Also, mechanisms have to be put in place to handle the case where a peer provides a resource he leased himself to another client. 2.7 Review of Few Algorithms In [59] authors presents end-to-end availability of services using overlay networks. The end-to-end availability of Internet services is between two and three orders of magnitude worse than other important engineered systems. A core aspect of many of the failures that interrupt end-to-end communication is that they fall outside the expected domain of well- behaved network failures. Many traditional techniques cope 34

with link and router failures; as a result, the remaining failures are those caused by software and hardware bugs, misconfiguration, malice, or the inability of current routing systems to cope with persistent congestion. The effects of these failures are exacerbated because Internet services depend upon the proper functioning of many components wide-area routing, access links, the domain name system, and the servers themselves and a failure in any of them can prove disastrous to the proper functioning of the service. Authors described three complementary systems to increase Internet availability in the face of such failures. Each system builds upon the idea of an overlay network, a network created dynamically between a group of cooperating Internet hosts. The first two systems, Resilient Overlay Networks (RON) and Multi-homed Overlay Networks (MONET) determine whether the Internet path between two hosts is working on an end-to-end basis. Both systems exploit the considerable redundancy available in the underlying Internet to find failure-disjoint paths between nodes, and forward traffic along a working path. RON is able to avoid 50% of the Internet outages that interrupt communication between small groups of communicating nodes. MONET is more aggressive, combining an overlay network of Web proxies with explicitly engineered redundant links to the Internet to also mask client access link failures. Eighteen months of measurements from a six-site deployment of MONET show that it increases a client s ability to access working Web sites by nearly an order of magnitude. Where RON and MONET combat accidental failures, the Mayday system guards against denial-of- Service(DoS) attacks by surrounding a vulnerable Internet server with a ring of filtering routers. Mayday then uses a set of overlay nodes to act as mediators between the service and its clients, permitting only properly authenticated traffic to reach the server. In [68] authors present game-theoretic analyses of the impact of Selfish Routing in which nodes are permitted choose the path their packets take through the network. Analysis results vary depending on the network model used: If network cost is linear in the amount of traffic, then the selfish solution has at most. 35

In [60] authors studies the question left in [68] through a simulation of an internal ISP like topology using OSPF (Open Shortest Path First)-like routing [60]. In this somewhat more realistic model, the authors found that selfish routing often outperforms the solution found by the conventional routing protocol, though it does lead to increased link utilization on certain popular links. In [61] author develops a novel middleware approach termed opportunistic overlays and its dynamically reconfigurable support framework for building efficient mobile applications. Specifically, author addressed the inefficiency of content delivery introduced by node mobility and by dynamically changing system loads, in the context of publish/subscribe systems. In response to changes in physical network topology, in nodes physical locations, and in network node behaviors, opportunistic overlays dynamically adapt event dissemination structures (i.e., broker overlays) with the goal of optimizing end-to-end delays in event delivery. In [62] author suggested the increasing availability of high bandwidth Internet connections and low-cost, commodity computers in people s homes have stimulated the use of resource sharing P2P networks. These systems employ scalable mechanisms that allow anyone to offer content and services to other users. However, the open accessibility of these systems makes them vulnerable to malicious users wishing to poison the system with corrupted data or harmful services and worms. Because of this danger, users must be wary of the quality or validity of the resources they access. To mitigate the adverse behavior of unreliable or malicious peers in a network, researchers have suggested using reputation systems. In [63] author presented the overlay networks which had have gained popularity as a viable alternative to overcome functionality limitations of the Internet (e.g., lack of QoS, multicast routing). They offer enhanced functionality to end-users by forming an independent and customizable virtual network over the native network. Furthermore, they are being widely promoted as a potential architecture of the future Internet in the form of network virtualization, where multiple heterogeneous virtual networks may co-exist on top of a shared native network. The prominent characteristic in either context is that routing at the overlay layer operates independent of that at the underlying native layer. There are several potential 36

problems with this approach because overlay networks are selfish entities that are chiefly concerned with achieving the routing objective of their own users. This leads to complex cross-layer interactions between the native and overlay layers, and often tends to degrade the achieved performance for both layers. As overlay applications proliferate and the amount of selfish overlay traffic surges, there is a clear need for understanding the complex interactions and for strategies to manage them appropriately. Author addresses these issues in the context of service overlay networks, which represent virtual networks formed of persistent nodes that collaborate to offer improved services to actual end-systems. Typically, service overlays alter the route between the overlay nodes in a dynamic manner in order to satisfy a selfish objective. Work improved the stability and performance of overlay routing in this multi-layer environment. Investigation is in the direction of the common problems of functionality overlap, lack of cross-layer awareness, mismatch or misalignment in routing objectives and the contention for native resources between the two layers. These problems often lead to deterioration in performance for the end-users. Work presents an analysis of the cross-layer interaction during fault recovery, inter-domain policy enforcement and traffic engineering in the multi-layer context. Based on the characterization of the interaction, author propose effective strategies that improve overall routing performance, with minimal side-effects on other traffic. These strategies typically 1) increase the layer-awareness (awareness of information about the other layer) at each layer, 2) introduce better control over routing dynamics and 3) offer improved overlay node placement options. Our results demonstrate how applying these strategies lead to better management of the crosslayer interaction, which in turn leads to improved routing performance for end-users. In [64] author considered a foundational issue underlying in many overlay network applications ranging from routing to P2P file sharing is that of connectivity management, i.e., folding new arrivals into an existing overlay, and re-wiring to cope with changing network conditions. In this paper, he unify these two perspectives: devising practical heuristics for specific applications designed to work well in real deployments, and providing abstractions for the underlying problem that are 37

analytically tractable, especially via game-theoretic analysis thrusts by using insights gleaned from novel, realistic theoretic models in the design of Egoist a prototype overlay routing system that he implemented, deployed, and evaluated on PlanetLab. Using measurements on PlanetLab and trace-based simulations, he demonstrate that Egoist s neighbor selection primitives significantly outperform existing heuristics on a variety of performance metrics, including delay, available bandwidth, and node utilization. Moreover, he demonstrate that Egoist is competitive with an optimal, but unsalable full-mesh approach, remains highly effective under significant churn, is robust to cheating, and incurs minimal overhead. In [65] authors proposed a novel query routing mechanism for improving query performance in unstructured P2P networks. Data structure developed in [65] is called traceable gain matrix (TGM) that records every query's gain at each peer along the query hit path, and allows for optimizing query routing decision effectively. Experimental results show that query routing mechanism achieves relatively high query hit rate with low bandwidth consumption in different types of network topologies under static and dynamic network conditions. In [66] authors proposed simple powerful index scheme to enhance search in unstructured P2P networks. The index scheme uses a data structure Bloom filters to index files shared at each node, and then lets nodes gossip to one another to exchange their Bloom filters. In [67] authors extend the deterministic algorithm in [69] the environment of asynchronous networks, where no clock pulses are assumed, and the message delivery time may vary and is not known. They managed to maintain complexities similar to those of the synchronous algorithm of [68]. (translated to the asynchronous model). These are lower than the complexities that would be needed to synchronize the system. The main technical difficulty in a directed, weakly connected system is to ensure that nodes take steps that are consistent with each other, even if their knowledge about each other is not symmetric. Here, this task is further complicated by the fact that there is no timeout mechanism (which does exist in synchronous systems) to assist in ensuring consistency. In particular, as opposed to the case in synchronous systems, an asynchronous algorithm cannot first transform every 38

directed edge to be bidirectional and second, apply an algorithm for bidirectional graph. 2.8 Various Middleware Approaches Different middleware approaches were selected and classified taking the programming models used into account. Programming wireless/wired networks includes two major classes shown in Figure 2.7. The first one is programming support, which manages the providing systems, services, and run-time mechanisms, such as reliable code distribution, safe code execution, and application-specific services. The second one is programming abstraction, which is related to the way a wireless/wired network is viewed and presents concepts and ideas of network nodes and data lying on them. The programming support class consists of five approaches virtual machine based, modular programming based, database-based, applicationdriven, and message-oriented middleware as shown in Figure 2.7. Programming Wireless/Wired Networks Programming Abstraction Programming Support 1. Global Behavior 1. Virtual Machine 2. Local Behavior 2. Data Base 3. Modules 4. Application Driven 5. Message-oriented 6. Middleware Figure 2.7. Programming Models 2.8.1 Virtual Machine This approach consists of virtual machines (VM), interpreters, and mobile agents. Its main characteristic is flexibility, allowing developers to write applications in divided small modules, which are injected and distributed through the network by the system using tailored algorithms and then interpreted by the VM. Those tailored algorithms 39

minimize the overall energy expenditure as well as resource use. However, the technology is complex and the instructions introduce overhead. 2.8.2 Modular Programming (Mobile Agents)[8, 70, 71] The use of mobile code facilitates the injection and distribution through the network and leads to application modularity. Less energy is necessary when broadcasting small modules instead of the complete application. 2.8.3 Database This approach observes the entire network as a virtual database system, offering an easy-to-use interface that permits the user to extract data of interest and issue queries about the network. Nevertheless, this approach does not support real-time applications, as it provides only approximate results and the detection of spatialtemporal relationships between events is not possible. 2.8.4 Application-Driven This approach establishes a new, innovative aspect in middleware research by complementing an architecture that accomplishes the network protocol stack, enabling programmers to adjust the network according to the exact application requirements. It provides a QoS advantage since the applications determine the network operations management. 2.8.5. Message-Oriented Middleware (MOM) This approach is essentially a communication model in a distributed network environment. The system facilitates message exchange between nodes and the sink nodes by means of a publish-subscribe mechanism. This model supports asynchronous communication, making movable combinations between the sender and receiver possible. 40

2.9 Some Middlewares Napster: Napster [26, 55, 57] is a simply structured centralized system. We present it here as a sort of simplest model (which was very successful socially) to contrast the other systems to. It uses a centralized server to create its own flat namespace of host addresses. In startup, the client contacts the central server and reports a list with the files it maintains. When the server receives a query from a user, it searches for matches in its index, returning a list of users that hold the matching file. The user then directly connect the peer that holds the requested file, and downloads it as shown in Figure 2.8. There are problems with using a centralized server including the fact that there is a single point of failure. Napster does not replicate data. It uses "keepalives" to make sure that its directories are current. Maintaining a unified view is computationally expensive in Napster. It does not provide scalability. The focus on Napster as a music sharing system in which users must be active in order to participate has made it exceedingly popular. Napster does not use the resource sharing, but it uses distributed file management. Regarding routing, it is simply a centralized directory system using Napster servers. The main advantage of Napster and similar systems is that they are simple and they locate files quickly and efficiently. The main disadvantage is that such centralized systems are vulnerable to malicious attack and technical failure. Furthermore, these systems are inherently not largely scalable, as there are bound to be limitations to the size of the server database and its capacity to respond to queried. This system is not reliable as it is prone to single point failure, easily attacked by DoS. Napster provides communication level fault tolerance as any packet dropped due to congestion, can be retransmitted. Napster provides communication level security. It does not support system level and application level security. Performance of Napster is good in under load, but it falls sharply when server is overload. The response time will increase when the number of nodes and request exceed the capability of the server. 41