P2P Characteristics and Applications Lecture for the Project Group A Distributed Framework for Social Networks Dr.-Ing. Kalman Graffi Email: graffi@mail.upb.de Fachgruppe Theorie verteilter Systeme Fakultät für Elektrotechnik, Informatik und Mathematik Universität Paderborn Fürstenallee 11, D-33102 Paderborn, Deutschland Tel.+49 5251 606730, Fax. +49 5251 606697 http://www.cs.uni-paderborn.de/fachgebiete/fg-ti.html UPB SS2011 PG-Framework Lecture-02 P2P-Characteristics-Applications.ppt This slide set is based on the lecture "Communication Networks 2" of Prof. Dr.-Ing. Ralf Steinmetz at TU Darmstadt 20. April 2011
Some relevant books Monitoring and Management of Peer-to-Peer Systems Kalman Graffi http://tuprints.ulb.tudarmstadt.de/2248/ Handbook of P2P Networking Xuemen Shen, Heather Yu, John Buford, Mursalin Akon Peer-to-Peer Systems and Applications Ralf Steinmetz, Klaus Wehrle (Editors) www.springerlink.com /content/g6h805426g7t H( m y data ) = 3107? 709 611 1008 2207 12.5.7.31 berkeley.edu planet-lab.org peer-to-peer.info 61.51.166.150 95.7.6.10 86.8.10.18 1622 3485 2011 2906 7.31.10.25 Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 2
Overview 1 Challenges in Data Management & Retrieval 1.1 Approach I: Central Index 1.2 Approach II: Search 1.3 Search Mechanisms: Flooding 1.4 Search Mechanisms: Expanding Ring 1.5 Search Mechanisms: Rendezvous Point 1.6 Approach III: Routing (Distributed Indexing) 2 Peer-to-Peer Networking 2.1 Definition of P2P 2.2 Nine Characteristics of P2P Systems 2.3 Major Query Types 3 Evolution of Internet Computing Paradigms 4 Success of P2P Networking Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 3
1 Challenges in Data Management & Retrieval? I have item D. Where to place D? D? Data item D distributed system I want item D. Where can I find D?? 12.5.7.31 berkeley.edu planet-lab.org peer-to-peer.info 89.11.20.15 95.7.6.10 86.8.10.18 7.31.10.25 Essential challenges Location of a data items at distributed systems Where shall the item be stored by the provider? How does a requester find the actual location of an item? Scalability to keep the complexity for communication and storage scalable Robustness and resilience in case of faults and frequent changes The content of this slide has been adapted from Peer-to-Peer Systems and Applications, ed. By Steinmetz, Wehrle Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 4
Strategies for Data Retrieval? I have item D. Where to place D? D? Data item D distributed system I want item D. Where can I find D?? 12.5.7.31 peer-to-peer.info berkeley.edu planet-lab.org 89.11.20.15 95.7.6.10 86.8.10.18 7.31.10.25 Strategies to store and retrieve data items in distributed systems Central server (central indexing) Flooding search (local indexing) Routing (distributed indexing) Index Information where data items lie Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 5
1.1 Approach I: Central Index A stores D Server S 2 Where is D? Node B 3 A stores D 4 Transmission: D Node B Node A 1 A stores D Simple strategy: Central server stores information about locations 1 Node A (provider) tells server that it stores item D 2 Node B (requester) asks server S for location of D 3 Server S tells B that node A stores item D 4 Node B requests item D from node A Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 6
Approach I: Central Server Advantages Search complexity of O(1) just ask the server Complex and fuzzy queries are possible Simple and fast But overall, Best principle for small and simple applications Challenges No intrinsic scalability O(N) network and system load of server Single point of failure or attack Non-linear increasing implementation and maintenance cost in particular for achieving high availability and scalability Central server not suitable for systems with massive numbers of users Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 7
1.2 Approach II: Search 1 B searches D 2 3 Node B 1 1 2 2 3 4 1 2 3 5 I have D? & Transmission: D Node B 2 Node A 4 I store D Distributed Indexing Approach No information about location of data at intermediate systems Necessity for broad search 1 Node B (requester) asks neighboring nodes for item D 2-4 Nodes forward request to further nodes (breadth-first search / flooding) 5 Node A (provider of item D) sends D to requesting node B The content of this slide has been adapted from Peer-to-Peer Systems and Applications, ed. By Steinmetz, Wehrle Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 8
Approach II: Search Fully Distributed Approach Central systems: vulnerable, do not scale Unstructured P2P overlays No global information available about location of a item Content only stored at respective node providing it Retrieval of data No routing information for content Necessity to ask sufficiently many nodes Search mechanisms in P2P overlays Broadcast Expanding Ring Random Walk Rendezvous Idea Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 9
1.3 Search Mechanisms: Flooding Query Query Query Query Query Query Breadth-first search (BFS) Use system-wide maximum TTL to control communication overhead Send a message to all neighbors except the one who delivered the incoming message Store message identifiers of routed messages or use non-oblivious ( nicht zu vergessende ) messages to avoid retransmission cycles Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 10
Example 4 3 3 1 3 2 3 3 4 4 4 5 6 6 6 5 increasing hop count source peer 2 3 4 5 destination peer Overhead Large, here 43 messages sent Length of the path: 5 hops Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 11
1.4 Search Mechanisms: Expanding Ring Mechanism Successive floods with increasing TTL Start with small TTL If no success increase TTL.. etc. Properties Improved performance if objects follow Zipf law popularity distribution and are located accordingly Message overhead is high Zipf-law example Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 12
Search Mechanisms: Random Walk Random walks Forward the query to a randomly selected neighbor Message overhead is reduced significantly Increased latency Multiple random walks (k-query messages) reduces latency generates more load Termination mechanism TTL-based Periodically checking requester before next submission Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 13
Example 4 3 3 1 4 2 10 3 4 5 5 6 7 8 8 7 increasing hop count source peer 2 9 8 9 destination peer Random walk with n=2 (each incoming message is sent twice out) Overhead Smaller, here e.g. 30 messages sent until destination is reached Length of the path found e.g. 7 hops Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 14
1.5 Search Mechanisms: Rendezvous Point Storing node green on right side propagates content on all nodes within a predefined range Requesting node violet on left side propagates his query to all neighbors within predefined range A query hit can be found at the Rendezvous Point (black) Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 15
1.6 Approach III: Routing (Distributed Indexing) B searches D 1 Calculates ID ID = Hash(D) = 45 Node B [00,10[ Fully Decentralized Approach Construction of an Overlay Each node responsible for certain range of the ID space ID are calculated using hash functions (e.g. SHA1, MD5) Routing is done as follows 1 Node B (requester) calculates ID of D 2 [10,15[ Node A [40,47[ 2,3 Node B sends request to neighbor with responsibility interval closest to ID 4 [27,33[ [15,22[ [22,27[ [33,40[ I am responsible for ID 45 4 Nodes A is responsible for ID 45 an sends requested item D to node B 3 [53,00[ [47,53[ The content of this slide has been adapted from Peer-to-Peer Systems and Applications, ed. By Steinmetz, Wehrle Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 16
Approach III: Routing (Distributed Indexing) Advantages No single point of failure Scalable Efficient way for retrieving content Challenges Maintenance of ID Space necessary Reliable routing Load balancing Replication of data Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 17
Requirements for Overlay Networks Fault-tolerance Resilience of the connectivity when failures are encountered Due to faulty behavior Heterogeneity Supporting variations in physical capabilities and peer behavior Fairness Workload equal on all peers Workload proportional to peer capability or to peer demands Security Ability of a system to manage, protect and distribute sensitive information Privacy Degree to which a system or component supports anonymous transactions Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 18
2 Peer-to-Peer Networking A huge number of nodes participating in the network Have resources to share Have demands towards the use of resources which may not be satisfied easily and by single nodes??? Main question Which of the nodes provides the resources Which instance of a resource shall be provided (exactly)? Approach: Peer-to-Peer (P2P) P2P offers mechanisms to find / look up what is wanted P2P builds overlay network(s) Mode of operation After locating the node providing the desired service: Interact directly from peer to peer Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 19
2.1 Definition of P2P P2P (Peer-to-Peer): A distributed system and A communications paradigm Notion of Distributed Systems (very general) it consists of multiple autonomous computing devices that communicate through a computer network computing devices interact with each other in order to achieve common goals Focus of the lecture: Systems with loosely coupled (no fixed relationship) autonomous devices Devices have their own semi-independent agenda Comply to some general rules but local policies define their behavior (At least) limited coordination and cooperation needed Strategies to find peer providing desired content Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 20
2.2 Nine Characteristics of P2P Systems Overlay Network Service Delivery Overlay Connection IP Network (Underlay) Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 21
Detailed Characteristics Service Delivery Overlay Network Overlay Connection Resources (location, sharing) IP Network (Underlay) 1. relevant resources located at nodes (peers) at the edges of a network 2. peers share their resources (voluntarily, not forced by contracts) 3. resource locations widely distributed most often largely replicated Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 22
Detailed Characteristics Service Delivery Overlay Network Overlay Connection Networking 4. variable connectivity is the norm support of dial-up users with variable IP addresses operating outside the domain name system (DNS) often operating behind firewalls or NAT gateways peers are often active only for a limited time period IP Network (Underlay) Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 23
Detailed Characteristics Service Delivery Overlay Network Overlay Connection Interaction of Peers 5. combined client and server functionality SERVer + client = SERVENT minimal demands of the underlying infrastructure services provided by end systems IP Network (Underlay) 6. direct interaction (provision of services, e.g. file transfer) between peers (= peer to peer ) Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 24
Detailed Characteristics Service Delivery Overlay Network Overlay Connection Management 7. peers have significant autonomy and mostly similar rights IP Network (Underlay) 8. no central control or centralized usage/provisioning of a service 9. self-organizing system Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 25
Peer-to-Peer: 9 Properties 1. relevant resources located at nodes ( peers ) at the edges of a network 2. peers share their resources 3. resource locations widely distributed most often largely replicated 4. variable connectivity is the norm 5. combined Client and Server functionality 6. direct interaction (provision of services, e.g. file transfer) between peers (= peer to peer ) 7. peers have significant autonomy and mostly similar rights 8. no central control or centralized usage/provisioning of a service 9. self-organizing system Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 26
P2P Systems Unstructured vs. Structured Unstructured P2P Structured P2P Centralized P2P Pure P2P Hybrid P2P DHT-Based Hybrid P2P 1. All features of Peerto-Peer included 2. Central entity is necessary to provide the service 3. Central entity is some kind of index/group database Examples: Napster 1. All features of Peerto-Peer included 2. Any terminal entity can be removed without loss of functionality 3. no central entities Examples: Gnutella 0.4 Freenet 1. All features of Peerto-Peer included 2. Any terminal entity can be removed without loss of functionality 3. dynamic central entities Examples: Gnutella 0.6 Fasttrack edonkey 1. All features of Peer-to-Peer included 2. Any terminal entity can be removed without loss of functionality 3. No central entities 4. Connections in the overlay are fixed Examples: Chord CAN Kademlia 1. All features of Peerto-Peer included 2. Peers are organized in a hierarchical manner 3. Any terminal entity can be removed without loss of functionality Examples: RecNet Globase.KOM from R.Schollmeier and J.Eberspächer, TU München Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 27
Peer-to-Peer Architectures centralized decentralized Content distribution centralized Client-Server hybrid Peer-to-Peer with Centralized Indexing decentralized Pure Peer-to-Peer with Decentralized Indexing Content indexing Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 28
Structured and Unstructured P2P Networks Unstructured P2P Networks objects have no special identifier location of desired object a priori not known each peer is only responsible objects it submitted Structured P2P Networks peers and objects have identifiers objects are stored on peers according to their ID: responsiblefor(objid) = PeerID distributed indexing points to object location Search: Find all (or some) objects in the P2P network which fit to given criteria Lookup / Addressing: Retrieve the object which is identified with a given identifier Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 29
2.3 Major Query Types Lookup Key-Value lookup as known from hash tables Given: Key Return: Single value Full Text Search Given: sequence of word (search term) Return: all entries / articles matching the search terms Range Query Given: Range [ X, Y] Return: all stored entries within range Location based search Matching Query Given: logical condition (A && B C) Return: all entries fulfilling logical condition Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 30
Look up vs. Searching Distributed Hash Tables (DHT) look up objects by addressing them with their unique name (cf. URLs in web) Traditional P2P file sharing networks find objects by searching with keywords that match object s description (cf. Google) Pros Each object uniquely identifiable Object location can be made efficient Cons Need to know unique name Need to maintain structure required by addresses Pros No need to know unique names Cons Hard to make efficient Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 31
Metrics, Searching and Addressing Probability of success structured protocols guarantee results, if target exists assuming absence of malicious peers unstructured protocols require exhaustive search Protocol metrics Average number of messages per node Visited nodes Peak number of messages Congestion Quality of results Completeness (are all results returned) Correctness (are all returned entries valid) Operation latency (time needed to solve the query) Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 32
3 Evolution of Internet Computing Paradigms 1st generation (since the beginning of the Internet): permanent IP addresses, always connected static domain name system (DNS) mapping limited specialized applications, protocols: Telnet, FTP, Gopher,... World Wide Access 2nd generation (since 90s): WWW & graphical browsers dynamic IP addresses / NAT / firewalls heterogeneous applications, asymmetric server based services protocol: HTTP,.. World Wide Web Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 33
Evolution of Internet Computing Paradigms (2) 3rd generation (since 2000): more collaboration and personalized applications powerful edge devices (peers), instant networking protocols/applications: Napster, Gnutella Emule/Edonkey/MLDonkey, Fasttrack (KaZaA), Freenet,.. Chord, 4th generation (since 2010) Cloud computing Centralized provision of all resources Users access only as clients Overlay Connection 5th generation (???) P2P cloud services? All resources provided by user devices? Service Delivery Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 34
Client / Server Model vs. P2P Technology Situation: 1 server, n * clients Issue:??? E.g. on which server is the information wanted? Solution: Look it up on another server (or google, which does this for you) Advantages: Reliable, well known behavior Drawbacks: Server need to provide (almost) all resources Client / Server model is not P2P: Communication only between clients and server, not between clients and clients Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 35
Client / Server Model vs. P2P Technology Client-Server Peer-to-Peer 1.Server is the central entity and only provider of service and content. Network managed by the Server 2.Server as the higher performance system 3.Clients as the lower performance system 1.Resources are shared between the peers 2.Resources can be accessed directly from other peers 3.Peer is provider and requestor (Servent concept) Example: WWW from R.Schollmeier and J.Eberspächer, TU München Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 36
GRID Computing vs. P2P Technology Similar idea, similar concept as in P2P High performance data processing centers needed for scientific applications but expensive to provide often do not offer enough performance Solution: GRID to interconnect the existing data processing centers to Virtual Organization and operate it as distributed processing center (the GRID) www.gridforum.org www.rechenkraft.net www.ggf.org History Participation Typical Transfer Volume Typical Service Typical Problems P2P Sharing MP3 files & illegal content Voluntarily Small (MP3) to medium (video) File Sharing Hugh number of users cause scalability issues GRID Saving costs for data processing centers By contract Huge (often terabytes) Processing Sharing Transferring huge amounts of data Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 37
Cloud Computing vs. P2P Technology Cloud and P2P Access to a distributed pool of resources Resources: storage, bandwidth, computational power Cloud computing Resource providers: companies Controlled environment No malicious providers No (/minimal) churn Homogenous devices Selective centralized structures OK Accounting, monitoring Single access point Centralized updates P2P systems Resource providers: user devices Uncontrolled environment Churn, malicious providers Heterogeneous devices Uncertainty / unpredictability Distributed access points Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 38
Essential Aspects of Cloud Computing On-demand self-service resources (e.g., server time, network storage) are automatically provided to a customer when required Rapid elasticity underlying infrastructure is able to adapt to changing requirements (e.g., number of concurrent users) allows for dynamic up-/down-scaling Measured Service metering of resource and service consumption to provide elastic pricing and billing models e.g., pay-per-use Resource pooling resources are provided/assigned dynamically in a multi-tenant way Broad network access capabilities are available worldwide over standard network mechanisms What s in a Cloud? Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 39
P2P vs. Cloud Computing Cloud Service Models Software as a Service provides applications and services representing business functions and make use of Cloud platforms and infrastructures themselves e.g., Google Docs, Salesforce CRM Platform as a Service provides a platform for application and service development and hosting e.g., Google App Engine, Windows Azure (Platform) Infrastructure as a Service provides storage, computing and network capabilities e.g., Amazon S3, Amazon EC2, SQL Azure Software as a Service (SaaS) Platformas a Service (PaaS) Infrastructureas a Service (IaaS) Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 40
Others vs. P2P Technology Ad hoc Networks: No communication infrastructure available Nodes provide bandwidth for common goal to enable communications main issue: routing More Hop2Hop than Peer2Peer Web 2.0 Users interact with Website Users-to-user interaction over Website Friend-of-a-friend Federation of personal webservers Linking to trusted friends Distributed Systems: Transparent distribution Distributed System act as a local machine P2P is not transparent Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 41
Challenges for Quality of P2P Systems Distributed character Distributed solutions and overlays needed Undefined scale From tens to several millions Peer fluctuation (churn) Peers join / leave the system autonomously Peer heterogeneity Varying capacities and connectivity Static configurations are insufficient Requirements known when system is deployed Updates are difficult Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 42
4 Success of P2P Networking One of the newest buzzwords in networking is Peer-to-Peer (P2P) Is it only a hype? initially 40 million Napster users in 2 years integrated into commercial systems, e.g., Microsoft P2P SDK Advanced Networking Pack for Windows XP open source, e.g., JXTA (Sun) with Protocols & Services strong presence at international networking conferences Above logos copied from the respective web page Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 43
P2P Traffic P2P traffic is the major traffic source, since at least 2003 of overall Internet traffic more than ~50% is P2P traffic and P2P traffic in the Internet 60% 80% P2P file sharing traffic on backbones P2P generates most traffic in all regions Source: http://www.ipoque.com/resources/internet-studies Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 44
Ipoque Internet Study 2008/2009 http://www.ipoque.com/resources/internet-studies/internet-study-2008_2009 Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 45
Dominant P2P Applications 2003: Sandvine Study in Europe (France, Germany,..) predominant EDonkey/EMule in USA 2005: predominant KaZaA/Fastrack BitTorrent most successful file sharing P2P application Skype dominates in IP Telephony KaZaA more and more irrelevant edonkey largely replaced by emule using an extended but compatible protocol 2009: Wuala P2P-based storage service KaZaA and emule almost dead PPLive P2P-based Video Streaming Platform Mostly used in Asia BBC IPlayer http://www.bbc.co.uk/iplayer/ Vuze Former Azureus P2P-based Video-on-demand platform >2012: Distributed Social Communities Cloudy like p2pcloud Social. like LifeSocial, PeerSon, Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 46
Enabling Effects File sharing: highly attractive and cheap content users share their content with other users attractive content copyrights are usually not respected (problem!) cheap content Publishing: exploding amount of data 2 x 1E+18 Bytes are produced per year 3 x 1E+12 Bytes are published per year only 1.3x1E+8 websites indexed by search engines like Google see Gong: JXTA: A Network Programming Environment, IEEE Computing 2001 Unused resources at the edges more processing power, memory, bandwidth, storage available 1 TB hard disk for letters? 100 Mbit/s for sending emails? new compression mechanisms (mp3, mpeg), no problem for CPUs assume e.g. a Small-Medium Enterprise (SME) with 100 desktop computers: spare storage space: 100 x 1 TB = 100 TB spare processing power: 100 x 2 x 2 GHz x 5 ops/cycle = 2 trillion ops/sec Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 47
P2P in Business World New services at the edge of the network P2P overlay networks make it relatively easy to deploy new services Group collaboration superior for business processes grow organically, non-uniform and highly dynamic largely manual, ad-hoc, iterative and document-intensive work often distributed, not centralized no single person/organization understands the entire process from beginning to end Cost effectiveness reduces centralized management resources optimizes computing, storage and communication resources rapid deployment P2P applications/protocols tailored for user s needs Napster s success depended to a great amount on its ease of use Dr.-Ing. Kalman Graffi, FG Theorie verteilter Systeme 48