Using Peer to Peer Dynamic Querying in Grid Information Services



Similar documents
An Introduction to Peer-to-Peer Networks

Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam

Distributed Framework for Data Mining As a Service on Private Cloud

A Review on Efficient File Sharing in Clustered P2P System

RESEARCH ISSUES IN PEER-TO-PEER DATA MANAGEMENT

Anonymous Communication in Peer-to-Peer Networks for Providing more Privacy and Security

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

p2p: systems and applications Internet Avanzado, QoS, Multimedia Carmen Guerrero

Information Searching Methods In P2P file-sharing systems

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

5. Peer-to-peer (P2P) networks

File sharing using IP-Multicast

Chord - A Distributed Hash Table

Raddad Al King, Abdelkader Hameurlain, Franck Morvan

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Peer-to-Peer Computing

PEER TO PEER FILE SHARING USING NETWORK CODING

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

ENABLING SEMANTIC SEARCH IN STRUCTURED P2P NETWORKS VIA DISTRIBUTED DATABASES AND WEB SERVICES

A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM

LOAD BALANCING WITH PARTIAL KNOWLEDGE OF SYSTEM

SuperViz: An Interactive Visualization of Super-Peer P2P Network

Security in Structured P2P Systems

Adapting Distributed Hash Tables for Mobile Ad Hoc Networks

Improving Query Processing Performance in Large Distributed Database Management Systems

A P2P SERVICE DISCOVERY STRATEGY BASED ON CONTENT

Using for Distributed Ensemble Learning

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Ant-based Load Balancing Algorithm in Structured P2P Systems

XML Data Integration in OGSA Grids

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

Chord. A scalable peer-to-peer look-up protocol for internet applications

Evolution of Peer-to-Peer Systems

Architectures and protocols in Peer-to-Peer networks

PSON: A Scalable Peer-to-Peer File Sharing System Supporting Complex Queries

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Load Balancing in Structured Overlay Networks. Tallat M. Shafaat

PEER-TO-PEER (P2P) systems have emerged as an appealing

DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems

Distributed Hash Tables in P2P Systems - A literary survey

Decentralized supplementary services for Voice-over-IP telephony

Trust based Peer-to-Peer System for Secure Data Transmission ABSTRACT:

Object Request Reduction in Home Nodes and Load Balancing of Object Request in Hybrid Decentralized Web Caching

An Efficient Hybrid P2P MMOG Cloud Architecture for Dynamic Load Management. Ginhung Wang, Kuochen Wang

A Comparative Study of the DNS Design with DHT-Based Alternatives

Analysis on Leveraging social networks for p2p content-based file sharing in disconnected manets

8 Conclusion and Future Work

RVS-Seminar Overlay Multicast Quality of Service and Content Addressable Network (CAN)

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

Enhance UDDI and Design Peer-to-Peer Network for UDDI to Realize Decentralized Web Service Discovery

Naming vs. Locating Entities

A distributed system is defined as

Using Content-Addressable Networks for Load Balancing in Desktop Grids (Extended Version)

AUTOMATED AND ADAPTIVE DOWNLOAD SERVICE USING P2P APPROACH IN CLOUD

International Journal of Advance Research in Computer Science and Management Studies

Software Concepts. Uniprocessor Operating Systems. System software structures. CIS 505: Software Systems Architectures of Distributed Systems

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

A Reputation Management System in Structured Peer-to-Peer Networks

KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery

Mapping the Gnutella Network: Macroscopic Properties of Large-Scale Peer-to-Peer Systems

Load Balancing in Structured Peer to Peer Systems

Load Balancing in Structured Peer to Peer Systems

Simple Solution for a Location Service. Naming vs. Locating Entities. Forwarding Pointers (2) Forwarding Pointers (1)

The p2pweb model: a glue for the Web

D1.1 Service Discovery system: Load balancing mechanisms

Performance Analysis of Cloud-Based Applications

P2PCloud-W: A Novel P2PCloud Workflow Management Architecture Based on Petri Net

CSIS CSIS 3230 Spring Networking, its all about the apps! Apps on the Edge. Application Architectures. Pure P2P Architecture

International Journal of Advance Foundation and Research in Computer (IJAFRC) Volume 2, Special Issue (NCRTIT 2015), January 2015.

Patterns of Information Management

Acknowledgements. Peer to Peer File Storage Systems. Target Uses. P2P File Systems CS 699. Serving data with inexpensive hosts:

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Multicast vs. P2P for content distribution

Sanjeev Kumar. contribute

IEEE/ACM TRANSACTIONS ON NETWORKING 1

XOP: Sharing XML Data Objects through Peer-to-Peer Networks

Transcription:

Using Peer to Peer Dynamic Querying in Grid Information Services Domenico Talia and Paolo Trunfio DEIS University of Calabria HPC 2008 July 2, 2008 Cetraro, Italy

Using P2P for Large scale Grid Information Services Information services are key components in Grid systems to enable effective discovery and usage of distributed resources and services. Today Grid information services are based on hierarchical solutions, which can suffer limited scalability and reliability as the size of the system increases (e.g. large scale Grids, massive clouds and pervasive service oriented systems). P2P architectures have been proposed as an alternative to hierarchical solutions to implement more scalable and reliable Grid information services: Unstructured P2P systems Structured P2P systems July 2, 2008 HPC 2008, Cetraro, Italy 2

Objective of this work (1/2) Distributed Hash Tables (DHTs) allow scalable resource discovery in structured P2P networks but do not support some types of queries (e.g., regular expressions) and cannot index dynamic information. On the other hand, supporting arbitrary queries in Grids is very important: t In many scenarios resources must be dynamically located and selected on the basis of complex criteria or semantic features. We designed a way to support arbitrary queries in structured P2P networks by implementing unstructured search techniques on top of DHT based overlays. July 2, 2008 HPC 2008, Cetraro, Italy 3

Objective of this work (2/2) We followed this approach by designing DQ DHT: an algorithm that combines the dynamic querying (DQ) strategy used in unstructured P2P systems, with an efficient algorithm for broadcast over a DHT. The aim of DQ DHT is two fold: allowing arbitrary queries in structured networks providing dynamic adaptation of the search according to the popularity of the resource to be located. We evaluated DQ DHT through simulations and experimented a prototype on a Grid testbed (Grid 5000). July 2, 2008 HPC 2008, Cetraro, Italy 4

DQ in unstructured networks Goal: controlling the query propagation on the basis of the desired number of results and the popularity of the resource to be located. Probe phase: The search initiator sends the query towards a few neighbors eg with a small TTL to estimate the epopularity of the resource. If the desired number of results is not reached, an iterative process takes place to contact a new set of nodes. The process stops when the desired number of results is received, or all neighbors have been already queried. July 2, 2008 HPC 2008, Cetraro, Italy 5 European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies 5

DQ in structured networks Similarly to DQ, DQ DHT dynamically adapts the search extent t on the basis of the desired d number of results and the popularity of the resource to be located. Differently from DQ, DQ DHT exploits the structural constraints of the DHT to avoid message duplications. DQ DHT is based on an algorithm for broadcast over a Chord DHT (El Ansary et al.) that allows to distribute a data item to all nodes in the network with N 1 messages in O(log 2 N) steps. July 2, 2008 HPC 2008, Cetraro, Italy 6 European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies 6

Broadcast over a Chord DHT Chord ring with m = 4 and N = 16 14 15 1 5 15 0 2 6 13 D 0 14 0 1 3 7 15 D 2 1 2 4 8 0 D 2 1 2 3 5 9 2 3 4 6 10 D 4 3 Broadcast data item D 4 5 7 11 3 Spanning tree view: 2 4 6 10 13 14 0 4 12 D14 D14 D 2 D 2 D10 D 6 D 6 4 5 6 8 12 5 7 8 11 12 14 12 13 15 3 11 D12 11 12 14 2 10 (El-AnsaryA et al.) 10 11 13 1 9 m steps N 1 messages D10 8 9 10 12 0 D10 7 8 9 11 15 D 8 6 5 7 8 10 14 6 7 9 13 9 13 15 0 binomial tree July 2, 2008 HPC 2008, Cetraro, Italy 7 1

Properties of the broadcast spanning tree initiator fingers: 1 2 3 4 Binomial tree B i : B 0 B 1 # nodes in B i = 2 i Depth of B i = i # nodes at level l in B i = i i l l B 2 B 3 Given the binomial trees properties, DQ DHT can estimate with good approximation the number of nodes at the different levels of the subtrees associated to each finger (neighbor) of the querying node. July 2, 2008 HPC 2008, Cetraro, Italy 8

DQ DHT algorithm Node initiating a search: 1. Sends the query to a given subset of its fingers waits for a given amount of time 2. While (# results < # desired_results) results) and (more fingers to contact): 1. calculates the item popularity 2. chooses a new subset of fingers to contact 3. sends the query to those fingers 4. waits for a given amount of time. Node receiving ing a query: 1. Processes the query locally against data it stores in case of match it replies to the query initiator 2. Forwards the query to the portion of the spanning tree it is responsible for, using the standard broadcast algorithm. July 2, 2008 HPC 2008, Cetraro, Italy 9

Performance analysis: simulation settings Network parameters: Identifier space: 32 bit No. of nodes (N) = 50000 Resource replication rate (r): 0.25% to 32% Algorithm parameters: Set of fingers to visit during the probe query (V) No. of levels from which to wait a response during the probe phase, before to evaluate the resource popularity (L): 2 to 8 Query parameter: Desired number of results (R d ): 100 Performance metrics: Number of messages Search time to obtain R d results July 2, 2008 HPC 2008, Cetraro, Italy 10

DQ DHT: choosing the value of V As expected, number of messages and search time decrease as r increases, for any value of V. With low values of r (rare resources) it is convenient to start the search by contacting a finger rooting a large spanning tree (e.g., V={F 14 }), which leads to lower search times. When the value of r is unknown, it is convenient to start the search with an intermediate finger (e.g., V={F 11 }), which produces a good balance between search time and number of messages. July 2, 2008 HPC 2008, Cetraro, Italy 11

DQ DHT: choosing the value of L Lower values of L generate lower search times, since the waiting time after the probe phase is proportional to L. On the other hand, if L is too low, the resource popularity p cannot be estimated in accurate way and a large number of messages will be sent. In general, intermediate values of L (e.g., L = 4 for V={F 11 }) produce the best compromise between number of messages and search time. July 2, 2008 HPC 2008, Cetraro, Italy 12

DQ DHT: comparison with DQ (1/2) Performance metrics: Number of messages Search time Success rate Duplication rate DQ DHT parameters: V and L: 1. V = {F 14 } and L = 5 2. V = {F 11 } and L = 4 DQ parameters: n (# neighbors contacted during the probe phase) and t (time to live used to propagate the query during the probe phase): 1. n = 3 and t = 2 2. n = 3 and t = 3 July 2, 2008 HPC 2008, Cetraro, Italy 13

DQ DHT: comparison with DQ (2/2) July 2, 2008 HPC 2008, Cetraro, Italy 14

Using DQ DHT in a real Grid scenario (1/2) Evaluating the use of DQ DHT in a Grid scenario *: Superpeer architecture: DQ DHT DHT used for distributing queries among Superpeers Client Server communication among Peers and Superpeers Implemented in Java using Open Chord * H. Papadakis, P. Trunfio, D. Talia, P. Fragopoulou. Design and Implementation of a Hybrid P2Pbased Grid Resource Discovery System. In: Making Grids Work, Springer, 2008. July 2, 2008 HPC 2008, Cetraro, Italy 15

Using DQ DHT in a real Grid scenario (2/2) Preliminary performance results on Grid 5000: over 400 CPUs across 4 sites 5000 Superpeers 50 Peers per Superpeer R d =100 desired # results R d R d=10 R d =100 R d =1000 r=1.0 r=0.1 r=0.01 r=0.001 replication rate 450 ms # messages 10 ms Replication rate (%) July 2, 2008 HPC 2008, Cetraro, Italy 16

Concluding remarks In a pervasive scenario such as that of the Internet of services and the Internet of things, scalable information services are vital. Centralized or hierarchical approaches can be a bottleneck in such scenarios. The same occurs in large scale dynamic distributed systems like Grids, Clouds and P2P systems: efficient service and/or resource discovery requires a scalable information service. Structured P2P models can provide effective solutions, but they must be flexible in handling dynamic service discovery. July 2, 2008 HPC 2008, Cetraro, Italy 17

Concluding remarks Combining structured P2P networks with unstructured P2P search techniques is an effective way to support arbitrary queries in distributed systems like Grids. We experimented the use of DQ DHT as basic querying algorithm for the prototype of a Grid information service. Simulation and experimental results (on the Grid 5000 testbed) show the effectiveness of the approach. As future work we will study how to exploit the DQ DHT approach to support semantic service discovery in emerging distributed environments like massive Cloud systems. July 2, 2008 HPC 2008, Cetraro, Italy 18

Questions? Thank you July 2, 2008 HPC 2008, Cetraro, Italy 19