A Reputation Management System in Structured Peer-to-Peer Networks So Young Lee, O-Hoon Kwon, Jong Kim and Sung Je Hong Dept. of Computer Science & Engineering, Pohang University of Science and Technology Email: {soyoung, dolphin, jkim, sjhong}@postech.ac.kr Abstract Since there is no method to verify the trustworthiness of shared files in P2P systems, malicious peers can spread untrustworthy files to the system. In order to prevent untrustworthy files from spreading, we propose an effective reputation management system using peer reputation and file reputation together in DHT-based structured P2P networks. Simulation results show that the proposed reputation system works better in preventing untrustworthy files from spreading than existing systems even in cases of allowing malicious peers to change their identities. 1. Introduction In P2P systems, since no peer has the power or responsibility to monitor and restrain the others behavior, there is no method to verify the trustworthiness of shared files. Therefore, malicious peers can spread untrustworthy files such as fake files that cheat their contents and corrupted files that infect systems or leave back doors in systems with viruses like the VBS.Gnutella [3] in Gnutella and W32.Supova.Worm [4] in Kazaa. In order to prevent these untrustworthy files from spreading, peers need to judge the reliability of other peers and shared files based on their experience and share the judgment with others using the reputation system. A reputation system receives, aggregates, and provides feedbacks about participants past behavior. The feedbacks help participants decide whom to trust, encourage trustworthy behavior and deter dishonest people from participation [9]. A P2P reputation system is similar to the general reputation system, but there are some design considerations for P2P Networks. Change of Identity In the real world, changing an identifier is not easy since it is strongly connected with the owner. In P2P systems, however, the identifier of a peer has no relation with its owner and can be changed easily. Due to this reason, a participant who has a low reputation can change its identifier and rejoin the system as a newcomer. If the reputation information is only recorded based on easily changeable identifier, it is hard to prevent malicious peers from pretending innocence. In this paper we use the reputation information of files which is more difficult to be changed than that of peers. By using the file reputation information, we can preclude a malicious peer from spreading an identical untrustworthy file again just by changing its identifier and rejoining the system. Storage of the reputation information An important consideration for a reputation management system is where to store the reputation information. There are two choices, local storage or global storage. The reputation system using local storage works as follows. Peer A can store its experience against peer B only in its local storage, and when others ask What do you know about peer B?, it answers them based on his stored information. It is a very simple and traditional way for people to learn the reputation of others. But, if you want to know someone more objectively, you must ask as many people as possible. This is also applied in the case of using local storage. If a peer wants more objective reputation about another peer, it must ask many peers. This will generate a lot of messages in P2P systems. Also, if a peer is not on-line when the reputation information is needed, the information of the peer is not reflected. So in this paper we propose to use global storage where others can easily access reputation information and it is still available when an evaluator is not on-line. Integrity of the reputation information The integrity of reputation information will be the most important characteristic that is directly connected with the reliability of the system. A reputation system based on P2P must guarantee the integrity from two sides. First, the evaluation itself must be reliable. The reputation system must prevent malicious peers from polluting it by giving a positive evaluation to a untrustworthy file or negative evaluation to a trustworthy file. For this, it is needed to confirm whether the evaluator is trustworthy or not. Secondly, when the evaluated value is stored and retrieved, it must not be altered by keeping malicious peers from modifying the reputation information to raise their own reputation or just to subvert the system.
To prevent this kind of malicious behavior, some replication scheme is needed. In this paper, we propose a reputation management system in DHT-based structured P2P networks. The proposed reputation system has two characteristics. First, it uses file reputation information as well as peer reputation information. Second, the system uses a global storage for reputation information. Although there are many application areas for P2P such as instant messaging and distributed computing, we just consider file sharing applications which are the most popular. This paper is organized as follows: We describe the proposed system model in Section 2 and basic operations in the proposed system in Section 3. Then, we show the simulation results in Section 4. Related works and the differences underlying our work are discussed in Section 5. Finally, we summarize this paper and give concluding remark in Section 6. 2. System Model In this section, we describe the overall system model and some assumptions of our proposed reputation management system. 2.1. Storing Reputation Information As we mentioned earlier, we use global storage to store the reputation information. But the global storage is not real, but a virtual one that is really partitioned into several small parts stored in various participants. Every participant equally manages some part of the whole reputation information. The peer that takes care of the reputation information is determined by a hash function in O(1) time and found using a Distributed Hash Table(DHT) [11]. Others who need the information also can know where it is stored by using the same hash function. We will explain the proposed system based on Chord [11], which is very widely cited DHTbased structured P2P networks. Although we use Chord as the base architecture, the proposed reputation system can be applied to other DHT-based structured P2P networks. 2.2. Roles of Peer Every peer equally participates in the following five roles in the proposed system : file provider, file consumer, file index manager, file reputation manager, peer reputation manager. Like other P2P file sharing systems, all peers are providers and also consumers of files. Because we assume the use of DHT-based structured P2P networks, every peer is responsible for some part of the file index, we call this role as file index manager. These three are basic roles of every participant in DHTs. For a reputation system, we add two roles, file reputation manager and peer reputation manager. The file reputation manager takes care of reputation information of files which are identified by their name and contents. In the proposed system, file reputation manager and peer reputation manager are same. Because just a reputation column is added to the existing file index table, our system can be applied to other DHTs without great difficulty. The peer reputation manager stores and manages other peer s reputation information. 2.3. Identifiers Every peer who takes part in the system has a unique m-bit identifier ID peer which is the hash value of peer s IP address or the digest of a public key. In the proposed system, ID peer is used to determine the file indexes and file reputation information that it must take care of. Also, double hash value of ID peer, Hash (ID peer ), points the peer that knows the peer reputation information of ID peer, which is known as the reputation manager of ID peer. Each shared file has two identifiers, key identifier ID key determined by its name and content identifier ID content determined by its contents. In DHT-based P2P networks, ID key which is generated by hashing file name is on the same name space with ID peer. We use an additional identifier ID content, which is generated by hashing its content and unique by its contents. If two files have the same ID content, that means they are identical. 2.4. Repositories Every peer has two separate repositories for the role of managers. One is the file repository which stores reputations of files and the other is the peer repository which stores reputations of peers. To manage file reputation information, we add a file reputation field and ID content field to the basic file index table of DHT. The file repository is organized as a table with attributes (ID key, ID content, file reputation (positive,negative), file owners, description). the reputation information of peers that it manages. Files that have the same key identifier but have a different content identifier are treated differently. We use two keys, content identifier and key identifier to access file repository. File reputation is the accumulated value in the row identified by ID key and ID content. It consists of two values: positive reputation, which is the number of evaluations stating the file is trustworthy and negative reputation, which is the number of evaluations stating the file is not trustworthy. If two files have the same ID key and ID content, the files share the same file reputation. The table also maintains the list of owners, list of ID peer s that have the file, and optional file descriptions such as file name and file size. 2
Table 1. File Repository ID key ID content File Rep Owners Description (+,-) K 8 F 1 (30,2) N 5,N 40 Music3 K 10 F 4 (45,0) N 20,N 3 Music1 K 10 F 10 (10,50) N 19,N 41 Music1 Table 1 shows the file repository of peer whose ID peer is N 10. Different from the original DHT-based system, however, we can see that files F 4 and F 10 which have same ID key, K 10, occupy different rows. The file whose ID content is F 4 has a positive reputation 45 and negative reputation 0, it means the file is trustworthy. But, file F 10 that has the same ID key but different ID content has positive reputation 10 and negative reputation 50, and so it is not trustworthy. Because the file index manager and reputation manager are the same, we can know the reputation information of the file by just searching the wanted file. Table 2. Peer Repository H(ID peer ) Positive Negative N 13 5 50 The peer repository is also organized as a table to store peer reputation data. Peer reputation is the summation of the reputation information of the files that it has provided. Like file reputation, it consists of positive reputation and negative reputation. The positive value means how many times trustworthy files the peer provided and negative value means how many untrustworthy files it provided. Table 2 shows the peer repository of peer N 13. We can see that peer N 13 provided more negative files than positive ones, so this peer is not trustworthy. To prevent repositories from being large, each manager deletes the rows that are not referenced or updated in a pre-determined interval (e.g., 1 month). 3. Basic Operations The proposed system consists of the following basic operations : Join and Publish, Query and Response, Download and Evaluation, Update Repositories. 3.1. Join and Publish When a peer joins the system, a peer identifier ID peer is assigned. Also the shared files get two identifiers such as ID key and ID content. A peer publishes its files by sending publish messages to the file reputation manager: publish (ID key, ID content, ID peer, description). The file reputation manager who receives the publish message updates its file repository. If its repository does not contain the information of published file, it adds a new row to its repository and assigns the initial reputation values, positive value 0 and negative value 0. If the manager already has the information of the file, it just adds the ID peer value to the file owner. As shown in Figure1, peers N 10 and N 20 both publish a file whose name is Music1 but whose contents are different. Because the two files have the same name, they are assigned the same key identifier and published to the same file reputation manager N 3. File reputation manager N 3 updates its repository using the received message. The file of N 10, key identifier K 3 and file identifier is F 10, is a newly appeared file because no entry matches with its two identifiers. So, N 3 adds a new row with reputation value (0,0) and file owner N 10. Whereas, the file of N 20, key identifier K 3 and file identifier F 6, is an already known file, its reputation is positive 30, negative 2 and the other peer N 7 also shares the identical file. N 3 just adds N 20 to the file owner column. 3.2. Query and Response The peer that wants to find a file, sends a query message to the file reputation manager: query (ID key ). The file reputation manager who receives the query message retrieves candidates that have the same ID key in the query from its repository. The candidates are classified as following 3 levels by their file reputation : trustworthy, unknown, untrustworthy. The reputation level of each file is decided by the following two Conditions. P ositive + Negative > T (1) P ositive P ositive + Negative > P (2) T and P are system-wide parameters. T is a threshold of minimum number of evaluations and P is a threshold of trust ratio. Files that do not satisfy Condition (1) are classified as unknown. Namely, they are considered as new, because we do not have enough reputation information on the file so we can not decide whether the file is trustworthy or not. The files which satisfy Conditions (1) and (2) are classified as trustworthy. These files have been evaluated enough and are considered as trustworthy by the evaluation. Whereas, the files which satisfy Condition (1) but do not satisfy Condition (2) are classified as untrustworthy. File reputation managers only include trustworthy and unknown files to the response message except untrustworthy files. In Figure 1, if N 3 receives a query to search K 3, it searches its repository and finds that three different files with ID content of F 6, F 7 and F 10 exist. Among them, it includes file F 10 which does not have enough reputation and 3
Figure 1. Join and Publish file F 6 which satisfies Conditions (1) and (2) (e.g., We assume P = 0.8). The file reputation manager sends the response message to the requester: Response (list of ID key, ID content, file owners, level, description). By excluding untrustworthy files from the response, their existence in the system is hidden. 3.3. Download and Evaluation The peer who receives the Response selects one of the various files. If it selects a file whose level is trustworthy, it randomly select one of its providers to avoid bottlenecks on high reputable peers. Otherwise, if it wants to select a file whose level is unknown, it can verify the file by its owner s reputation. To keep malicious peers from modifying untrustworthy files slightly and republishing them like a new, the peer queries the reputation of the owner of the unknown file. If the file is provided by several owners, the peer can take an average value, a maximum value or a minimum value of several file owner s reputation. If the reputation of the file owner is low, the file is excluded from the selection. After downloading and using the selected file, the peer evaluates its trustworthiness as positive or negative and sends the evaluation to the file reputation manager. If the file is trustworthy, the evaluation is assigned +1, if not it gets -1. The evaluated value is also sent to the peer reputation manager. If the file gets +1, the peer who provides the file also get the +1. For the performance wise, the selected file can be downloaded from multiple owners by block. 3.4. Update Repositories The file reputation manager and peer reputation manager who receive the evaluation data store it in their file and peer repositories. Malicious evaluators, however, can forge the reputation value by giving positive evaluation to an untrustworthy file or giving negative evaluation to a trustworthy file. To prevent this, before updating the value, the managers confirm whether the evaluator is trustworthy or not. The managers hash evaluator s ID peer and query the peer reputation manager of the evaluator by sending QueryPeer message : QueryPeer (Hash(ID peer )). The peer reputation manager who receives the QueryPeer message searches its peer repository, calculates the level of peer reputation like file reputation and sends the ResponsePeer (Hash(ID peer, level) to the requester. By this ResponsePeer, the opinion of the evaluator is treated differently. For example, if the level of peer reputation is trustworthy, the manager updates the reputation value of the corresponding file and peer. And if the level is untrustworthy, the value is not reflected in the repository. If the level is unknown, only a partial value, e.g. a half of its trustworthiness is reflected. Through reflecting the opinion of the unknown peer relatively less than that of the trustworthy peer, we can reduce the probability of polluting the repositories. 4. Simulation Results We have experimented with simulation to show the benefits of using file and peer reputations together. In the first 4
experiment, we compared the case of using peer reputation only with the case of using file and peer reputations together. Our second experiment compared those two cases when malicious peers change their identifiers periodically. We start the simulation with n peers and the percentile of malicious peers is 5% of total participants. Initially, every peer has s different kinds of files. Peers share the downloaded file again when the file and the peer are both untrustworthy or both trustworthy. We have assumed that (i) the malicious peers only have untrustworthy files and give a negative reputation to the trustworthy file and positive reputation to the untrustworthy file to subvert the system and (ii) trustworthy peers only have trustworthy files and act correctly. The evaluation is reflected based on the reputation of evaluator. In the simulation, if the level of peer reputation is trustworthy, the manager updates the reputation value of the corresponding file and peer. And if the level is untrustworthy, the value is not reflected in the repository. Finally if the level is unknown, only one-half of the evaluation value is reflected. In the case of using file and peer reputations together, peers periodically send query messages to the file reputation managers. The file is determined as trustworthy, unknown, or does not exist. Since we exclude untrustworthy files in the response, they appear as nonexistent. If the file is proved to be trustworthy, the peer randomly selects one of the providers. Otherwise, it is classified as unknown and the peer decide to download the file or not by using the reputation of providers. untrustworthy providers are excluded from the selection and one of the trustworthy or unknown providers is selected. In the case of using peer reputation only, the assumptions and steps are similar with the above, except referencing the file reputation. Peers do not use file reputation information and only refer to peer reputation. We use the rate of downloading untrustworthy files among total downloading as metric. Figure 2 shows the result of our first simulation. It compares the rate when using peer reputation only with the rate when using peer and file reputations together. The x-axis means the total number of downloads performed by peers against trustworthy and untrustworthy files and the y-axis means the rate of downloading untrustworthy files among the total downloads. In the first 10,000 downloads, since all files and peers are not known well, the rate of downloading untrustworthy files are same. But after 10,000 downloads, the case that use peer and file reputations together slightly reduces the rate. The benefit of file reputation is proved in the second simulation. In the second simulation, every condition is the same as in the first, except malicious peers change their identities. After every 10,000 downloads are performed, 50% of total malicious peers changes their identity and pretend to be innocent. Figure 3 shows the result of second Rates of downloading untrustworth file (%) Rates of downloading untrustworth file (%) 6.5 6 5.5 5 4.5 4 3.5 3 2.5 2 with file reputation only peer reputation 1.5 0 10 20 30 40 50 60 70 80 90 Number of donwloads(1,000) Figure 2. Peer reputation only vs. File reputation together 5.5 5 4.5 4 3.5 3 2.5 2 50% of malicious peers change their identity at every 10,000 with file reputation only peer reputation 1.5 0 10 20 30 40 50 60 70 80 90 Number of donwloads(1,000) Figure 3. When 50% of malicious peers change their identity at every 10,000 downloads simulation. Changing identity does not harm the system which use peer and file reputation together, but the system which only use peer reputation continuously suffers from untrustworthy files. 5. Related Works There have been several studies about managing reputation on P2P networks. These studies can be classified into unstructured or structured according to the base architecture of P2P networks. Because many famous P2P file sharing applications [2, 1] are implemented on unstructured P2P networks for practical reason, most previous works [6, 7, 10] about reputation management systems are based on unstructured P2P net- 5
works. Among them, Xrep [6] is similar to ours in terms of using combined reputations of peers and resources to recognize untrustworthy resources regardless of its provider. But, it has several weak points. First, it creates too many messages and is not scalable since it gathers opinions about resources and peers using a distributed polling algorithm. Second, it does not use the reputation information effectively since it does not use the reputation information on selection but only use for verifying the selection. Third, it lacks a reliable method to verify the trustworthiness of voters. Recently, several reputation systems in structured P2P networks have been proposed. EigenTrust [7] and PeerTrust [12] are reputation management systems in structured P2P networks such as CAN [8] and P-Grid [5], respectively. In EigenTrust, each peer has multiple score (reputation) managers that are determined effectively by using CAN characteristics. But, since the trust values are stored in local storage, to get a unique global value from any starting position in the system it must contact lost of neighbors, which creates high overhead. Also, the normalizing technique used in EigenTrust makes it impossible to distinguish between malicious peers and newly joined peers. PeerTrust also stores the trust data in a distributed manner using DHT and uses the trust manager that is responsible for feedback submission and trust evaluation. But, PeerTrust just suggests an independent reputation system using DHT, we present more adaptable system by adding some columns to existing DHT structure. Since both of them consider only the peer reputation, they can not prevent malicious peers from changing their identities. Our work differs from others in two aspects. First, we present detailed data structures and algorithms to manage reputation information in DHT-based structured P2P networks. Our system can be easily implemented to use index tables and routing protocols of DHTs. Second, we use file reputation and peer reputation together unlike Eigen- Trust [7] and PeerTrust [12]. Therefore, we can prevent untrustworthy files from spreading even in case of allowing malicious peers to change their identities. 6. Conclusion and Future Work cases of allowing malicious peers to change their identities. But, to prevent the loss of the reputation information, we need a firm replication scheme. To encourage the evaluators and resource providers, applying an incentive mechanism is very useful. Finally, to prove our scheme is more efficient than others, we will perform more evaluations. References [1] Gnutella homepage. http://www.gnutella.com. [2] Kazza homepage. http://www.kazaa.com. [3] Vbs.gnutella worm. http://securityresponse.symantec.com /avcenter/venc/data/vbs.gnutella.html. [4] W32.supova worm. http://securityresponse.symantec.com /avcenter/venc/data/w32.supova.worm.html. [5] K. Aberer. P-grid: A self-organizing access structure for p2p information systems. Proceedings of ACM Conference on Information and Knowledge Management (CIKM), 2001. [6] E. Damiani, D. C. di Vimercati, S. Paraboschi, P. Samarati, and F. Violante. Reputation-based approach for choosing reliable resources in peer-to-peer networks. Proceedings of the 9th ACM Conference on Computer and Communications Security, 2002. [7] S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina. The eigentrust algorithm for reputation management in p2p networks. Proceedings of the 12th International World Wide Web Conference, May 2003. [8] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content addressable network. Proceedings of the ACM 2001 SIGCOMM Conference, August 2001. [9] P. Resnick, R. Zeckhauser, E. Friedman, and K. Kuwabara. Reputation systems. Communications of the ACM, 43(12):45-48, December 2000. [10] A. Selcuk, E. Uzun, and M. Pariente. A reputation-based trust management system for p2p networks. Proceedings of the International Workshop on Global and Peer-to-Peer Computing, IEEE/ACM CCGRID, 2004. [11] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable Peer-To-Peer lookup service for internet applications. Proceedings of the 2001 ACM SIG- COMM Conference, 2001. [12] L. Xiong and L. Liu. Peertrust: Supporting reputation-based trust for peer-to-peer electronic communities. IEEE Transactions on Knowledge and Data Engineering, 16(7):843 857, July 2004. We have presented an effective reputation management system using file reputation and peer reputation together in DHT-based structured P2P networks. The proposed system used file reputation as well as peer reputation so that it could prevent untrustworthy files from spreading even in cases of allowing malicious peers to change their identities. The system used DHTs in order to store and retrieve reputation information in a scalable and distributed manner. we showed that the proposed system works better in preventing untrustworthy files from spreading than existing systems even in 6