Peer-to-Peer Data Management
|
|
- Sheila Tucker
- 8 years ago
- Views:
Transcription
1 Peer-to-Peer Data Management Wolf-Tilo Balke Sascha Tönnies Institut für Informationssysteme Technische Universität Braunschweig
2 4. Overview. Introduction 2. Content Searching in Peer-to-Peer Applications. Problems in Peer-to-Peer Information Retrieval 2. Related Work in Distributed Information Retrieval 3. Index structures for Query Routing. Distributed Hash Tables for Information Retrieval 2. Routing Indexes for Information Retrieval 3. Locality-Based Routing Indexes 4. Supporting Effective Information Retrieval. Providing Collection-Wide Information 2. Estimating the Document Overlap 3. Prestructuring Collections with Taxonomies 5. Summary and Conclusion
3 4. What is IR? Information retrieval (IR) is the science of searching for documents, for information within documents and for metadata about documents A user enters a query, i.e. an information need, into the system Several objects may match the query with different degrees of relevancy
4 4. RepresentingText How do we represent the complexities of language? Computers don t understand documents or queries Simple, yet effective approach: bag of words Treat all the words in a document as index terms for that document Assign a weight to each term based on its importance Disregard order, structure, meaning, etc. of the words
5 4. Representing Text McDonald's slims down spuds Fast-food chain to reduce certain types of fat in its french fries with new cooking oil. NEW YORK (CNN/Money) - McDonald's Corp. is cutting the amount of "bad" fat in its french fries nearly in half, the fast-food chain said Tuesday as it moves to make all its fried menu items healthier. But does that mean the popular shoestring fries won't taste the same? The company says no. "It's a win-win for our customers because they are getting the same great french-fry taste along with an even healthier nutrition profile," said Mike Roberts, president of McDonald's USA. But others are not so sure. McDonald's will not specifically discuss the kind of oil it plans to use, but at least one nutrition expert says playing with the formula could mean a different taste. Shares of Oak Brook, Ill.-based McDonald's (MCD: down $.54 to $23.22, Research, Estimates) were lower Tuesday afternoon. It was unclear Tuesday whether competitors Burger King and Wendy's International (WEN: down $.8 to $34.9, Research, Estimates) would follow suit. Neither company could immediately be reached for comment. 6 said 4 McDonalds 2 fat fries 8 new 6 company, french, nutrition 5 food, oil, percent, reduce, taste, Tuesday Bag of Words
6 4. Retrieval Retrieving relevant information is hard! Evolving, ambiguous user needs, context, etc. Complexities of language To operationalize information retrieval, we must vastly simplify the picture Information retrieval is all (and only) about matching words in documents with words in queries Obviously, not true But it works pretty well!
7 Document Document 2 4. Representing Documents asvectors Document The quick brown fox jumped over the lazy dog s back. Document 2 Now is the time for all good men to come to the aid of their party. Term aid all back brown come dog fox good jump lazy men now over party quick their time Stopword List for is of the to
8 4. RepresentingText text + structure document structured recognition accents, Howspacing, to comparestopwords etc. documents and queries? text noun groups stemming automatic or manual indexing structure full text index terms
9 4. Boolean Retrieval Weights assigned to terms are either or represents absence : term isn t in the document represents presence : term is in the document Build queries by combining terms with Boolean operators AND, OR, NOT The system returns all documents that satisfy the query
10 Doc Doc 2 Doc 3 Doc 4 Doc 5 Doc 6 Doc 7 Doc 8 4. Boolean View of a Document-Set (=Collection) Term aid all back brown come dog fox good jump lazy men now over party quick their time Each column represents the view of a particular document: What terms are contained in this document? Each row represents the view of a particular term: What documents contain this term? To execute a query, pick out rows corresponding to query terms and then apply logic table of corresponding Boolean operator
11 Doc Doc 2 Doc 3 Doc 4 Doc 5 Doc 6 Doc 7 Doc 8 Doc Doc 2 Doc 3 Doc 4 Doc 5 Doc 6 Doc 7 Doc 8 4. Sample Queries Term dog fox dog fox dog fox dog fox fox dog dog AND fox Doc 3, Doc 5 dog OR fox Doc 3, Doc 5, Doc 7 dog NOT fox empty fox NOT dog Doc 7 Term good party g p over g p o good AND party Doc 6, Doc 8 good AND party NOT over Doc 6
12 4. The Perfect Query Paradox Every information need has a perfect set of documents If not, there would be no sense doing retrieval Every document set has a perfect query AND every word in a document to get a query for it Repeat for each document in the set OR every document query to get the set query But can users realistically be expected to formulate this perfect query? Boolean query formulation is hard!
13 4. Why Boolean Retrieval fails Natural language is way more complex AND discovers nonexistent relationships Terms in different sentences, paragraphs, Guessing terminology for OR is hard good, nice, excellent, outstanding, awesome, Guessing terms to exclude is even harder! Democratic party, party to a lawsuit,
14 4. Strengths and Weaknesses Strengths Precise, if you have a clear idea of what you re looking for Efficient for the computer Weaknesses Users must learn Boolean logic Boolean logic insufficient to capture the richness of language No control over size of result set: either too many documents or none All documents in the result set are considered equally good What about partial matches? Documents that don t quite match the query may be useful also
15 4. Ranked Retrieval Order documents by how likely they are to be relevant to the information need Present hits one screen at a time At any point, users can continue browsing through ranked list or reformulate query Attempts to retrieve relevant documents directly, not merely provide tools for doing so
16 4. Why Ranked Retrieval? Arranging documents by relevance is Closer to how humans think: some documents are better than others Closer to user behavior: users can decide when to stop reading Best (partial) match: documents need not have all query terms Although documents with more query terms should be better
17 4. Similarity-based Retrieval? Let s replace relevance with similarity Rank documents by their similarity with the query Treat the query as if it were a document Create a query bag-of-words Find its similarity to each document Rank order the documents by similarity Surprisingly, this works pretty well!
18 4. Vector Space Model t 3 d 2 d 3 φ θ d t t 2 d 5 d 4 Postulate: Documents that are close together in vector space talk about the same things Therefore, retrieve documents based on how close the document is to the query (i.e., similarity ~ closeness )
19 4. How to Weight Terms? Idea: Hans Peter Luhn 958, IBM Here s the intuition: Terms that appear often in a document should get high weights The more often a document contains the term dog, the more likely that the document is about dogs. Terms that appear in many documents should get low weights Words like the, a, of appear in (nearly) all documents. How do we capture this mathematically? Term frequency Inverse document frequency
20 4. TFxIDF TFxIDF [Gerald Salton, 96] Term Frequency (TF) How often a term appears in a document can be calculated locally Document Frequency (DF) Number of documents, which contain a specific term Needs global (system wide) knowledge Inverse Document Frequency (IDF) Discriminator for the importance of a term regarding the number of occurrences in all documents Needs global (system wide) knowledge
21 4. Working on Indices quick brown fox over lazy dog back now time all good men come jump aid their party Term Doc Doc 2 Doc 3 Doc 4 Doc 5 Doc 6 Doc 7 Doc 8 The term-document matrix again has bag of words information about the collection
22 4. Small yet Fast? Can we make this data structure smaller, keeping in mind the need for fast retrieval? Observations: The nature of the search problem requires us to quickly find which documents contain a term The term-document matrix is very sparse Some terms are more useful than others
23 Doc Doc 2 Doc 3 Doc 4 Doc 5 Doc 6 Doc 7 Doc 8 4. Posting Lists Term aid all back brown come dog fox good jump lazy men now over party quick their time Postings 4, 8 2, 4, 6, 3, 7, 3, 5, 7 2, 4, 6, 8 3, 5 3, 5, 7 2, 4, 6, 8 3, 3, 5, 7 2, 4, 8 2, 6, 8, 3, 5, 7, 8 6, 8, 3, 5, 7 2, 4, 6
24 4. Inverted Document Index Term aid all back brown come dog fox good jump lazy men now over party quick their time Postings 4, 8 2, 4, 6, 3, 7, 3, 5, 7 2, 4, 6, 8 3, 5 3, 5, 7 2, 4, 6, 8 3, 3, 5, 7 2, 4, 8 2, 6, 8, 3, 5, 7, 8 6, 8, 3, 5, 7 2, 4, 6
25 4. What goes in the Postings? Boolean retrieval Just the document number Ranked Retrieval Document number and term weight (tf.idf,...) Proximity operators Word offsets for each occurrence of the term
26 4.2 Overview. Introduction 2. Content Searching in Peer-to-Peer Applications. Problems in Peer-to-Peer Information Retrieval 2. Related Work in Distributed Information Retrieval 3. Index structures for Query Routing. Distributed Hash Tables for Information Retrieval 2. Routing Indexes for Information Retrieval 3. Locality-Based Routing Indexes 4. Supporting Effective Information Retrieval. Providing Collection-Wide Information 2. Estimating the Document Overlap 3. Prestructuring Collections with Taxonomies 5. Summary and Conclusion
27 4.2 Information Retrieval in P2P Systems Information Retrieval deals with complex documents Meta-data can only capture some aspects of a document, but not anticipate all semantic searches E.g. sports-related newspaper article, but no names, locations, etc. Support for full-text searches needed Find the best-matching document from the bestconnected peer Unlike in file sharing emphasis is on the document quality If there are multiple sources offering similar quality documents, choose best peer according to connection, etc.
28 4.2 Challenges in P2P IR Efficient query evaluation scheme Central inverted index of documents is expensive to maintain How to disseminate a peer s query? Simple flooding of all queries is not scalable, if best documents have to be found (not just some match) Dealing with network churn A peer can always alter the set of documents offered, or significantly change individual documents Peers may join and leave the network, i.e. whole document collections may disappear, or can be added Integration of collection-wide information Peers are not able to calculate IR-style scorings from local knowledge, but needs some knowledge from the (virtual) merged collection Constant dissemination of collection-wide information needs a lot of bandwidth
29 4.2 Example: Problem of Collection-wide Information Example: Different news collections, query on keyword basketball General news collection, e.g. Many articles, only few about basketball, therefore IDF small Keyword discriminates well between articles NBA news collection Few articles, almost all about basketball, therefore IDF high Keyword hardly discriminates between articles Merged collection: IDF medium But how do independent collections (peers) exchange their information?
30 4.2 Example: Problem of Collection-wide Information Top object A... Peer A... B... B global scoring all objects identical TF = IDF = 6/3 A Querying Peer Query: A and B TF= IDF=3/2 TF= IDF= 3/ local scoring A... Peer 2 B... B... Top object TF= IDF=3/ TF= IDF= 3/2 local scoring
31 4.2 Distributed IR Distributed information retrieval techniques grew increasingly important for searching Web sources Abstracts of information sources To support distributed retrieval sources have to register abstracts or keyword sets Abstracts can either be kept in a central repository or distributed by gossiping algorithms, e.g. PlanetP [Cuenca-Acuna et al., 3] Collection selection Having no central index needs a sophisticated way of choosing the most promising collections for querying
32 4.2 Distributed IR Such abstracts can be compactly represented by Bloom Filters, i.e. bit vectors that allow membership queries Each term is hashed with n different functions and the position in the bit vector for each hash value is set to Allows for false positives, but no false negatives In Counting Bloom Filters objects can also be removed?
33 4.2 Distributed IR Benefit estimators for collection selection use aggregated statistics about individual collections for selection, e.g. CORI measure [Callan et al., 95] CORI calculates collection score s i for collection i regarding query q: with and where n is the number of collections, cdf the collection document frequency, cdf max the maximum cdf and cf t the collection frequency of term t
34 4.3 Overview. Introduction 2. Content Searching in Peer-to-Peer Applications. Problems in Peer-to-Peer Information Retrieval 2. Related Work in Distributed Information Retrieval 3. Index structures for Query Routing. Distributed Hash Tables for Information Retrieval 2. Routing Indexes for Information Retrieval 3. Locality-Based Routing Indexes 4. Supporting Effective Information Retrieval. Providing Collection-Wide Information 2. Estimating the Document Overlap 3. Prestructuring Collections with Taxonomies 5. Summary and Conclusion
35 4.3 Index Structures for Query Routing Traditional index structures cannot be readily employed in P2P systems High degree of distribution High degree of volatility (churn) High degree of index maintenance Distributed paradigms needed to route queries to appropriate peers Simple flooding method does not scale Distributed hash table lookup Using indexed routing information Using shortcut overlays
36 4.3 Distributed Hash Tables for IR Distributed hash tables Route queries to appropriate peers with number of hops logarithmic in network size No peer needs to maintain more than logarithmic amount of routing information But Exact match queries only All new content has to be published, if peers join/change Old content has to be unpublished, if peers leave Documents added/removed will contain a lot of different terms to be published/unpublished. Thus, usually many index peers have to be addressed Conjunction of query terms needs to access many peers, but there is still no guarantee that a single document with the conjunction exists
37 Occurrence Frequency 4.3 Distributed Hash Tables for IR Improvement: Hybrid P2P infrastructures [Loo et al., 4] Efficiency of DHT is worst, if highly replicated items are requested Experiments show worse behavior than flooding, degrading with churn Querying and content allocation follow Zipf-distribution Only few highly replicated and often queried items People are looking for hay, not for needles (S. Shenker) Hybrid P2P infrastructures use DHTs only for the less replicated and rarely Query Frequency Distribution,% queried items, all other queries are flooded Still, DHTs have to be maintained for the majority of query terms 6,% 4,% 2,%,% 8,% 6,% 4,% 2,% Query
38 4.3 Routing Indexes for IR Routing indexes are local collections of (key, peer) pairs Key is either a keyword or a query Peer is the address of a peer that either offers relevant results, or routes the query to other peers with relevant result In contrast to flooding only interesting directions are queried Often distinguished between links in the default network (directions of content providers) and overlay structure of direct links to content providers ( shortcuts ) First introduced by [Crespo & Garcia-Molina, 2] to choose best neighbors in the default network for query forwarding Index maintenance is of local nature and index coverage is usually high due to Zipf distribution of requests Correctness of index is influenced by network volatility/churn
39 4.3 Routing Indexes for IR Routing index policies in the face of network churn With restricted index sizes new entries are collected and always stored. If the maximum size is reached, some stale information is replaced A simple strategy always replaces the currently oldest index entries Least recently used (LRU) strategy assigns higher usefulness to entries that have been successfully used recently Optimal index size is a problematic parameter Indexes with unrestricted size have to combat network churn differently time to live assigns an expiry time for each new index entry forgetting factors can periodically weigh down reliability of link information
40 4.3 An Algorithm for Correct Query Routing Goal: progressive distributed top-k ranking of documents Putting techniques together to design an efficient top-k algorithm Minimal number of object transfers Optimal number of object accesses Features of the P2P based approach Optimized Query-Routing No global Index Query-driven term-indexing
41 4.3 Bird s View. Distribute query through the network (Routing) 2. Every peer scores documents locally (Ranking) 3. Hierarchical construction of the final result (Merging) 4. Optimized query routing (Index)
42 4.3 Building Blocks Structured network local ranking result query-driven index merging
43 4.3 Network Structure Observation: peers strongly differ in availability, bandwidth, computing power, Hierarchical network structure with super-peers Query routing Result merging Indexes
44 4.3 Network topology Super-peers as hypercube (HyperCuP protocol) Resilient against leaving peers Broadcast with (n-) messages, log 2 (n) hops minimal spanning tree SP 5 SP 6 SP 2 SP 2 2 SP 5 SP SP 3 SP 7 SP 7 SP SP 2 SP 6 SP 3 SP 4 SP 4 SP 8
45 4.3 Local Ranking Super-peer asks for local rankings of peers collections Top-k results (plus metric-dependent information) are returned to SP Arbitrary similarity measures can be used TFxIDF Similarities in taxonomies
46 4.3 Result Merging Results will be merged at the super-peers Unique scoring function Maximum of k messages per SP-SP egde SP C P 3 P 7 P 6 P 2 P 5 P 4 P SP D SP B SP A P Q
47 4.3 Indexing Super-peers keep indexes IDFs (collection wide information) IDF-values for query terms Top peers (routing) List of peers that already have contributed to a previous top-k result Others possible, e.g. for taxonomies Index entries are query-driven
48 4.3 Routing Indexes Example: Top k Query Routing Example for routing indexes in P2P networks with super-peer backbone holding routing indexes Progressive P2P top-k algorithm [Balke et al., 4] If query q is indexed, distribute query and collect results Otherwise flood query and Compute ranks at local peers Merge results at super-peers Use statistics for new entry in routing index (routing information, collection-wide information, etc.) Data structures at super-peers RequestResults: Peers which are queried for result (index information) BestPeer: Peers which delivered recent best result TopRes: Current top results Delivered: Delivered results
49 4.3 Routing Indexes Example: Top k Query Routing SP 5 SP4 RequestResults {SP8,P2, P3, P4} SP SP 3 SP 7 BestPeers {} TopRes {} Delivered {} P P SP 2 SP 6 Empty routing index at SP 4 q? d.8 Find top 2 documents d2.3 d3.2 SP 4 SP 8 P 2 P 3 P 4 d2.7 d22.4 d23.3 d3.6 d32.6 d33. d4.5 d42.5 d43.2
50 4.3 Routing Indexes Example: Top k Query Routing SP 5 P SP P SP 3 SP 2 SP 7 SP 6 SP4 RequestResults {} {SP8,P2, P3, P3, P4} P4} BestPeers {P2} {} {} TopRes {(P3, {(P2, {} {(P2, d3, d2, d2,.5),.7),.7)} TopRes Delivered TopRes (P4, (P3, {} d4, d3,.4)}.5), Delivered {(P2, (P4, d2, d4,.7)}.4)} Delivered {} q? d.8 d2.3 d3.2 SP 4 SP 8 P 2 P 3 P 4 d2.7 d3 d2.7.6 d4 d2.7.5 d22.4 d32.6 d42.5 d23.3 d33. d43.2
51 4.3 Routing Indexes Example: Top k Query Routing SP 5 P SP P SP 3 SP 2 q {(d,?.8)} d.8 d2.3 d3.2 SP 7 SP 6 SP 4 SP RequestResults {} {SP3,SP5, P} BestPeers {} {P} TopRes {(P, {(SP2, d2, d,.7)}.8), TopRes Delivered {(P, {SP2} (SP2, d, d2,.8)}.7)} Delivered {} SP 8 P 2 P 3 d2.7 d3.6 d22.4 d32.6 d23.3 d33. P 4 d4.5 d42.5 d43.2
52 4.3 Routing Indexes Example: Top k Query Routing SP 5 SP SP 3 SP 7 SP BestPeers RequestResults {P} {} Delivered BestPeers {} {(P, {SP2} d,.8)} RequestResults TopRes {(SP2, {} {(P, d2, d2,.3)}.7)} Delivered {(SP2, {(P, d, d2, d,.8)}.7),.8), TopRes Delivered (P, (SP2, d2, d2,.7)}.3)} P P SP 2 SP 6 q {(d,.8),.8)} q (d2,.7)} d.8 d2.3 SP 4 SP 8 d3.2 P 2 P 3 P 4 d2.7 d22.4 d23.3 d3.6 d32.6 d33. d4.5 d42.5 d43.2
53 4.3 Routing Indexes Example: Top k Query Routing q SP 5 SP SP4 SP2 Routing Index q RequestResults {SP2, {P2, {SP4} P3} P} {} BestPeers {SP2} SP SP 3 SP 2 SP 7 P P SP 6 TopRes {(P, d2,.3)} Delivered {(P, d,.8), (SP2, d2,.7)} q {(d,.8), (d2,.7)} d.8 d2.3 SP 4 SP 8 d3.2 P 2 P 3 P 4 d2.7 d22.4 d23.3 d3.6 d32.6 d33. d4.5 d42.5 d43.2
54 4.3 Query Routing At the first appearance of a queries peers only send out their input for IDF computation Super-peers aggregate IDFs and build index Whenever a query is repeated SPs will send recent IDF-values together with query terms Peers will uses IDFs for local score computation Disadvantage: at first occurrance of query it has to be sent twice Zipf-Distribution minimizes number of queries concerned Advantages: No effort for maintaining global IDF index Values for often occurring queries are kept up-to-date
55 4.3 Query Routing und Network Churn Query index strategy Send queries only to peers that have already recently contributed to answering a query Problem: the network s and each peer s volatility Solution : Send queries also to a randomly selected set of peers Solution 2: Best before -timestamp X SP 2 SP X SP 3 X SP 5 SP 4 X SP 6 SP 7 SP 8
56 4.3 Locality-Based Routing Indexes Refinement of routing indexes by social metaphors Similar retrieval process like in real life Every person has only limited knowledge of the environment Who knows about a certain topic? Who might know other people who know about the topic? Try to build (short) chains of acquaintances that will bring you close to the requested information Aims at building social networks as overlays Peers semantically connected by certain topics form small world networks, e.g. [Milgram, 67; Kleinberg, ] Paradigm of interest-based locality If a peer has relevant content for a user s query, it very often also has some other content that this user might be interested in
57 4.3 Locality-Based Routing Indexes For information retrieval in P2P network this enables new routing in interest-based overlay structures Route queries to peers with documents matching semantically close queries Traces on practical data collections show that Peers get well-connected The overlay graph shows highly-clustered characteristics with a small minimum distance between any two nodes Overhearing of communications routed through a peer can be used to enhance its local index Randomly sending queries also to peers from the default network helps to extend knowledge and can remedy the effect of network churn
58 4.4 Overview. Introduction 2. Content Searching in Peer-to-Peer Applications. Problems in Peer-to-Peer Information Retrieval 2. Related Work in Distributed Information Retrieval 3. Index structures for Query Routing. Distributed Hash Tables for Information Retrieval 2. Routing Indexes for Information Retrieval 3. Locality-Based Routing Indexes 4. Supporting Effective Information Retrieval. Providing Collection-Wide Information 2. Estimating the Document Overlap 3. Prestructuring Collections with Taxonomies 5. Summary and Conclusion
59 4.4 Supporting Effective P2P IR P2P information retrieval has to deal with the trade-off between Efficient local maintenance of statistics / index information, where information can be stale (incorrect) Expensive global maintenance of statistics / index information, where information always is accurate Needed is just the right level of dissemination of statistics to guarantee a sufficiently effective retrieval Some techniques help to support efficient retrieval Providing adequate collection-wide information Estimate document overlap between peers Pre-structure collections by categories / taxonomies
60 4.3 Providing Collection-Wide Information Collection-wide information is important for retrieval quality, but cannot be calculated locally like e,g., IDFs Some systems like e.g. PlanetP, do not use CWI directly, but circumnavigate the problem by using an inverted peer frequency where N is the number of all peers and N t is the number of peers offering documents on term t If summarizations of peers (abstracts) are eagerly disseminated, each peer can locally decide values for N and N t The relevance of peers in multi-keyword queries is simply the sum of IPFs for the individual terms Practical tests show an average overlap of about 7% between result sets retrieved with IDFs and those retrieved with IPFs Using IPFs the scalability is, however, still limited
61 4.4 Providing Collection-Wide Information Tests in Web information retrieval, e.g. [Viles & French, 95], show that CWI stays relatively stable over the whole collection of Web Sites even with churn Only joining/leaving corpora on completely new topics result in significant change Indexing CWI in a similar way as the routing information for queries is possible [Balke et al., 5] In structured networks CWI can be aggregated along the backbone and indexed CWI can be distributed together with the query New queries have to be flooded/routed twice The first flooding collects and aggregates CWI The second one provides the correct CWI for local scorings Non-expired indexed CWI can always be used when available
62 4.4 Estimating the Document Overlap Assessing the novelty of collections also supports retrieval quality Pre-computed statistics about expected result quality in each collection is often used to minimize the number of queried collections Choosing collection with high overlap for querying will usually not improve result sets sufficiently to justify the access costs Especially progressive searches, like top-k searches, profit from focusing on collections with small overlaps, since result merging procedures will ignore identical/similar results The novelty of a collection can only be calculated with respect to some reference collection(s) e.g. those collection(s) already in a peers local routing index
63 4.4 Estimating the Document Overlap A definition of a peer p s collection C p with respect to a reference collection C ref [Bender et al., 5] Since the information what exact documents a peer offers is usually not disseminated, the values have to be approximated from statistics E.g. if abstracts in the form of Bloom filters are given, a combined Bloom filter b p can be calculated by bitwise logical AND between p s Bloomfilters for all keywords in a query Novelty then can be estimated by comparing it to as the union of those Bloom filters b i of the set of collections S that have already been retrieved The degree of novelty is given by counting locations where p s Bloom filter has differing set bits
64 4.4 Prestructuring Collections with Taxonomies Retrieval in P2P systems generally considers two basic paradigms Fulltext-based queries Metadata-based queries Integrating these paradigms can support retrieval effectiveness Structuring document collections Disambiguation of query terms Peers often host collections of similar documents, e.g. similar kind of information (newspaper articles, etc.) on similar topics, etc. Scalability and successful use of statistics are strongly improved, if a common system of categories to classify the documents can be used Since categories are more or less similar to each other a taxonomy on categories allows for easily finding semantically similar documents
65 4.4 Prestructuring Collections with Taxonomies Topical similarity within a taxonomy is defined by [Li et al., 3] l: shortest path between categories c and c 2 h: level of common subsumer Common values =.2, =.6 (experimentally determined) E.g. newspaper articles: News h sim(politics, Sports): Foreign): Business Politics l Foreign Domestic l Sports Tennis l = 2 h = 2 sim =.35.68
66 4.4 Combination of Topics and Keywords Topics dominate keywords Cooperative Filter: Relax on topics until k results have been found Example: [<Politics>, London Olympics ] Topic Similarity Text Collection Politics Foreign Domestic Sports Business Tennis Politics Foreign Sports
67 4.4 Combination of Topics and Keywords SP 5 P SP P SP 3 SP 2 SP 7 SP 6 SP RequestResults {P} {(P, d, [P,.8]), TopRes TopRes (SP2, d2, [P,.7]), [P,.7])} Delivered {(P, d, d2, [P,.8])} [P,.3])} [S,.3])} Delivered {(P, d, [P,.8]), [P,.8])} Delivered (SP2, d2, [P,.7])} {d, {d} d2} d P.8 d2 P.3 d3 P.2 SP 4 SP 8 P 2 P 3 P 4 Politics News Sports d2 PD.7 d3 PD.6 d4 S.9 d22 P.4 d32 D.5 d42 S.5 d23 S.3 d33 D. d43 S.2
68 4.5 Overview. Introduction 2. Content Searching in Peer-to-Peer Applications. Problems in Peer-to-Peer Information Retrieval 2. Related Work in Distributed Information Retrieval 3. Index structures for Query Routing. Distributed Hash Tables for Information Retrieval 2. Routing Indexes for Information Retrieval 3. Locality-Based Routing Indexes 4. Supporting Effective Information Retrieval. Providing Collection-Wide Information 2. Estimating the Document Overlap 3. Prestructuring Collections with Taxonomies 5. Summary and Conclusion
69 4.5 Summary and Conclusion In today s P2P systems only exact match keyword retrieval is prevalent (usually on meta-data) Information retrieval in P2P scenarios is needed Individual, loosely coupled document collections need fulltext retrieval and ranking techniques Applications range from shared working environments e.g. in project groups, to distributed digital libraries Almost all IR systems use at least some global statistics, in P2P infrastructures the dissemination of necessary statistics becomes a performance bottleneck Trade-off between cached, but sometimes stale statistics and new, but expensively updated statistics needs to be managed How much staleness does a sufficient retrieval effectiveness allow?
70 4.5 Summary and Conclusion Choosing the right collections for querying improves retrieval efficiency Containing most promising documents with possibly little overlap Small worlds offer quick connections to semantically close collections Query routing indexes can handle some network churn while providing results of sufficient quality Local indexes can be efficiently maintained Can exploit advantages by Zipf-distributed content allocations and querying behavior Need to contact only small numbers of peers Supporting techniques like efficient CWI estimation/ dissemination or taxonomies of document categories further improves retrieval
A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM
A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM Dr.S. DHANALAKSHMI 1, R. ANUPRIYA 2 1 Prof & Head, 2 Research Scholar Computer Science and Applications, Vivekanandha College of Arts and Sciences
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationTF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt
TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article
More informationVaralakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam
A Survey on P2P File Sharing Systems Using Proximity-aware interest Clustering Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam
More informationA Peer-to-Peer Architecture for Information Retrieval Across Digital Library Collections
A Peer-to-Peer Architecture for Information Retrieval Across Digital Library Collections Ivana Podnar, Toan Luu, Martin Rajman, Fabius Klemm, Karl Aberer School of Computer and Communication Sciences Ecole
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh
More information8 Conclusion and Future Work
8 Conclusion and Future Work This chapter concludes this thesis and provides an outlook on future work in the area of mobile ad hoc networks and peer-to-peer overlay networks 8.1 Conclusion Due to the
More informationScalable Source Routing
Scalable Source Routing January 2010 Thomas Fuhrmann Department of Informatics, Self-Organizing Systems Group, Technical University Munich, Germany Routing in Networks You re there. I m here. Scalable
More informationSemantic Search in Peer-to-Peer Systems. Yingwu Zhu and Yiming Hu
Semantic Search in Peer-to-Peer Systems Yingwu Zhu and Yiming Hu Contents 1 Semantic Search in Peer-to-Peer Systems 1 1.1 Introduction.................................... 1 1.2 Search in Unstructured P2P
More informationTopic Communities in P2P Networks
Topic Communities in P2P Networks Joint work with A. Löser (IBM), C. Tempich (AIFB) SNA@ESWC 2006 Budva, Montenegro, June 12, 2006 Two opposite challenges when considering Social Networks Analysis Nodes/Agents
More information1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
More informationEfficient Search in Gnutella-like Small-World Peerto-Peer
Efficient Search in Gnutella-like Small-World Peerto-Peer Systems * Dongsheng Li, Xicheng Lu, Yijie Wang, Nong Xiao School of Computer, National University of Defense Technology, 410073 Changsha, China
More informationSearch Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
More informationMedical Information-Retrieval Systems. Dong Peng Medical Informatics Group
Medical Information-Retrieval Systems Dong Peng Medical Informatics Group Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval
More informationInformation Retrieval Elasticsearch
Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches
More informationPeer-to-Peer Data Management
Peer-to-Peer Data Management Wolf-Tilo Balke Sascha Tönnies Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Overview Why Peer-to-Peer Databases? Federation
More informationUsing Peer to Peer Dynamic Querying in Grid Information Services
Using Peer to Peer Dynamic Querying in Grid Information Services Domenico Talia and Paolo Trunfio DEIS University of Calabria HPC 2008 July 2, 2008 Cetraro, Italy Using P2P for Large scale Grid Information
More informationA Collaborative and Semantic Data Management Framework for Ubiquitous Computing Environment
A Collaborative and Semantic Data Management Framework for Ubiquitous Computing Environment Weisong Chen, Cho-Li Wang, and Francis C.M. Lau Department of Computer Science, The University of Hong Kong {wschen,
More informationThe Case for a Hybrid P2P Search Infrastructure
The Case for a Hybrid P2P Search Infrastructure Boon Thau Loo Ryan Huebsch Ion Stoica Joseph M. Hellerstein University of California at Berkeley Intel Research Berkeley boonloo, huebsch, istoica, jmh @cs.berkeley.edu
More informationSimulating a File-Sharing P2P Network
Simulating a File-Sharing P2P Network Mario T. Schlosser, Tyson E. Condie, and Sepandar D. Kamvar Department of Computer Science Stanford University, Stanford, CA 94305, USA Abstract. Assessing the performance
More informationKEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
More informationInformation Searching Methods In P2P file-sharing systems
Information Searching Methods In P2P file-sharing systems Nuno Alberto Ferreira Lopes PhD student (nuno.lopes () di.uminho.pt) Grupo de Sistemas Distribuídos Departamento de Informática Universidade do
More informationEnhancing P2P File-Sharing with an Internet-Scale Query Processor
Enhancing P2P File-Sharing with an Internet-Scale Query Processor Boon Thau Loo Joseph M. Hellerstein Ryan Huebsch Scott Shenker Ion Stoica UC Berkeley, Intel Research Berkeley and International Computer
More informationD1.1 Service Discovery system: Load balancing mechanisms
D1.1 Service Discovery system: Load balancing mechanisms VERSION 1.0 DATE 2011 EDITORIAL MANAGER Eddy Caron AUTHORS STAFF Eddy Caron, Cédric Tedeschi Copyright ANR SPADES. 08-ANR-SEGI-025. Contents Introduction
More informationContent Delivery Network (CDN) and P2P Model
A multi-agent algorithm to improve content management in CDN networks Agostino Forestiero, forestiero@icar.cnr.it Carlo Mastroianni, mastroianni@icar.cnr.it ICAR-CNR Institute for High Performance Computing
More informationIntroduction to Information Retrieval http://informationretrieval.org
Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space Model Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 2011-08-29 Schütze:
More informationRESEARCH ISSUES IN PEER-TO-PEER DATA MANAGEMENT
RESEARCH ISSUES IN PEER-TO-PEER DATA MANAGEMENT Bilkent University 1 OUTLINE P2P computing systems Representative P2P systems P2P data management Incentive mechanisms Concluding remarks Bilkent University
More informationSwanLink: Mobile P2P Environment for Graphical Content Management System
SwanLink: Mobile P2P Environment for Graphical Content Management System Popovic, Jovan; Bosnjakovic, Andrija; Minic, Predrag; Korolija, Nenad; and Milutinovic, Veljko Abstract This document describes
More informationApproximate Object Location and Spam Filtering on Peer-to-Peer Systems
Approximate Object Location and Spam Filtering on Peer-to-Peer Systems Feng Zhou, Li Zhuang, Ben Y. Zhao, Ling Huang, Anthony D. Joseph and John D. Kubiatowicz University of California, Berkeley The Problem
More informationHomework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9
Homework 2 Page 110: Exercise 6.10; Exercise 6.12 Page 116: Exercise 6.15; Exercise 6.17 Page 121: Exercise 6.19 Page 122: Exercise 6.20; Exercise 6.23; Exercise 6.24 Page 131: Exercise 7.3; Exercise 7.5;
More informationCS5412: TIER 2 OVERLAYS
1 CS5412: TIER 2 OVERLAYS Lecture VI Ken Birman Recap 2 A week ago we discussed RON and Chord: typical examples of P2P network tools popular in the cloud Then we shifted attention and peeked into the data
More informationSystem Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks
System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks OnurSoft Onur Tolga Şehitoğlu November 10, 2012 v1.0 Contents 1 Introduction 3 1.1 Purpose..............................
More informationSo today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
More informationPerformance Tuning for the Teradata Database
Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document
More informationAdapting Distributed Hash Tables for Mobile Ad Hoc Networks
University of Tübingen Chair for Computer Networks and Internet Adapting Distributed Hash Tables for Mobile Ad Hoc Networks Tobias Heer, Stefan Götz, Simon Rieche, Klaus Wehrle Protocol Engineering and
More informationStatic IP Routing and Aggregation Exercises
Politecnico di Torino Static IP Routing and Aggregation xercises Fulvio Risso August 0, 0 Contents I. Methodology 4. Static routing and routes aggregation 5.. Main concepts........................................
More informationBloom Filter based Inter-domain Name Resolution: A Feasibility Study
Bloom Filter based Inter-domain Name Resolution: A Feasibility Study Konstantinos V. Katsaros, Wei Koong Chai and George Pavlou University College London, UK Outline Inter-domain name resolution in ICN
More informationInternational journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.
RESEARCH ARTICLE ISSN: 2321-7758 GLOBAL LOAD DISTRIBUTION USING SKIP GRAPH, BATON AND CHORD J.K.JEEVITHA, B.KARTHIKA* Information Technology,PSNA College of Engineering & Technology, Dindigul, India Article
More informationThe Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390
The Role and uses of Peer-to-Peer in file-sharing Computer Communication & Distributed Systems EDA 390 Jenny Bengtsson Prarthanaa Khokar jenben@dtek.chalmers.se prarthan@dtek.chalmers.se Gothenburg, May
More informationWireless Sensor Networks Chapter 3: Network architecture
Wireless Sensor Networks Chapter 3: Network architecture António Grilo Courtesy: Holger Karl, UPB Goals of this chapter Having looked at the individual nodes in the previous chapter, we look at general
More informationIncorporating Window-Based Passage-Level Evidence in Document Retrieval
Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological
More informationLOAD BALANCING WITH PARTIAL KNOWLEDGE OF SYSTEM
LOAD BALANCING WITH PARTIAL KNOWLEDGE OF SYSTEM IN PEER TO PEER NETWORKS R. Vijayalakshmi and S. Muthu Kumarasamy Dept. of Computer Science & Engineering, S.A. Engineering College Anna University, Chennai,
More informationSix Degrees of Separation in Online Society
Six Degrees of Separation in Online Society Lei Zhang * Tsinghua-Southampton Joint Lab on Web Science Graduate School in Shenzhen, Tsinghua University Shenzhen, Guangdong Province, P.R.China zhanglei@sz.tsinghua.edu.cn
More informationBig Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
More informationPredicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
More informationMIDAS: Multi-Attribute Indexing for Distributed Architecture Systems
MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems George Tsatsanifos (NTUA) Dimitris Sacharidis (R.C. Athena ) Timos Sellis (NTUA, R.C. Athena ) 12 th International Symposium on Spatial
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
More informationHow To Create A P2P Network
Peer-to-peer systems INF 5040 autumn 2007 lecturer: Roman Vitenberg INF5040, Frank Eliassen & Roman Vitenberg 1 Motivation for peer-to-peer Inherent restrictions of the standard client/server model Centralised
More informationMultimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.
Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 0 Organizational Issues Lecture 21.10.2014 03.02.2015
More informationScalable Prefix Matching for Internet Packet Forwarding
Scalable Prefix Matching for Internet Packet Forwarding Marcel Waldvogel Computer Engineering and Networks Laboratory Institut für Technische Informatik und Kommunikationsnetze Background Internet growth
More informationA Review on Efficient File Sharing in Clustered P2P System
A Review on Efficient File Sharing in Clustered P2P System Anju S Kumar 1, Ratheesh S 2, Manoj M 3 1 PG scholar, Dept. of Computer Science, College of Engineering Perumon, Kerala, India 2 Assisstant Professor,
More informationEfficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems
Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems Kunwadee Sripanidkulchai Bruce Maggs Hui Zhang Carnegie Mellon University, Pittsburgh, PA 15213 {kunwadee,bmm,hzhang}@cs.cmu.edu
More informationComponents: Interconnect Page 1 of 18
Components: Interconnect Page 1 of 18 PE to PE interconnect: The most expensive supercomputer component Possible implementations: FULL INTERCONNECTION: The ideal Usually not attainable Each PE has a direct
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationDistributed Computing over Communication Networks: Topology. (with an excursion to P2P)
Distributed Computing over Communication Networks: Topology (with an excursion to P2P) Some administrative comments... There will be a Skript for this part of the lecture. (Same as slides, except for today...
More informationA Reputation Management System in Structured Peer-to-Peer Networks
A Reputation Management System in Structured Peer-to-Peer Networks So Young Lee, O-Hoon Kwon, Jong Kim and Sung Je Hong Dept. of Computer Science & Engineering, Pohang University of Science and Technology
More informationIntroduction to Information Retrieval http://informationretrieval.org
Introduction to Information Retrieval http://informationretrieval.org IIR 7: Scores in a Complete Search System Hinrich Schütze Center for Information and Language Processing, University of Munich 2014-05-07
More informationRaddad Al King, Abdelkader Hameurlain, Franck Morvan
Raddad Al King, Abdelkader Hameurlain, Franck Morvan Institut de Recherche en Informatique de Toulouse (IRIT), Université Paul Sabatier 118, route de Narbonne, F-31062 Toulouse Cedex 9, France E-mail:
More informationMining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
More informationP2P VoIP for Today s Premium Voice Service 1
1 P2P VoIP for Today s Premium Voice Service 1 Ayaskant Rath, Stevan Leiden, Yong Liu, Shivendra S. Panwar, Keith W. Ross ARath01@students.poly.edu, {YongLiu, Panwar, Ross}@poly.edu, Steve.Leiden@verizon.com
More informationTowards a Next- Generation Inter-domain Routing Protocol. L. Subramanian, M. Caesar, C.T. Ee, M. Handley, Z. Mao, S. Shenker, and I.
Towards a Next- Generation Inter-domain Routing Protocol L. Subramanian, M. Caesar, C.T. Ee, M. Handley, Z. Mao, S. Shenker, and I. Stoica Routing 1999 Internet Map Coloured by ISP Source: Bill Cheswick,
More informationUsing LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.
White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,
More informationPeer-to-Peer Networks. Chapter 6: P2P Content Distribution
Peer-to-Peer Networks Chapter 6: P2P Content Distribution Chapter Outline Content distribution overview Why P2P content distribution? Network coding Peer-to-peer multicast Kangasharju: Peer-to-Peer Networks
More informationReputation Management Algorithms & Testing. Andrew G. West November 3, 2008
Reputation Management Algorithms & Testing Andrew G. West November 3, 2008 EigenTrust EigenTrust (Hector Garcia-molina, et. al) A normalized vector-matrix multiply based method to aggregate trust such
More informationlow-level storage structures e.g. partitions underpinning the warehouse logical table structures
DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures
More informationVirtual Landmarks for the Internet
Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser
More informationBig Data and Scripting map/reduce in Hadoop
Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb
More informationInteractive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs
Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe
More informationLoad Balancing in Structured Overlay Networks. Tallat M. Shafaat tallat(@)kth.se
Load Balancing in Structured Overlay Networks Tallat M. Shafaat tallat(@)kth.se Overview Background The problem : load imbalance Causes of load imbalance Solutions But first, some slides from previous
More informationW. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
More informationquery enabled P2P networks 2009. 08. 27 Park, Byunggyu
Load balancing mechanism in range query enabled P2P networks 2009. 08. 27 Park, Byunggyu Background Contents DHT(Distributed Hash Table) Motivation Proposed scheme Compression based Hashing Load balancing
More informationRecognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
More informationEffective Keyword-based Selection of Relational Databases
Effective Keyword-based Selection of Relational Databases Bei Yu National University of Singapore Guoliang Li Tsinghua University Anthony K. H. Tung National University of Singapore Karen Sollins MIT ABSTRACT
More informationDevelopment of an Enhanced Web-based Automatic Customer Service System
Development of an Enhanced Web-based Automatic Customer Service System Ji-Wei Wu, Chih-Chang Chang Wei and Judy C.R. Tseng Department of Computer Science and Information Engineering Chung Hua University
More informationAnalysis on Leveraging social networks for p2p content-based file sharing in disconnected manets
Analysis on Leveraging social networks for p2p content-based file sharing in disconnected manets # K.Deepika 1, M.Tech Computer Science Engineering, Mail: medeepusony@gmail.com # K.Meena 2, Assistant Professor
More informationAmerican Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access
More informationEffective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted
More informationChristian Bettstetter. Mobility Modeling, Connectivity, and Adaptive Clustering in Ad Hoc Networks
Christian Bettstetter Mobility Modeling, Connectivity, and Adaptive Clustering in Ad Hoc Networks Contents 1 Introduction 1 2 Ad Hoc Networking: Principles, Applications, and Research Issues 5 2.1 Fundamental
More informationCassandra A Decentralized, Structured Storage System
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922
More informationDistributed Hash Tables in P2P Systems - A literary survey
Distributed Hash Tables in P2P Systems - A literary survey Timo Tanner Helsinki University of Technology tstanner@cc.hut.fi Abstract Distributed Hash Tables (DHT) are algorithms used in modern peer-to-peer
More informationPerformance of networks containing both MaxNet and SumNet links
Performance of networks containing both MaxNet and SumNet links Lachlan L. H. Andrew and Bartek P. Wydrowski Abstract Both MaxNet and SumNet are distributed congestion control architectures suitable for
More informationHigh Throughput Computing on P2P Networks. Carlos Pérez Miguel carlos.perezm@ehu.es
High Throughput Computing on P2P Networks Carlos Pérez Miguel carlos.perezm@ehu.es Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured
More informationTaxonomies in Practice Welcome to the second decade of online taxonomy construction
Building a Taxonomy for Auto-classification by Wendi Pohs EDITOR S SUMMARY Taxonomies have expanded from browsing aids to the foundation for automatic classification. Early auto-classification methods
More informationIntroduction to LAN/WAN. Network Layer
Introduction to LAN/WAN Network Layer Topics Introduction (5-5.1) Routing (5.2) (The core) Internetworking (5.5) Congestion Control (5.3) Network Layer Design Isues Store-and-Forward Packet Switching Services
More informationStatistical Validation and Data Analytics in ediscovery. Jesse Kornblum
Statistical Validation and Data Analytics in ediscovery Jesse Kornblum Administrivia Silence your mobile Interactive talk Please ask questions 2 Outline Introduction Big Questions What Makes Things Similar?
More informationDecentralized Peer-to-Peer Network Architecture: Gnutella and Freenet
Decentralized Peer-to-Peer Network Architecture: Gnutella and Freenet AUTHOR: Jem E. Berkes umberkes@cc.umanitoba.ca University of Manitoba Winnipeg, Manitoba Canada April 9, 2003 Introduction Although
More informationSearch Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
More informationEng. Mohammed Abdualal
Islamic University of Gaza Faculty of Engineering Computer Engineering Department Information Storage and Retrieval (ECOM 5124) IR HW 5+6 Scoring, term weighting and the vector space model Exercise 6.2
More informationAn Efficient Strategy for Data Recovery in Wi-Fi Systems
International Journal of Research & Development in Science and Technology Volume 1, Issue 2, December 2014, PP 1-6 ISSN 2350-4751 (Print) & ISSN 2350-4751(Online) An Efficient Strategy for Data Recovery
More informationDistributed Caching Algorithms for Content Distribution Networks
Distributed Caching Algorithms for Content Distribution Networks Sem Borst, Varun Gupta, Anwar Walid Alcatel-Lucent Bell Labs, CMU BCAM Seminar Bilbao, September 30, 2010 Introduction Scope: personalized/on-demand
More informationPrinciples of Distributed Database Systems
M. Tamer Özsu Patrick Valduriez Principles of Distributed Database Systems Third Edition
More informationLecture 2.1 : The Distributed Bellman-Ford Algorithm. Lecture 2.2 : The Destination Sequenced Distance Vector (DSDV) protocol
Lecture 2 : The DSDV Protocol Lecture 2.1 : The Distributed Bellman-Ford Algorithm Lecture 2.2 : The Destination Sequenced Distance Vector (DSDV) protocol The Routing Problem S S D D The routing problem
More informationIBM Social Media Analytics
IBM Social Media Analytics Analyze social media data to better understand your customers and markets Highlights Understand consumer sentiment and optimize marketing campaigns. Improve the customer experience
More informationPhysical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design
Physical Database Design Process Physical Database Design Process The last stage of the database design process. A process of mapping the logical database structure developed in previous stages into internal
More informationKEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
More informationRecommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1
Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components
More informationIntroduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A
Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases
More informationA Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*
A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* Junho Jang, Saeyoung Han, Sungyong Park, and Jihoon Yang Department of Computer Science and Interdisciplinary Program
More informationRecommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
More informationSuper-Agent Based Reputation Management with a Practical Reward Mechanism in Decentralized Systems
Super-Agent Based Reputation Management with a Practical Reward Mechanism in Decentralized Systems Yao Wang, Jie Zhang, and Julita Vassileva Department of Computer Science, University of Saskatchewan,
More information