MAGE: Matching Approximate Patterns in Richly-Attributed Graphs
|
|
|
- Laureen Carpenter
- 10 years ago
- Views:
Transcription
1 MAGE: Matching Approximate Patterns in Richly-Attributed Graphs Robert Pienta, Acar Tamersoy, Hanghang Tong, and Duen Horng Chau College of Computing, Georgia Institute of Technology, Atlanta, GA Department of Computer Science, Arizona Sate University, Phoenix, AZ pientars, tamersoy, Abstract Given a large graph with millions of nodes and edges, say a social network where both its nodes and edges have multiple attributes e.g., job titles, tie strengths), how to quickly find subgraphs of interest e.g., a ring of businessmen with strong ties)? We present MAGE, a scalable, multicore subgraph matching approach that supports expressive queries over large, richly-attributed graphs. Our major contributions include: 1) MAGE supports graphs with both node and edge attributes most existing approaches handle either one, but not both); 2) it supports expressive queries, allowing multiple attributes on an edge, wildcards as attribute values i.e., match any permissible values), and attributes with continuous values; and 3) it is scalable, supporting graphs with several hundred million edges. We demonstrate MAGE s effectiveness and scalability via extensive experiments on large real and synthetic graphs, such as a Google+ social network with 460 million edges. I. INTRODUCTION Graphs are a convenient and ubiquitous means to represent many naturally occurring patterns. Rich with information, many graphs contain subgraph patterns that capture interesting dynamics among entities, but because of the size and complexity of these graphs, spotting such interesting patterns can be a difficult task. An analyst may want to analyze an intelligence network of various entities e.g., people or events) connected with edges denoting gathered intelligence as in Figure 1). The analyst will have some ideas about the structure of criminal behavior which drive the use of subgraph matching techniques to find potentially dangerous individuals. A Motivating Example. In Figure 1, the node attributes are the entity types, which can be a Person orange circle), an Event green square), or a Location purple triangle), and the edge attributes are the confidence of intelligence for a pair of entities, which can be Confirmed dotted line), Suspected wavy line), and Unlikely line with pluses). The graph labeled Query Q shows a query that our analyst may issue, which looks for two indirectly related people, who appeared at the same location thus, confirmed to be linked to a location), and were suspected of having attended an event together. One challenge here is that the analyst might not know what values to assign to some of the nodes and edges in the query; so instead of guessing or assigning an arbitrary value, the analyst may want to assign a node or an edge with a wildcard attribute value, a feature that would allow the analyst to explore different hypotheses. However, it is not supported by many existing subgraph matching techniques. The existing subgraph matching approaches that return only exact matches if any) will not help the analyst solve complex tasks where attribute uncertainty is present. For the query in Figure 1, the system should be able to return a match filling in the wildcard, e.g., the person entity corresponding to node 3 in the result. Even with the wildcard, the exact specified structure may not exist in in the graph; under this scenario the system would return a best-effort match or approximation of the query. By generating both exact and near matches, we can provide the user with the top-k most closely matched subgraphs even if the initial query was not exactly present in the input graph. Limitations of Existing Techniques. A representative set of early work on inexact pattern matching on graphs is G-Ray by Tong et al. [1], TALE by Tian and Patel [2], and SIGMA by Mongioví et al. [3]. More recently, Khan et al. proposed the NeMa approach [4], Fan et al. proposed a set of incremental pattern matching algorithms [5], [6] and the TopKDiv approach [7], and Cheng et al. proposed the kgpm framework [8]. The main limitations of these techniques are that they either i) do not support edge attributes, ii) need computationally-heavy indexes precomputed for the input graph, or iii) return results that can significantly deviate from the query pattern, which may be difficult for users to comprehend. Our Contributions. We present MAGE, short for Multi- Attribute Graph Engine, a pattern matching system for graphs with node and edge attributes that overcomes many of the challenges and limitations outlined above. It produces topk closest subgraph matches for a variety of attributed input graphs. Our experimental evaluation shows that MAGE is a scalable tool that works for graphs from different domains, from intelligence applications to understanding the patterns of movie success. MAGE s major contributions include: 1. Support for node and edge attributes, expanding the effectiveness on real world data and increasing the types of questions that can be answered through graph querying. 2. Support for flexible queries with rich attributes, supporting queries with wildcards and multiple categorical attributes. 3. A fast & scalable multicore algorithm, handling real and synthetic graphs with hundreds of millions of edges, using random walk with restart RWR) steady-state probabilities as proximity scores between nodes to determine how well a subgraph matches the query. II. PROBLEM DEFINITION AND NOTATION Here, we formalize the subgraph matching problem that MAGE aims to solve. In its general form, we are given two graphs G and Q and we wish to know if G contains a subgraph that is equivalent to Q. Table I describes the symbols used in the paper. This problem is often referred to as the subgraph isomorphism problem. Unfortunately, this problem is NP- Complete [9], making all general solutions computationally
2 Fig. 1: An illustrative example of how MAGE finds patterns in an intelligence graph left) with node and edge attributes. Node attributes: Person, Event, or Location. Edge attributes amount of gathered intelligence for a pair of entities): Confirmed, Suspected, or Unlikely. A query middle) looks for two indirectly related individuals, who were both confirmed at an event location and are believed to have attended the event two lines connecting the corresponding nodes). The node with a star indicates a wildcard, which can take any attribute value. All figures best viewed in color.) The rightmost figure shows an approximate match. Symbol Description G A Q G0 A0 Q0 M hs, ti n m t i,j & u,v λ n n adjacency matrix for G n t node-attribute matrix for graph G Query subgraph to be extracted from G m + n) m + n) linegraph-modified bipartite graph Node-edge-attribute matrix for graph G 0 Query subgraph after edge augmentation A bijective mapping between edges of G and edge-nodes in G 0 An edge leading from node s to node t Number of nodes in G Number of edges in G Number of distinct categorical attributes Indices of nodes in Q and in G respectively. Ratio of approximated nodes and edges with correct attributes to the total number in a result TABLE I: Symbols and terminology used in this work infeasible for even modest sized graphs. The problem we tackle is subtly different than the subgraph isomorphism problem and can be formally stated as follows: Given: i) A graph G whose nodes and edges have categorical attributes, ii) a query graph Q showing the desirable configuration of nodes connected with edges, each assigned one or more attribute values or a wildcard), and iii) the number of desired matching subgraphs k. Find: k matching subgraphs Qi i = 1,..., k) that match query graph Q as closely as possible, according to a goodness metric which we cover in the Methodology section). A. Preliminary: Querying on Node Attributes Several approaches have been proposed to subgraph isomorphism problem. The work by Tong et al. proposes the G-Ray algorithm [1], which is a best-effort inexact subgraph matching approach that relies on RWR or personalized page rank values as the selection criterion when constructing query results. This approach carefully uses nodes as restarts when calculating the RWR values. We leverage this approach, but considerably modify it to allow multiple attributes as restarts in the RWR approximations see Section IV-D). Approximate RWR is still a computationally expensive step that must be performed often. In Section IV-A, we show our approach to reducing query latency by decreasing RWR calculation times. While G-Ray is an integral facet of MAGE, the limitations of the original algorithm are far too constrictive. The G-Ray algorithm is inadequate in supporting expressive querying as it does not support attributed edges, unknown query attributes, multiple attributes, and it incurs sizable query latencies on large graphs. We address these issues with MAGE. Using our linegraph augmentation approach, we support node and edge attributes with limited information by offering, wildcards and multiple attributes. To address the speed and scalability we utilize an approximate parallel RWR technique to quickly calculate the data needed to find good candidate matches. III. MAGE OVERVIEW In order to cover both edge and node attributes in G we use an edge-augmentation method based on intuition from the linegraph transformation [10]. Under the canonical linegraph transformation, each vertex in the line graph L of G is an edge from G. Two vertices in L are connected if and only if their corresponding edges from G) share a common endpoint in G. Figure 2 demonstrates an example transformation with the key linegraph transformation occurring in c). a) b) c) d) Fig. 2: Linegraph transformation and edge augmentation used to support edge and node attributes in MAGE. a) Input graph. b) Intermediate step of linegraph transformation of G, where a node for each original edge is created. c) LG) or L is the linegraph of G in which all edges from G that shared a node in G are now connected as nodes of L. d) Edge augmentation wherein we embed the edge-nodes of LG) directly into G to create G 0. By making use of the line-graph transformation on the starting graph G, we can produce an attributed line graph L where each of the edges in G is represented by a node in L. Rather than working with both L and G, we create G 0 which combines aspects of both G and L. We insert a new edgenode on each edge of G such that it is connected to the same vertices in G 0 that it connected to as an edge of G. This process is illustrated with a toy graph in Figure 2d, where the edgenodes are the square nodes splitting each edge of G. Under this formulation, no two nodes in G will be directly connected in G 0. The same holds for all new edge-nodes in G 0. The newly created G 0 is bipartite between the set of original nodes and the set of new edge-nodes; a fact we will use later in approach. A. MAGE Subroutines We divide the process of discovering matching subgraphs into several key steps: i) finding the initial nodes to start our localized search, then ii) finding nearby nodes that fulfill or
3 1: procedure APPROXQUERYG A, Q ) 2: Initialization 3: i Detect-Candidate 4: i corresponds to a node u of Q 5: for all Nodes u in the query Q do 6: for all edges u, v from u to neighbor v do 7: find node j by Local-Search 8: approximate u, v with i, j via Linker 9: edge u, v is fulfilled 10: end for 11: end for 12: end procedure Fig. 3: An overview of the approach used in finding an approximate subgraphs. approximate the desired attributes and iii) approximating the structure of the edges interconnecting nodes from steps i and ii. Detect-Candidate. The first node of each query approximation is located in the graph using Detect-Candidate in line 3 of Algorithm 3. This routine leverages attribute filtering, but primarily subgraph centerpieces suggested in [11]. Detect- Candidate selects the most central node of the query first to help narrow the scope of the candidate search. Local-Search. Given a partially fulfilled set of query nodes, Local-Search line 7 of Algorithm 3) will locate a node in G corresponding to an unfulfilled node from Q. We generate heuristic scores using RWR values) to pick the best candidate nodes. This approach will select nodes closer to the approximated query, keeping the selected nodes in the local area of the current approximated nodes. Linker. The Linker, line 8 in Algorithm 3, approximates the query s edges in the dictionary graph. If the two selected nodes of the real graph are connected the edge is returned; otherwise, the shortest path is used as an approximation of the edge. This shortest path may introduce additional nodes in order to complete the pattern. IV. METHODOLOGY A. Random Walk with Restart RWR) MAGE uses proximity scores between nodes in data graph G and query graph Q to construct a subgraph Q i approximating Q. Specifically, the proximity between nodes i and j in a graph is the RWR value when node i is used as the restart point. The number of RWR values needed to find a matching subgraph is significant, because we use them to rank possible candidate nodes. Calculating RWR values is the most expensive step MAGE. Let ˆF be our RWR values see Equation 1). They are used to rank the goodness of nearby candidate nodes; unselected nodes that are near to the partially constructed query receiver higher RWR values and are therefore more likely to be chosen. We have implemented a parallelized, sparse, power method that allows the fast calculation of RWR vectors. The canonical power-iteration method is a common approach to determine the RWR values see Equation 1). Decompose G row-wise into p submatrices each with m/p rows call them G 1...G p ). Each G i is zero everywhere except its m/p rows such that they sum to G, see Equation 2). ˆF k+1 = αg ˆF k + Ŷ 1) ˆF k+1 = α G 1 + G G p ) ˆF k + Ŷ 2) Require: Fully-attributed graph G, attribute graph A, query graph Q, and desired number of results k Ensure: k node-attribute matched subgraphs from G 1: procedure MAGEG, A, Q, k) 2: G = Linegraph-AugmentorG) 3: Q = Linegraph-AugmentorQ) 4: A w = Wildcard-Attribute-InserterA) 5: for i=1:k do 6: Q i = ApproxQueryG, A w, Q ) 7: Q i = Linegraph-ReverterQ i ) 8: end for 9: return Q i where i = 1 : k 10: end procedure Fig. 4: Detailed MAGE Algorithm ˆF k+1 = αg 1 ˆF + αg 2 ˆF k αg p ˆF k + Ŷ 3) For each node considered as a restart location the vector Ŷ is set to the random restart probability 1 α. Now using Equation 3) each submatrix-vector multiplication can be calculated separately in parallel and the results aggregated. Each iteration F k+1 is normalized to unit length and used in the next iteration. In hopes to maximize memory bandwidth during the calculation, we synchronize the parallel computation at the end of each iteration rather than during it. We recommend choosing p as the number of available cores. B. Primary Subsystems Detect-Candidate and Local-Search. Both detecting the initial candidates and searching for nearby candidates follow the same general approach in our algorithm. The equation selects new nodes heuristically based on their RWR scores, higher scores means a higher chance of being selected. Given two nodes from G the Linker computes a path connecting them. The implementation is an efficient modification of Prim s algorithm. We consider the best path as the path connecting the two nodes with largest RWR value when a direct link one unused in the approximate solution is not available. This score is the sum of RWR values over the whole length of the path. C. Supporting Edge Attributes In order to support edge attributes, we have chosen to embed an edge-node in each edge of G. The Linegraph- Augmentation algorithm iterates over all original edges of G and appends new edge-nodes to G. The algorithm operates in Om) where m is the number of edges from G. This is precomputed a single time before querying begins and is then used as the main input graph for querying. This transformation creates G, a m + n) m + n) adjacency matrix. Expanding both dimensions of our adjacency matrix by a factor of m may seem expensive in memory usage; however, G is guaranteed to be bipartite between the original nodes and the new edge-nodes. Because only original nodes can be connected to edge-nodes, we have only the m n and n m regions of our augmented matrix that can contain values. We can derive the maximum matrix density, ρ max, as follows: ρ max = 2mn/m 2 + 2mn + n 2 ) if the graph is undirected only mn edges need to be stored. Because we use sparse data structures, the memory for this augmentation grows at a linear rate with the number of edges. The matched subgraph results produced by MAGE are embedded with edge-nodes and must be converted to the
4 original graph format. The linegraph-reverter converts the edge-nodes back to edges from G as the mapping is one-to-one. D. Supporting Multiple Attributes and Wildcards The categorical attribute matrix A is an n t sparse matrix where there are t distinct attribute categories for each of n nodes. Each row of A represents a node while each column represents a single categorical variable. In general A is sparse so it utilizes a minor amount of memory. We support multiple attributes on each node and edge, and allow them to be selected via logical OR. The attributes are leveraged during the attribute-centric RWR carried out in line 5 of Procedure 4. By serving as restart sources during the RWR calculations, the correctly attributed nodes are given larger proximity scores and therefore are more likely to be selected as a result. Wilcards allow MAGE to effectively ignore the attribute and use the most structurally coherent node or edge during approximation. To support the wildcard attribute we have created a universal attribute applied to all nodes and edges. The function labeled Wildcard-Attribute-Inserter on line 4 of Procedure 4 works by inserting a new and distinct attribute node in the attribute matrix A that points to all nodes. When there are multiple choices to fulfill a wildcard the one with largest RWR value is selected, otherwise ties are broken by random selection. V. EVALUATION We evaluated multiple important aspects of MAGE, such as speed, memory usage, and effectiveness on both large real and synthetic graph datasets, e.g., 460 million edge Google+ graph [12], [13]. Experimenting on real graphs allows us to better understand how MAGE works in practice, while experimenting on synthetic ones let us carefully control experimental parameters and observe MAGE s responses. Besides investigating how MAGE s speed scale with the graph size, we also study how supporting wildcards and multiple attributes in the graph queries would affect speed. All tests were run on Linux using an Intel Core i5-4670k 3.8 GHz) and 32GB of RAM. A. Graph Datasets Real Graphs. We have examined the scalability of our system using the attributed social network from Google+ [12], [13]. The dataset consists of four snapshots ranging in size from 4.6M nodes, 47M edges to 28M nodes, 460M edges) each taken at different months of In order to test the practicality of MAGE, we also queried an undirected graph constructed out more than 20,000 of the Rotten Tomatoes RT) movies in Figure 6. We used RT movie similarity to form the edges; however, MAGE operates on categorical attributes, so continuous fields must be encoded categorically by any approach to discretize continuous fields; we used quartiles. Synthetic graphs. We used stochastically generated Erdős- Rényi ER) random graphs parameterized by the number of nodes n and edges m) and Watts-Strogatz WS) graphs parameterized by the number of nodes n, the node degree k, and the rewiring probability p) [14], [15]. All WS graphs used p =.01 rewiring chance. Their attributes were chosen uniformly at random. B. Scalability and Speed RWR requires two parameters; the fly-out or restartprobability and the number of iterations, which we set to 0.15 and 10 respectively. MAGE performs queries in linear time with the number of edges in each graph. For both synthetic graph types, the increase in query time is linear in the number of edges, suggesting good scalability. Three general query structures are explored for various sizes of graph; a 5-node 4-edge linear query, a 5-node 4-edge star, and a 5-node clique. 1) RWR Results.: The timely calculation of RWR values are essential to the MAGE algorithm. We test our approach on synthetic graphs and various G+ snapshots. The synthetic graphs were generated with increasing numbers of edges. Each measurement is an average of the runtime for multiple networks at each m. The results for our parallelized RWR implementation are presented in Figure 5b. The ER graphs exhibit worse performance than WS graphs due to the constant random memory accesses during the parallelized matrix multiplication. Our experimental results, both on real and synthetic graphs suggest excellent scalability of the core RWR algorithm. 2) Wildcard and Multi-attribute Cost.: We tested the overhead introduced by increasing the number of wildcards in a fixed set of queries. Figure 5c shows the query time, when using wildcards, relative to a query with only normal attributes. The wildcard overhead increases linearly with each new wildcard, suggesting that large queries can safely incorporate wildcards.the queries used were a 6-node linear query, a 6-node star and a 5-clique. Each data point is an average over multiple runs where the wildcards were assigned at random over edges and nodes. Multi-attributes are represented as multiple 1 s in A and can be easily added for little overhead. Data-rich graphs can be queried in MAGE with minor increases in query latency and memory footprint. C. Memory Usage We performed an experiment to measure the memory usage of edge-augmentation, a memory-intensive step. In the directed case the full augmented matrix must be stored; however, in the undirected case only half of the matrix is necessary. In Figure 5d, we compare the memory footprint of each graph before and after edge augmentation as previously explained in Secton IV-C) for graphs of increasing size. D. Effectiveness Measuring the effectiveness of approximate graph queries is a nontrivial task with two major challenges. The first problem stems from measuring similarity between the result and query, considering both structure and attributes. The second challenge is that the usefulness of a result is highly subjective. Human evaluation will be important in proving the overall effectiveness of our approach and we plan to perform it in our future work. Method. Taking inspiration from related pattern matching work e.g., [16]), we chose to modify graphs by injecting new patterns into them and then seeking those patterns with MAGE. We tested our approach against synthetic graphs and the RT movie graph. The synthetic graphs were generated as before see Section V-A); however, the patterns were generated with 6 edge and 12 node attributes, which were assigned randomly during graph generation. In the RT graph, attributes were randomly sampled from the empirical distributions. In Table II, we present 5 general query types: i) a short line of 6 nodes, ii) a long line of 15 nodes, iii) a star with 15 spokes, iv) a barbell with two 3-cliques connected by a path of length 3, and v) a 7-clique [1].
5 Star Wildcards Wildcard Free Query Avg Avg Query Time s) MAGE Query Time vs. Graph Size 0 ER Linear ER Star ER Clique G+ Linear G+ Star G+ Clique 50M 100M 150M # Edges Millions) WS Linear WS Star WS Clique 200M 10 Iterations RWR Runtime s) Parallel RWR Time vs. Graph Size Erdos Renyi Google+ Snapshots Watts Strogatz 0 100M 200M 300M 400M # Edges Millions) Relative Query Time vs #Wildcard in Query 2.2 Relative Query Time s) Clique Query Star Query Linear Query No wildcards baseline) #Wildcard M 200M 300M 400M # Edges Millions) a) b) c) d) Fig. 5: a) MAGE query time on real and synthetic graphs of varying sizes for a linear, star, and clique query. The average query time, over 10 runs, increases linearly with the augmented) graph s size for each query type. b) The runtimes for 10 iterations of the random walk with restart module over Erdős-Rényi, Watts-Strogatz, and Google+ graphs of varying sizes, which increase linearly with the number of edges, even into several hundred millions nodes and edges. c) The relative response time for queries containing wildcards tested on an Erdős-Rényi graph with 10M nodes and edges, compared against the baseline query without wildcard horizontal black dotted line). Query time is linear in the number of wildcards in the query. d) MAGE memory usage for Erdős-Rényi and Google+ graphs, before and after edge augmentation, which increases linearly with the number of edges. Memory Footprint GB) Google+ Erdős-Rényi Watts-Strogatz Memory Footprint Full Augmented Graphs for directed input graphs) Input Graphs Graph Query % Extra % Extra λ Type Shape Nodes Edges Score Erdős-Rényi Line short) 11 ± ± ER) Line long) 20 ± ± Barbell 18 ± ± Star 39 ± ± Clique 64 ± ± Rotten Tomatoes Line short) 66 ± ± RT) Line long) 84 ± ± Barbell 96 ± ± Star 104 ± ± Clique 110 ± ± TABLE II: The similarity between the query and first 20 results. The λ similarity is a modification of the Jaccard index. Values close to 1 mean the results matched exactly, while lower scores mean a worse approximation. MAGE produces results with high similarity to the query. Table II displays the average %-difference in size between the queries and their matches over the top 20 results. We measure the quality of results using λ, a modified Jaccard Index over categorical variables which was used in [17]. We compose the λ-similarity by taking the size of the setintersection of nodes and edges from the query with the result and normalize it by the size set-union of the same query-result pair. When a result closely matches the query, λ 1, if the result has either extra nodes or nodes with different attributes then λ < 1. MAGE s goal is to detect approximate matches to a user s desired query; Table II shows that the algorithm is capable of extracting results with good similarity. The WS graphs work well with linear queries, because their construction guarantees consistent paths. The RT queries generally perform worse than the synthetic graphs, because the RT contains several very low occurrence attributes that often require an approximation include, decreasing the λ-similarity. Detecting exact and approximate cliques is a very challenging problem. When discovering initial candidate nodes for the results, MAGE may pick a structurally coherent likely to be a clique) match, but with only some exactly matched attribute nodes. This means that some of the edges or nodes of the result will not be directly connected and will introduce some new nodes and edges. Even with the complexity of approximating 7-clique MAGE is still able to return good approximations, albeit with many extra nodes and edges, demonstrating MAGE s main strength is in finding approximate matches to help the user explore different hypotheses. For the purposes of demonstrating the semantic result quality, we present visualized query results from the RT movie data in Figure 6. To demonstrate MAGEs effectiveness, our evaluation used multiple graphs, including a real RT movie similarity graph, where MAGE finds movies matching interesting criteria e.g., finding a cult movie that has horror and comedy ingredients). MAGE was able to find the movie Carrie that fits this query see Fig 12, bottom row), which is mostly horror, with a few funny scenes; it indeed generated a cult following. Similarly, we also used MAGE to find movies that occupy the intersect of the genres of drama, sci-fi and action as seen in Figure 6, top row) and it returned two popular films Underworld and Beowolf that fit this description very well. This experiment with the RT graph suggests that MAGE has the potential to be used in consumer-facing, main-stream applications such as movie recommendation. VI. RELATED WORK Our work draws from several research areas including, subgraph matching, graph similarty, and graph databases. Methods in indexing have been proposed to improve the capabilities and responsiveness of graph query tools [18]. Some graph proximity measures include random-walk based approaches [19], [20], conductance-based approaches [21], and meta-path based approaches [22], [23], among others. Matching attributed graphs has been studied in [24] again as an approach for the intelligence industry. With this idea an analyst to specify a particular pattern in an attributed relational graph and scour a much larger dataset for occurrences of such a structure. Approximate matching is a common relaxation to the subgraph isomorphism problem and has been researched heavily [2], [3]. There are a host of purely structural graph matching systems exploring polynomial time solutions to variants of the subgraph isomorphism problem [25] [27].. These approaches generally do not utilize any semantic content from the graphs themselves, making it challenging for these approaches to extend fully to our problem formulation. There are few algorithms that combine the aforementioned
6 R EFERENCES Fig. 6: Rotten Tomatoes queries and results. Nodes are movies; an edge connects two similar movies. Edge classes see legend) were derived from user-contributed similarity scores. Exact matches are in green, partial matches in purple. techniques to tackle inexact matching in large, attributed graphs with highly variable content. Recent works include [7] and [4], however the former requires the user to specify a focus node in the query and the latter returns results that do not adhere to the query structure. The closest work is that by Tong et al. [1], which proposes graph X-ray or G-Ray, a method that finds approximate subgraphs. G-Ray uses RWR to estimate the quality of a match between a subgraph and a query graph [19] which is used to rank the results extracted from the graph. G-Ray also uses the CenterPiece Subgraphs idea [11]. The main drawback of G-Ray is that it only supports graphs with node attributes. This is a significant limitation, considering the additional semantics contributed to the graphs by the edge attributes. We have leveraged some of the ideas and techniques proposed in this work to create MAGE. VII. C ONCLUSION To the best of our knowledge, MAGE is the first approach that supports exact and approximate subgraph matching on graphs with both node and edge attributes, for which wildcards and multiple attribute values are permissible. Our experiments on large real and synthetic graphs e.g., 460M edge Google+ graph) demonstrate that MAGE is both effective and scalable. Using multiple node or edge attributes in a query incurs negligible costs, and MAGE scales linearly with the number of wildcards. To demonstrate MAGE s generalizability, our effectiveness evaluation used multiple graphs, including a real Rotten Tomatoes movie similarity graph, where MAGE finds movies matching interesting criteria e.g., finding a cult movie that has horror and comedy ingredients). We believe MAGE can be a helpful tool for exploring large, real-world graphs. ACKNOWLEDGEMENTS This work is supported by NSF IIS , ARL W911NF , DARPA W911NF-11-C-0200 and W911NF-12-C-0028, Region II UTRC , and Symantec Research Labs Graduate Fellowship [1] H. Tong, C. Faloutsos, B. Gallagher, and T. Eliassi-Rad, Fast besteffort pattern matching in large attributed graphs, in KDD, [2] Y. Tian and J. Patel, Tale: A tool for approximate large graph matching, in ICDE, [3] M. Mongiovı, R. D. Natale, R. Giugno, A. Pulvirenti, A. Ferro, and R. Sharan, Sigma: A set-cover-based approach for inexact graph matching, JBCB, vol. 8, no. 2, [4] A. Khan, Y. Wu, C. C. Aggarwal, and X. Yan, Nema: Fast graph search with label similarity, PVLDB, vol. 6, no. 3, [5] W. Fan, J. Li, J. Luo, Z. Tan, X. Wang, and Y. Wu, Incremental graph pattern matching, in SIGMOD, [6] W. Fan, X. Wang, and Y. Wu, Incremental graph pattern matching, TODS, vol. 38, no. 3, [7] W. Fan, X. Wang, and Y. Wu, Diversified top-k graph pattern matching, PVLDB, vol. 6, no. 13, [8] J. Cheng, X. Zeng, and J. X. Yu, Top-k graph pattern matching over large graphs, in ICDE, [9] S. A. Cook, The complexity of theorem-proving procedures, in STOC, [10] H. Whitney, Congruent graphs and the connectivity of graphs, AJAM, vol. 54, no. 1, [11] H. Tong and C. Faloutsos, Center-piece subgraphs: problem definition and fast solutions, in KDD, [12] N. Z. Gong, W. Xu, L. Huang, P. Mittal, E. Stefanov, V. Sekar, and D. Song, Evolution of social-attribute networks: Measurements, modeling, and implications using google+, CoRR, [13] N. Z. Gong, A. Talwalkar, L. W. Mackey, L. Huang, E. C. R. Shin, E. Stefanov, E. Shi, and D. Song, Predicting links and inferring attributes using a social-attribute network san), CoRR, [14] P. Erdo s and A. Re nyi, On random graphs, I, Publicationes Mathematicae-Debrecen, vol. 6, [15] D. J. Watts and S. H. Strogatz, Collective dynamics of small-world networks, Nature, vol. 393, [16] S. Pandit, D. H. Chau, S. Wang, and C. Faloutsos, Netprobe: A fast and scalable system for fraud detection in online auction networks, in WWW, [17] S. Guha, R. Rastogi, and K. Shim, Rock: a robust clustering algorithm for categorical attributes, in ICDE, [18] P. Zhao and J. Han, On graph query optimization in large networks, PVLDB, vol. 3, no. 1, [19] H. Tong, C. Faloutsos, and J.-Y. Pan, Fast random walk with restart and its applications, in ICDM, [20] Y. Fujiwara, M. Nakatsuji, M. Onizuka, and M. Kitsuregawa, Fast and exact top-k search for random walk with restart, PVLDB, vol. 5, no. 5, [21] Y. Koren, S. C. North, and C. Volinsky, Measuring and extracting proximity in networks, in KDD, [22] X. Yu, Y. Sun, B. Norick, T. Mao, and J. Han, User guided entity similarity search using meta-path selection in heterogeneous information networks, in CIKM, [23] H. Tong, C. Faloutsos, and Y. Koren, Fast direction-aware proximity for graph mining, in KDD, [24] T. Coffman, S. Greenblatt, and S. Marcus, Graph-based technologies for intelligence analysis, CACM, vol. 47, no. 3, [25] B. Messmer and H. Bunke, Subgraph isomorphism detection in polynomial time on preprocessed model graphs, in Recent Developments in Computer Vision, 1996, vol [26] J. R. Ullmann, An algorithm for subgraph isomorphism, JACM, vol. 23, no. 1, [27] T. Washio and H. Motoda, State of the art of graph-based data mining, SIGKDD Explorations, vol. 5, no. 1, 2003.
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Graph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
KEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
Complex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich [email protected] 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction
Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource
An Introduction to APGL
An Introduction to APGL Charanpal Dhanjal February 2012 Abstract Another Python Graph Library (APGL) is a graph library written using pure Python, NumPy and SciPy. Users new to the library can gain an
Introduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003
Graph models for the Web and the Internet Elias Koutsoupias University of Athens and UCLA Crete, July 2003 Outline of the lecture Small world phenomenon The shape of the Web graph Searching and navigation
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
How To Find Local Affinity Patterns In Big Data
Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class
Social Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
MapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
Social Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION
A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS Kyoungjin Park Alper Yilmaz Photogrammetric and Computer Vision Lab Ohio State University [email protected] [email protected] ABSTRACT Depending
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, [email protected] Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
Why? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
Fast Contextual Preference Scoring of Database Tuples
Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
Mining Social-Network Graphs
342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is
System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1
System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect
A discussion of Statistical Mechanics of Complex Networks P. Part I
A discussion of Statistical Mechanics of Complex Networks Part I Review of Modern Physics, Vol. 74, 2002 Small Word Networks Clustering Coefficient Scale-Free Networks Erdös-Rényi model cover only parts
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
InfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
Graph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Travis Goodwin & Sanda Harabagiu
Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research
Network (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 [email protected],
Adaptive information source selection during hypothesis testing
Adaptive information source selection during hypothesis testing Andrew T. Hendrickson ([email protected]) Amy F. Perfors ([email protected]) Daniel J. Navarro ([email protected])
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
Compact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
Link Prediction in Social Networks
CS378 Data Mining Final Project Report Dustin Ho : dsh544 Eric Shrewsberry : eas2389 Link Prediction in Social Networks 1. Introduction Social networks are becoming increasingly more prevalent in the daily
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD
Journal homepage: www.mjret.in ISSN:2348-6953 IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD Deepak Ramchandara Lad 1, Soumitra S. Das 2 Computer Dept. 12 Dr. D. Y. Patil School of Engineering,(Affiliated
American Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access
SCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
Implementing Graph Pattern Mining for Big Data in the Cloud
Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya [email protected]
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Cloud Computing is NP-Complete
Working Paper, February 2, 20 Joe Weinman Permalink: http://www.joeweinman.com/resources/joe_weinman_cloud_computing_is_np-complete.pdf Abstract Cloud computing is a rapidly emerging paradigm for computing,
A Review And Evaluations Of Shortest Path Algorithms
A Review And Evaluations Of Shortest Path Algorithms Kairanbay Magzhan, Hajar Mat Jani Abstract: Nowadays, in computer networks, the routing is based on the shortest path problem. This will help in minimizing
Big Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
Insider Threat Detection Using Graph-Based Approaches
Cybersecurity Applications & Technology Conference For Homeland Security Insider Threat Detection Using Graph-Based Approaches William Eberle Tennessee Technological University [email protected] Lawrence
On the independence number of graphs with maximum degree 3
On the independence number of graphs with maximum degree 3 Iyad A. Kanj Fenghui Zhang Abstract Let G be an undirected graph with maximum degree at most 3 such that G does not contain any of the three graphs
Bisecting K-Means for Clustering Web Log data
Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
We shall turn our attention to solving linear systems of equations. Ax = b
59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system
Creating Synthetic Temporal Document Collections for Web Archive Benchmarking
Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Kjetil Nørvåg and Albert Overskeid Nybø Norwegian University of Science and Technology 7491 Trondheim, Norway Abstract. In
5. Binary objects labeling
Image Processing - Laboratory 5: Binary objects labeling 1 5. Binary objects labeling 5.1. Introduction In this laboratory an object labeling algorithm which allows you to label distinct objects from a
Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann
Storage Systems Autumn 2009 Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Scaling RAID architectures Using traditional RAID architecture does not scale Adding news disk implies
3.2 Roulette and Markov Chains
238 CHAPTER 3. DISCRETE DYNAMICAL SYSTEMS WITH MANY VARIABLES 3.2 Roulette and Markov Chains In this section we will be discussing an application of systems of recursion equations called Markov Chains.
Cloud Service Placement via Subgraph Matching
Cloud Service Placement via Subgraph Matching Bo Zong, Ramya Raghavendra *, Mudhakar Srivatsa *, Xifeng Yan, Ambuj K. Singh, Kang-Won Lee * Dept. of Computer Science, University of California at Santa
Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?
What is a Network? Network/Graph Theory Network = graph Informally a graph is a set of nodes joined by a set of lines or arrows. 1 1 2 3 2 3 4 5 6 4 5 6 Graph-based representations Representing a problem
The Trip Scheduling Problem
The Trip Scheduling Problem Claudia Archetti Department of Quantitative Methods, University of Brescia Contrada Santa Chiara 50, 25122 Brescia, Italy Martin Savelsbergh School of Industrial and Systems
Visual Structure Analysis of Flow Charts in Patent Images
Visual Structure Analysis of Flow Charts in Patent Images Roland Mörzinger, René Schuster, András Horti, and Georg Thallinger JOANNEUM RESEARCH Forschungsgesellschaft mbh DIGITAL - Institute for Information
Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
GeoGrid Project and Experiences with Hadoop
GeoGrid Project and Experiences with Hadoop Gong Zhang and Ling Liu Distributed Data Intensive Systems Lab (DiSL) Center for Experimental Computer Systems Research (CERCS) Georgia Institute of Technology
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
A Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks
Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Imre Varga Abstract In this paper I propose a novel method to model real online social networks where the growing
Compression algorithm for Bayesian network modeling of binary systems
Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
An Ontology-enhanced Cloud Service Discovery System
An Ontology-enhanced Cloud Service Discovery System Taekgyeong Han and Kwang Mong Sim* Abstract This paper presents a Cloud service discovery system (CSDS) that aims to support the Cloud users in finding
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Evaluating Online Payment Transaction Reliability using Rules Set Technique and Graph Model
Evaluating Online Payment Transaction Reliability using Rules Set Technique and Graph Model Trung Le 1, Ba Quy Tran 2, Hanh Dang Thi My 3, Thanh Hung Ngo 4 1 GSR, Information System Lab., University of
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
Performance Workload Design
Performance Workload Design The goal of this paper is to show the basic principles involved in designing a workload for performance and scalability testing. We will understand how to achieve these principles
SGL: Stata graph library for network analysis
SGL: Stata graph library for network analysis Hirotaka Miura Federal Reserve Bank of San Francisco Stata Conference Chicago 2011 The views presented here are my own and do not necessarily represent the
Data Warehouse Snowflake Design and Performance Considerations in Business Analytics
Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of
Server Load Prediction
Server Load Prediction Suthee Chaidaroon ([email protected]) Joon Yeong Kim ([email protected]) Jonghan Seo ([email protected]) Abstract Estimating server load average is one of the methods that
