Fast Parallel Conversion of Edge List to Adjacency List for Large-Scale Graphs
|
|
|
- Pierce Farmer
- 10 years ago
- Views:
Transcription
1 Fast arallel Conversion of Edge List to Adjacency List for Large-Scale Graphs Shaikh Arifuzzaman and Maleq Khan Network Dynamics & Simulation Science Lab, Virginia Bioinformatics Institute Virginia Tech, Blacksburg, VA 2461, USA {sm1, Abstract In the era of bigdata, we are deluged with massive graph data emerged from numerous social and scientific applications. In most cases, graph data are generated as lists of edges (edge list), where an edge denotes a link between a pair of entities. However, most of the graph algorithms work efficiently when information of the adjacent nodes (adjacency list) for each node are readily available. Although the conversion from edge list to adjacency list can be trivially done on the fly for small graphs, such conversion becomes challenging for the emerging large-scale graphs consisting billions of nodes and edges. These graphs do not fit into the main memory of a single computing machine and thus require distributed-memory parallel or external-memory algorithms. In this paper, we present efficient MI-based distributed memory parallel algorithms for converting edge lists to adjacency lists. To the best of our knowledge, this is the first work on this problem. To address the critical load balancing issue, we present a parallel load balancing scheme which improves both time and space efficiency significantly. Our fast parallel algorithm works on massive graphs, achieves very good speedups, and scales to large number of processors. The algorithm can convert an edge list of a graph with 2 billion edges to the adjacency list in less than 2 minutes using 4 processors. Denoting the number of nodes, edges and processors by n, m, and, respectively, the time complexity of our algorithm is O( m + n + ) which provides a speedup factor of at least Ω(min{,d avg }), where d avg is the average degree of the nodes. The algorithm has a space complexity of O( m ), which is optimal. 1. INTRODUCTION Graph (network) is a powerful abstraction for representing underlying relations in large unstructured datasets. Examples include the web graph [11], various social networks, e.g., Facebook, Twitter [17], collaboration networks [19], infrastructure networks (e.g., transportation networks, telephone networks) and biological networks [16]. We denote a graph by G(V,E), where V and E are the set of vertices (nodes) and edges, respectively, with m = E edges and n = V vertices. In many cases, a graph is specified by simply listing the edges (u,v),(v,w), E, in an arbitrary order, which is called edge list. A graph can also be specified by a collection of adjacency lists of the nodes, where the adjacency list of a node v is the list of nodes that are adjacent to v. Many important graph algorithms, such as computing shortest path, breadth-first search, and depth-first search are executed by exploring the neighbors (adjacent nodes) of the nodes in the graph. As a result, these algorithms work efficiently when the input graph is given as adjacency lists. Although both edge list and adjacency list have a space requirement of O(m), scanning all neighbors of node v in an edge list can take as much as O(m) time compared to O(d v ) time in adjacency list, where d v is the degree of node v. Adjacency matrix is another data structure used for graphs. Much of the earlier work [3,15] use adjacency matrix A[.,.] of order n n for a graph with n nodes. Element A[i, j] denotes whether node j is adjacent to node i. All adjacent nodes of i can be determined by scanning the i-th row, which takes O(n) time compared to O(d i ) time for adjacency list. Further, adjacency matrix has a prohibitive space requirement of O(n 2 ) compared to O(m) of adjacency list. In a real-world network, m can be much smaller than n 2 as the average degree of a node can be significantly smaller than n. Thus adjacency matrix is not suitable for the analysis of emerging large-scale networks in the age of bigdata. In most cases, graphs are generated as list of edges since it is easier to capture pairwise interactions among entities in a system in arbitrary order than to capture all interactions of a single entity at the same time. Examples include capturing person-person connection in social networks and proteinprotein links in protein interaction networks. This is true even for generating massive random graphs [1, 14] which is useful for modeling very large system. As discussed by Leskovec et. al [18], some patterns only exist in massive datasets and they are fundamentally different from those in smaller datasets. While generating such massive random graphs, algorithms usually output edges one by one. Edges incident on a node v are not necessarily generated consecutively. Thus a conversion of edge list to adjacency list is necessary for analyzing these graphs efficiently. Why do we need parallel algorithms? With unprecedented advancement of computing and data technology, we are deluged with massive data from diverse areas such as business and finance [5], computational biology [] and social science [7]. The web has over 1 trillion webpages. Most of the social networks, such as, Twitter, Facebook, and MSN, have millions to billions of users [13]. These networks hardly
2 fit in the memory of a single machine and thus require external memory or distributed memory parallel algorithms. Now external memory algorithms can be very I/O intensive leading to a large runtime. Efficient distributed memory parallel algorithms can solve both problems (runtime and space) by distributing computing tasks and data to multiple processors. In a sequential settings with the graphs being small enough to be stored in the main memory, the problem of converting an edge list to adjacency list is trivial as described in the next section. However, the problem in a distributed-memory setting with massive graphs poses many non-trivial challenges. The neighbors of a particular node v might reside in multiple processors which need to be combined efficiently. Further, computation loads must be well-balanced among the processors to achieve a good performance of the parallel algorithm. Like many others, this problem demonstrates how a simple trivial problem can turn into a challenging problem when we are dealing with bigdata. Contributions. In this paper, we study the problem of converting edge list to adjacency list for large-scale graphs. We present MI-based distributed-memory parallel algorithms which work for both directed and undirected graphs. We devise a parallel load balancing scheme which balances the computation load very well and improves the efficiency of the algorithms significantly, both in terms of runtime and space requirement. Furthermore, we present two efficient merging schemes for combining neighbors of a node from different processors message-based and external-memory merging which offer a convenient trade-off between space and runtime. Our algorithms work on massive graphs, demonstrate very good speedups on both real and artificial graphs, and scale to a large number of processors. The edge list of a graph with 2B edges can be converted to adjacency list in two minutes using 4 processors. We also provide rigorous theoretical analysis of the time and space complexity of our algorithms. The time and space complexity of our algorithms are O( m + n + ) and O( m ), respectively, where n, m, and are the number of the nodes, edges, and processors, respectively. The speedup factor is at least Ω(min{,d avg }), where d avg is the average degree of the nodes. Organization. The rest of the paper is organized as follows. The preliminary concepts and background of the work are briefly described in Section 2. Our parallel algorithm along with the load balancing scheme is presented in Section 3. Section 5 discusses our experimental results, and we conclude in Section RELIMINARIES AND BACKGROUND In this section, we describe the basic definitions used in this paper and then present a sequential algorithm for converting edge list to adjacency list Basic definitions We assume n vertices of the graph G(V,E) are labeled as,1,2,...,n 1. We use the words node and vertex interchangeably. If (u,v) E, we say u and v are neighbors of each other. The set of all adjacent nodes (neighbors) of v V is denoted by N v, i.e., N v = {u V (u,v) E}. The degree of v is d v = N v. In edge-list representation, edges (u, v) E are listed one after another without any particular order. Edges incident to a particular node v are not necessarily listed together. On the other hand, in adjacency-list representation, for all v, adjacent nodes of v, N v, are listed together. An example of these representations is shown in Figure a) Example Graph (, 1) (1, 2) (1, 3) (1, 4) (2, 3) (3, 4) b) Edege List N = {1} N 1 = {, 2, 3, 4} N 2 = {1, 3} N 3 = {1, 2, 4} N 4 = {1, 3} c) Adjacency List Figure 1: The edge list and adjacency list representations of an example graph with 5 nodes and 6 edges A Sequential Algorithm The sequential algorithm for converting edge list to adjacency list works as follows. Create an empty list N v for each node v, and then, for each edge (u,v) E, include u in N v and v in N u. The pseudocode of the sequential algorithm is given in Figure 2. For a directed graph, line 5 of the algorithm should be omitted since a directed edge (u,v) doesn t imply that there is also an edge (v,u). In our subsequent discussion, we assume that the graph is undirected. However, the algorithm also works for the directed graph with the mentioned modification. This sequential algorithm is optimal since it takes O(m) time to process O(m) edges and thus can not be further improved. The algorithm has a space complexity of O(m). For small graphs that can be stored wholly in the main memory, the conversion in a sequential setting is trivial. However, emerging massive graphs pose many non-trivial challenges in terms of memory and execution efficiency. Such graphs might not fit in the local memory of a single computing node. Even if some of them fit in the main memory, the runtime might be prohibitively large. Efficient parallel algorithms can solve this problem by distributing computation and data among computing nodes. We present our parallel algorithm in the next section. 3. THE ARALLEL ALGORITHM In this section, first we present the computational model and an overview of our parallel algorithm. A detailed description follows thereafter.
3 1: for each v V do 2: N v / 3: for each (u, v) E do 4: N u N u {v} 5: N v N v {u} Figure 2: Sequential algorithm for converting edge list to adjacency list Computation Model We develop parallel algorithms for message passing interface (MI) based distributed-memory parallel systems, where each processor has its own local memory. The processors do not have any shared memory, one processor cannot directly access the local memory of another processor, and the processors communicate via exchanging messages using MI constructs Overview of the Algorithm Let be the number of processor used in the computation and E be the list of edges given as input. Our algorithm has two phases of computation. In hase 1, the edges E are partitioned into initial partitions E i and each processor is assigned one such partition. Each processor then constructs neighbor lists from the edges of its own partition. However, edges incident to a particular node might reside in multiple processors, which creates multiple partial adjacency lists for the same node. In hase 2 of our algorithms, such adjacency lists are merged together. Now, performing hase 2 of the algorithm in a cost-effective way is very challenging. Further, computing loads among processors in both phases need to be balanced to achieve a significant runtime efficiency. The load balancing scheme should also make sure that space requirement among processors are also balanced so that large graphs can be processed. We describe the phases of our parallel algorithm in detail as follows (hase 1) Local rocessing The algorithm partitions the set of edges E into partitions E i such that E i E, k E k = E for k 1. Each partition E i has almost m edges to be exact, m edges, except for the last partition which has slightly fewer (m (p 1) m ). rocessor i is assigned partition E i. rocessor i then constructs adjacency lists Nv i for all nodes v such that (.,v) E i or (v,.) E i. Note that adjacency list Nv i is only a partial adjacency list since other partitions E j might have edges incident on v. We call Nv i local adjacency list of v in partition i. For future reference, we define S i as the set of all such Nv. i The pseudocode for hase 1 computation is presented in Figure 3. This phase of computation has both the runtime and space complexity of O( m ) as shown in Lemma 1. 1: Each processor i, in parallel, executes the following. 2: for (u,v) E i do 3: N i v N i v {u} 4: N i u N i u {v} Figure 3: Algorithm for performing hase 1 computation. Lemma 1 hase 1 of our parallel algorithm has both the runtime and space complexity of O( m ). roof: Each initial partition i has E i = O( m ) edges. Executing Line 3-4 in Figure 3 for O( m ) edges requires O( m ) time. Now the total space required for storing local adjacency lists Nv i in partition i is 2 E i = O( m ). Thus the computing loads and space requirements in hase 1 are well-balanced. The second phase of our algorithm constructs the final adjacency list N v from local adjacency lists Nv i from all processors i. Note that balancing load for hase 1 doesn t make load well balanced for hase 2 which requires a more involved load balancing scheme as described later in the following sections (hase 2) Merging Local Adjacency Lists Once all processors complete constructing local adjacency lists N i v, final adjacency lists N v are created by merging N i v from all processors i as follows. 1 N v = Nv i (1) i= The scheme used for merging local adjacency lists has significant impact on the performance of the algorithm. One might think of using a dedicated merger processor. For each node v V i, the merger collects N i v from all other processors and merges them into N v. This requires O(d v ) time for node v. Thus the runtime complexity for merging adjacency lists of all v V is O( v V d v ) = O(m), which is at most as good as the sequential algorithm. Next we present our efficient parallel merging scheme which employs parallel mergers An Efficient arallel Merging Scheme To parallelize hase 2 efficiently, our algorithm distributes the corresponding computation disjointly among processors. Each processor i is responsible for merging adjacency lists N v for nodes v in V i V such that for any i and j, V i V j = / and iv i = V. Note that this partitioning of nodes is different from the initial partitioning of edges. How the nodes in V is distributed among processors crucially affect the load balancing and performance of the algorithm. Further, this partitioning and load balancing scheme should be parallel to ensure the efficiency of the algorithm. Later in Section 3.5., we discuss a parallel algorithm to partition set of nodes V which
4 makes both space requirement and runtime well-balanced. Once the partitions V i are given, the scheme for parallel merging works as follows. Step 1: Let S i be the set of all local adjacency lists in partition i. rocessor i divides S i into disjoint subsets, j 1, as defined below. S j i S j i = {Ni v : v V j }. (2) Step 2: rocessor i sends S j i to all other processors j. This step introduces non-trivial efficiency issues which we shall discuss shortly. Step 3: Once processor i gets S i j from all processors j, it constructs N v for all v V i by the following equation. N v = Nv k (3) k:n k v S i k We present two methods for performing Step 2 of the above scheme. (1) Message Based Merging: Each processor i sends S j i directly to processor j via messages. Specifically, processor i sends Nv i (with a message < v,nv i >) to processor j where v V j. A processor might send multiple lists to another processor. In such cases, messages to a particular processor are bundled together to reduce communication overhead. Once a processor i receives messages < v,nv j > from other processors, for v V i, it computes N v = 1 j= N v j. The pseudocode of this algorithm is given in Figure 4. 1: for each v s.t. (.,v) E i (v,.) E i do 2: Send < v,n i v > to proc. j where v V j 3: for each v V i do 4: N v = {} // empty set 5: for each < v,n j v > received from any proc. j do 6: N v = N v N j v Figure 4: arallel algorithm for merging local adjacency lists to construct final adjacency lists N v. A message, denoted by < v,n i v >, refers to local adjacency lists of v in processor i. (2) External-memory Merging: Each processor i writes S j j i in intermediate disk files Fi, one for each processor j. rocessor i reads all files Fj i j for partial adjacency lists Nv for each v V i and merges them to final adjacency lists using step 3 of the above scheme. However, processor i doesn t read in the whole file into its main memory. It only stores local adjacency lists Nv j of a node v at a time, merges it to N v, releases memory and then proceeds to merge the next node v+1. This works correctly since while writing S j i in F j i, local adjacency lists Nv i are listed in the sorted order of v. External-memory merging thus has a space requirement of O(max v d v ). However, the I/O operation leads to a higher runtime with this method than message-based merging, although the asymptotic runtime complexity remains the same. We demonstrate this space-runtime tradeoff between these two method experimentally in Section 5. The runtime and space complexity of parallel merging depends on the partitioning of V. In the next section, we discuss the partitioning and load balancing scheme followed by the complexity analyses artitioning and Load Balancing The performance of the algorithm depends on how loads are distributed. In hase 1, distributing the edges of the input graph evenly among processors provides an even load balancing both in terms of runtime and space leading to both space and runtime complexity of O( m ). However, hase 2 is computationally different than hase 1 and requires different partitioning and load balancing scheme. In hase 2 of our algorithm, set of nodes V is divided into subsets V i where processor i merges adjacency lists N v for all v V i. The time for merging N v of a node v (referred to as merging cost henceforth) is proportional to the degree d v = N v of node v. Total cost for merging incurred on a processor i is Θ( v Vi d v ). Distributing equal number of nodes among processors may not make the computing load wellbalanced in many cases. Some nodes may have large degrees and some very small. As shown in Figure 5, distribution of merging cost ( v Vi d v ) across processors is very uneven with an equal number of nodes assigned to each processor. Thus the set V should be partitioned in such a way that the cost of merging is almost equal in all processors. Load 1e+9 1e+8 1e+7 1e+6 Miami LiveJournal Twitter Rank of rocessors Figure 5: Load distribution among processors for LiveJournal, Miami and Twitter before applying the load balancing scheme. Summary of our dataset is provided in Table 2. Let, f (v) be the cost associated to constructing N v by receiving and merging local adjacency lists for a node v V. We need to compute disjoint partitions of node set V such that for each partition V i, f (v) 1 v V i f (v). v V
5 Now, note that, for each node v, total size of the local adjacency lists received (in Line 5 in Figure 4) equals the number of adjacent nodes of v, i.e., N v = d v. Merging local adjacency lists Nv j via set union operation (Line 6) also requires d v time. Thus, f (v) = d v. Now, since the adjacent nodes of a node v can reside in multiple processors, computing f (v) = N v = d v requires communication among multiple processors. For all v, computing f (v) sequentially requires O(m + n) time which diminishes the advantages gained by the parallel algorithm. Thus, we compute f (v) = d v for all v, in parallel in O( n+m + c) time, where c is the communication cost. We will discuss the complexity shortly. This algorithm works as follows: for determining d v for v V, in parallel, processor i computes d v for n nodes v, starting from in to (i+1)n 1. Such nodes v satisfy the equation, v n/ = i. Now, for each local adjacency list Nv i rocessors i constructed in hase 1, it sends dv i = Nv i to processor p = v n/ with a message < v,dv i >. Once processor i receives messages < v,dv j > from other processors, it computes f (v) = d v = 1 j= d v j for = i. The pseudocode of the paral- all nodes v such that v n/ lel algorithm for computing f (v) = d v is given in Figure 6. 1: for each v s.t. (.,v) E i (v,.) E i do 2: dv i Nv i 3: j v n/ 4: Send < v,dv i > to processor j 5: for each v s.t. v n/ = i do 6: d v 7: for each < v,d j v > received from any proc. j do 8: d v d v + d j v Figure 6: arallel algorithm executed by each processor i for computing f (v) = d v. Once f (v) is computed for all v V, we compute cumulative sum F(t) = t v= f (v) in parallel by using a parallel prefix sum algorithm [4]. Each processor i stores F(t) for nodes t, where t spans from in (i+1)n to 1. This computation takes O( n + ) time. Then, we need to compute V i such that computation loads are well-balanced among processors. artitions V i are disjoint subset of consecutive nodes, i.e., V i = {n i,n i ,n (i+1) 1} for some node n i. We call n i start node or boundary node of partition i. Now, V i is computed in such a way that the sum v Vi f (v) becomes almost equal ( 1 v V f (v)) for all partitions i. At the end of this execution, each processor i knows n i and n (i+1). Algorithm presented in [6] compute V i for the problem of triangle counting. The algorithm can also be applied for our problem to compute V i using cost function f (v) = d v. In summary, computing load balancing for hase 2 has the following main steps. Step 1: Compute cost f (v) = d v for all v in parallel by the algorithm shown in Figure 6. Step 2: Compute cumulative sum F(v) by a parallel prefix sum algorithm [4]. Step 3: Compute boundary nodes n i for every subset V i = {n i,...,n (i+1) 1} using the algorithms [2, 6]. Lemma 2 The algorithm for balancing loads for hase 2 has a runtime complexity of O( n+m + + max i M i ) and a space requirement of O( n ), where M i is the number of messages received by processor i in Step 1. roof: For Step 1 of the above load balancing scheme, executing Line 1-4 (Figure 6) requires O( E i )=O( m ) time. The cost for executing Line 5-6 is O( n ) since there are n nodes v such that = i. Each processor i sends a total of O( m ) v n/ messages since E i = m. If the number of messages received by processor i is M i, then Line 7-8 of the algorithm has a complexity of O(M i ) (we compute bounds for M i in Lemma 3). Computing Step 2 has a computational cost of O( n + ) [4]. Step 3 of the load balancing scheme requires O( n + ) time [2,6]. Thus the runtime complexity of the load balancing scheme is O( n+m + + max i M i ). Storing f (v) for n nodes has a space requirement of O( n ). Lemma 3 Number of messages M i received by processor i in Step 1 of load balancing scheme is bounded by O(min{n, (i+1)n/ 1 in/ d v }). roof: Referring to Figure 6, each processor i computes d v for n nodes v, starting from in (i+1)n to 1. For each v, processor i may receive messages from at most ( 1) other processors. Thus, the number of received messages is at most n (p 1) = O(n). Now, notice that, when all neighbors u N v of v reside in different partitions E j, processor i might receive as much as N v = d v messages for node v. This gives another upper bound, M i = O( (i+1)n/ 1 in/ d v ). Thus we have M i = O(min{n, (i+1)n/ 1 in/ d v }). In most of the practical cases, each processor receives much smaller number of messages than that specified by the theoretical upper bound. Now, for each node v, processor i receives messages actually from fewer than 1 processors. Let, for node v, processor i receives messages from.l v processors, where l v is a real number ( l v 1). Thus total number of message received, M i = O( (i+1)n/ 1 in/.l v ). To get a crude estimate of M i, let l v = l for all v. The term l can be thought of as the average over all l v. Then M i = O( n..l) = O(n.l). As shown in Table 1, the actual number of messages received M i is up to 7 smaller than the theoretical bound.
6 Table 1: Number of messages received in practice compared to the theoretical bounds. This results report max i M i with = 5. Summary of our dataset is provided in Table 2. (i+1)n 1 Network n in d v M i l(avg.) Miami 2.1M 2.17M 6K.27 LiveJournal 4.8M 2.4M 56K.14 A(5M, 2) 5M 2.48M 1.4M.28 Lemma 4 Using the load balancing scheme discussed in this section, hase 2 of our parallel algorithm has a runtime complexity of O( m ). Further, the space required to construct all final adjacency lists N v in a partition is O( m ). roof: Line 1-2 in the algorithm shown in Figure 4 requires O( E i )=O( m ) time for sending at most E i edges to other processors. Now, with load balancing, each processor receives at most O( v V d v /) = O( m ) edges (Line 3-6). Thus the cost for merging local lists Nv j into final list N v has a runtime of O( m ). Since the total size of the local and final adjacent lists in a partition is O( m ), the space requirement is O( m ). The runtime and space complexity of our complete parallel algorithm are formally presented in Theorem 1. Theorem 1 The runtime and space complexity of our parallel algorithm is O( m + + n) and O( m ), respectively. roof: The proof follows directly from Lemma 1, 2, 3, and 4. The total space required by all processors to process m edges is O(m). Thus the space complexity O( m ) is optimal. erformance gain with load balancing: Cost for merging incurred on each processor i is Θ( v Vi d v ) (Figure 4). Without load balancing, this cost Θ( v Vi d v ) can be as much as Θ(m) (it is easy to construct such skewed graphs) leading the runtime complexity of the algorithm Θ(m). With load balancing scheme our algorithm achieves a runtime of O( m + + n) = O( m + m d avg ), for usual case n >. Thus, by simple algebraic manipulation, it is easy to see, the algorithm with load balancing scheme achieves a Ω(min{,d avg })-factor gain in runtime efficiency over the algorithm without load balancing scheme. In other words, the algorithm gains a Ω()-fold improvement in speedup when d avg and Ω(d avg )-fold otherwise. We demonstrate this gain in speedup with experimental results in Section DATA AND EXERIMENTAL SETU We present the datasets and the specification of computing resources used in our experiments below. Datasets. We used both real-world and artificially generated networks for evaluating the performance of our algorithm. A summary of all the networks is provided in Table Table 2: Dataset used in our experiments. Notations K, M and B denote thousands, millions and billions, respectively. Network Nodes Edges Source -Enron 37K.36M SNA [2] web-berkstan.69m 13M SNA [2] Miami 2.1M 1M [9] LiveJournal 4.8M 86M SNA [2] Twitter 42M 2.4B [17] Gnp(n,d) n 1 2 nd Erdős-Réyni A(n,d) n 1 2 nd ref. Attachment 2. Twitter [17], web-berkstan, -Enron and LiveJournal [2] are real-world networks. Miami is a synthetic, but realistic, social contact network [6, 9] for Miami city. Network Gnp(n, d) is generated using the Erdős-Réyni random graph model [1] with n nodes and d average degree. Network A(n, d) is generated using preferential attachment (A) model [8] with n nodes and average degree d. Both the realworld and A(n, d) networks have skewed degree distributions which make load balancing a challenging task and give us a chance to measure the performance of our algorithms in some of the worst case scenarios. Experimental Setup. We perform our experiments using a high performance computing cluster with 64 computing nodes, 16 processors (Sandy Bridge E5-267, 2.6GHz) per node, memory 4GB/processor, and operating system SLES ERFORMANCE ANALYSIS In this section, we present the experimental results evaluating the performance of our algorithm Load distribution As discussed in Section 3.5., load distribution among processors is very uneven without applying our load balancing scheme. We show a comparison of load distribution on various networks with and without load balancing scheme in Figure 7. Our load balancing scheme provides an almost equal load among the processors, even for graphs with very skewed degree distribution such as LiveJournal and Twitter Strong Scaling Figure 8 shows strong scaling (speedup) of our algorithm on LiveJournal, Miami and Twitter networks with and without load balancing scheme. Our algorithm demonstrates very good speedups, e.g., it achieves a speedup factor of 3 with 4 processors for Twitter network. Speedup factors increase almost linearly for all networks, and the algorithm scales to a large number of processors. Figure 8 also shows the speedup factors the algorithm achieves without load balancing scheme. Speedup factors with load balancing scheme
7 e+6 2.5e+6 Loads 2e+6 1.5e+6 1e e+6 8e+6 7e+6 Loads 6e+6 5e+6 4e+6 3e+6 2e+6 1e e+8 2e+8 Loads 1.5e+8 1e+8 5e Rank of rocessors (a) Miami network Rank of rocessors (b) LiveJournal network Rank of rocessors (c) Twitter network Figure 7: Load distribution among processors for LiveJournal, Miami and Twitter networks by different schemes Speedup Factor Speedup Factor Speedup Factor Number of rocessors (a) Miami network Number of rocessors (b) LiveJournal network Number of rocessors (c) Twitter network Figure 8: Strong scaling of our algorithm on LiveJournal, Miami and Twitter networks with and without load balancing scheme. Computation of speedup factors includes the cost for load balancing. are significantly higher than those without load balancing scheme. For Miami network, the differences in speedup factors are not very large since Miami has a relatively even degree distribution and loads are already fairly balanced without load balancing scheme. However, for real-world skewed networks, our load balancing scheme always improves the speedup quite significantly for example, with 4 processors, the algorithm achieves a speedup factor of 297 with load balancing scheme compared to 6 without load balancing scheme for LiveJournal network Comparison between Message-based and External-memory Merging We compare the runtime and memory usage of our algorithm with both message-based and external-memory merging. Message-based merging is very fast and uses messagebuffers in main memory for communication. On the other hand, external-memory merging saves main memory by using disk space whereas requires large runtime for I/O operations. Thus these two methods provide desirable alternatives to trade-off between space and runtime. However, as shown in Table 3, message-based merging is significantly faster (up to 2 ) than external-memory merging albeit taking a little larger space. Thus message-based merging is the preferable method in our fast parallel algorithm. Table 3: Comparison of external-memory (EXT) and message-based (MSG) merging (using 5 processors). Network Memory (MB) Runtime (s) EXT MSG EXT MSG -Enron web-berkstan Miami LiveJournal Twitter Gnp(5K, 2) A(5M, 2) A(1B, 2) Weak Scaling Weak scaling of a parallel algorithm shows the ability of the algorithm to maintain constant computation time when the problem size grows proportionally with the increasing number of processors. As shown in Figure 9, total compu-
8 tation time of our algorithm (including load balancing time) grows slowly with the addition of processors. This is expected since the communication overhead increases with additional processors. However, the growth of runtime of our algorithm is rather slow and remains almost constant, the weak scaling of the algorithm is very good. Time Required (s) Weak Scaling Number of rocessors Figure 9: Weak scaling of our parallel algorithm. For this experiment we use networks A(x/1 1M,2) for x processors. 6. CONCLUSION We present a parallel algorithm for converting edge-list of a graph to adjacency-list. The algorithm scales well to a large number of processors and works on massive graphs. We devise a load balancing scheme that improves both space efficiency and runtime of the algorithm, even for networks with very skewed degree distributions. To the best of our knowledge, it is the first parallel algorithm to convert edge list to adjacency list for large-scale graphs. It also allows other graph algorithms to work on massive graphs which emerge naturally as edge lists. Furthermore, this work demonstrates how a seemingly trivial problem becomes challenging when we are dealing with Bigdata. ACKNOWLEDGEMENT This work has been partially supported by DTRA Grant HDTRA , DTRA CNIMS Contract HDTRA1-11-D-16-1, NSF NetSE Grant CNS , and NSF SDCI Grant OCI [4] S. Aluru. Teaching parallel computing through parallel prefix. In SC,, 2. [5] C. Apte et al. Business applications of data mining. Commun. ACM, (8):49 53, Aug. 22. [6] S. Arifuzzaman et al. atric: A parallel algorithm for counting triangles in massive networks. In roceedings of CIKM 13, 213. [7]. Attewell and D. Monaghan. Data Mining for the Social Sciences: An Introduction. University of California ress, 215. [8] A. Barabasi et al. Emergence of scaling in random networks. Science, 286:59 5, [9] C. Barrett et al. Generation and analysis of large synthetic social contact networks. In WSC, pages , 29. [1] B. Bollobas. Random Graphs. Cambridge Univ. ress, 21. [11] A. Broder et al. Graph structure in the web. Computer Networks, pages 39 32, 2. [] J. Chen et al. Biological Data Mining. Chapman & Hall/CRC, 1st edition, 29. [13] S. Chu et al. Triangle listing in massive networks and its applications. In KDD, pages , 211. [14] F. Chung et al. Complex Graphs and Networks. American Mathematical Society, Aug. 26. [15] D. Coppersmith et al. Matrix multiplication via arithmetic progressions. In STOC, pages 1 6, [16] M. Girvan and M. Newman. Community structure in social and biological networks. roc. Natl. Acad. of Sci. USA, 99(): , June 22. [17] H. Kwak et al. What is twitter, a social network or a news media? In WWW, pages 591 6, 21. [18] J. Leskovec. Dynamics of large networks. In h.d. Thesis, ittsburgh, A, USA., 28. [19] M. Newman. Coauthorship networks and patterns of scientific collaboration. roc. Natl. Acad. of Sci. USA, 11(1):52 525, April 24. [2] SNA. Stanford network analysis project. stanford.edu/, 2. REFERENCES [1] M. Alam et al. Distributed-memory parallel algorithms for generating massive scale-free networks using preferential attachment model. In roceedings of SC 13, 213. [2] M. Alam et al. Efficient algorithms for generating massive random networks. Technical Report 13-64, NDSSL at Virginia Tech, May 213. [3] N. Alon et al. Finding and counting given length cycles. Algorithmica, 17:29 223, 1997.
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Social Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
Big Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
5.1 Bipartite Matching
CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson
Mining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
On the k-path cover problem for cacti
On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China [email protected], [email protected] Abstract In this paper we
Distributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Load Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait
Finding and counting given length cycles
Finding and counting given length cycles Noga Alon Raphael Yuster Uri Zwick Abstract We present an assortment of methods for finding and counting simple cycles of a given length in directed and undirected
Why? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE
SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE 2012 SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH (M.Sc., SFU, Russia) A THESIS
SCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks
Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks Benjamin Schiller and Thorsten Strufe P2P Networks - TU Darmstadt [schiller, strufe][at]cs.tu-darmstadt.de
CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.
Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure
Chapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, 2010. Chapter 7: Digraphs
MCS-236: Graph Theory Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, 2010 Chapter 7: Digraphs Strong Digraphs Definitions. A digraph is an ordered pair (V, E), where V is the set
Network (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 [email protected],
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
Analysis of Algorithms, I
Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications
Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar
Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples
How To Find Influence Between Two Concepts In A Network
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing
An Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],
Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs
Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Yong Zhang 1.2, Francis Y.L. Chin 2, and Hing-Fung Ting 2 1 College of Mathematics and Computer Science, Hebei University,
Graph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
Mining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
High degree graphs contain large-star factors
High degree graphs contain large-star factors Dedicated to László Lovász, for his 60th birthday Noga Alon Nicholas Wormald Abstract We show that any finite simple graph with minimum degree d contains a
Load Balancing in MapReduce Based on Scalable Cardinality Estimates
Load Balancing in MapReduce Based on Scalable Cardinality Estimates Benjamin Gufler 1, Nikolaus Augsten #, Angelika Reiser 3, Alfons Kemper 4 Technische Universität München Boltzmannstraße 3, 85748 Garching
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
A Note on Maximum Independent Sets in Rectangle Intersection Graphs
A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada [email protected] September 12,
Parallel Scalable Algorithms- Performance Parameters
www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for
Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, [email protected] Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
ON THE COMPLEXITY OF THE GAME OF SET. {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu
ON THE COMPLEXITY OF THE GAME OF SET KAMALIKA CHAUDHURI, BRIGHTEN GODFREY, DAVID RATAJCZAK, AND HOETECK WEE {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu ABSTRACT. Set R is a card game played with a
136 CHAPTER 4. INDUCTION, GRAPHS AND TREES
136 TER 4. INDUCTION, GRHS ND TREES 4.3 Graphs In this chapter we introduce a fundamental structural idea of discrete mathematics, that of a graph. Many situations in the applications of discrete mathematics
Simplified External memory Algorithms for Planar DAGs. July 2004
Simplified External Memory Algorithms for Planar DAGs Lars Arge Duke University Laura Toma Bowdoin College July 2004 Graph Problems Graph G = (V, E) with V vertices and E edges DAG: directed acyclic graph
Offline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: [email protected] 2 IBM India Research Lab, New Delhi. email: [email protected]
Zachary Monaco Georgia College Olympic Coloring: Go For The Gold
Zachary Monaco Georgia College Olympic Coloring: Go For The Gold Coloring the vertices or edges of a graph leads to a variety of interesting applications in graph theory These applications include various
Network Analysis. BCH 5101: Analysis of -Omics Data 1/34
Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search
COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers
COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: ([email protected]) TAs: Pierre-Luc Bacon ([email protected]) Ryan Lowe ([email protected])
Graph theoretic approach to analyze amino acid network
Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
Small Maximal Independent Sets and Faster Exact Graph Coloring
Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected
Network File Storage with Graceful Performance Degradation
Network File Storage with Graceful Performance Degradation ANXIAO (ANDREW) JIANG California Institute of Technology and JEHOSHUA BRUCK California Institute of Technology A file storage scheme is proposed
Introduction to Scheduling Theory
Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France [email protected] November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling
Mining Social-Network Graphs
342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is
The Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
Data Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data
MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT
MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT 1 SARIKA K B, 2 S SUBASREE 1 Department of Computer Science, Nehru College of Engineering and Research Centre, Thrissur, Kerala 2 Professor and Head,
Systems and Algorithms for Big Data Analytics
Systems and Algorithms for Big Data Analytics YAN, Da Email: [email protected] My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining
Max Flow, Min Cut, and Matchings (Solution)
Max Flow, Min Cut, and Matchings (Solution) 1. The figure below shows a flow network on which an s-t flow is shown. The capacity of each edge appears as a label next to the edge, and the numbers in boxes
MATHEMATICAL ENGINEERING TECHNICAL REPORTS. An Improved Approximation Algorithm for the Traveling Tournament Problem
MATHEMATICAL ENGINEERING TECHNICAL REPORTS An Improved Approximation Algorithm for the Traveling Tournament Problem Daisuke YAMAGUCHI, Shinji IMAHORI, Ryuhei MIYASHIRO, Tomomi MATSUI METR 2009 42 September
A Comparison of General Approaches to Multiprocessor Scheduling
A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA [email protected] Michael A. Palis Department of Computer Science Rutgers University
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
Fast Sequential Summation Algorithms Using Augmented Data Structures
Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik [email protected] Abstract This paper provides an introduction to the design of augmented data structures that offer
Hadoop SNS. renren.com. Saturday, December 3, 11
Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December
How To Cluster Of Complex Systems
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
A Brief Introduction to Property Testing
A Brief Introduction to Property Testing Oded Goldreich Abstract. This short article provides a brief description of the main issues that underly the study of property testing. It is meant to serve as
Labeling outerplanar graphs with maximum degree three
Labeling outerplanar graphs with maximum degree three Xiangwen Li 1 and Sanming Zhou 2 1 Department of Mathematics Huazhong Normal University, Wuhan 430079, China 2 Department of Mathematics and Statistics
PHYSICAL REVIEW LETTERS
PHYSICAL REVIEW LETTERS VOLUME 86 28 MAY 21 NUMBER 22 Mathematical Analysis of Coupled Parallel Simulations Michael R. Shirts and Vijay S. Pande Department of Chemistry, Stanford University, Stanford,
Graph Mining Techniques for Social Media Analysis
Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented
The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,
The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, cheapest first, we had to determine whether its two endpoints
Introduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor
Online algorithms for combinatorial problems Ph.D. Thesis by Judit Nagy-György Supervisor: Péter Hajnal Associate Professor Doctoral School in Mathematics and Computer Science University of Szeged Bolyai
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
On Integer Additive Set-Indexers of Graphs
On Integer Additive Set-Indexers of Graphs arxiv:1312.7672v4 [math.co] 2 Mar 2014 N K Sudev and K A Germina Abstract A set-indexer of a graph G is an injective set-valued function f : V (G) 2 X such that
Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths
Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths Satyajeet S. Ahuja, Srinivasan Ramasubramanian, and Marwan Krunz Department of ECE, University of Arizona, Tucson,
SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis
SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: [email protected] October 17, 2015 Outline
The positive minimum degree game on sparse graphs
The positive minimum degree game on sparse graphs József Balogh Department of Mathematical Sciences University of Illinois, USA [email protected] András Pluhár Department of Computer Science University
Strong and Weak Ties
Strong and Weak Ties Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz April 11, 2016 Elisabeth Lex (KTI, TU Graz) Networks April 11, 2016 1 / 66 Outline 1 Repetition 2 Strong and Weak Ties 3 General
Big Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction
Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource
Mean Ramsey-Turán numbers
Mean Ramsey-Turán numbers Raphael Yuster Department of Mathematics University of Haifa at Oranim Tivon 36006, Israel Abstract A ρ-mean coloring of a graph is a coloring of the edges such that the average
Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs
Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Kousha Etessami U. of Edinburgh, UK Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 6) 1 / 13 Overview Graphs and Graph
SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH
31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS
USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected] ABSTRACT This
Large induced subgraphs with all degrees odd
Large induced subgraphs with all degrees odd A.D. Scott Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, England Abstract: We prove that every connected graph of order
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
