Mining Correlated Subgraphs in Graph Databases

Size: px
Start display at page:

Download "Mining Correlated Subgraphs in Graph Databases"

Transcription

1 Mining orrelated Subgraphs in Graph atabases Tomonobu Ozaki 1 and Takenao Ohkawa 2 1 Organization of Advanced Science and Technology, Kobe University 2 Graduate School of Engineering, Kobe University 1-1 Rokkodai-cho, Nada, Kobe, , Japan {tozaki@cs., ohkawa@}kobe-u.ac.jp Abstract. In this paper, we bring the concept of hyperclique pattern in transaction databases into the graph mining and consider the discovery of sets of highly-correlated subgraphs in graph-structured databases. To discover frequent hyperclique patterns in graph databases efficiently, a novel algorithm named HSG is proposed. y considering the generality ordering of subgraphs, HSG employs the depth-first/breadth-first search strategy with powerful pruning techniques based on the upper bound of h-confidence measure. The effectiveness of HSG is assessed through the experiments with real world datasets. 1 Introduction Recently, the research area of correlation mining, that extracts the underlying dependency among objects, attracts a big attention and extensive studies have been reported [25,23,7,15,12]. Among these researches on correlation mining, we focus on the hyperclique pattern discovery [26,27] in this paper. While the most of researches aim at finding mutually dependent pairs of objects efficiently, a hyperclique pattern is a set of highly-correlated items that has high value of an objective measure named h-confidence [26,27]. The h-confidence measure of an itemset P = {i 1,,i m } is designed for capturing the strong affinity relationship and is defined as follows. hconf(p )= min {conf(i l P \{i l })} = sup(p )/ max {sup({i l})} l=1,,m l=1, m where sup and conf are the conventional definitions of support and confidence in association rules[1], respectively. A hyperclique pattern P states that the occurrence of an item i l P in a transaction implies the occurrence of all other items P \{i l } in the same transaction with probability at least hconf(p ). In addition, the cosine similarity between any pair of items in P is greater than or equals to hconf(p )[27]. y these features, hyperclique pattern discovery has been applied successfully to some real world problems [9,18,24]. While hyperclique pattern discovery aims at finding valuable patterns in transaction databases, structured data is becoming increasingly abundant in many application domains recently. Although we can easily expect to get a more powerful T. Washio et al. (Eds.): PAK 2008, LNAI 5012, pp , c Springer-Verlag erlin Heidelberg 2008

2 Mining orrelated Subgraphs in Graph atabases 273 tool for structured data by introducing correlation mining, the most of current research on correlation mining are designed for transaction databases and little attention is paid to mining correlations from structured data. Motivated by these background, in this paper, we tackle the problem of hyperclique pattern discovery in the context of graph mining[21,22] and discuss the effectiveness of the correlation mining in structured domains. The basic idea of hyperclique patterns in graph databases is simple: Instead of items, we employ subgraphs (i.e. patterns) as building blocks of hyperclique patterns. While this simple replacement might seem to be trivial, it gives us new expectations and difficulties. On one hand, the proposed framework extracts sets of mutually dependent or affinitive patterns in graph databases. ecause each pattern gives another view to other patterns in the same set, we can expect to obtain new findings and precise insights. On the other hand, as easily imagined, hyperclique pattern discovery in graph databases is much harder than the traditional tasks because there are exponentially many subgraphs in graph databases and any combinations of those subgraphs are to be potentially candidates. In order to alleviate this combinatorial explosion and to discover hyperclique patterns efficiently, in this paper, we propose a novel algorithm named HSG. HSG reduces the search space effectively by taking into account the generality ordering of hyperclique patterns. The main contributions of this paper are briefly summarized as follows. First, we formulate the new problem of hyperclique pattern discovery in graph databases. Second, we propose a novel algorithm named HSG for solving this problem efficiently. Third, through the experiments with real world datasets, we assess the effectiveness of our proposal. This paper is organized as follows. In section2, after introducing basic notations, we formulate the problem of hyperclique pattern discovery in graph databases. In section3, the proposed algorithm HSG is explained in detail. After mentioned related work in section4, we show the results of the experiments in section5. Finally, we conclude the paper and describe future work in section6. 2 Preliminaries Let L be a finite set of labels. A labeled graph g =(V g,e g,l g )onl consists of a vertex set V g,anedgesete g and a labeling function l g : V g E g Lthat maps each vertex or edge to a label in L. Hereafter, we refer labeled graph as graph simply. Each graph can be represented in so called code word [3,28], that is a unique string which consists of a series of edges associated with connection information. Especially, we employ canonical code word[3,28] which is minimal code word among isomorphic graphs to represent each graph. The lexicographic order on code word gives a total order on graphs. Given two graphs g and g, g< lex g denotes that the code word of g is lexicographically earlier than that of g.ifthe code word of g is a prefix of that of g,wedenoteitasg< pfx g.examplesof graphs and those code words are shown in Fig.1.

3 274 T. Ozaki and T. Ohkawa g 0 g 1 g 2 g 3 g 4 g 5 A A A A (0,1,A,,) (0,1,A,,) (0,1,A,,) (0,1,A,,) (0,1,,,) (0,1,,,) (0,2,A,,) (1,2,,,) (1,2,,,) (2,3,,,) (3,0,,,A) (All edge labels are assumed to be ) (1,2,,,) (2,3,,,) For example, the relations below hold. g 0 < lex g 4 g 1 < lex g 2 g 3 < lex g 5 g 0 < pfx g 1 g 2 < pfx g 3 g 4 < pfx g 5 Fig. 1. Examples of Labeled Graphs and those ode Words Agraphg =(V g,e g,l g ) is called a subgraph of another graph g =(V g,e g,l g ), denoted as g g, if there exists an injective function f : V g V g such that u V g l g (u) = l g (f(u)) and (u, v) E g (f(u),f(v)) E g l g (u, v) = l g (f(u),f(v)). If g g,thenwesaythatg is more general than g.notethat,if g< pfx g holds, then g g also holds[3,28]. ased on the relationship of subgraphs, we consider the joint occurrence of a set of subgraphs in a graph. The most intuitive definition is as follows: Given a set of subgraphs G and a graph g,if g i Gg i g holds, then G is considered as to be occurred in g. However, this simple definition might not be suitable for the hyperclique patterns of subgraphs because large number of uninteresting combinations of subgraphs having large overlaps in a graph will be obtained. Therefore, we introduce another definition in consideration of edge-disjointness to suppress the redundancy. Given a set of m subgraphs G = {g 1,,g m } and a graph g, G is called a set of k-edge disjoint subgraphs of g, denoted as G k g,if there exists the following set of injective functions {f i : V gi V g i =1,,m}: (1) g i Gg i g (2) m i=1 E g i i=1,,m {(f i(u),f i (v)) (u, v) E gi } k The second condition gives the constraint on the edge overlaps. y this constraint, the redundant combinations can be expected to be controlled. For example in Fig.1, while both g 1 g 3 and g 2 g 3 hold, if k is set to be 0, then {g 1,g 2 } 0 g 3 does not hold because of an overlap of edge A- in g 3. We introduce the definitions of support and h-confidence for a set of subgraphs. Let = {d 1,,d N } be a database of N graphs. The support and h-confidence of a set of subgraph G = {g 1,,g m } in are defined as follows: sup (G) = d σ(g, d )/N where σ(g, d )= hconf (G) =sup (G)/ max i=1,,m {sup ({g i })} { 1(G k d ) 0(otherwise) ased on the above preparation, we formulate the problem of mining frequent hyperclique patterns in graph databases (HSG mining in short) below. Given a database of labeled graphs, a positive number called minimum support σ (0 <

4 Mining orrelated Subgraphs in Graph atabases 275 σ 1) and a positive number called minimum h-confidence h c (0 h c 1), then the problem of HSG mining is to find all frequent hyperclique patterns of subgraphs G in such that sup (G) σ, hconf (G) h c and the cardinality of G is more than one. Note that, because we are interested in the sets of mutually dependent subgraphs, the hyperclique patterns of cardinality one are excluded. 3 Mining Hyperclique Patterns of Subgraphs In this section, we propose an algorithm named HSG for mining frequent hyperclique patterns in graph databases. efore describing the concrete algorithm, we show some properties of hyperclique patterns and a data structure called a conditional prefix tree of hyperclique patterns, that are utilized for the effective pruning based on the generality ordering of hyperclique patterns. 3.1 Properties of Hyperclique Patterns Given two sets G 1 and G 2 of subgraphs, if there exists an injective function φ : G 1 G 2 which satisfies g G 1 g φ(g) G 2,thenwesaythatG 1 is more general than G 2 and denote it as G 1 G 2. As shown formally below, given a set of subgraphs G 1, there are two kinds of specializations to obtain a more specific set of subgraphs G 2 from G 1.Note that, while only first kind of specialization is considered in item set mining, the second one also plays the key role in HSG mining. (1) Specialization by addition G 2 is obtained by adding a new subgraph g to G 1, i.e. G 2 = G 1 {g } (2) Specialization by replacement G 2 is obtained by replacing a subgraph g G 1 to a more specific subgraph g ( g), i.e. G 2 =(G 1 \{g}) {g }. The following two lemmas hold in hyperclique patterns of subgraphs based on the generality ordering introduced above. Lemma 1 (Anti-monotone property of support value). Given two sets G 1 and G 2 of subgraphs, if G 1 G 2,thensup (G 1 ) sup (G 2 ) holds. Proof. Obvious from the definition of support value. y this lemma, if a set of subgraphs G 1 does not satisfy the minimum support, then all sets of subgraphs G 2 s.t. G 1 G 2 can be eliminated safely from the candidate of frequent hyperclique patterns. Lemma 2 (Upper bound of h-confidence). Given two sets of subgraphs G 1 = G A G s.t. G A, G A G = and G 2 = G A G s.t. G A G =, if G G, then the following inequality holds. up(g 1,G A )=sup (G 1 )/ max g G A {sup ({g})} hconf (G 2 )

5 276 T. Ozaki and T. Ohkawa Proof. Since G A G 2,max{sup ({g})} max {sup ({g })} holds. y lemma1, g G A g G 2 sup (G 1 ) sup (G 2 ) also holds. Therefore, sup (G 1 )/ max {sup ({g})} g G A sup (G 2 )/ max {sup ({g })} =hconf (G 2 )holds. g G 2 This lemma gives the upper bound of h-confidence. If up(g 1,G A ) does not satisfy the minimum h-confidence h c, then any set of subgraphs G 2 = G A G s.t. G G must not satisfy h c. Furthermore, this lemma also shows the antimonotone property of h-confidence with respect to the specialization by addition. y definition, hconf (G 1 )=up(g 1,G 1 )holds.thus,ifhconf (G 1 ) <h c,then no set of subgraphs obtained by adding some subgraphs to G 1 can satisfy h c. 3.2 A onditional Prefix Tree of Hyperclique Patterns Here, we consider the enumeration of hyperclique patterns in graph databases. According to the reverse search[2], the repeated enumeration of the same pattern can be avoided by generating each pattern from its unique parent. In case of hyperclique patterns of subgraphs, the parent can be uniquely defined by using the total order of graphs formed by code word. The parent of a set of subgraphs G, denoted as p(g), is a set obtained by removing the smallest element with respect to < lex from G, i.e. p(g) =G \{g G g Gg < lex g}. ecause of the anti-monotone property of hyperclique patterns with respect to the specialization by addition shown in lemma1 and 2, all subsets of a frequent hyperclique pattern must be also frequent hyperclique patterns. Furthermore, a hyperclique pattern should be enumerated via its parent to avoid the repeated enumerations. Therefore, in our strategy, a new hyperclique pattern G will be generated by joining two hyperclique patterns G 1 = G {g 1 } and G 2 = G {g 2 } as G = G {g 1 } {g 2 } = G 1 {g 2 }. Note that the enumeration via parent can be naturally realized through the join operation. Since a hyperclique pattern will be generated by joining two hyperclique patterns having the same parent, it is convenient to treat all hyperclique patterns which have the same parent as an unit. Furthermore, in order to effectively utilize the pruning based on the generality ordering, hyperclique patterns in this unit should be organized in consideration of the generality ordering. Motivated by these backgrounds, we propose a tree-shaped data structure called conditional prefix tree of hyperclique patterns, on which our algorithm HSG works, for storing hyperclique patterns which have the same parent in common. A conditional prefix tree of hyperclique patterns PT G =(V G,E G, G, root) is an ordered tree and it stores hyperclique patterns which have a hyperclique pattern G as those parent. The root node root is a dummy node. Each node v in V G, except for root, corresponds to a hyperclique pattern G {g(v)} and has an graph g(v). E G V G V G and G V G V G represent the set of parent-child and sibling relationships, respectively. These are formally defined as follows. E G = {(v 1,v 2) g(v 1) < pfx g(v 2), v V G[ g(v 1) < pfx g(v ) g(v ) < pfx g(v 2)]} {(root, v 3) v V G[ g(v ) < pfx g(v 3)]} G = {(v 1,v 2) g(v 1) < lex g(v 2), v V G[(v,v 1) E G (v,v 2) E G ]}

6 Mining orrelated Subgraphs in Graph atabases 277 G {g 0} G {g 1} A A G {g 2} G {g 3} A A = parent (condition) G g0 A g1 g2 A A g4 g5 G {g 4} G {g 5} g3 A Fig. 2. An Example of onditional Prefix Tree Intuitively speaking, v 1 is the parent of v 2 if the code word of g(v 1 ) is the longest prefix of that of g(v 2 ). If v 3 has no such parent, then root is assigned as the parent of v 3.Notethat, (g 1,g 2 ) E G g 1 g 2 holds. The children of a node are ordered in the lexicographic order < lex. An example of conditional prefix tree is shown in Fig.2. This tree is constructed from six hyperclique patterns that have {G} as parent in common. 3.3 HSG: A Hyperclique Pattern Miner in Graph atabases In this subsection, we propose an algorithm HSG and explain it in detail. The algorithm HSG for mining frequent hyperclique patterns in graph databases is shown in Fig.3. In the following explanation, we use the notations below for the sake of simplicity: G x = G {g(g x )}, G x = G {g(g x)} and G x,y = G {g(g x ),g(g y )} whereweassumeg(g x ) g(g x ). As an input, HSG takes an unconditional prefix tree PT of hyperclique patterns that stores frequent hyperclique patterns of cardinality one, i.e. frequent subgraphs potentially obtained by the conventional graph miners[28,11,10,16]. Then, HSG calls a procedure LoopV with T a = T b = PT (line1 in HSG). HSG consists of two main procedures LoopV and LoopH which realize the join of elements in a conditional prefix tree mutually while considering the generality ordering. LoopV traverses a tree T a in preorder by using recursive call (line5 in LoopV). y using the preorder traversal, elements in T a will be considered in the order of < lex. uring the traversal, LoopV calls LoopH with G, g a and T b (line3 in LoopV). LoopH also traverses a tree T b in preorder (line16 in LoopH). Since T a and T b refer to the same tree at the beginning, if no pruning is applied, all pairs of elements in a conditional prefix tree will be considered. Note that, no repeated enumeration occurs due to the check of g(g a ) lex g(g b ) (line2 in LoopH). uring the recursive calls, LoopH constructs two new conditional prefix trees NT a and NT b which form the search spaces afterwards. NT a is a prefix tree under the condition G a and it is used as an input for discovering hyperclique patterns whose parent is G a,b (line4 in LoopV). NT a will be constructed by adding a new hyperclique pattern G a,b whenever it is obtained (line10 in LoopH). NT b is a

7 278 T. Ozaki and T. Ohkawa Algorithm HSG(PT ) 1: LoopV(, PT, PT ) Procedure LoopV(G, T a, T b ) 1: for each g a T a s children //G {g(g a)} is a frequent hyperclique pattern 2: NT a := new root node, NT b := new root node 3: LoopH(G, g a, T b, NT a, NT b ) //specialize G {g(g a)} by addition 4: LoopV(G {g(g a)}, NT a, NT a) //search on new conditional prefix tree 5: LoopV(G, g a, NT b ) //preorder traversal in T a //specialize G {g(g a)} by replacement Procedure LoopH(G, g a, T b, NT a, NT b ) 1: for each g b T b s children //check G {g(g a), g(g b )} and prune by it 2: if (g(g a) lex g(g b )) then 3: add g b to the last of NT b s children 4: continue 5: if (sup (G {g(g a), g(g b )}) <σ) then continue //pruning (1) 6: if (G up(g {g(g a), g(g b )}, G) <h c) then continue //pruning (2) 7: N a := NT a 8: if (hconf (G {g(g a), g(g b )}) h c) then //pruning (3) 9: ouput(g {g(g a),g(g b )}) //output of a frequent hyperclique pattern 10: := new node, g() :=g(g b ), add to the last of N a s children 11: N a := //replacement of N a 12: N b := new node, g(n b ):=g(g b ), add N b to the last of NT b s children 13: if (up(g {g(g a), g(g b )}, G {g(g a)}) <h c) then //pruning (4) 14: for each g c g b s children add g c to the last of N b s children 15: else 16: LoopH(G, g a, g b, N a, N b ) //preorder traversal in T b // specialize G {g(g a), g(g b )} by replacement Fig. 3. An algorithm HSG of mining hyperclique patterns in graph databases prefix tree under the condition G, on which hyperclique patterns having G a as parents will be mined (line5 in LoopV). onceptually, NT b will be obtained by pruning some branches in T b. Four prunings will be applied in LoopH. They are achieved partially by not adding new vertices to NT a and NT b. The first pruning is based on the antimonotone property of support value in lemma1 (line5 in LoopH). If the support of G a,b is less than the minimum support, then all patterns which are more specific than G a,b must not satisfy the minimum support. Thus, we ignore the following specializations of G a,b by skipping the loop of line1 in LoopH: (1) G a,b by not calling LoopH (line16 in LoopH), (2) patterns obtained by specialization of G a,b by addition by not updating NT a,and(3)g a,b and G a,b by not updating NT b. The second pruning is derived from the upper bound of h-confidence in lemma2 (line6 in LoopH). As similar to the first pruning, all specializations of G a,b will be ignored in the same way. The third pruning is by anti-monotone property of h-confidence with respect to the specialization by addition in lemma2

8 Mining orrelated Subgraphs in Graph atabases 279 (line8 in LoopH). If G a,b dose not satisfy minimum h-confidence, the search for patterns having G a,b as parent will be avoided by not adding G a,b to NT a.the fourth pruning is based on the upper bound of h-confidence in lemma2 (line13 in LoopH). The search for G a,b can be avoided by not calling LoopH. Note that, G a,b as well as G a,b must be considered. Therefore, NT b has to be updated. This is achieved through the update of N b. As shown above, HSG makes the best use of the pruning based on the specializations by using the conditional prefix trees. For HSG, the following theorem holds. Theorem 1. Given an unconditional prefix tree having all frequent subgraphs, HSG discovers all frequent hyperclique patterns without any duplication. Proof. erived from the complete enumeration procedure by the double preorder traversals and the safety prunings guaranteed by lemma1 and 2. Although HSG can discover all frequent hyperclique patterns, the obtained set of hyperclique patterns may contain some redundancy. Since each frequent subgraph in the unconditional prefix tree is treated as an item, if some subgraphs which are equivalent in some senses are contained in the tree, they cause the redundancy. To eliminate obviously redundant patterns, we believe that the frequent subgraphs included in the unconditional prefix tree should be limited to the representatives such as closed subgraphs (a graph g c s.t. g g c g sup (g c ) = sup (g )) or minimal subgraphs (a graph g m s.t. g g g m sup (g m )=sup (g )). In particular, minimal subgraphs might be more suitable if the edge-disjointness is considered in the joint occurrence. Although, to the best of our knowledge, the method which finds minimal subgraphs directly has not been proposed yet, those subgraphs can be obtained by some post-processing of the conventional graph miners[28,11,10,16]. 4 Related Work The concept of HSG mining is inspired by the hyperclique pattern discovery in transaction databases [26,27]. The methods of mining correlated pairs of items have been proposed[25,23,7]. Furthermore, correlated pattern mining based on a pattern-growth methodology in transaction databases has been proposed[15]. ompared with these methods, HSG is different in the point of finding sets of affinitive structured patterns. On the correlation mining in graph databases, a new problem named orrelated Graph Search has been proposed recently[12]. In this problem, Pearson s correlation coefficient[20] is employed as correlation measure and all correlated subgraphs with a query graph will be discovered. This framework is greatly different from our proposal because the different measure is employed and only subgraphs correlated with a given query are considered. Pattern team proposed in [13] is a set of patterns that optimizes some quality measure. The discovery of pattern team may look similar to the HSG mining

9 280 T. Ozaki and T. Ohkawa Table 1. Statistics of atasets V a E a V E escription A synthetic dataset generated by graph generator[5] PTE The Predictive Toxicology Evaluation hallenge[8] TP M The TP AIS Antiviral Screen dataset[4] : # of graphs in datasets. V a, E a: average number of vertices and edges per graph. V, E : # of distinct labels of vertices and edges. because both find the set or combination of patterns. However, pattern team discovery is done by selecting patterns from the given set. In addition, pattern team usually consists of a set of mutually dissimilar and independent patterns for optimizing the quality measure. Similar to the pattern team in some senses, the concept of α-orthogonal patterns in graph databases has been proposed recently[6]. In this framework, a set of frequent maximal subgraphs that are mutually dissimilar with each other will be obtained by employing a randomized search. While treating a set of subgraphs, this framework is also different from the HSG mining because HSG discovers the complete sets of affinitive subgraphs. From the aspect of finding similar patterns, redescription mining [19,17,29] is closely related to the HSG mining. In redescription mining, patterns consist of any combinations of conjunction, disjunction and negation of items and pairs of patterns that occur in almost the same transactions will be discovered. While this framework is very general, neither the application to the structured data nor precise algorithms which use the generality ordering have been proposed yet. 5 Experimental Evaluation To assess the effectiveness of the proposed algorithm, we implement HSG in Java and conduct some experiments with the datasets shown in Table1 on a P (PU: Intel(R) ore2quad 2.4GHz) with 4Gbytes of main memory running Windows XP. Furthermore, another miner phsg, that is HSG without pruning (2) and (4), is also prepared to demonstrate the effects of pruning related to the specialization by replacement. In the experiments, we construct the unconditional prefix trees PT by using minimal subgraphs only. Experimental results are shown in Table2. The obtained number of hyperclique patterns decreases when the value of k is reduced. Furthermore, though not shown in Table2, about 231 million and 17 thousand of hyperclique patterns were obtained if we set σ =0.1,h c =0.9 and k = in PTE and TP M, respectively. This means that the consideration of edge-disjointness succeeds in suppressing the generation of redundant patterns. In all cases, phsg discovers all frequent hyperclique patterns in a reasonable time though at least O( PT 2 ) of candidates will be generated if no pruning applied. Thus, it is understood that the pruning by minimum support is effective enough. Note that, this pruning eliminates the patterns obtained by the specialization by addition as well as the specialization by replacement. ompared

10 Mining orrelated Subgraphs in Graph atabases 281 Table 2. Experimental Results h c k P Time and. P Time and. Results for 1 σ =0.025 ( PT = 1208) σ =0.01 ( PT = 8946) (0.6) 17.4 (32.6) (4.4) (337.9) (0.6) 22.5 (38.2) (6.1) (470.6) (0.5) 22.7 (38.3) (5.7) (513.1) (0.6) 18.0 (32.6) (4.4) (337.9) (0.6) 23.2 (38.2) (6.1) (470.6) (0.5) 23.4 (38.4) (5.7) (514.3) Results for PTE σ =0.1 ( PT = 561) σ =0.05 ( PT = 1441) (1.1) 2.3 (8.3) (8.0) 9.6 (48.0) (2.0) 4.2 (13.9) (13.2) 16.2 (67.0) (2.1) 2.9 (8.4) (9.4) 13.8 (49.9) (7.0) 5.9 (14.8) (28.1) 29.5 (77.2) (4.0) 3.5 (9.1) (11.8) 17.0 (51.4) (35.5) 7.9 (17.1) (77.2) 38.5 (85.4) Results for TP M σ =0.1 ( PT = 417) σ =0.05 ( PT = 1592) (2.4) 2.2 (11.3) (7.7) 9.9 (62.6) (2.8) 3.1 (13.9) (9.0) 15.8 (79.0) (2.7) 3.3 (11.3) (8.0) 14.7 (62.7) (3.4) 4.4 (14.0) (9.7) 22.7 (79.3) (8.8) 4.3 (11.6) (14.1) 19.0 (63.0) (116.6) 5.7 (14.7) (123.1) 28.5 (80.2) k: # of the edge overlaps permitted in the joint occurrence ( means no restriction). P : # of obtained hyperclique patterns. Time: execution time after PT is given (in second). and.: # of candidates enumerated during the search (in thousand). Numbers in parentheses in Time and and. are for phsg. with phsg, the execution time of HSG for real world problems decreases to 16.0% in the maximum and to 33.9% on the average. The number of candidate patterns is also reduced to 15.9% in the maximum and to 30.8% on the average. It is also observed that HSG runs about two times faster than phsg in the synthetic dataset on the average. These reductions are the strong evidences to show the effectiveness of the pruning based on the generality ordering, especially on the specialization by replacement. 6 onclusion In this paper, we formulate the problem of hyperclique pattern discovery in graph databases. To solve this problem efficiently, a novel algorithm named HSG is proposed that utilizes the depth-first/breadth-first search with the effective pruning based on the generality ordering. We believe that HSG can mine hyperclique

11 282 T. Ozaki and T. Ohkawa patterns efficiently not only in other types of structured data but also in transaction databases with the conceptual hierarchy because the conditional prefix trees, on which HSG works, can be constructed naturally from these kinds of datasets. For future work, the theoretical analysis of the proposed algorithm and further experiments with large-scale datasets are necessary. In addition, some more efficient mechanism is required for computing support value of a set of edge disjoint subgraphs. For this objective, we plan to employ the idea of support value computation of edge disjoint subgraphs in a large graph[14]. We also plan to apply the proposed algorithm to top-k correlated pattern discovery as well as to redescription mining in structured databases. References 1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. of 20th International onference on Very Large ata ases (VL 1994), pp (1994) 2. Avis,., Fukuda, K.: Reverse search for enumeration. iscrete Applied Mathematics 65(1-3), (1996) 3. orgelt,.: On canonical forms for frequent graph mining. In: Working Notes of the 3rd International EML/PK- Workshop on Mining Graphs, Trees and Sequences (MGTS 2005), pp (2005) 4. orgelt,., erthold, M.R.: Mining molecular fragments: Finding relevant substructures of molecules. In: Proc. of the 2002 IEEE International onference on ata Mining (IM 2002), pp (2002) 5. heng, J., Ke, Y., Ng, W.: Graphgen: A graph synthetic generator (2006), 6. Hasan, M., haoji, V., Salem, S., esson, J., Zaki, M.: ORIGAMI: Mining representative orthogonal grap patterns. In: Proc. of the 7th IEEE International onference on ata Mining (2007) 7. He, Z., Xu, X., eng, S.: Mining top-k strongly correlated item pairs without minimum correlation threshold. International Journal of Knowledge-based and Intelligent Engineering Systems 10(2), (2006) 8. Helma,., King, R.., Kramer, S., Srinivasan, A.: The predictive toxicology challenge ioinformatics 17(1), (2001) 9. Hu, T., Xiong, H., Sung, S.Y.: o-preserving patterns in bipartite partitioning for topic identification. In: Proc. of the 7th SIAM International onference on ata Mining, pp (2007) 10. Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proc. of the 3rd IEEE International onference on ata Mining, pp (2003) 11. Inokuchi, A., Washio, T., Motoda, H.: omplete mining of frequent patterns from graphs: Mining graph data. Machine Learning 50, (2003) 12. Ke, Y., heng, J., Ng, W.: orrelation search in graph databases. In: Proc. of the 13th AM SIGK International onference on Knowledge iscovery and ata Mining (K 2007), pp (2007) 13. Knobbe, A.J., Ho, E.K.Y.: Pattern teams. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PK LNS (LNAI), vol. 4213, pp Springer, Heidelberg (2006)

12 Mining orrelated Subgraphs in Graph atabases Kuramochi, M., Karypis, G.: Finding Frequent Patterns in a Large Sparse Graph. ata Mining and Knowledge iscovery 11(3), (2005) 15. Lee, Y.-K., Kim, W.-Y., ai, Y.., Han, J.: omine: Efficient mining of correlated patterns. In: Proc. of the 3rd IEEE International onference on ata Mining, pp (2003) 16. Nijssen, S., Kok, J.: A quickstart in frequent structure mining can make a difference. In: Proc. of the 10th AM SIGK International onference on Knowledge iscovery and ata Mining (K 2004), pp (2004) 17. Parida, L., Ramakrishnan, N.: Redescription mining: Structure theory and algorithms. In: Proc. of the 20th National onference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence onference, pp (2005) 18. Qian, T., Xiong, H., Wang, Y., hen, E.: On the strength of hyperclique patterns for text categorization. Information Sciences 177(19), (2007) 19. Ramakrishnan, N., Kumar,., Mishra,., Potts, M., Helm, R.F.: Turning cartwheels: an alternating algorithm for mining redescriptions. In: Proc. of the 10th AM SIGK International onference on Knowledge iscovery and ata Mining, pp (2004) 20. Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proc. of the 8th AM SIGK International onference on Knowledge iscovery and ata Mining, pp AM Press, New York (2002) 21. Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGK Explorations 5(1), (2003) 22. Washio, T., Kok, J.N., e Raedt, L. (eds.): Advances in Mining Graphs, Trees and Sequences. IOS Press, Amsterdam (2005) 23. Xiong, H., rodie, M., Ma, S.: Top-cop: Mining top-k strongly correlated pairs in large databases. In: Proc. of the 6th International onference on ata Mining, pp (2006) 24. Xiong, H., He, X., ing,., Zhang, Y., Kumar, V., Holbrook, S.R.: Identification of functional modules in protein complexes via hyperclique pattern discovery. In: Proc. of the Pacific Symposium on iocomputing, pp (2005) 25. Xiong, H., Shekhar, S., Tan, P.-N., Kumar, V.: Exploiting a support-based upper bound of pearson s correlation coefficient for efficiently identifying strongly correlated pairs. In: Proc. of the 10th AM SIGK International onference on Knowledge iscovery and ata Mining, pp AM Press, New York (2004) 26. Xiong, H., Tan, P.-N., Kumar, V.: Mining strong affinity association patterns in data sets with skewed support distribution. In: Proc. of the 3rd IEEE International onference on ata Mining (IM 2003), pp (2003) 27. Xiong, H., Tan, P.-N., Kumar, V.: Hyperclique pattern discovery. ata Mining and Knowledge iscovery 13(2), (2006) 28. Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: Proc. of the 2002 IEEE International onference on ata Mining (IM 2002), pp (2002) 29. Zaki, M.J., Ramakrishnan, N.: Reasoning about sets using redescription mining. In: Proceeding of the 11th AM SIGK International onference on Knowledge iscovery in ata Mining, pp (2005)

A New Marketing Channel Management Strategy Based on Frequent Subtree Mining

A New Marketing Channel Management Strategy Based on Frequent Subtree Mining A New Marketing Channel Management Strategy Based on Frequent Subtree Mining Daoping Wang Peng Gao School of Economics and Management University of Science and Technology Beijing ABSTRACT For most manufacturers,

More information

The Minimum Consistent Subset Cover Problem and its Applications in Data Mining

The Minimum Consistent Subset Cover Problem and its Applications in Data Mining The Minimum Consistent Subset Cover Problem and its Applications in Data Mining Byron J Gao 1,2, Martin Ester 1, Jin-Yi Cai 2, Oliver Schulte 1, and Hui Xiong 3 1 School of Computing Science, Simon Fraser

More information

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, kanak.saxena@gmail.com D.S. Rajpoot Registrar,

More information

Discovery of Frequent Episodes in Event Sequences

Discovery of Frequent Episodes in Event Sequences Data Mining and Knowledge Discovery 1, 259 289 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. Discovery of Frequent Episodes in Event Sequences HEIKKI MANNILA heikki.mannila@cs.helsinki.fi

More information

Integrating Pattern Mining in Relational Databases

Integrating Pattern Mining in Relational Databases Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

Binary Coded Web Access Pattern Tree in Education Domain

Binary Coded Web Access Pattern Tree in Education Domain Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi

More information

On the k-path cover problem for cacti

On the k-path cover problem for cacti On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China zeminjin@eyou.com, x.li@eyou.com Abstract In this paper we

More information

A Survey of Graph Pattern Mining Algorithm and Techniques

A Survey of Graph Pattern Mining Algorithm and Techniques A Survey of Graph Pattern Mining Algorithm and Techniques Harsh J. Patel 1, Rakesh Prajapati 2, Prof. Mahesh Panchal 3, Dr. Monal J. Patel 4 1,2 M.E.C.S.E.,KITRC, KALOL 3 HOD, M.E.C.S.E., KITRC, KALOL

More information

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

More information

Association Rule Mining

Association Rule Mining Association Rule Mining Association Rules and Frequent Patterns Frequent Pattern Mining Algorithms Apriori FP-growth Correlation Analysis Constraint-based Mining Using Frequent Patterns for Classification

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

IncSpan: Incremental Mining of Sequential Patterns in Large Database

IncSpan: Incremental Mining of Sequential Patterns in Large Database IncSpan: Incremental Mining of Sequential Patterns in Large Database Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 hcheng3@uiuc.edu Xifeng

More information

Large induced subgraphs with all degrees odd

Large induced subgraphs with all degrees odd Large induced subgraphs with all degrees odd A.D. Scott Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, England Abstract: We prove that every connected graph of order

More information

Graph Mining and Social Network Analysis. Data Mining; EECS 4412 Darren Rolfe + Vince Chu 11.06.14

Graph Mining and Social Network Analysis. Data Mining; EECS 4412 Darren Rolfe + Vince Chu 11.06.14 Graph Mining and Social Network nalysis Data Mining; EES 4412 Darren Rolfe + Vince hu 11.06.14 genda Graph Mining Methods for Mining Frequent Subgraphs priori-based pproach: GM, FSG Pattern-Growth pproach:

More information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering

More information

Implementing Graph Pattern Mining for Big Data in the Cloud

Implementing Graph Pattern Mining for Big Data in the Cloud Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya Ojah.chandana@gmail.com

More information

CLAN: An Algorithm for Mining Closed Cliques from Large Dense Graph Databases

CLAN: An Algorithm for Mining Closed Cliques from Large Dense Graph Databases CLAN: An Algorithm for Mining Closed Cliques from Large Dense Graph Databases Jianyong Wang, Zhiping Zeng, Lizhu Zhou Department of Computer Science and Technology Tsinghua University, Beijing, 100084,

More information

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Yong Zhang 1.2, Francis Y.L. Chin 2, and Hing-Fung Ting 2 1 College of Mathematics and Computer Science, Hebei University,

More information

Molecular Fragment Mining for Drug Discovery

Molecular Fragment Mining for Drug Discovery Molecular Fragment Mining for Drug Discovery Christian Borgelt 1, Michael R. Berthold 2, and David E. Patterson 3 1 School of Computer Science, tto-von-guericke-university of Magdeburg, Universitätsplatz

More information

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Selection of Optimal Discount of Retail Assortments with Data Mining Approach Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.

More information

Types of Degrees in Bipolar Fuzzy Graphs

Types of Degrees in Bipolar Fuzzy Graphs pplied Mathematical Sciences, Vol. 7, 2013, no. 98, 4857-4866 HIKRI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.37389 Types of Degrees in Bipolar Fuzzy Graphs Basheer hamed Mohideen Department

More information

ADI-Minebio: A Graph Mining Algorithm for Biomedical Data

ADI-Minebio: A Graph Mining Algorithm for Biomedical Data ADI-Minebio: A Graph Mining Algorithm for Biomedical Data Rodrigo de Sousa Gomide 1, Cristina Dutra de Aguiar Ciferri 2, Ricardo Rodrigues Ciferri 3, Marina Teresa Pires Vieira 4 1 Goiano Federal Institute

More information

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*

More information

Frequent Subgraph Discovery in Large Attributed Streaming Graphs

Frequent Subgraph Discovery in Large Attributed Streaming Graphs JMLR: Workshop and Conference Proceedings 36:166 181, 2014 BIGMINE 2014 Frequent Subgraph Discovery in Large Attributed Streaming Graphs Abhik Ray abhik.ray@wsu.edu Lawrence B. Holder holder@wsu.edu Washington

More information

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD

IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD Journal homepage: www.mjret.in ISSN:2348-6953 IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD Deepak Ramchandara Lad 1, Soumitra S. Das 2 Computer Dept. 12 Dr. D. Y. Patil School of Engineering,(Affiliated

More information

Mining Mobile Group Patterns: A Trajectory-Based Approach

Mining Mobile Group Patterns: A Trajectory-Based Approach Mining Mobile Group Patterns: A Trajectory-Based Approach San-Yih Hwang, Ying-Han Liu, Jeng-Kuen Chiu, and Ee-Peng Lim Department of Information Management National Sun Yat-Sen University, Kaohsiung, Taiwan

More information

Reducing the Number of Canonical Form Tests for Frequent Subgraph Mining

Reducing the Number of Canonical Form Tests for Frequent Subgraph Mining Reducing the Number of Canonical Form Tests for Frequent Subgraph Mining Andrés Gago Alonso 1, Jesús A. Carrasco Ochoa 2, José E. Medina Pagola 1, and José F. Martínez Trinidad 2 1 Data Mining Department,

More information

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington

More information

Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets

Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets Pramod S. Reader, Information Technology, M.P.Christian College of Engineering, Bhilai,C.G. INDIA.

More information

A 2-factor in which each cycle has long length in claw-free graphs

A 2-factor in which each cycle has long length in claw-free graphs A -factor in which each cycle has long length in claw-free graphs Roman Čada Shuya Chiba Kiyoshi Yoshimoto 3 Department of Mathematics University of West Bohemia and Institute of Theoretical Computer Science

More information

9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining

9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining 9 Graph Mining, Social Network Analysis, and Multirelational Data Mining We have studied frequent-itemset mining in Chapter 5 and sequential-pattern mining in Section 3 of Chapter 8. Many scientific and

More information

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

More information

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Process Mining by Measuring Process Block Similarity

Process Mining by Measuring Process Block Similarity Process Mining by Measuring Process Block Similarity Joonsoo Bae, James Caverlee 2, Ling Liu 2, Bill Rouse 2, Hua Yan 2 Dept of Industrial & Sys Eng, Chonbuk National Univ, South Korea jsbae@chonbukackr

More information

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan

More information

Chapter 6: Episode discovery process

Chapter 6: Episode discovery process Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing

More information

DEGREES OF CATEGORICITY AND THE HYPERARITHMETIC HIERARCHY

DEGREES OF CATEGORICITY AND THE HYPERARITHMETIC HIERARCHY DEGREES OF CATEGORICITY AND THE HYPERARITHMETIC HIERARCHY BARBARA F. CSIMA, JOHANNA N. Y. FRANKLIN, AND RICHARD A. SHORE Abstract. We study arithmetic and hyperarithmetic degrees of categoricity. We extend

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

Every tree contains a large induced subgraph with all degrees odd

Every tree contains a large induced subgraph with all degrees odd Every tree contains a large induced subgraph with all degrees odd A.J. Radcliffe Carnegie Mellon University, Pittsburgh, PA A.D. Scott Department of Pure Mathematics and Mathematical Statistics University

More information

DualIso: Scalable Subgraph Pattern Matching On Large Labeled Graphs SUPPLEMENT. Computer Science Department

DualIso: Scalable Subgraph Pattern Matching On Large Labeled Graphs SUPPLEMENT. Computer Science Department DualIso: Scalable Subgraph Pattern Matching On Large Labeled Graphs SUPPLEMENT Matthew Saltz, Ayushi Jain, Abhishek Kothari, Arash Fard, John A. Miller and Lakshmish Ramaswamy Computer Science Department

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

New Matrix Approach to Improve Apriori Algorithm

New Matrix Approach to Improve Apriori Algorithm New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, rehab.alwan@majancolleg.edu.om Associate

More information

Degrees that are not degrees of categoricity

Degrees that are not degrees of categoricity Degrees that are not degrees of categoricity Bernard A. Anderson Department of Mathematics and Physical Sciences Gordon State College banderson@gordonstate.edu www.gordonstate.edu/faculty/banderson Barbara

More information

A Sublinear Bipartiteness Tester for Bounded Degree Graphs

A Sublinear Bipartiteness Tester for Bounded Degree Graphs A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron February 5, 1998 Abstract We present a sublinear-time algorithm for testing whether a bounded degree graph is bipartite

More information

Multi-table Association Rules Hiding

Multi-table Association Rules Hiding Multi-table Association Rules Hiding Shyue-Liang Wang 1 and Tzung-Pei Hong 2 1 Department of Information Management 2 Department of Computer Science and Information Engineering National University of Kaohsiung

More information

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING AN EFFIIENT APPROAH TO PERFORM PRE-PROESSING S. Prince Mary Research Scholar, Sathyabama University, hennai- 119 princemary26@gmail.com E. Baburaj Department of omputer Science & Engineering, Sun Engineering

More information

8.1 Min Degree Spanning Tree

8.1 Min Degree Spanning Tree CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree

More information

A Time Efficient Algorithm for Web Log Analysis

A Time Efficient Algorithm for Web Log Analysis A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

A Fast and Efficient Method to Find the Conditional Functional Dependencies in Databases

A Fast and Efficient Method to Find the Conditional Functional Dependencies in Databases International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 3, Issue 5 (August 2012), PP. 56-61 A Fast and Efficient Method to Find the Conditional

More information

Procedia Computer Science 00 (2012) 1 21. Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu.

Procedia Computer Science 00 (2012) 1 21. Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu. Procedia Computer Science 00 (2012) 1 21 Procedia Computer Science Top-k best probability queries and semantics ranking properties on probabilistic databases Trieu Minh Nhut Le, Jinli Cao, and Zhen He

More information

136 CHAPTER 4. INDUCTION, GRAPHS AND TREES

136 CHAPTER 4. INDUCTION, GRAPHS AND TREES 136 TER 4. INDUCTION, GRHS ND TREES 4.3 Graphs In this chapter we introduce a fundamental structural idea of discrete mathematics, that of a graph. Many situations in the applications of discrete mathematics

More information

Lift-based search for significant dependencies in dense data sets

Lift-based search for significant dependencies in dense data sets Lift-based search for significant dependencies in dense data sets W. Hämäläinen epartment of omputer Science University of Helsinki Finland whamalai@cs.helsinki.fi ABSTRAT ependency analysis is an important

More information

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010 CS 598CSC: Combinatorial Optimization Lecture date: /4/010 Instructor: Chandra Chekuri Scribe: David Morrison Gomory-Hu Trees (The work in this section closely follows [3]) Let G = (V, E) be an undirected

More information

Fault Analysis in Software with the Data Interaction of Classes

Fault Analysis in Software with the Data Interaction of Classes , pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental

More information

Topology-based network security

Topology-based network security Topology-based network security Tiit Pikma Supervised by Vitaly Skachek Research Seminar in Cryptography University of Tartu, Spring 2013 1 Introduction In both wired and wireless networks, there is the

More information

Business Lead Generation for Online Real Estate Services: A Case Study

Business Lead Generation for Online Real Estate Services: A Case Study Business Lead Generation for Online Real Estate Services: A Case Study Md. Abdur Rahman, Xinghui Zhao, Maria Gabriella Mosquera, Qigang Gao and Vlado Keselj Faculty Of Computer Science Dalhousie University

More information

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Purpose: key concepts in mining frequent itemsets understand the Apriori algorithm run Apriori in Weka GUI and in programatic way 1 Theoretical

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

A Performance Comparison of Five Algorithms for Graph Isomorphism

A Performance Comparison of Five Algorithms for Graph Isomorphism A Performance Comparison of Five Algorithms for Graph Isomorphism P. Foggia, C.Sansone, M. Vento Dipartimento di Informatica e Sistemistica Via Claudio, 21 - I 80125 - Napoli, Italy {foggiapa, carlosan,

More information

On Mining Group Patterns of Mobile Users

On Mining Group Patterns of Mobile Users On Mining Group Patterns of Mobile Users Yida Wang 1, Ee-Peng Lim 1, and San-Yih Hwang 2 1 Centre for Advanced Information Systems, School of Computer Engineering Nanyang Technological University, Singapore

More information

Generating models of a matched formula with a polynomial delay

Generating models of a matched formula with a polynomial delay Generating models of a matched formula with a polynomial delay Petr Savicky Institute of Computer Science, Academy of Sciences of Czech Republic, Pod Vodárenskou Věží 2, 182 07 Praha 8, Czech Republic

More information

GraphZip: A Fast and Automatic Compression Method for Spatial Data Clustering

GraphZip: A Fast and Automatic Compression Method for Spatial Data Clustering GraphZip: A Fast and Automatic Compression Method for Spatial Data Clustering Yu Qian Kang Zhang Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75083-0688, USA {yxq012100,

More information

Multi-layer Structure of Data Center Based on Steiner Triple System

Multi-layer Structure of Data Center Based on Steiner Triple System Journal of Computational Information Systems 9: 11 (2013) 4371 4378 Available at http://www.jofcis.com Multi-layer Structure of Data Center Based on Steiner Triple System Jianfei ZHANG 1, Zhiyi FANG 1,

More information

Computer Science Department. Technion - IIT, Haifa, Israel. Itai and Rodeh [IR] have proved that for any 2-connected graph G and any vertex s G there

Computer Science Department. Technion - IIT, Haifa, Israel. Itai and Rodeh [IR] have proved that for any 2-connected graph G and any vertex s G there - 1 - THREE TREE-PATHS Avram Zehavi Alon Itai Computer Science Department Technion - IIT, Haifa, Israel Abstract Itai and Rodeh [IR] have proved that for any 2-connected graph G and any vertex s G there

More information

An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,

More information

A Graph-Theoretic Network Security Game

A Graph-Theoretic Network Security Game A Graph-Theoretic Network Security Game Marios Mavronicolas 1, Vicky Papadopoulou 1, Anna Philippou 1, and Paul Spirakis 2 1 Department of Computer Science, University of Cyprus, Nicosia CY-1678, Cyprus.

More information

Network Algorithms for Homeland Security

Network Algorithms for Homeland Security Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially

More information

Visual Analysis Tool for Bipartite Networks

Visual Analysis Tool for Bipartite Networks Visual Analysis Tool for Bipartite Networks Kazuo Misue Department of Computer Science, University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, 305-8573 Japan misue@cs.tsukuba.ac.jp Abstract. To find hidden features

More information

Mining the Temporal Dimension of the Information Propagation

Mining the Temporal Dimension of the Information Propagation Mining the Temporal Dimension of the Information Propagation Michele Berlingerio, Michele Coscia 2, and Fosca Giannotti 3 IMT-Lucca, Lucca, Italy 2 Dipartimento di Informatica, Pisa, Italy {name.surname}@isti.cnr.it

More information

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, rajalakshmi@cit.edu.in

More information

Mining Association Rules: A Database Perspective

Mining Association Rules: A Database Perspective IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 69 Mining Association Rules: A Database Perspective Dr. Abdallah Alashqur Faculty of Information Technology

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

More information

Graphical degree sequences and realizations

Graphical degree sequences and realizations swap Graphical and realizations Péter L. Erdös Alfréd Rényi Institute of Mathematics Hungarian Academy of Sciences MAPCON 12 MPIPKS - Dresden, May 15, 2012 swap Graphical and realizations Péter L. Erdös

More information

Cacti with minimum, second-minimum, and third-minimum Kirchhoff indices

Cacti with minimum, second-minimum, and third-minimum Kirchhoff indices MATHEMATICAL COMMUNICATIONS 47 Math. Commun., Vol. 15, No. 2, pp. 47-58 (2010) Cacti with minimum, second-minimum, and third-minimum Kirchhoff indices Hongzhuan Wang 1, Hongbo Hua 1, and Dongdong Wang

More information

Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor

Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor Online algorithms for combinatorial problems Ph.D. Thesis by Judit Nagy-György Supervisor: Péter Hajnal Associate Professor Doctoral School in Mathematics and Computer Science University of Szeged Bolyai

More information

Determination of the normalization level of database schemas through equivalence classes of attributes

Determination of the normalization level of database schemas through equivalence classes of attributes Computer Science Journal of Moldova, vol.17, no.2(50), 2009 Determination of the normalization level of database schemas through equivalence classes of attributes Cotelea Vitalie Abstract In this paper,

More information

Neovision2 Performance Evaluation Protocol

Neovision2 Performance Evaluation Protocol Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram rajmadhan@mail.usf.edu Dmitry Goldgof, Ph.D. goldgof@cse.usf.edu Rangachar Kasturi, Ph.D.

More information

Effective Data Mining Using Neural Networks

Effective Data Mining Using Neural Networks IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 8, NO. 6, DECEMBER 1996 957 Effective Data Mining Using Neural Networks Hongjun Lu, Member, IEEE Computer Society, Rudy Setiono, and Huan Liu,

More information

Web Usage Mining - Languages and Algorithms. 1 Introduction. John R. Punin, Mukkai S.Krishnamoorthy, Mohammed J.Zaki

Web Usage Mining - Languages and Algorithms. 1 Introduction. John R. Punin, Mukkai S.Krishnamoorthy, Mohammed J.Zaki Web Usage Mining - Languages and lgorithms John R. Punin, Mukkai S.Krishnamoorthy, Mohammed J.Zaki omputer Science Department Rensselaer Polytechnic Institute, Troy NY 12180, US bstract: Web Usage Mining

More information

EVENT CENTRIC MODELING APPROACH IN CO- LOCATION PATTERN ANALYSIS FROM SPATIAL DATA

EVENT CENTRIC MODELING APPROACH IN CO- LOCATION PATTERN ANALYSIS FROM SPATIAL DATA EVENT CENTRIC MODELING APPROACH IN CO- LOCATION PATTERN ANALYSIS FROM SPATIAL DATA Venkatesan.M 1, Arunkumar.Thangavelu 2, Prabhavathy.P 3 1& 2 School of Computing Science & Engineering, VIT University,

More information

Random vs. Structure-Based Testing of Answer-Set Programs: An Experimental Comparison

Random vs. Structure-Based Testing of Answer-Set Programs: An Experimental Comparison Random vs. Structure-Based Testing of Answer-Set Programs: An Experimental Comparison Tomi Janhunen 1, Ilkka Niemelä 1, Johannes Oetsch 2, Jörg Pührer 2, and Hans Tompits 2 1 Aalto University, Department

More information

MapReduce Approach to Collective Classification for Networks

MapReduce Approach to Collective Classification for Networks MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty

More information

On the independence number of graphs with maximum degree 3

On the independence number of graphs with maximum degree 3 On the independence number of graphs with maximum degree 3 Iyad A. Kanj Fenghui Zhang Abstract Let G be an undirected graph with maximum degree at most 3 such that G does not contain any of the three graphs

More information

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

More information

RDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases

RDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases 998 JOURNAL OF SOFTWARE, VOL. 5, NO. 9, SEPTEMBER 2010 RDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases Abdallah Alashqur Faculty of Information Technology Applied Science University

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs

A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs Risto Vaarandi Department of Computer Engineering, Tallinn University of Technology, Estonia risto.vaarandi@eyp.ee Abstract. Today,

More information

GameTime: A Toolkit for Timing Analysis of Software

GameTime: A Toolkit for Timing Analysis of Software GameTime: A Toolkit for Timing Analysis of Software Sanjit A. Seshia and Jonathan Kotker EECS Department, UC Berkeley {sseshia,jamhoot}@eecs.berkeley.edu Abstract. Timing analysis is a key step in the

More information

Mining the Most Interesting Web Access Associations

Mining the Most Interesting Web Access Associations Mining the Most Interesting Web Access Associations Li Shen, Ling Cheng, James Ford, Fillia Makedon, Vasileios Megalooikonomou, Tilmann Steinberg The Dartmouth Experimental Visualization Laboratory (DEVLAB)

More information

Optimal Index Codes for a Class of Multicast Networks with Receiver Side Information

Optimal Index Codes for a Class of Multicast Networks with Receiver Side Information Optimal Index Codes for a Class of Multicast Networks with Receiver Side Information Lawrence Ong School of Electrical Engineering and Computer Science, The University of Newcastle, Australia Email: lawrence.ong@cantab.net

More information

A single minimal complement for the c.e. degrees

A single minimal complement for the c.e. degrees A single minimal complement for the c.e. degrees Andrew Lewis Leeds University, April 2002 Abstract We show that there exists a single minimal (Turing) degree b < 0 s.t. for all c.e. degrees 0 < a < 0,

More information