GRAPH PATTERN MINING: A SURVEY OF ISSUES AND APPROACHES
|
|
|
- Archibald Harrison
- 10 years ago
- Views:
Transcription
1 International Journal of Information Technology and Knowledge Management July-December 2012, Volume 5, No. 2, pp GRAPH PATTERN MINING: A SURVEY OF ISSUES AND APPROACHES B. Bhargavi 1 and K.P. Supreethi 2 Abstract: Most of the internet data that is available in public that are analyzed/archived is graph structured in nature. Graphs form a powerful modeling tool in many areas that include chemistry, biology, www etc. Hence there is a demand for efficiently querying such large graph data. Graph pattern matching problem is to find all the patterns from a large data graph that match the given graph pattern. The survey paper discusses tree pattern matching and graph pattern matching techniques and efficient computation of compressed transitive closure using 2-hop labeling. Join based algorithms are proposed which are two step filter(r-semijoin) and fetch(r-join) steps that are implemented using cluster-based index in relational database context. Optimization techniques like R-join order selection with R-semijoin enhancement and interleaving R-joins and R- semijoins are proposed which make graph pattern matching efficient. Keywords: graph pattern, graph matching, 2-hop labeling, reachability join 1. INTRODUCTION Due to rapid growth of Internet, most of the data is archived and analysed which is available in public is graph-structured in nature. For example, RDF (Resource Description Framework) is a directed labeled graph used to represent resource/property/value triples for describing semantic resources on web. Graphs form a powerful modelling tool to represent various networks in different areas like chemistry, biology, social networks, etc. In biology, complex protein-protein interaction networks can be conveniently expressed using graphs. According to Scifinder (a scientific website about chemical compounds) report, 4000 new chemical compound structures are added each day which can be efficiently represented as large graphs. Twitter, an online social networking site, initially stored its data and maintained relationships of users' data using an open fault-tolerant distributed graph database called FlockDB. Thus, there is a demand for efficiently querying the graph data. Graph Database is a large labeled directed graph or a collection of labeled directed graphs. A graph pattern is a graph query which is constructed by connecting nodes based on links/relationships required by user. Given a graph database and a graph pattern, find all the patterns that match a user given graph pattern is the graph pattern matching problem. But, the graph pattern matching problem is challenging as graph data can be large and graph patterns can be large and complex. Graph pattern matching applications include finding research collaboration information like citation links 1,2 Department of Computer Science and Engineering, JNTU College of Engineering, Hyderabad, India, 1 [email protected], [email protected] analysis from bibliographic data, relationships and their proximity between persons in a social network and finding patterns of interest by scientists in biological networks like protein-protein interaction networks, source code analysis etc. Extensive research has been done and many algorithms are proposed and implemented in following Graph pattern mining areas: frequent subgraph mining using sub-graph isomorphism, exact pattern matching and pattern matching with wildcards based on distance. 1.1 Subgraph Isomorphism There are several algorithms defined that implement subgraph isomorphism for frequent subgraph mining. In frequent subgraph mining, given a graph pattern and a large graph, the problem is to find all the isomorphic subgraphs that match a given graph pattern. Many algorithms are proposed for frequent subgraph mining on certain data and also currently, on uncertain data, i.e. data that is incomplete and inaccurate due to noise. But, subgraph isomorphic problem is an NP-complete problem. 1.2 Graph Pattern Matching The other type of graph pattern matching is given an n-node user-defined graph pattern, find all the patterns from the graph database that match the user-given graph pattern exactly, i.e., every label of each node from graph database should match the labels of user defined graph pattern and also every node reachable from/to every node i.e. edge should match to that of user graph pattern. Different techniques have been proposed for graph pattern matching based on distance threshold described by Zou et al. [10]. The problem is to find all the patterns that may match exactly or inexactly with a limited distance between nodes based on threshold.
2 402 INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND KNOWLEDGE MANAGEMENT The contributions of this paper include: Overview of tree-pattern matching techniques. Overview of various techniques for computing transitive closure and storage of the compressed form efficiently. Survey on 2-hop labeling computation for directed graphs and thus discovering a faster and efficient hierarchical geometry-based approach for 2-hop labeling computation. Suggesting efficient join-based algorithms and optimization techniques for graph pattern matching. In this paper, section 2 covers tree pattern matching techniques. Section 3 describes briefly different datasets that will be tested upon for efficiency and scalability. Section 4 clearly describes about the graph pattern. Section 5 covers internal representation of graph data. Section 6 covers transitive closure storage and computation techniques. Section 7 covers the existing approach and section 8 includes join-based algorithms and optimization techniques and finally in section 9 we give concluding remarks. 2. TREE-PATTERN MATCHING TECHNIQUES Figure 1: A Sample XML Tree Representation (Bruno et al. 2002) Figure 2: A Tree Pattern Tree pattern matching is to find all the patterns in an xml document represented as a tree model that match usergiven tree pattern. For instance, consider xml tree model shown in Figure 1. The query is to find all the authors who have their firstname jane and lastname doe which is represented as a tree-pattern in Figure 2. The tree pattern matching problem is to find all the set of tuples that match the user-defined tree pattern. In Figure 1, the tuple <author3, fn3, ln3> matches the given tree-pattern. There are different tree pattern matching techniques like binary structural joinbased approach, tree pattern matching with stack encoding [8] and tree pattern matching with hierarchical stack encoding [9]. 2.1 Binary Structural Join-based Approach This is the basic approach that evaluates the tree pattern by decomposing the tree pattern into binary structural relationships, using structural join algorithms to match the binary relationships against XML database and then combining the results for finding the set of tuples that match the tree-pattern. But, this approach produces large number of intermediate results. This problem can be overcome by Stack encoding technique described by Bruno et al. [8]. 2.2 Tree Pattern Matching with Stack Encoding Tree pattern matching with stack encoding involves initializing stacks for each user-required node, finding intermediate results of twigs (i.e. a twig is a sub-query that ends at a leaf node), maintaining them in stacks for faster processing of intermediate results and combining the results of twigs efficiently to give the final set of matching tuples. An improvement of the above approach is hierarchical stack encoding described by Chen et al. [9] in which for each node the results are arranged in stack hierarchical to their order in the database and results are enumerated efficiently by hybrid bottom-up and top-down approach. The hierarchical stack encoding approach is applied on Generalized Tree Pattern queries which are generalized form of XQuery. XQuery is a query language for XML documents just like SQL is to relational databases. XPath expressions and operators are used in XQuery that can traverse through elements and attributes of XML data forward or backward. Using FLWOR (for-let-where-order-by) expressions of XQuery, solution to given query is evaluated and results of query are retrieved. The queries will be tree-structured as xml data is modelled as a tree and results are retrieved in a result set tag in XQuery. But these techniques cannot be directly applied on graphs as graphs don t have the good property of trees i.e. the acyclic property. Hence, different approaches are applied on graph pattern matching. 3. XML DATA AS GRAPH Graph database represents data as a graph. The data can be a large real xml document like DBLP data or synthetic XML
3 GRAPH PATTERN MINING: A SURVEY OF ISSUES AND APPROACHES 403 data like XMark which are to be parsed using SAX parser to derive the directed data graph. DBLP XML document maintains bibliographic details of research collaborations of various authors published at various conferences/journals and citation information that can be modelled as a large directed data graph and experimented to evaluate and test for efficiency and validity of various proposed algorithms that work on graph data. Figure 3: Elements and their Relationship in an XMark xml Document XMark is a synthetic XML benchmark known for its irregular schema. It maintains auction site details like when the auction is open, when it is closed, who participated in it, etc. There are elements that internally refer to other elements in the document (For instance, in Figure 3, itemref element refers to item element) which can be used to encode the XMark XML document as a graph. The xmlgen tool of XMark constitutes a scaling factor which can be adjusted for generating XML documents with huge size of upto 10GB for testing scalability issues. 4. GRAPH PATTERN A graph pattern is a pattern interested by user which contains set of vertex labels and edges representing the user-defined query. A graph pattern can take nodes as element tags, attribute -values or comparisons and edges as parent-child relationships or referencing relationships. Vertex labels usually represent element tags while edges represent relationships. Figure 4: Graph Pattern Consider the instance of graph pattern of Figure 5 which represents the query of finding all the authors who have published papers at ICDE, VLDB, EDBT and SIGMOD at the same year. Each edge is defined as reachability joins(r-join) as it represents whether a destination edge label can be reachable from the source edge label. The graph pattern is evaluated as a sequence of R-joins by Cheng et al. [1]. The graph pattern matching problem finds all the set of tuples that match the user-given graph pattern from a large directed data graph. It is to be noted that if there exist R-joins X Y, Y Z in the graph pattern where X,Y,Z represent labels, it implies that the graph pattern satisfies X Z which is not present in the case of subgraph isomorphism making frequent subgraph mining intractable. For instance, in Figure 3, there exists R-joins, year ICDE & ICDE author, it implies that the given graph pattern satisfies year author. 5. INTERNAL REPRESENTATIONS OF DIRECTED DATA GRAPH A directed data graph can be internally represented in general using either the adjacency matrix or adjacency list implementation. In the adjacency matrix implementation, an n*n matrix is maintained for n nodes of directed graph G. Each ijth element in matrix contains one of the values {1, 0, 1} that represent if the ith node is adjacent to the jth node or not. If it is adjacent, the value of ijth element is 1 (the negative sign implies that jth node is the source node for the incoming edge to ith node) else it is 0. Adjacency matrix is generally used to represent dense graphs. Adjacency list is used to internally represent sparse graphs. In this representation, for each node, a list is maintained which contains the list of all the target nodes that are adjacent to the current node. There are several libraries defined to internally represent graphs like in C++, the Boost graph library which supports various representations and user-defined interfaces for graphs, by default, Boost offers its own representation class adjacency_list. In java, JDSL offers rich support for graphs in jdsl.graph. It has a clear separation between interfaces, algorithms and representation. It offers an adjacency list representation of graphs that supports directed and undirected edges. Jgrapht is a java software library which contains classes and interfaces useful to work on different graph algorithms. Using jgraph library, the graphs can be visualized. 6. TRANSITIVE CLOSURE IMPLEMENTATION AND STORAGE TECHNIQUES Transitive closure represents set of paths that satisfy transitivity property. If there is a path from nodes u to w of a directed graph G defined by (u, w) and similarly a path (w, v) then by transitivity there exists a path (u, v). By pre-computing transitive closure, we can access shortest paths faster and determine the existence of paths between two nodes. The following are different techniques of computing and storing Transitive closure:
4 404 INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND KNOWLEDGE MANAGEMENT Transitive closure computation by Warshall s algorithm Multi Interval encoding 2-hop labeling Transitive closure computed by Warshall s algorithm is a naïve approach which traverses each node and computes the reachable paths from every node to other node of directed graph with time complexity O(V 3 ). 6.1 Multi-interval Encoding Chen et al. [11] used stack-based multi-interval encoding Scheme on directed acyclic graphs(dags) where for each node, set of intervals and a postid are assigned that represent the position of the node in graph using algorithms of [12]. Each interval of node [s, e] constitutes e, a post-order number of the node (postid) derived from post-order traversal of the optimal tree cover of DAG as described in [12] and s a post-order number of the lowest descendant node assigned to the current node that are used to represent the ancestor-descendant relationship between two nodes. Multiple intervals for each node of the graph come into existence if there are back edges (i.e., a back edge is an edge not present in optimal tree cover but present in the DAG) to the node. Figure 5: Multi-interval Encoding for a Graph Consider an example of figure 5 in which multi-interval coding is assigned for a graph. First, the graph is condensed to DAG and then intervals are assigned for each node which represents the ancestor-descendant relationship between the nodes. For instance, consider the node b1 which is encoded by set of intervals {[0,4]}. A node is said to be descendant of b1 if postid of the node lies between [0, 4]. In Figure 6, the nodes d1, d2, d3, c1, e1, e2, e3, f1 form the descendants of the node b1. In general, there exists a path a~b if and only if i (0 i n) such that ai.x postidb ai.y. Thus, multi-interval encoding represents the compressed transitive closure of a graph Hop Labeling Cohen et al. [3] defined theories and proposed a solution to compute 2-hop labeling on general graphs. A hop h is defined as a pair (p, v) where p is a path in a graph G & v is one of the end vertices of path p of graph. For each node v in directed graph G, a label L(v) is assigned which has L in (v) that represents all the nodes u in G that can reach v and L out (v) that represents all nodes w in G that are reachable from v, (hence the name 2-hop, one hop from u to v and other hop from v to w) which define the 2-hop reachability labeling. 2-hop cover is defined as collection of hops which cover all connections in G. 2-hop labeling is assigned to the nodes of the graph such that all connections of the graph G are covered. Figure 6: 2-hop Labeling for a Graph For instance, consider the graph in Figure 6 where solid edges are the edges of the graph while dashed edges are hops that are not edges of the graph. The dashed edge, for instance a d represents the hop (a b d, d). The table to the right of graph of Figure 6 shows 2-hop labels assigned for each node. For instance for node g, L in (g) = {d, f} and L out (g) = {c} which implies that the nodes d, f can reach g while g can reach c. Thus, by assigning 2-hop labels, all connections are covered. The problem of finding 2-hop cover/2-hop labeling is found to be an NP-hard problem as it can be reducible to minimum set-cover problem which has no optimal solution. Each node in minimum set-cover problem is represented by S(U w, w, V w ) and centre graph is constructed where w is a node centre of the bipartite graph. The centre with maximum cost is selected. The cost is assigned to the centre nodes based on the criterion of maximum number of paths that the node can cover and this set-cover problem is reduced to densest subgraph problem and all the centre nodes are traversed in decreasing order of cost and nodes are assigned to L in and L out based on whether the node is reachable from or to the current node and covered connections, nodes are pruned from pre-computed transitive closure and it is repeated until all connections are covered. The computation of 2-hop labelling is O(log V ) times the optimal solution and takes O( V log E ) space where V and E represent the total number of vertices and edges of directed graph respectively. Schenkel et al. [4] used the 2-hop labeling technique for creating connection index for heterogeneous xml
5 GRAPH PATTERN MINING: A SURVEY OF ISSUES AND APPROACHES 405 document collection. They improved the 2-hop labelling computation algorithm by implementing it in a divide and conquer approach. It includes the following steps: 1. Partition the original graph with a considerable memory bound size for each partition. 2. Compute transitive closure and 2-hop cover for each partition and store the 2-hop cover on disk. 3. Merge the 2-hop covers for partitions that have one or more cross-partition edges, yielding 2-hop cover. The advantage of this approach is efficient usage of memory and pre-computation of the transitive closure of the partition is required than that of the whole directed graph which makes it faster. Schenkel et al. [5] further made following improvements to compute faster and efficient 2- hop labelling: - Partitioning is done such that transitive closure for each partition fits into memory. - While building partition covers, cross-link targets, i.e. the nodes of one xml document that are linked to the nodes of other document, are selected as center nodes. Skeleton graph that represents cross-link source and target edges is used while building the initial partitioning. They also worked on how to update 2-hop labels when xml elements /documents are inserted or deleted. Insertion is trivial which is done by adding new labels for each inserted edge. Deletion of node/ edge involves forming good documents that separate the graph which is trivial & bad documents are formed that don't separate the graph. Cheng et al. [6] used geometry-based approach to compute 2-hop labelling. The main advantage over previous approaches is that in this approach there is no need to compute transitive closure. First, for a directed graph, construct DAG by condensing each maximal strongly connected component into node. Compute 2-hop cover for DAG. Compute 2-hop cover of G using 2-hop cover computed for DAG. To compute 2-hop cover of DAG, a set of intervals [s, e] are assigned to each node as in multi-interval encoding scheme for the DAG. Similarly, the DAG with reverse edges is constructed and multi-intervals are computed and stored in reachability table. Then a reachability map is constructed based on post-order numbers assigned to nodes of DAG on x-axis and post-order numbers of nodes in reverse DAG on Y-axis. Each point (x, y) in map is assigned a rectangle on the map. R- tree, a height balanced data structure is used to store and retrieve the rectangles of reachability map efficiently. 2-hop cover is computed by mapping 2-hop cover problem onto a 2- dimensional grid point and by using operations on rectangles with the help of R-tree, i.e. bipartite graph is mapped with a virtual center in rectangles, and densest bipartite graph is computed. For the nodes in densest bipartite graph, 2-hop labels are assigned and the process is repeated for the next denser bipartite graph until all connections are covered. This approach focused on finding minimum number of large subsets rather than minimum number of overlapping among subsets which is found to be efficient. Cheng et al. [7] further improved the algorithm. First, the directed graph is converted to DAG and then partitioned into two subgraphs based on bisection. The edges connecting two subgraphs are stored in an induced subgraph and 2-hop cover is computed for it using either R-tree based node-oriented approach or edge oriented approach. Then, all the nodes and edges of induced subgraph are removed from the two partitioned subgraphs, to form two independent subgraphs and 2-hop cover of the resulting subgraphs can be computed independently without need of merging. The subgraphs can be further partitioned if the number of vertices are greater than a threshold other wise 2-hop cover of the subgraph is computed based on their earlier approach. This approach is found to be efficient in computing 2-hop labelling for the directed data graph. 7. AN EXISTING APPROACH OF GRAPH PATTERN MATCHING The approach of graph pattern matching used by Wang et al. [2] on XML documents use the multi interval encoding scheme on directed graphs defined by [12]. For finding possible tuples of a query edge A D, two lists AList and DList, ancestor list and descendant list are maintained with Alist containing set of intervals [x i, y i ] of nodes with label A sorted in descending order of x value and Dlist contains postids sorted in ascending order that are sorted based on increasing post-order number. The lists are merged using range search tree for finding the required matching edges. Wang et al. [2] also worked on subgraph query patterns that are cyclic and devised algorithms based on interval-stack data structure. But the drawback of using multi-interval encoding scheme in graph pattern matching is sorting is to be done while joining and interval encoding is lengthy and computations over them are time-consuming during graph pattern matching which are not present in the following proposed approach. 8. A JOIN BASED APPROACH OF GRAPH PATTERN MATCHING Cheng et al. in [1] used the fast hierarchical geometry-based algorithm defined in [7] for efficiently computing 2-hop labeling in the Join/Semijoin approach of graph pattern matching. Each reachability condition/edge that appears in user-given graph pattern is defined as R-join. Each node of directed graph contains a label X and is uniquely identified
6 406 INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND KNOWLEDGE MANAGEMENT by name xi with i in [1, n] which implies that there are n nodes with same label X which is defined as extent (X). A table is maintained for each such label that represents its extent and 2-hop labels for each identifier node of the label which is termed as base relation. 8.1 Cluster-based R-join Index Cluster-based index is constructed using these base relations which are used for indexing tuples that can join between two relations. The cluster-based index includes B+ tree construction of centre nodes that are formed during 2-hop labeling computation. B+ tree is a widely used indexing mechanism in relational databases. The leaf node of B+ tree points to two sets termed F-cluster and T-cluster. A node in F-cluster can reach a node in T-cluster via the centre node. These clusters are further grouped into labeled subclusters based on labels assigned to nodes. W-table is constructed for edges, that is, for all possible label pairs, set of centers are included. A center is included in W(X, Y) if the center node has X-labeled F-subcluster and Y-labeled T-subcluster. For computing an R-join X'Y, W-table gives the center for the edge. Using the center in R-join index, resultant set of tuples is the cartesian product of X-labeled F-subcluster and Y-labeled T-subcluster. 8.2 Join Based Algorithms Using the following algorithms, the result of matching set of tuples is obtained for a user-given graph pattern: Algorithm1 is implemented initially for an edge which gives the result of join between two base relations using cluster-based R-join index.algorithm2 which contains filter step that filters/prunes unwanted tuples from a temporal relation and fetch step that gives the resultant join of temporal relation and a base relation. Algorithm1: Input: an edge E(X Y), Tx, Ty Output: Set of binary tuples<xi, yi> Get centers from W-table for E. For each center w, get X-labeled F-subcluster nodes and Y-labeled T-subcluster nodes from R-join index. Compute the cartesian product of two subclusters. Algorithm2: Input: an edge X X, T y Output: Set of tuples. Y, a temporal relation T r containing Filter() step: Prune the tuples from T r that contain x i that cannot reach y i and store the remaining tuples with centers as < r i, w i > in T w. Fetch() step: For each w i, get set of Y-labeled T- subcluster nodes {y i } and compute cartesian product of {r i } and {y i } which gives resultant set of tuples. Algorithm1 can be computed by just accessing cluster -based index while Algorithm2 requires access to base relations. 8.3 R-semijoin The Filter step of Algorithm2 is considered to be R-semijoin. A unique feature of the proposed R-semijoin is that the R- join algorithms must first process R-semijoin to complete R-join. R-semijoin processing includes sequence of R- semijoins that together involves one scan of the temporal relation if in the sequence there is no R-join and processing cost of R-semijoin is estimated to be small. 8.4 Order Selection and Optimization R-join/R-semijoin order selection is important for optimizing the join based algorithms. For R-join order selection, dynamic programming technique is used to optimize the cost of join processing. The cost of joins is the estimation of joins/semijoins sizes based on status and moves. A status defines the cost of subquery in the directed data graph. A move defines adding an edge to the current status to get new status. The better approach in R-join order selection is the greedy approach which is proposed for R-join order selection [1]. Optimization is achieved by following techniques: R-join order selection followed by R-semijoin enhancement In this technique, optimal R-join order is identified from the proposed approach. R-semijoins are added to same sequence of R-joins based on transitive closure of R-semijoins. By Join-Semijoin reorder, semijoins that are movable are reordered by moving them to left in the sequence. Additional R-joins can be computed from the edge transitive closure of the given graph pattern and added as R-semijoins. Interleaving R-joins with R-semijoins Using status and move, R-joins can be interleaved with R-semijoins. There are 3 possible moves. Filter move through which semijoins are added to the current sequence. Fetch move by which self R-join can be processed and an R- join move which is only allowed to move from initial status to final status. Using these moves R-joins are interleaved with R-semijoins for optimization which is observed to perform better than the former optimization approach of R-join order selection with R-semijoin enhancement. 9. CONCLUDING REMARKS A new join-based approach is proposed for implementing graph pattern matching efficiently. Graph pattern matching
7 GRAPH PATTERN MINING: A SURVEY OF ISSUES AND APPROACHES 407 problem is to find the matching patterns to the user-defined graph pattern from a large directed data graph. The usergiven n-node graph pattern is processed for finding a set of n-ary tuples as a sequence of reachability joins(r-joins) where an edge represents an R-join. First, from xml document that is used for graph data representation, a large directed graph is constructed. Extensive survey has been done for faster computation of 2-hop labeling techniques for transitive closure representation. A fast and efficient hierarchical geometry-based algorithm is used to implement compressed transitive closure algorithm that assigns 2-hop labels to each node. Then, base relations are constructed for each label that contains 2-hop labelling information. A cluster-based index is constructed from 2-hop labels for faster access of tuples. Efficient algorithms are designed that use base relations and cluster-based index for finding the edge reachability joins. A unique feature of this approach is that R-semijoins are to be processed first before processing R-joins. Additionally, optimization techniques are used that include join-order selection with semijoin enhancement or interleaving R-joins with R-semijoins. The proposed approach computes faster than the existing approach as it does not require sorting and computations over multiintervals during graph pattern matching. References [1] J. Cheng, Jeffrey Xu Yu and Phillip S.Yu, Graph Pattern Matching: A Join/Semijoin Approach, IEEE Transactions on Knowledge and Data Engineering, 23(7), pp [2] H. Wang, W. Wang, X. Lin, and J. Li, Coding-based Join Algorithms for Structural Queries on Graph-Structured XML Document, Springer Publications [3] E. Cohen, E. Halperin, H. Kaplan, and U. Zwick, Reachability and Distance Queries via 2-Hop Labels, Proc. Ann. ACM-SIAM Symp. Discrete Algorithms (SODA 02). [4] R. Schenkel et al., HOPI: An Efficient Connection Index for Complex XML Document Collections, Proc. Int l Conf. Extending Database Technology: Advances in Database Technology (EDBT 04). [5] R. Schenkel, A. Theobald, and G. Weikum, Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections, Proc. 21st Int l Conf. Data Eng. (ICDE 05). [6] J. Cheng, J.X. Yu, X. Lin, H. Wang, and P.S. Yu, Fast Computation of Reachability Labeling for Large Graphs, Proc. Int l Conf. Extending Database Technology: Advances in Database Technology (EDBT 06). [7] J. Cheng, J.X. Yu, X. Lin, H. Wang, and P.S. Yu, Fast Computing Reachability Labelings for Large Graphs with High Compression Rate, Proc. Int l Conf. Extending Database Technology: Advances in Database Technology (EDBT 08). [8] N. Bruno, N. Koudas, and D. Srivastava, Holistic Twig Joins: Optimal XML Pattern Matching, Proc. ACM SIGMOD. [9] S. Chen, H.G. Li, J. Tatemura, W.P. Hsiung, D. Agrawal, and K.S. Candan, Twig2stack: Bottom-Up Processing of Generalized-Tree-Pattern Queries over XML Documents, Proc. 32nd Int l Conf. Very Large Data Bases (VLDB 06). [10] L. Zou, L. Chen and M.T. Ozsu, Distancejoin: Pattern Match Query in a Large Graph Database, In Proceedings of 35th International Conference on Very Large Data Bases (VLDB 09). [11] L. Chen, A. Gupta, and M.E. Kurul, Stack-Based Algorithms for Pattern Matching on DAGs, In Proceedings of 31st International Conference of Very Large Data Bases (VLDB 05). [12] R. Agrawal, A. Borgida, and H.V. Jagadish, Efficient Management of Transitive Relationships in Large Data and Knowledge Bases, Proc. ACM SIGMOD, 1989.
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Survey On: Nearest Neighbour Search With Keywords In Spatial Databases
Survey On: Nearest Neighbour Search With Keywords In Spatial Databases SayaliBorse 1, Prof. P. M. Chawan 2, Prof. VishwanathChikaraddi 3, Prof. Manish Jansari 4 P.G. Student, Dept. of Computer Engineering&
KEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Fuzzy Duplicate Detection on XML Data
Fuzzy Duplicate Detection on XML Data Melanie Weis Humboldt-Universität zu Berlin Unter den Linden 6, Berlin, Germany [email protected] Abstract XML is popular for data exchange and data publishing
Caching XML Data on Mobile Web Clients
Caching XML Data on Mobile Web Clients Stefan Böttcher, Adelhard Türling University of Paderborn, Faculty 5 (Computer Science, Electrical Engineering & Mathematics) Fürstenallee 11, D-33102 Paderborn,
KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
Graph/Network Visualization
Graph/Network Visualization Data model: graph structures (relations, knowledge) and networks. Applications: Telecommunication systems, Internet and WWW, Retailers distribution networks knowledge representation
Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
A Note on Maximum Independent Sets in Rectangle Intersection Graphs
A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada [email protected] September 12,
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
Social Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif Sakr National ICT Australia UNSW, Sydney, Australia [email protected] Sameh Elnikety Microsoft Research Redmond, WA, USA [email protected]
HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering
HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering Chang Liu 1 Jun Qu 1 Guilin Qi 2 Haofen Wang 1 Yong Yu 1 1 Shanghai Jiaotong University, China {liuchang,qujun51319, whfcarter,yyu}@apex.sjtu.edu.cn
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1
Network (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 [email protected],
Binary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: [email protected] M. Moorthi
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti
Enhancing Traditional Databases to Support Broader Data Management Applications. Yi Chen Computer Science & Engineering Arizona State University
Enhancing Traditional Databases to Support Broader Data Management Applications Yi Chen Computer Science & Engineering Arizona State University What Is a Database System? Of course, there are traditional
Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce
Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore [email protected] Abstract. Processing XML queries over
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.
5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.
1. The advantage of.. is that they solve the problem if sequential storage representation. But disadvantage in that is they are sequential lists. [A] Lists [B] Linked Lists [A] Trees [A] Queues 2. The
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs
Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Kousha Etessami U. of Edinburgh, UK Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 6) 1 / 13 Overview Graphs and Graph
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
Mining Social-Network Graphs
342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?
Database Indexes How costly is this operation (naive solution)? course per weekday hour room TDA356 2 VR Monday 13:15 TDA356 2 VR Thursday 08:00 TDA356 4 HB1 Tuesday 08:00 TDA356 4 HB1 Friday 13:15 TIN090
On the k-path cover problem for cacti
On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China [email protected], [email protected] Abstract In this paper we
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
Towards Full-fledged XML Fragmentation for Transactional Distributed Databases
Towards Full-fledged XML Fragmentation for Transactional Distributed Databases Rebeca Schroeder 1, Carmem S. Hara (supervisor) 1 1 Programa de Pós Graduação em Informática Universidade Federal do Paraná
WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS
WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation [email protected] ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development
Analysis of Algorithms, I
Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications
BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou
BIG DATA VISUALIZATION Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou Let s begin with a story Let s explore Yahoo s data! Dora the Data Explorer has a new
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed
Visual Data Mining with Pixel-oriented Visualization Techniques
Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 [email protected] Abstract Pixel-oriented visualization
How To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm
R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
A LANGUAGE INDEPENDENT WEB DATA EXTRACTION USING VISION BASED PAGE SEGMENTATION ALGORITHM
A LANGUAGE INDEPENDENT WEB DATA EXTRACTION USING VISION BASED PAGE SEGMENTATION ALGORITHM 1 P YesuRaju, 2 P KiranSree 1 PG Student, 2 Professorr, Department of Computer Science, B.V.C.E.College, Odalarevu,
1 Definitions. Supplementary Material for: Digraphs. Concept graphs
Supplementary Material for: van Rooij, I., Evans, P., Müller, M., Gedge, J. & Wareham, T. (2008). Identifying Sources of Intractability in Cognitive Models: An Illustration using Analogical Structure Mapping.
An Introduction to APGL
An Introduction to APGL Charanpal Dhanjal February 2012 Abstract Another Python Graph Library (APGL) is a graph library written using pure Python, NumPy and SciPy. Users new to the library can gain an
How To Find Local Affinity Patterns In Big Data
Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class
Three Effective Top-Down Clustering Algorithms for Location Database Systems
Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
Vector storage and access; algorithms in GIS. This is lecture 6
Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector
FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data
FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez To cite this version: Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez. FP-Hadoop:
Data Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu
Implementing Graph Pattern Mining for Big Data in the Cloud
Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya [email protected]
SCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors
Journal of omputational Information Systems 7:2 (2011) 623-630 Available at http://www.jofcis.com Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors Wenhong WEI 1,, Yong LI 2 1 School
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
Unified XML/relational storage March 2005. The IBM approach to unified XML/relational databases
March 2005 The IBM approach to unified XML/relational databases Page 2 Contents 2 What is native XML storage? 3 What options are available today? 3 Shred 5 CLOB 5 BLOB (pseudo native) 6 True native 7 The
Graph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
Mining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
Automatic Annotation Wrapper Generation and Mining Web Database Search Result
Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Introduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Portable Bushy Processing Trees for Join Queries
Reihe Informatik 11 / 1996 Constructing Optimal Bushy Processing Trees for Join Queries is NP-hard Wolfgang Scheufele Guido Moerkotte 1 Constructing Optimal Bushy Processing Trees for Join Queries is NP-hard
Big Data looks Tiny from the Stratosphere
Volker Markl http://www.user.tu-berlin.de/marklv [email protected] Big Data looks Tiny from the Stratosphere Data and analyses are becoming increasingly complex! Size Freshness Format/Media Type
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Big Data Begets Big Database Theory
Big Data Begets Big Database Theory Dan Suciu University of Washington 1 Motivation Industry analysts describe Big Data in terms of three V s: volume, velocity, variety. The data is too big to process
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan
A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX
ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 3 Issue 2; March-April-2016; Page No. 09-13 A Comparison of Database
Simplified External memory Algorithms for Planar DAGs. July 2004
Simplified External Memory Algorithms for Planar DAGs Lars Arge Duke University Laura Toma Bowdoin College July 2004 Graph Problems Graph G = (V, E) with V vertices and E edges DAG: directed acyclic graph
QuickDB Yet YetAnother Database Management System?
QuickDB Yet YetAnother Database Management System? Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Department of Computer Science, FEECS,
Why? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015
E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing
Image Search by MapReduce
Image Search by MapReduce COEN 241 Cloud Computing Term Project Final Report Team #5 Submitted by: Lu Yu Zhe Xu Chengcheng Huang Submitted to: Prof. Ming Hwa Wang 09/01/2015 Preface Currently, there s
Fuzzy Multi-Join and Top-K Query Model for Search-As-You-Type in Multiple Tables
Fuzzy Multi-Join and Top-K Query Model for Search-As-You-Type in Multiple Tables 1 M.Naveena, 2 S.Sangeetha 1 M.E-CSE, 2 AP-CSE V.S.B. Engineering College, Karur, Tamilnadu, India. 1 [email protected],
Data Structure [Question Bank]
Unit I (Analysis of Algorithms) 1. What are algorithms and how they are useful? 2. Describe the factor on best algorithms depends on? 3. Differentiate: Correct & Incorrect Algorithms? 4. Write short note:
A Fast Algorithm For Finding Hamilton Cycles
A Fast Algorithm For Finding Hamilton Cycles by Andrew Chalaturnyk A thesis presented to the University of Manitoba in partial fulfillment of the requirements for the degree of Masters of Science in Computer
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, [email protected]
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
