On Network Tools for Network Motif Finding: A Survey Study

Size: px
Start display at page:

Download "On Network Tools for Network Motif Finding: A Survey Study"

Transcription

1 On Network Tools for Network Motif Finding: A Survey Study Elisabeth A. Wong 1,2, Brittany Baur 1, NSF Bio-Grid REU Research Fellows at Univ of Connecticut 2 Bowdoin College 3 Manhattanville College Abstract. Network motifs have been called the building blocks of networks [1]. Graph theory is used to computationally represent and search networks. Many efforts have been put into developing motif discovery tools to search for and find network motifs, patterns or subgraphs within the input network that occur more frequently in the input network than in randomized networks where patterns occur by chance [2]. Complications involved with network motif discovery include the graph isomorphism problem which is NP-complete. A myriad of tools and algorithms have been developed for both full enumeration of subgraphs and methods for avoiding full enumeration in order to lessen runtimes and required computational power. Experimental data from various tools is provided in this paper including (1) runtimes for different subgraph sizes, network sizes, and number of random networks generated, (2) differences in frequencies based on different search restrictions, and (3) protein-protein interaction (PPI) network results. The limitations that still exist especially concerning size of motifs and networks that can be searched are also included. This paper presents a survey study of current network motif discovery tools; algorithms, experimental data, limitations, and pros and cons of tools are examined and discussed. Keywords: network, motif, algorithm, isomorphism 1 Introduction Networks are integral parts of many real systems and thus it has become a priority in many research fields to analyze them. Emphasis has been placed on the importance of studying small aspects of networks in order to gain a better understanding of the entire network. Recently graph theory has been used to allow for computational analysis of networks. Through graph theory, it has been found that numerous networks contain network motifs, small sub-graphs that appear more frequently than expected in randomized networks [1]. Because these motifs are statistically significant, it has been hypothesized that they are also significant to their respective networks and systems; higher frequencies of subgraphs than result by chance suggest that the motifs are present due to factors such as being conserved evolutionarily and having an important function or purpose [1]. Each network has different motifs that are more frequent and thus more important to the system or organism that they are in. For example, gene regulation transcriptional networks and neuronal connectivity networks have been found to have motifs known as feed forward loops [2] and bifans [2]. This suggests that these two networks are similar in some design aspects.

2 The feed-forward loop is thought to be used in information processes where its design helps with controlling connections and signals [1]. In contrast, food web networks which do not deal with information processing have motifs unique to networks such as neuronal connectivity and gene regulation. This demonstrates how motifs are biologically significant in their ability to help analyze, explain, and classify networks. Due to the significance of network motifs many efforts have been put forth into developing tools that can detect network motifs. In order for graph theory to be applied to the study of networks and motifs, networks need to be represented by graphs. Each entity in a network (i.e. a protein, a gene, a person) is represented by a node (or vertex) while the connections between the entities (i.e. an interaction, a regulatory signal, a correspondence) are represented by edges. In some networks the nodes and edges have different characteristics (i.e. different types of genes or different signals passed from gene to gene). In these cases the nodes or edges are colored with each color representing a different kind of entity or connection. Furthermore another aspect of a network that must be considered and included in the graph representation is whether or not a connection or edge is directed or undirected. Gene regulatory networks have directed connections as it is important that the signal travels from one specific gene to another. On the other hand networks such as a social network that counts handshaking as a connection is not directed because the shaking of hands is not direction specific. Undirected edges versus directed edges can be differentiated on a graph with different edge colors and node colors or by arrows. A graph can be split up into small graphs known as subgraphs. As stated before, statistically overrepresented subgraphs are defined as network motifs. The number of edges that enter and exit a node are summed to determine the node s degree and the number of nodes that make up a motif determine the subgraph s size. Tools for network motif discovery have proved very difficult to develop to be both efficient and able to find motifs of all sizes (not just small sized motifs). One of the larger obstacles in finding an efficient and thorough algorithm is the graph isomorphism problem. This problem entails determining if a bijection occurs between the nodes of two graphs and that each corresponding node is adjacent to the same corresponding nodes [3]. Two isomorphic graphs have the same number of nodes and edges and the same number of degrees for corresponding nodes. The graph isomorphism problem is computationally complex and is classified as an NP complete problem. The NAUTY algorithm is a well known and powerful algorithm that has been developed to test for graph isomorphism. It is used by multiple motif discovery tools. The NAUTY algorithm utilizes canonical labeling in order to tell which graphs are isomorphic to each other [3]. If they are isomorphic, then their canonical label should be the same. A canonical label for each graph is formed by taking the adjacency matrix of a graph and concatenating it row by row in order to form a binary number. By leaf partitioning (partitioning the graph into singleton sets) each of the vertices automorphism can be found by checking the adjacency matrix of different orderings of vertices and seeing if the matrix is the same. NAUTY then examines the automorphisms and computes a canonical label, which is the largest or smallest possible concatenated adjacency matrix [4]. Additional difficulties in developing a network motif discovery algorithm include the fact that the number of network motifs exponentially increases with increases in

3 network size and that there is an absence of the downward closure property in many networks [5]. These difficulties make it so that full enumeration of subgraphs can be extremely time consuming and may require large amounts of computational power. In order to study a network in context, randomized networks are used for comparison. These randomized networks are developed in such a way that their structure is random and thus not a result of any constraints or significant design elements [6]. This allows for aspects of the network in question (such as motifs) to be compared to the randomized networks to see whether they are a result of the intrinsic properties of the network or if they are indicative of real world functional constraints and/or design principles due to selection [2]. Varying parameters are used to describe the network motif occurrences and to determine whether they are statistically significant. The frequency of a motif is the number of times the motif appears in a network [7]. Different tools use different restrictions for counting frequencies based off of whether or not overlapping of nodes and edges is allowed. Some motif discovery tools ask for a user input that sets a threshold of how many motif occurrences are required for a motif to be considered frequent. To determine whether a motif is significant in a specific network and not just present due to intrinsic properties of the network, a uniqueness factor is sometimes applied [7]. If the network in question has a motif with a higher frequency than in a certain amount of random networks (threshold set by the user) then the motif is considered to be unique. In addition, statistical numbers such as z-score and p- value are often used to determine whether the frequency of a motif is statistically significant. The z-score is calculated by finding the difference in the frequency of the motif in the specific target network and the mean frequency of the motif in the randomized networks divided by the standard deviation of the frequency in the randomized networks [3]. The higher the z-score corresponds with the motif being more overrepresented. The z-score threshold over which the motifs are considered overrepresented is often 2. The p-value looks at whether the probability that the number of times a motif appears in a randomized network is equal to or greater than the number of times the motif is present in the network in question [2]. The lower the p-value means the more significant the motif. The threshold under which the p-value must be to be considered significant is commonly All of these parameters are important for setting standards to help distinguish between which subgraphs are overrepresented and which are not. Multiple algorithms and tools have been developed, each with different advantages and disadvantages, to identify network motifs. Network motif discovery is a crucial problem to solve in order to gain further insights into the important characteristics, functions, and inner workings of systems with networks. Therefore it has been the goal of many researchers to develop ways to efficiently identify network motifs. It is our goal in this paper to summarize, collect experimental data about, and analyze the various network motif discovery tools and algorithms that have been created.

4 2 Methodology Major aspects of motif discovery tools that must be considered when examining these tools are the methods of determining frequencies of motifs, the ways of developing randomized networks, the algorithms used for full enumeration, the strategies of identifying motifs without full enumeration,, and the data sets the tools can be applied to. All of these factors must be considered when developing a motif discovery tool and each tool uses different variations and combinations of all of these factors. I) Restrictions for Determining Frequencies An important aspect of each tool and algorithm is the method of determining motif frequencies. Frequency refers to the number of matches of a motif in a network [8]. Different methods for determining motif frequency depend on restrictions of how network elements are shared [2]. Different methods lead to different frequency results. There are three types of frequency concepts: (1) F1, (2) F2, and (3) F3. F1 allows overlapping of nodes and edges arbitrarily. Only node overlapping is allowed in concept F2. F3 does not allow for any overlapping of nodes or edges [3]. The method of determining motif frequencies is very important because motif frequency is used in the calculation of statistical elements such as z-score and p-value. [7]. Numerous tools use these statistical parameters to indicate whether or not motifs are statistically significant. It is important to note which frequency concept is used by which tool. The different restrictions upheld by each concept cause the frequencies calculated by the different concepts to be significantly different [2]. Sometimes it is also important to use tools with certain frequency concepts for specific networks. In some networks the overlapping of edges and nodes may be an important aspect of motifs whereas sometimes it might be only relevant to find motifs that do not overlap at all. Thus, paying attention to the frequency concepts is very important when using and designing motif discovery tools. II) Random network generation As mentioned previously, random networks are essential in network motif discovery because they are needed for comparison with the input network. Subgraph occurrences in the input network are compared to those in the random networks to see if differences are present which would indicate a significant motif. Multiple methods are used to generate randomized networks. Common randomization techniques include the switching method, the stubs method, and the go with the winners algorithm [6]. (1) The switching method implements the Markov chain method. It involves using the nodes of the input network, preserving their degree in and degree out, and switching the edges between the nodes numerous times to obtain randomization. The draw back to the switching method is that the time required for proper mixing is not known for the Markov chain method. [9] (2) The stubs method keeps the same in and out degrees of the nodes of the input network. Each node has stubs that are in-stubs (for all in degrees of the node) and out-stubs (for all out degrees of the

5 node). A matching algorithm is used to put all of the in-stubs in a pair with an out-stub. Theoretically this creates random edges between nodes while still preserving the in and out degrees of all nodes. The method discards any self edges or multiple edges. This becomes a problem because numerous real world networks have nodes with degrees such that there will most likely be more than one edge between two nodes. [9] (3) The go with the winners algorithm starts with multiple graphs. It then carries out the stubs method. To compensate for the graphs that are eliminated (due to self or multiple edges) the algorithm periodically copies all of its graphs which results in the number of graphs being constant on average. Once all stubs have been linked the process stops and a random network is chosen from all the remaining graphs. This algorithm can be very slow, especially with large scale networks. [9] The switching algorithm has been found to be the ideal method for random graph generation and is often used in network motif discovery tools. III) Classification of tools based on algorithms: Network centric tools require that the entire network and all subgraphs have to be enumerated. On the other hand, non-network centric tools (motif centric tools) allow for a single specific motif to be examined [10]. Major network discovery tools have been classified into these two groups and segregated further within each group based on aspects of their algorithms. NETWORK CENTRIC ALGORITHMS: Algorithms that use trees: NeMoFINDER NeMoFINDER is a motif discovery algorithm used specifically to find motifs in PPI networks [5]. This tool uses trees to partition the network in question. It uses concept F1. By allowing for arbitrary node and edge overlap it ensures uniqueness and is not downward closed. The required inputs for the algorithm include maximum motif size, number of randomized networks, the target PPI network, frequency threshold, and uniqueness [7]. NeMoFINDER generates randomized networks via the switching method [5] and the Apriori algorithm is used for subgraph frequency determination [8]. The algorithm can be divided into three main steps [5]. The first step entails finding all occurrences of a 2 sized tree and subsequently larger sized trees up until all size trees from 2 to k have been found. This ensures all of the repeated subgraphs have been found. If the number of k sized trees is larger than a user given frequency threshold then the subgraph in question is considered statistically significant and is

6 designated as a motif. Step 2 involves the size k trees being used to partition the graph. Thus each section of the graph contains trees of size 2 through k. In step 3, for each size k tree a subgraph is generated with k-1 edges and k nodes. A new set of subgraphs is then generated by combing each k-1 edge subgraph with a size k tree resulting in subgraphs with k edges. This new set contains subgraphs that are all candidates for being a motif. The number of occurrences of each candidate subgraph is found in the partition of the network by the k sized trees. If the occurrence is more than a given threshold then the subgraph is added to a set of repeated subgraphs. These subgraphs are then combined with novel generated subgraphs to find k+1 sized subgraphs. This process continues until all repeated subgraphs of size 2 through k are detected. Because the network is partitioned by trees the algorithm is consequently scalable. [5] NeMoFINDER also uses the concept of graph cousins to generate possible motif candidates [5]. However, graph cousin generation can be ambiguous and symmetry breaking is not used in the NeMoFINDER algorithm resulting in the discovery of redundant subgraphs [8]. Performance studies have been carried out on NeMoFINDER. This was done by ranking PPI network motifs of different sizes by frequency, uniqueness, and individual motif size. Motif strengths were generated and scored from these parameters. The scores were compared by function homogeneity, localization coherence, and gene expression correlation. Reliability of each motif was determined using this scoring method. [5] Kavosh Kavosh is a network motif discovery tool that uses trees to enable the detection of motifs. It can handle both directed and undirected networks [3]. There are four main parts of the Kavosh algorithm: (1) enumeration, (2) classification, (3) random graph generation, and (4) motif identification [3]. Enumeration looks at the network in question and finds all subgraphs of given sizes (also preformed on random graphs). This is achieved by selecting one node and all the combinations of connections with the neighboring nodes via tree representation. The first level of the tree is the selected node, the second level consists of the neighbors of this node, the third level of the tree is made up of the neighbors of the previous neighbors, and so on. If a k sized graph is being searched for, all compositions of size k-1 are found. The revolving door algorithm is used to go through all of the nodes at each level ascending from the bottom level and labeling each node as visited. This ensures that no tree or subgraph is constructed more than once. The algorithm finds all of the combinations of the nodes including subgraphs with nodes in the same level (i.e. a subgraph size 3 can be made up of an initial node and two neighbors or an initial node, a neighbor, and a neighbor of a neighbor). After these motifs are found, the node is removed and a new node is used. This process is also carried out on the randomized networks to find the frequency and identify the motifs in the randomized cases for comparison. Constraints are placed on the construction of these trees (some explained above) so that each specific tree is only generated once. This avoids redundancy and extra computational time.

7 Classification involves placing the subgraphs found in the enumeration step into isomorphic classes. This is done using the NAUTY algorithm [3]. Random graphs are generated in Kavosh using the switching method. The frequencies of subgraphs in the input network are compared to frequencies in the random networks. Subgraphs are dubbed as motifs if frequencies are higher in the input network than they are in the random networks. Parameters often used include p-value, frequency level, and z scores. [3] MA Visto MA Visto is able to consider all 3 frequency concepts when enumerating subgraphs [2]. This allows for an effective visual representation of the frequency concepts. MA Visto finds all of the subgraphs of a certain size and finds the frequencies for each subgraph using all three frequency concepts. The flexible pattern finder (FPF) algorithm is used by MA Visto to search for the motifs [13]. The FPF algorithm looks at patterns that are of the same size as the given target size (i.e. looks for all patterns of size 4 when looking for size 4 motifs). As the size of the pattern increases the number of possible patterns of that size also increases meaning that finding all of the patterns of one size would be computationally costly. A tree is constructed with each level of the tree is comprised of patterns of a certain size up until a level where the desired size is reached. In order to avoid generating all the possible patterns of a given size, FPF eliminates patterns that are not supported by (cannot be mapped to) the input network as soon as it appears in the tree. This stops a pattern from being generated as soon as it is seen which allows for elimination of unnecessary branches [14]. Also, since frequencies of patterns decreases with increasing pattern size, if an intermediate (and smaller) sized pattern is found to have a smaller frequency than patterns of the desired (and larger) size the branch of the tree is discontinued because it will never have a high enough frequency [13]. MA Visto uses the frequencies of the subgraphs in the input network as well as the frequencies in the randomized networks in order to find z scores and p values for the different motifs [2]. Probabilistic algorithms: Full enumeration can be computationally costly and require a lot of time. As the size of the subgraphs being searched for increases the possible isomorphic types increases. This makes exhaustive enumeration algorithms extremely time consuming and costly because they need to find the frequencies of each different isomorphic graph of all sizes in both the input network and the randomized networks. Kashtan et al developed a sampling method for subgraph counting which is a probabilistic algorithm [11]. This algorithm deals with estimating subgraph frequencies by sampling subgraphs. This is less time consuming than full enumeration. The algorithm makes it so that runtime does not increase asymptotically as network size increases. With Kashtan s sampling algorithm larger networks than

8 full enumeration algorithms can handle are able to be analyzed and larger motifs can be identified. A random n-sized subgraph is found in this sampling algorithm. An edge is picked randomly and its neighbors are all made into candidates to be the next edge. One of the candidates is picked at random and its neighbors are the new candidates. This process continues with one edge from all the neighbors being chosen randomly to be the next edge until a subgraph of size n is created. All of the nodes from these edges and all the edges that connect these nodes make up the sampled subgraph. [11] An ordered set of n-1 edges needs to be picked for an n sized subgraph to be found. The probabilities of getting these ordered pairs is used to find the probability that an n sized subgraph will be sampled. From this and a few additional calculations the estimated subgraph concentrations are found. [11] A major problem with Kashtan et al s method is that it has bias sampling [8]. This means that each subgraph does not have a uniform probability of being sampled [10]. Therefore, occurrences of a subgraph cannot be impartially estimated [8]. The algorithm tries to take this into account by weighting each subgraph with a value of 1/(probability of the subgraph being chosen) [10]. Other tools that use probabilistic sampling algorithms as alternatives to full enumeration are MFinder and FANMOD. MFinder uses a bias algorithm like that of Kashtan s while FANMOD uses an improved method that achieves unbiased sampling [10]. MFinder MFinder is capable of analyzing directed and undirected networks [2]. Concept F1 is used when finding the frequency for the subgraphs. Also, concept F3 is applied in order to determine a lower bound for uniqueness value [2]. MFinder fully enumerates subgraphs by starting with an edge. All motifs of different sizes are found that contain this edge [6]. Once a subset of nodes is found that is connected to the initial edge the subset is added to a hash table so it cannot be revisited [10]. When no more subgraphs can be identified the hash tables are cleared and the process begins again with a different edge. This is repeated until all edges have been used. Because a specific subgraph will be counted each time one of its edges is examined there is redundancy and number of times the subgraph will be counted is a multiple of its edge number [6]. Therefore, the count for a subgraph must be divided by the number of edges in the motif. Since MFinder looks at so many motifs and has redundancy it requires large amounts of memory. This causes the runtimes to be large and makes it hard for large motifs to be searched for [10]. Therefore, MFinder uses the biased sampling method that Kashtan et al developed. FANMOD FANMOD is a tool that can be used to analyze both directed and undirected networks [2]. It is able to identify motifs of sizes 3 8. Only induced subgraphs are

9 found from FANMOD. It determines frequencies of subgraphs with concept F1 and uses z-score and p-value to deem whether or not a motif is statistically significant [2]. The full enumeration part of the FANMOD algorithm begins with one node and a list of possible vertices to which this node can be connected (i.e. the node s neighbors). Once a possible vertex is extended to it is removed from the list of possible extensions and its neighbors are added to the candidates that this vertex can be connected to next. Different combinations of possible extensions are chosen in order to form subgraphs of different sizes. Since the list of possible extensions is constantly changed, each subgraph is only enumerated once. Like Kavosh, FANMOD uses the NAUTY algorithm to test for graph isomorphism. [10] FANMOD s alternative method uses probabilistic sampling to reduce runtimes for identifying motifs. It uses randomized enumeration algorithm known and RAND- ESU. This sampling works by changing the full enumeration algorithm so that it randomly skips subgraphs. The FANMOD sampling algorithm chooses each size k subgraph with a certain probability [12]. This means that all subgraphs have the same probability of being sampled and all samples give different subgraphs. Because of the adjustments to the Kashtan et al algorithm, FANMOD is unbiased and results in all subgraphs having the same probability of being chosen [10]. MOTIF CENTRIC ALGORITHMS: Mapping algorithms: Grochow Grochow is a motif centric tool that can be applied to directed and undirected networks [15]. The algorithm progressively maps a specific target subgraph onto a global network. By doing this Grochow checks for isomorphism as it maps the query graph onto the network [10]. This eliminates the extra time and memory it would take to check for isomorphism and avoids full enumeration. The mapping algorithm goes through the query subgraph node by node in order to map the subgraph onto the network. A node will be specified and the tool will find all the candidate nodes, nodes in the network that have the same characteristics (i.e. same degree and neighbors with correct degrees). As the algorithm goes through each node in the query subgraph possible matches in the network are found while others that are not exactly the same are eliminated once any inconsistency is found. This mapping ensures that only exact isomorphic subgraphs in the network are detected. [15] Grochow uses a method known as symmetry breaking to make sure that each subgraph is only mapped to once in order to reduce run time and redundancy [15]. Graphs that are self-isomorphic are said to have the same symmetries. Nodes that can be mapped to one another are defined as equivalent. Therefore, the nodes in a specific subgraph can be separated into equivalence classes. The Grochow algorithm ensures that mapping begins only from one representative of each equivalence class so that

10 multiple mappings are not carried out beginning with equivalent nodes. Also, restrictions are added to the labeling of each vertex so that symmetry is avoided. [10] MODA MODA utilizes a pattern growth algorithm that takes in a query graph [8]. It uses information based on previously found query graphs. By maintaining information about formerly found mappings, it reduces computational time. It uses the concept of expansion trees, which are similar to pattern trees used in MA Visto, but applicable to the frequency concept F1. The expansion tree starts with a root node at level 0. Then it finds all minimally connected size-k trees of the root node, which is level 1. It then adds an edge at each level until a complete graph is obtained. The first level of the tree therefore represents the number of non-isomorphic trees. Each node of the expansion tree can be represented by an adjacency matrix consisting of 0 s and 1 s. For undirected graphs, which are symmetric, only the numbers below the main diagonal are stored. Expansion trees are stored for every size k-graph. They are a static data structure which can be stored and retrieved and do not have to be found each time. [8] The mapping algorithm takes the query graph from the first level of the expansion tree, which is composed of trees themselves, and maps them onto the network. It holds onto their calculated frequencies. The frequencies at the second level of the expansion tree can be found with respect to the first level of the expansion tree, which are their parent nodes. MODA utilizes the symmetry-breaking conditions of the Grochow algorithm. It only uses the Grochow algorithm for the first level of the expansion tree. All the information the algorithm finds about the first level can be exploited to find the frequencies of the all the next levels which are supergraphs of the first level. By exploiting information of formerly found mappings, MODA can be used to reduce computational costs. Additionally, MODA has a sampling method that can be used to reduce runtimes with the sacrifice of accuracy. [8] 3 Experiments and Analysis Data from experiments on runtimes of various algorithms are presented here along with MA Visto frequency concept data and experimental motif results from PPI networks. I) Runtimes Experiments: Many experimental runs have been carried out to determine runtimes for network motif discovery tools. As shown in Table 1, Omidi et al compared runtimes of MODA, MFinder, Grochow, FPF (algorithm used in MA Visto), and FANMOD [8].

11 Searches were carried out for subgraphs size 3 9. In Table 2 is shown Chen et al s comparison of the runtimes of NeMoFINDER and FPF (algorithm used in Ma Visto) [5]. Kavosh et al compared the runtimes for Kavosh, FANMOD, MA Visto, and MFinder for subgraphs between size 3 and size 10 as shown in Table 3 [3]. Table 1. Data from Figure 7 from Omidi et al [8] showing runtimes (in seconds) for size 3-9 subgraphs in E. coli transcription network. Tools compared include MODA, MFinder, Grochow, FPF algorithm, FANMOD. [8] Mfinder x x10 3 FPF(MA Visto) x x x10 4 Fanmod x x10 2 MODA x x x x10 4 Grochow x x x10 4 Table 2. Data from Figure 11 from Chen et al [5] showing runtimes (in seconds) for size 3-13 subgraphs in Utez PPI network. Tools compared include NeMoFINDER, FPF algorithm, sampling algorithm, and full enumeration algorithm FPF 2.2x x x x x x x10 6 NeMo FINDER 2.2x x x x x x x x x x x10 4

12 Table 3. Data compiled from Table 4 from Kashani et al [3]. Runtimes (in seconds) for identifying subgraphs in yeast S. cereviciae transcription network of sizes between 3 and 10 are shown. Tools compared include Kavosh, FANMOD, MA Visto, and MFinder Kavosh 3.0x x x x x x x10 6 FANMOD 8.1x x x x x10 3 MA VISTO 1.4x10 4 (FPF) Mfinder 3.1x x x10 4 Kashtan et al compared the times it took their probabilistic sampling method to the time it took for full enumeration to complete while identifying motifs in different sized networks (Figure 1) [11]. The network sizes for which these comparisons were made were between 1000 and 8000 nodes. Figure 1. Figure 4 from Kashtan et al [11] showing runtimes for different network sizes (on a log-log scale). Kashtan s probabilistic algorithm and a full enumeration algorithm were compared.

13 Runtimes were found for MA Visto when finding subgraphs of size 3-4 and 4-5 (Table 4). Networks analyzed included E. coli transcription network and yeast transcription network [16]. Table 4. Examples of runtimes for MA Visto analyzing E. coli and yeast transcription networks [16]. Subgraphs of size 3-4 and 4-5 were searched for. For each run 100 randomized networks were generated. 3-4 Nodes 4-5 Nodes E. Coli transcription network (418 nodes, 519 edges) Yeast transcription Network (688 nodes, 1079 edges) >25200 Runtimes were found for FANMOD when finding subgraphs of size 3 7 (Table 5). A protein structure network [16], PPI network [17], yeast transcription network [16], and E. coli transcription network [16] were used. Table 5. Runtimes (in seconds) for FANMOD tool finding subgraphs of size 3 7 for networks including protein structure [16], PPI [17], yeast transcription[16], and E. coli transcription [16] random networks were generated in all of the runs. Protein Structure (Undirected, 96 nodes, 213 edges) Protein-Protein Interaction (Undirected, 4470 nodes, 3886 edges) Yeast transcription Network (Directed, 689 nodes, 1078 edges) E. Coli transcription Network (Directed, (418 nodes, 519 edges)

14 Omidi et al [8] and Kashani et al [3] both did experimental runs on FANMOD, MA Visto (or the FPF algorithm), and MFinder. Kashani et al measured the runtimes of the tools to fully enumerate the input network and to generate and enumerate 100 random networks. Omidi et al measured the runtimes only for full enumeration of the input network. Table 6. Data from Figure 7. Of Omidi et al [8] and Table 4 of Kashani et al [3]. Runtimes (in seconds) for motif searches done by FANMOD, MA Visto (FPF algorithm), and MFinder. Times are given for runs that fully enumerated the input network and 100 random networks with a 3.2 GHz AMD Opteron processor and 8 GB RAM (shown with no shading) [3] and for runs that only fully enumerated the input network with IBM R50e laptop with Intel Pentium 1.8 GHz and 1 GB Ram (shown with grey shading) [8] FanMod MA Visto/FPF MFinder With 100 random networks With 0 random networks With 100 random networks With 0 random networks With 100 random networks With 0 random networks 8.1x x x x x x x x x x x x x x x x10 3 Analysis and Limitations: Experimental runs carried out by Omidi et al [8], Chen et al [5], and Kavosh et al [3] allow for comparisons of a variety of network motif discovery tools. A consistent trend seen in these experiments is the inability of MA Visto and MFinder to handle subgraphs as large as the other tools. Often they were only able to find subgraphs of size 5 or less and usually had runtimes larger than most other tools. FANMOD was able to identify motifs up to size 8 but NeMoFINDER, Kavosh, MODA, and Grochow were seen to be able to deal with subrgaphs larger than 8. Despite the ability of some tools to handle subgraphs larger than 8 it can be seen that the runtimes for these experiments are very large. Overall, it can be concluded that the current motif discovery tools are very limited in the size of subgraphs that they can

15 handle in reasonable amounts of time. NeMoFINDER shows promise in being able to search for larger sized motifs. Another limitation for motif discovery tools is the size of the network able to be analyzed. As seen in Kashtan et al s [11] experiment the runtimes for motif searches increases exponentially as network sizes increase. All of the tools discussed have difficulty searching larger networks (in the thousands) in reasonable amounts of time. Therefore, networks such as PPI and most social networks that have thousands of nodes are difficult to fully enumerate in a reasonable time. Kashtan s probabilistic sampling method has shown to produce a fairly consistent runtime with increases in network size. The sampling method takes significantly less time than exhaustive enumeration as network size increases. However, as discussed above, Kashtan s sampling algorithm has bias and results in loss of accuracy. The number of randomly generated networks is also a limitation to be considered. Random networks are used in the tools for comparison with the input network and multiple of them are needed for an accurate comparison. However, runtimes increase with the number of random networks that need to be generated and searched for motifs. Omidi et al s [8] experiments only involved full enumeration of the input network where as Kashani et al s [3] fully enumerated the input network and generated and enumerated 100 random networks. Despite the fact that Kashani et al used a computer with greater computational powers, the runtimes for Kashani et al s experiments were significantly larger than those for Omidi et al s experiment. Therefore, even with a more powerful computer, the generation and enumeration of many random networks adds on significant time to searches. II) Frequency Concepts Experiments: MA Visto was used to compare the frequency results for concepts F1, F2, and F3 for the same subgraph within the same network (Table 6, Table 7). Table 6. Values for concept F1, F2, and F3 for each size 3 motif found by MA Visto in the E. coli gene transcription network [16]. F1 F2 F

16 Table 7. Values for concept F1, F2, and F3 for each size 3 motif found by MA Visto in the yeast gene transcription network [16] F1 F2 F Analysis and Limitations: MA Visto s ability to calculate the frequency of motifs for all three frequency concepts, F1, F2, and F3 allows for comparisons of discrepancies in each case. Experimental runs on two different data sets (E. coli transcription and yeast transcription) demonstrate that there can be differences in the frequencies calculated by different concepts. When a small amount of a certain subgraph is in the network then the discrepancy is not large but for more frequent subgraphs the frequency concept results vary substantially.

17 III) Protein-Protein Interaction Network Motifs Experiments: An E. coli PPI network [17] was analyzed by FANMOD. Motifs of size 3 and 4 were found (Figure 2). Figure 2. Motifs of size 3 and 4 from the E. coli PPI network [17] as identified by FANMOD. Z-scores for each motif are shown. Analysis and Limitations: Although NeMoFINDER had success identifying larger sized motifs other tools struggled to find motifs larger than size 5. All tools had issues with identifying motifs in a reasonable amount of time. Due to the large size of PPI networks they are harder to analyze than smaller networks such as E. coli gene transcription networks. Preliminary findings from runs done by FANMOD show the size 3 and 4 motifs found in an E. coli PPI network. The motifs and their z scores are listed. For both the size 3 and size 4 motifs the most frequent motif (motif with the largest z score) was that of the complete graph, a graph with an edge between all pairs of nodes. Previous studies have also found complete graphs with high frequencies in PPI networks [18]. Further studies involving PPI networks may support these findings further. Although some methods have been used to predict PPI network motifs [19] there is still much about the biological significance of the PPI motifs to be explored. 4 Conclusion Increasing interest in network motifs and emphasis on motif significance has led to an ongoing process of network motif discovery tool development and continual revision of previous work. As wet lab techniques have become more advanced, increasing amounts of information about different biological systems and organisms have been collected. This has allowed for databases to be developed that provide full information sets concerning networks. The study of networks provides insights into how organisms and systems work as a whole. Network motifs are the building blocks of networks [1] and are often biologically significant which makes the identification of the motifs extremely important in the search for the understanding of networks.

18 Researchers have struggled to overcome the difficulties in developing network motif discovery tools. The graph isomorphism problem makes it so that finding all the motifs in different networks is highly unreasonable [2]. Also, dealing with large networks, discovering large motifs, and generating and searching numerous random networks are all issues that cause network motif discovery to be extremely computationally costly. Although these factors are costly, they are also integral parts of the network motif discovery process and in understanding networks. Results of various experimental runs carried out using different network motif discovery tools have helped to determine which tools are more efficient and useful. Furthermore, these comparisons help to highlight which algorithmic methods improve tool performance. Overall, MA Visto and MFinder were computationally costly and had large runtimes in comparison to other tools searching the same network for the same subgraph sizes. MFinder s algorithm requires full enumeration and exhaustively searches using a technique that counts the same subgraph multiple times [6]. This redundancy contributes to increased computational cost and runtimes. MA Visto calculates the frequencies for all three frequency concepts which requires more time than only doing searches with one frequency concept [13]. Other tools were found to perform better than both MFinder and MA Visto; all had better runtimes and were able to search for larger subgraphs than either MA Visto or MFinder. FANMOD is a well-established and well-known tool which performs relatively well partially due to its use of the NAUTY algorithm to test for graph isomorphism. This, along with the fact that FANMOD s algorithm ensures that each subgraph is only counted once, makes full enumeration with FANMOD relatively reasonable [10]. FANMOD also uses an unbias sampling algorithm that helps to reduce runtimes in comparison to full enumeration [12]. FANMOD has been shown to have smaller runtimes than Grochow and MODA but it can only search for subgraphs of size 8. Kavosh, like FANMOD, uses the NAUTY algorithm and has also shown in experimental runs that it has relatively good runtimes. The restrictions put on tree structures formed while searching for motifs design the algorithm to only enumerate each subgraph once [3]. This along with the use of the NAUTY algorithm results in the Kavosh tool having relatively good search efficiency. Grochow achieves some efficiency due to its symmetry breaking techniques. With symmetry breaking, Grochow is able to reduce redundant counts of subgraphs. The algorithm s ability to eliminate the subgraphs that are being mapped as soon as it is discovered that they do not match any patterns in the input network also helps boost efficiency. This prevents irrelevant subgraphs from being generated which saves time and computational power. This also ensures that the subgraphs identified are isomorphic to the subgraph in question which means that an isomorphic test is not required [15]. MODA uses some of the techniques from Grochow such as symmetry breaking and uses the actual Grochow algorithm to find frequencies of some of the patterns in question [8]. MODA s algorithm also uses expansion trees to build patterns that make subgraphs. These expansion trees and the mapping information for the patterns are stored so that redundancy does not occur and computational time is saved. NeMoFinder has been found to be able to identify meso-scale motifs (specifically, up to size 12) although it is limited to analyzing PPI networks and thus only undirected networks [5]. By partitioning networks into sets of graphs with

19 repeated trees the algorithm is more efficient than some other tools. NeMoFinder is different than many other network discovery tools because of its use of graph cousins to generate possible subgraphs and to determine subgraph frequencies. Although graph cousins allow for generation of candidate graphs their use also causes redundancy which adds more time to the runs [7]. The good and bad aspects of each tool are important to take note of so that algorithmic shortcomings can be avoided in future tools while successful aspects can be capitalized on. Sampling techniques have shown promise in reduction of runtimes and should be considered when developing algorithms (along with the sacrifice in accuracy). Also, the concept frequency that each tool uses is important to take note of because, as seen from experimental runs, the frequencies vary greatly between the different concepts. Network motif discovery has proved to be a very complex task. Although many tools, algorithms and methods have been created for finding network motifs, further improvements and new developments are a necessity in order to increase motif discovery capabilities. Future directions: These further directions include: (1) the ability to intelligently search respective networks for possible biologically relevant motifs that have been identified as significant sub-graphs from experimental runs and literature review, and (2) the idea of employing modern computing infrastructure to search concurrently for network motifs that are larger than those that presently available tools can search. Acknowledgements: We would like to thank the National Science Foundation for providing the funding for the Bio-Grid REU program and making this research possible. We would also like to thank the University of Connecticut for hosting this program and especially Dr. Chun- His Huang for advising and mentoring.

20 5 References 1. Milo, R., Shen-Orr S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network Motifs: Simple Building Blocks of Complex Networks. Science. 298, (2002) 2. Schwobbermeyer, H.: Network Motifs. In Junker, B., Schreiber, F. (eds.) Analysis of Biological Networks. Pp NJ: John Wiley & Sons, Inc (2008) 3. Kashani, Z., Ahrabian, H., Elahi, E., Nowzari-Dalini, A., Ansari, E., Asadi, S., Mohammadi, S., Schreiber, F., Masoudi-Nejad, A.: Kavosh: a new algorithm for finding network motifs. BMC Bioinf. 10:318 (2009). 4. Fortin, S.: The Graph Isomorphism Problem. University of Alberta: Dept of Computing Science, Alberta (1996) 5. Chen, J., Hsu, M., Lee, L., Ng, SK.: NeMofinder:. genome-wide proteinprotein interactions with meso-scale network motifs. KDD (2006). 6. Kashtan, N., Itzkovitz, S., Milo, R., Alon, U.: Network Motif Detection Tool: mfinder Tool Guide. Weizmann Institute of Science: Depts of Mol Cell Bio and Comp Sci & Applied Math, Rehovot, Israel ( ) 7. Ciriello, G., Guerr,a C.: A review on models and algorithms for motif discovery in protein-protein interaction networks. Briefings in Functional Genomics and Proteomics Advance Access. (2008) 8. Omidi, S., Schreiber, F., Masoudi-Nejad, A.: MODA: An efficient algorithm for network motif discovery in biological networks. Genes Genet. Syst. 84, (2009) 9. Milo, R., Kashtan, N., Itzkovitz, S., Newman, M., Alon, U.: Uniform generation of random graphs with arbitrary degree sequences. (2004) 10. Ribeiro, P., Silva, F., Kaiser, M.: Strategies for network motifs discovery. IEEE International Conference (2009) 11. Kashtan, N., Itzkovitz, S., Milo, R., Alon, U.: Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics. 20, (2004) 12. Wernicke, S.: A Faster Algorithm for Detecting Network Motifs. In Casadio, R., Myers, G. (eds.) Algorithms in Bioinformatics: 5 th international workshop. pp Springer (2005) 13. Schreiber, F., Schwobbermeyer, F. MAVisto: a tool for the exploration of network motifs. Bioinformatics Applications Note. 21, (2005) 14. Schreiber, F., Schwobbermeyer, H.: Frequency Concepts and Pattern Detection for the Analysis of Motifs in Networks. Trans. On Comput. Syst. Biol. III (2005) 15. Grochow. J., Kellis, M.: Network motif discovery using subgraph enumeration and symmetry-breaking. Recomb (2007) 16. Collections of complex networks, Bacteriome,

21 18. Przulj, N., Wigle, D., Jurisica, I.: Functional topology in a network of protein interactions. Bioinformatics. 20, (2004) 19. Albert, I., Albert, R.: Conserved network motifs allow protein-protein interaction predication. Bioinformatics. 20, (2004)

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Graph Pattern Analysis with PatternGravisto

Graph Pattern Analysis with PatternGravisto Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 9, no. 1, pp. 19 29 (2005) Graph Pattern Analysis with PatternGravisto Christian Klukas Dirk Koschützki Falk Schreiber Institute of Plant

More information

Graph theoretic approach to analyze amino acid network

Graph theoretic approach to analyze amino acid network Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to

More information

BIOINFORMATICS. Biomolecular Network Motif Counting and Discovery by Color Coding

BIOINFORMATICS. Biomolecular Network Motif Counting and Discovery by Color Coding BIOINFORMATICS Vol. 00 no. 00 2008 Pages 9 Biomolecular Network Motif Counting and Discovery by Color Coding Noga Alon, Phuong Dao 2, Iman Hajirasouliha 2, Fereydoun Hormozdiari 2, and S. Cenk Sahinalp

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

More information

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34 Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,

More information

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Understanding the dynamics and function of cellular networks

Understanding the dynamics and function of cellular networks Understanding the dynamics and function of cellular networks Cells are complex systems functionally diverse elements diverse interactions that form networks signal transduction-, gene regulatory-, metabolic-

More information

Small Maximal Independent Sets and Faster Exact Graph Coloring

Small Maximal Independent Sets and Faster Exact Graph Coloring Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected

More information

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti

More information

Mining Social-Network Graphs

Mining Social-Network Graphs 342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is

More information

Feed Forward Loops in Biological Systems

Feed Forward Loops in Biological Systems Feed Forward Loops in Biological Systems Dr. M. Vijayalakshmi School of Chemical and Biotechnology SASTRA University Joint Initiative of IITs and IISc Funded by MHRD Page 1 of 7 Table of Contents 1 INTRODUCTION...

More information

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu ABSTRACT This

More information

Implementing Graph Pattern Mining for Big Data in the Cloud

Implementing Graph Pattern Mining for Big Data in the Cloud Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya Ojah.chandana@gmail.com

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

Exponential time algorithms for graph coloring

Exponential time algorithms for graph coloring Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].

More information

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Three Effective Top-Down Clustering Algorithms for Location Database Systems Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

More information

High-dimensional labeled data analysis with Gabriel graphs

High-dimensional labeled data analysis with Gabriel graphs High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use

More information

Speed Performance Improvement of Vehicle Blob Tracking System

Speed Performance Improvement of Vehicle Blob Tracking System Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu, nevatia@usc.edu Abstract. A speed

More information

Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks

Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks Ruoming Jin, Scott McCallen Department of Computer Science,Kent State University, Kent, OH, 44241 {jin,smccalle}@cs.kent.edu

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

The Open University s repository of research publications and other research outputs

The Open University s repository of research publications and other research outputs Open Research Online The Open University s repository of research publications and other research outputs The degree-diameter problem for circulant graphs of degree 8 and 9 Journal Article How to cite:

More information

A Performance Comparison of Five Algorithms for Graph Isomorphism

A Performance Comparison of Five Algorithms for Graph Isomorphism A Performance Comparison of Five Algorithms for Graph Isomorphism P. Foggia, C.Sansone, M. Vento Dipartimento di Informatica e Sistemistica Via Claudio, 21 - I 80125 - Napoli, Italy {foggiapa, carlosan,

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Kousha Etessami U. of Edinburgh, UK Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 6) 1 / 13 Overview Graphs and Graph

More information

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS Kyoungjin Park Alper Yilmaz Photogrammetric and Computer Vision Lab Ohio State University park.764@osu.edu yilmaz.15@osu.edu ABSTRACT Depending

More information

New Matrix Approach to Improve Apriori Algorithm

New Matrix Approach to Improve Apriori Algorithm New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, rehab.alwan@majancolleg.edu.om Associate

More information

KEYWORD SEARCH IN RELATIONAL DATABASES

KEYWORD SEARCH IN RELATIONAL DATABASES KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

A New Marketing Channel Management Strategy Based on Frequent Subtree Mining

A New Marketing Channel Management Strategy Based on Frequent Subtree Mining A New Marketing Channel Management Strategy Based on Frequent Subtree Mining Daoping Wang Peng Gao School of Economics and Management University of Science and Technology Beijing ABSTRACT For most manufacturers,

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM) 1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM) 2. Gene regulation tools and methods Regulatory sequences and motif discovery TF binding sites, microrna target prediction

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Graph Theory Problems and Solutions

Graph Theory Problems and Solutions raph Theory Problems and Solutions Tom Davis tomrdavis@earthlink.net http://www.geometer.org/mathcircles November, 005 Problems. Prove that the sum of the degrees of the vertices of any finite graph is

More information

CAD Algorithms. P and NP

CAD Algorithms. P and NP CAD Algorithms The Classes P and NP Mohammad Tehranipoor ECE Department 6 September 2010 1 P and NP P and NP are two families of problems. P is a class which contains all of the problems we solve using

More information

How To Balance In Cloud Computing

How To Balance In Cloud Computing A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi johnpm12@gmail.com Yedhu Sastri Dept. of IT, RSET,

More information

A Comparison of General Approaches to Multiprocessor Scheduling

A Comparison of General Approaches to Multiprocessor Scheduling A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University

More information

An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups Abstract Yan Shen 1, Bao Wu 2* 3 1 Hangzhou Normal University,

More information

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Journal of Al-Nahrain University Vol.15 (2), June, 2012, pp.161-168 Science Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Manal F. Younis Computer Department, College

More information

A Sublinear Bipartiteness Tester for Bounded Degree Graphs

A Sublinear Bipartiteness Tester for Bounded Degree Graphs A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron February 5, 1998 Abstract We present a sublinear-time algorithm for testing whether a bounded degree graph is bipartite

More information

A search based Sudoku solver

A search based Sudoku solver A search based Sudoku solver Tristan Cazenave Labo IA Dept. Informatique Université Paris 8, 93526, Saint-Denis, France, cazenave@ai.univ-paris8.fr Abstract. Sudoku is a popular puzzle. In this paper we

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

How To Cluster Of Complex Systems

How To Cluster Of Complex Systems Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving

More information

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8] Code No: R05220502 Set No. 1 1. (a) Describe the performance analysis in detail. (b) Show that f 1 (n)+f 2 (n) = 0(max(g 1 (n), g 2 (n)) where f 1 (n) = 0(g 1 (n)) and f 2 (n) = 0(g 2 (n)). [8+8] 2. (a)

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

Social Media Mining. Graph Essentials

Social Media Mining. Graph Essentials Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures

More information

Competitive Analysis of On line Randomized Call Control in Cellular Networks

Competitive Analysis of On line Randomized Call Control in Cellular Networks Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising

More information

Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

More information

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

!!!#$$%&'()*+$(,%!#$%$&'()*%(+,'-*&./#-$&'(-&(0*.$#-$1(2&.3$'45 !"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:

More information

Methods for Firewall Policy Detection and Prevention

Methods for Firewall Policy Detection and Prevention Methods for Firewall Policy Detection and Prevention Hemkumar D Asst Professor Dept. of Computer science and Engineering Sharda University, Greater Noida NCR Mohit Chugh B.tech (Information Technology)

More information

Visualizing Networks: Cytoscape. Prat Thiru

Visualizing Networks: Cytoscape. Prat Thiru Visualizing Networks: Cytoscape Prat Thiru Outline Introduction to Networks Network Basics Visualization Inferences Cytoscape Demo 2 Why (Biological) Networks? 3 Networks: An Integrative Approach Zvelebil,

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation: CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm

More information

A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS

A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS Mihai Horia Zaharia, Florin Leon, Dan Galea (3) A Simulator for Load Balancing Analysis in Distributed Systems in A. Valachi, D. Galea, A. M. Florea, M. Craus (eds.) - Tehnologii informationale, Editura

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

Temporal Dynamics of Scale-Free Networks

Temporal Dynamics of Scale-Free Networks Temporal Dynamics of Scale-Free Networks Erez Shmueli, Yaniv Altshuler, and Alex Sandy Pentland MIT Media Lab {shmueli,yanival,sandy}@media.mit.edu Abstract. Many social, biological, and technological

More information

On the k-path cover problem for cacti

On the k-path cover problem for cacti On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China zeminjin@eyou.com, x.li@eyou.com Abstract In this paper we

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Comparing Methods for Identifying Transcription Factor Target Genes

Comparing Methods for Identifying Transcription Factor Target Genes Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF

More information

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh

More information

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

Intelligent Heuristic Construction with Active Learning

Intelligent Heuristic Construction with Active Learning Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field

More information

Cluster Editing And Search Tree Algorithms

Cluster Editing And Search Tree Algorithms Automated Generation of Search Tree Algorithms for Hard Graph Modification Problems Jens Gramm Jiong Guo Falk Hüffner Rolf Niedermeier Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Sand

More information

Analysis of an Artificial Hormone System (Extended abstract)

Analysis of an Artificial Hormone System (Extended abstract) c 2013. This is the author s version of the work. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purpose or for creating

More information

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining

9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining 9 Graph Mining, Social Network Analysis, and Multirelational Data Mining We have studied frequent-itemset mining in Chapter 5 and sequential-pattern mining in Section 3 of Chapter 8. Many scientific and

More information

Complex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5

Complex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5 Complex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5 Fernando Ferreira-Santos 2012 Title: Complex Network Analysis of Brain Connectivity: An Introduction Technical Report Authors:

More information

Link Prediction in Social Networks

Link Prediction in Social Networks CS378 Data Mining Final Project Report Dustin Ho : dsh544 Eric Shrewsberry : eas2389 Link Prediction in Social Networks 1. Introduction Social networks are becoming increasingly more prevalent in the daily

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

ON THE COMPLEXITY OF THE GAME OF SET. {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu

ON THE COMPLEXITY OF THE GAME OF SET. {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu ON THE COMPLEXITY OF THE GAME OF SET KAMALIKA CHAUDHURI, BRIGHTEN GODFREY, DAVID RATAJCZAK, AND HOETECK WEE {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu ABSTRACT. Set R is a card game played with a

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL G. Maria Priscilla 1 and C. P. Sumathi 2 1 S.N.R. Sons College (Autonomous), Coimbatore, India 2 SDNB Vaishnav College

More information

Chapter 6: Episode discovery process

Chapter 6: Episode discovery process Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing

More information

Time-Dependent Complex Networks:

Time-Dependent Complex Networks: Time-Dependent Complex Networks: Dynamic Centrality, Dynamic Motifs, and Cycles of Social Interaction* Dan Braha 1, 2 and Yaneer Bar-Yam 2 1 University of Massachusetts Dartmouth, MA 02747, USA http://necsi.edu/affiliates/braha/dan_braha-description.htm

More information

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,

More information

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, steen@cs.vu.nl Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information