4 Techniques for Analyzing Large Data Sets

Size: px
Start display at page:

Download "4 Techniques for Analyzing Large Data Sets"

Transcription

1 4 Techniques for Analyzing Large Data Sets Pablo A. Goloboff Contents 1 Introduction 70 2 Traditional Techniques 71 3 Composite Optima: Why Do Traditional Techniques Fail? 72 4 Techniques for Analyzing Large Data Sets Ratchet Sectorial searches Tree-fusing Tree-drifting Combined methods Minimum length: multiple trees or multiple hits? 76 5 TNT: Implementation of the New Methods 77 6 Remarks and Conclusions 78 Acknowledgments 79 References 79 1 Introduction Parsimony problems of medium or large numbers of taxa can be analyzed only by means of trial-and-error or "heuristic" methods. Traditional strategies for finding most parsimonious trees have long been in use, implemented in the programs Hennig86 [1], PAUP [2], and NONA [3]. Although successful for small and medium-sized data sets, these techniques normally fail for analyzing very large data sets, i. e., data sets with 200 or more taxa. This is because rather than simply requiring more of the same kind of work used to analyze smaller data sets, very large data sets require the use of qualitatively different techniques. The techniques described here have so far been used only for prealigned sequences, but they could be adapted for other methods of analysis, like the direct optimization method of Wheeler [4]. Methods and Tools in Biosciences and Medicine Techniques in molecular systematics and evolution, ed. by Rob DeSalle et al Birkhauser Verlag Basel/Switzerland

2 2 Traditional Techniques The two basic heuristic computational techniques for finding most parsimonious trees are wagner trees and branch-swapping. A wagner tree is a tree created by sequentially adding the taxa at the most parsimonious available branch. At each point during the addition of taxa, only part of the data are actually used. A taxon may be placed best in some part of the tree when only some taxa are present, but it may be placed best somewhere else when all the taxa are considered. Therefore, which taxa have been added determines the outcome of a wagner tree, so that different addition sequences will lead - for large data sets - to different results. Branch-swapping is a widely used technique for improving the trees produced by the wagner method. Branchswapping takes a tree and evaluates the parsimony of each of a series of branch-rearrangements (discarding, adding, or replacing the new tree if it is, respectively, worse, equal, or better than previously found trees). The number of rearrangements to complete swapping depends strongly on the number of taxa. The most widely used branch-swapping algorithm is "tree bisection reconnection" or TBR ([5], called "branch-breaking" in Hennig86; [1]). In TBR, the tree is clipped in two, and the two subtrees are rejoined in each possible way. The number of rearrangements to complete TBR increases with the cube of the number of taxa, and thus the time needed to complete TBR on a tree of twice the taxa is much more than twice the time. Thus, if a tree of 10 taxa requires x rearrangements for complete swapping, a tree of 20 taxa will require 8x, 40 taxa will require 50x and 80 taxa will require 400x. Because of special short-cuts, which allow deducing tree length for rearrangements without unnecessary calculations (see [6, 7] for basic descriptions, and [8], for a description of techniques for multi-character optimization), the rearrangements for larger trees can in many cases be evaluated more quickly than the rearrangements for smaller trees. Therefore, the time for swapping increases in those cases with less than the cube of the number of taxa (although it is still more than the square). In implementations which do not (or cannot) use some of these short-cuts, the time to complete TBR may well increase with the cube of the number of taxa (the use of some of the techniques described here, like sectorial searches and tree-fusing, would be even more beneficial under those circumstances). For even relatively small data sets (i. e., 30 or 40 taxa), TBR may be unable, given some starting trees, to find the most parsimonious trees. In computer science, this is known as the problem of local optima (known in systematics as the problem of "islands" of trees; [9]) This is easily visualized by thinking of the parsimony of the trees as a "landscape" with peaks and valleys. The goal of the analysis is to get to the highest possible peak; this is done by taking a series of "steps" in several possible directions, going back if the step took us to a lower elevation, continuing from the new point if the step took us higher. Note that if the "steps" with which the swapping algorithm "walks" in this landscape are too

3 72 Pablo A. Goloboff short, it may easily get trapped in an isolated peak of non-maximal height. To reach higher peaks, the algorithm would have to descend and then go up again - but the algorithm does not do so, by virtue of its own design. The two traditional strategies around the problem of local optima for the TBR algorithm are the use of multiple starting points for TBR and the retention of suboptimal trees during swapping. The first is more efficient and is thus the only one that will be considered here. The multiple starting points for TBR are best obtained by doing wagner trees using different addition sequences to create multiple wagner trees. Typically, the addition sequence can be randomized to obtain many different wagner trees to be later input to TBR - this has been termed a "random addition sequence" or RAS. The expectation is that some of the initial trees will eventually be near or on the slopes of the highest peaks. For data sets of 50 to 150 taxa, this method generally works well, although it may require the use of large numbers of RAS+TBR. The strategy of RAS+TBR, however, is very inefficient for data sets of much larger size. It might appear in principle that larger data sets might simply require a larger number of replications, but the number of RAS+TBR needed to actually find optimal trees for data sets with 500 or more taxa seems to increase exponentially. 3 Composite Optima: Why do Traditional Techniques Fail? Traditional techniques fail because very large trees can exhibit what Goloboff [10] termed composite optima. The TBR algorithm can get stuck in local optima for many data sets with taxa. But a tree with (say) 500 taxa has many regions or sectors that can be seen as sub-problems of 50 taxa. Each of these sub-problems might have its own "local" and "global" optima. Whether a given sector is in a globally optimal configuration will be, to some extent, independent of whether other sectors in the tree are in their optimal configurations. For a tree to be optimal, all sectors in the tree have to be in a globally optimal configuration, but the chances of achieving this result in a given RAS+TBR may be extremely low. If five sectors of the tree are in an optimal configuration, just starting a new RAS+TBR will possibly place other sectors of the tree in optimal configurations, but it is unlikely also to place the same five sectors that were optimal again in optimal configurations. Consider the following analogy: you have six dice, and the goal is to achieve the highest sum of values by throwing them. You can either take the six dice and throw all of them at once, in which case the probability of getting the highest value is (1/6) 6, or 2 in 100,000. Or, you can use a divisive strategy: throw all the dice together only once, and then take each of the six dice and, in turn, throw it 50 times, keeping the highest value in each case. In the first case, you may well not find the highest possible value in 100,000 throws. With the divisive strategy of the second case, you would be

4 Techniques for Analyzing Large Data Sets 73 almost guaranteed to find the highest possible value with a total of 301 throws. In the real world, parsimony problems do not have sectors clearly identified as the dice, and the resolution among different sectors is often not really independent. This simply makes the problem more difficult. It is then easy to understand why finding a shortest tree using RAS+TBR may become so difficult for large real data sets. Consider a tree of 500 taxa; such a tree could have 10 different sectors which can have its own local optima; if a given RAS+TBR has a chance of 0.5 to find a globally optimal configuration for a given sector, then the chances of a given RAS+TBR to find a most parsimonious tree are , or less than 1 in 1,000. Thus, not only the number of rearrangements necessary to complete TBR swapping on trees with more taxa increases exponentially, but so does the number of replications of RAS+TBR that have to be done in order to find optimal trees. 4 Techniques for Analyzing Large Data Sets The best way to analyze data sets with composite optima will be by means analogous to the divisive strategy described above for the dice. Re-starting a new replication every time a replication of RAS+TBR gets stuck will simply not do the job in a reasonable time. There are four basic methods that have been proposed to cope with the problem of local optima. The first one to be developed is the parsimony ratchet ([11], originally presented at a symposium in 1998; see [12]). Subsequently developed methods are sectorial-searches, tree-fusing and tree-drifting [10]. The expected difference in performance between the traditional and these new techniques is about as much as one would expect for the two strategies for throwing the dice. 4.1 Ratchet The ratchet is based on slightly perturbing the data once the TBR gets stuck, repeating a TBR search for the perturbed data using the same tree as starting point, then using that tree for searching again under the original data. The perturbation is normally done by either increasing the weights of a proportion (10 to 15%) of the characters, or by eliminating some characters, as in jackknifing (but with lower probabilities of deletion). The TBR searches for both the perturbed and the original data must be made saving only one (or very few) trees. The effectiveness of the ratchet is not significantly increased by saving more trees, but run times are (see [11] for details). The ratchet works because the perturbation phase makes partial changes to the tree, but without changing its entire structure. The changes are made, at each round, to only part of the tree, improving, it is hoped, the tree a few parts at a time. In the end, the changes made by the ratchet are determined by

5 74 Pablo A. Goloboff character conflict: a given TBR rearrangement can improve the tree for the perturbed data only if some characters actually favor the alternative groupings. Since it is character conflict in the first place that determines the existence of local optima, the ratchet addresses the problem of local optima at its very heart. The ratchet is very effective for finding shortest trees. In the case of the 500- taxon data set of Chase et al. [13], the ratchet can find a shortest tree in about 2 hours (on a 266 MHz pentium II machine). Using only multiple RAS+TBR, it takes from 48 to 72 hours to find minimum length for that data set. 4.2 Sectorial searches The sectorial searches choose a sector of the tree with a size such that it can be properly handled by the TBR algorithm, create a reduced data set for that part of the tree, and analyze that sector by doing some number of RAS+TBR (without saving multiple trees). Then the best tree for the sector is replaced onto the entire tree. The process is repeated several times, choosing different sectors. The sectors can be chosen at random, or based on a consensus previously calculated by some means. Details are given in Goloboff [10]. Sectorial searches find short trees much more effectively than TBR alone; in the case of Chase et al.'s data set, finding trees under steps using TBR alone would require using over 10 times more replications than when using sectorial searches, and this would take about 7 times longer. Sectorial searches alone rarely find an optimal tree for large data sets. Used alone, they are less effective than the ratchet, normally going down to some non-minimal length (much lower than TBR alone), and then they get stuck. Sectorial searches, however, analyze many reduced data sets, which take almost no time at all. They thus have the advantage that they get down to a non-minimal length faster than the ratchet. They are then useful as initial stages of the search, in combination with other methods. 4.3 Tree-fusing Tree-fusing takes two trees and evaluates all possible exchanges of sub-trees with identical taxon-composition. The sub-tree exchanges that improve the tree are then actually made. See Goloboff [10] for details. Tree-fusing is best done by successively fusing pairs of trees and thus needs several trees as input to produce results; getting those trees will require several replications of RAS+TBR, possibly followed by some other method (like a sectorial search, ratchet, or tree-drifting). Once several close-to-optimal trees have been obtained, tree-fusing produces dramatic improvements in almost no time. It is easy to see why: each of the sectors will be in an optimal configuration in at least

6 Techniques for Analyzing Large Data Sets 75 some of the trees, and tree-fusing simply merges together those optimal sectors to achieve a globally optimal tree. In this sense, tree-fusing makes it possible to make good use of trees which are not globally optimal, as long as they have at least some sectors in optimal configuration. 4.4 Tree-drifting Tree-drifting is based on an idea quite similar to that of the ratchet. It is based on doing rounds of TBR, alternatively accepting only optimal, and suboptimal as well as optimal trees. The suboptimal trees are accepted, during the drift phase, with a probability that depends on how suboptimal the trees are. One of the key components of the method is the function for determining the probability of acceptance, which is based on both the absolute step difference and a measure of character conflict (the relative fit difference, which is the ratio of steps gained and saved in all characters, between the two trees; see [14]). Trees as good as or better than the one being swapped are always accepted. Once a given number of rearrangements has been accepted, a round of TBR accepting only optimal trees is made, and the process is repeated (as in the ratchet) a certain number of times. Tree-drifting is about as effective as the ratchet at finding shortest trees, although in current implementations tree-drifting seems to find minimum length about two to three times faster than the ratchet itself. This difference is probably a consequence of the fact that the ratchet analyzes the perturbed data set until completion of TBR, while the equivalent phase in tree-drifting only does a fixed number of replacements. Since there is no point in having the ratchet find the actually optimal trees for the perturbed data, the ratchet could be easily modified such that the perturbed phase finishes as soon as a certain number of rearrangements has been accepted. Most likely this would make the ratchet about as fast as tree-drifting. 4.5 Combined methods The methods described above can be combined. Thus, the best results have been obtained when RAS+TBR is first followed by Sectorial searches, then some drift or ratchet, and the results are fused. Repeating this procedure will sometimes find minimum length much more quickly than other times. If the procedure uses (say) ten initial replications, on occasion the first four or five replications will find a shortest tree, the rest of the time effectively being wasted -at least as far as hitting minimum length is concerned. On other occasions, the ten replications will not be enough to find minimum length, but then there is no point in starting from scratch with another ten replications: maybe just adding a few more, and tree-fusing those new replications with the previous ten ones,

7 76 Pablo A. Goloboff will do the job. The most efficient results, unsurprisingly, are then obtained when the methods described above are combined, and the parameters for the entire search are supervised and changed at run time. At each point, the number of initial replications is changed according to how many replications had to be used in previous hits to minimum length; if fewer replications were needed, the number is decreased, and vice versa. Goloboff [10] suggested that it would also be beneficial to change the number of Sectorial searches as well, and the number of drift cycles, to be done within each replication (although this has not been actually implemented so far). The process just described in the end also makes it likely that the best results obtained correspond to the actual minimum length. Each hit to minimum length will use as many initial replications as necessary to reproduce the previously found best length; if the length used so far as bound is in fact not optimal, shorter trees will eventually be found. With every certain number of hits to minimum length, the results from all previous replications can be submitted to tree-fusing. If the trees from several independent hits to some length do not produce shorter trees when subject to fusing, it is likely that that length represents indeed the minimum possible (and thus tree-fusing provides an additional criterion, beyond mere convergence, to determine whether the actual minimum length has been found in a particular case). Alternatively, the search parameters can be made very aggressive (i. e., many replications, with lots of drifting and fusing, etc.) at first, to make sure that one has the actual minimum length, and subsequently they can be switched to the more effortsaving strategy when it comes to determining the consensus tree for the data set being analyzed. 4.6 Minimum length: multiple trees or multiple hits? The approach to parsimony analysis for many years has been that of trying to actually find each and every possible most parsimonious tree for the data. Getting all possible most parsimonious tree for large data sets can be a difficult task (since there can be millions of them). What is more important, for the purpose of taxonomic studies, is that there is absolutely no point in doing so. Since the trees found are to be used to create a (strict) consensus tree, it would be much less wasteful to simply gather the minimum number of trees necessary to produce the same consensus that would be produced by all possible most parsimonious trees. In this sense, it is more fruitful to find additional trees of minimum length by producing new, independent hits to minimum length, than it is to find trees from the same hit by doing TBR saving multiple trees. Doing TBR saving multiple trees will produce, by necessity, trees which are in the same local optimum or island, differing by few rearrangements, while the trees from new hits to minimum length could, potentially, be more different -possibly belonging to different islands. The consensus from a few trees from indepen-

8 Techniques for Analyzing Large Data Sets 77 dent hits to minimum length is likely to be the same as the consensus from every possible most parsimonious tree, especially when the trees are collapsed more stringently. The trees can be collapsed by applying the TBR algorithm, not to find multiple trees, but rather to collapse all the nodes between source and destination node when a rearrangement produces a tree of the same length as the tree being swapped. This allows production of the same results as would be produced by saving large numbers of trees, but more quickly and using less RAM. This is one of the main ideas in Farris et al.'s [15] paper, further explored in Goloboff and Farris [14]. Thus, current implementation of the methods described here exploits this idea. As minimum length is successively hit, the consensus for the results obtained so far can be calculated. The consensus will become less and less resolved with additional hits to minimum length, up to a point, where it will become stable. Once additional hits to minimum length do not further de-resolve the consensus, the search can be stopped, and it is likely that the consensus corresponds to the same consensus that would be obtained if each and every most parsimonious tree was used to produce a consensus. If the user wants more confidence that the actual consensus has been obtained, once the consensus became stable, it is possible to restart calculating a consensus from the new (subsequent) hits to minimum length, until it becomes stable again; the grand consensus of both consensuses is less likely to contain spurious groups (i. e., actually unsupported groups, present in some most parsimonious trees, but not in all of them). For Chase et al.'s data set, when the consensus is calculated every three hits to minimum length, until stability is achieved twice, the analysis takes (on a 266 MHz Pentium II) an average time of only 4 hours (minimum length being hit 20 to 40 times). The exact consensus is obtained 80% of the time, but the 20% of the cases where the consensus is not exact exhibit only one or two spurious nodes. The consensus could be made more reliable by re-calculating it until stability is reached more times, and by re-calculating it less frequently (e. g., every five hits to minimum length, instead of three). This is in stark contrast with a search like Rice et al.'s [16] analysis, based on ca trees (found in 3.5 months of analysis) from a single replication, which produced 46 spurious nodes. 5 TNT: Implementation of the New Methods The techniques described here have been implemented in "Tree analysis using New Technology" (TNT), a new program by P. Goloboff, J. Farris, and K. Nixon [17]. The program is still a prototype, but demonstration versions are available from The program has a full Windows interface (although command-driven versions for other operating systems are anticipated). The input format is as for Hennig86 and NONA (see Siddall, this volume). The program allows the user to change the parameters of the search, either by hand, or by letting the program try to identify the best parameters for a given size of data set and degree of exhaustiveness. In general, a few recommendations can be made. Data sets with

9 78 Pablo A. Goloboff fewer than 100 taxa will be difficult to analyze only when extremely incongruent. In those cases, the methods of tree-fusing and Sectorial searches perform more poorly (these methods assume that some isolated sectors in the tree can indeed be identified, but this is unlikely to be the case for such data sets). Therefore, smaller data sets are best analyzed by means of extensive ratchet and/or tree-drifting, reducing tree-fusing and Sectorial searches to a minimum. Larger data sets can be analyzed with essentially only Sectorial searches plus tree-fusing if they are rather clean. However, as data sets become more difficult, it is necessary to increase not only the number of initial replications, but also the exhaustiveness of each replication. This is best done by selecting (at some point in each of the initial replications) sectors of larger size and analyzing them with tree-drift instead of simply RAS+TBR (this is the "DSS" option in the sectorial-search dialogue of TNT). Larger sectors are more likely to identify areas of conflict, and it is less likely that better solutions will be missed, because they would require that some taxon be moved outside the sector being analyzed. After certain number of sector selections are analyzed with tree-drifting, several cycles of global tree-drifting further improve the trees, before submitting them to tree-fusing. The tree-drifting can be done faster if some nodes are constrained during the search (the constraint is created from a consensus of the previous tree and the tree resulting from the perturbed round of TBR; see [10]). This might conceivably decrease the effectiveness of the drift, but it can be countered by doing an unconstrained cycle of drift with some periodicity, and since it means more cycles of drift per unit time, in the end it means an increase in effectiveness. The "hard cycles" option in the "Drift" dialogue box of TNT sets the number of hard drift cycles to do before an unconstrained cycle is done. If large numbers of drift cycles are to be done, it is advisable to set the hard cycles so that a large portion of the drift cycles are constrained (e.g., eight or nine out of ten). For difficult data sets, making the searches more exhaustive will take more time per replication, but in the end will mean that minimum length can be found much more quickly. The number of hits to re-check for consensus stability and the number of times the consensus should reach stability are changed from the main dialogue box of the "New Technology Search." As discussed above, this determines the reliability of the consensus tree obtained, with larger numbers meaning more reliable results. If the user so prefers, he may simply decide to hit minimum length a certain number of times and then let the program stop. 6 Remarks and Conclusions New methods for analysis of large data sets perform at speeds that were unimaginable only a few years ago. Parsimony problems of a few hundred taxa had been considered "intractable" by many authors, but they can now be easily analyzed. No doubt the enormous progress made in the last few years in this area has been facilitated by the fact that people have recently started publishing and openly discussing new algorithms and ideas. Although at

10 Techniques for Analyzing Large Data Sets 79 present it is difficult to predict whether the currently used methods will be further improved, the possibility certainly exists: the field of computational cladistics is still an area of active discussion and ferment. Acknowledgments The author wishes to thank Martin Ramirez and Gonzalo Giribet for comments and help during the preparation of the manuscript. Part of the research was carried out with the deeply appreciated support from PICT (Agencia Nacional de Promociónn Cientifica y Tecnológica), and from PEI 0324/ 97 (CONICET). References Farris JS (1988) HENNIG 86, v. 1.5, 10 program and documentation. Port Jefferson, NY Swofford DL (1993) PAUP: Phylogenetic analysis using parsimony, v , pro- 11 gram and documentation, Illinois Goloboff PA (1994b) Nona, v , program and documentation. Available at 12 ftp.unt.edu.ar/pub/parsimony Wheeler WC (1996) Optimization alignment: the end of multiple sequence 13 alignment in phylogenetics? Cladistics 12: 1-9 Swofford D, Olsen G (1990) Phylogeny reconstruction. In: D Hillis and C Moritz (eds.): Molecular Systematics Goloboff PA (1994a) Character optimization and calculation of tree lengths. Cladistics 9: Goloboff PA (1996) Methods for faster parsimony analysis. Cladistics 12: Moilanen A (1999) Searching for most 16 parsimonious trees with simulated evolutionary optimization. Cladistics 15: Maddison D (1991) The discovery and importance of multiple islands of most parsimonious trees. Syst. Zool., 40: Goloboff PA (1999) Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15: Nixon KC (1999) The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15: Horovitz I (1999) A report on "One Day Symposium on Numerical Cladistics". Cladistics 15: Chase MW, Soltis DE, Olmstead RG, Morgan D et al. (1993) Phylogenetics of seed plants: An analysis of nucleic sequences from the plastid gene rbcl. Ann. Mo. Bot. Gard. 80: Goloboff PA, Farris JS (2001) Methods for quick consensus estimation. Cladistics 17: Farris JS, Albert VA, Kallersjo M, Lipscomb, D et al. (1996) Parsimony jackknifing outperforms neighbor-joining. Cladistics 12: Rice KA, Donoghue MJ, Olmstead RG (1997) Analyzing large data sets: rbcl 500 revisited. Syst. Biol. 46: Goloboff PA, Farris JS, Nixon KC (1999) T.N. T.: Tree analysis using New Technology. Available at

Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima

Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima Cladistics 15, 415 428 (1999) Article ID clad.1999.0122, available online at http://www.idealibrary.com on Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima Pablo A. Goloboff

More information

Methods for Quick Consensus Estimation

Methods for Quick Consensus Estimation Cladistics 17, S26 S34 (2001) doi:10.1006/clad.2000.0156, available online at http://www.idealibrary.com on Methods for Quick Consensus Estimation Pablo A. Goloboff* and James S. Farris *Instituto Superior

More information

Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search

Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search André Wehe 1 and J. Gordon Burleigh 2 1 Department of Computer Science, Iowa State University, Ames,

More information

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

More information

PRec-I-DCM3: a parallel framework for fast and accurate large-scale phylogeny reconstruction

PRec-I-DCM3: a parallel framework for fast and accurate large-scale phylogeny reconstruction Int. J. Bioinformatics Research and Applications, Vol. 2, No. 4, 2006 407 PRec-I-DCM3: a parallel framework for fast and accurate large-scale phylogeny reconstruction Yuri Dotsenko*, Cristian Coarfa, Luay

More information

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference Stephane Guindon, F. Le Thiec, Patrice Duroux, Olivier Gascuel To cite this version: Stephane Guindon, F. Le Thiec, Patrice

More information

Bayesian Phylogeny and Measures of Branch Support

Bayesian Phylogeny and Measures of Branch Support Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The

More information

Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation

Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation Jack Sullivan,* Zaid Abdo, à Paul Joyce, à and David L. Swofford

More information

Arbres formels et Arbre(s) de la Vie

Arbres formels et Arbre(s) de la Vie Arbres formels et Arbre(s) de la Vie A bit of history and biology Definitions Numbers Topological distances Consensus Random models Algorithms to build trees Basic principles DATA sequence alignment distance

More information

Enumerating possible Sudoku grids

Enumerating possible Sudoku grids Enumerating possible Sudoku grids Bertram Felgenhauer Department of Computer Science TU Dresden 00 Dresden Germany bf@mail.inf.tu-dresden.de Frazer Jarvis Department of Pure Mathematics University of Sheffield,

More information

Solving Three-objective Optimization Problems Using Evolutionary Dynamic Weighted Aggregation: Results and Analysis

Solving Three-objective Optimization Problems Using Evolutionary Dynamic Weighted Aggregation: Results and Analysis Solving Three-objective Optimization Problems Using Evolutionary Dynamic Weighted Aggregation: Results and Analysis Abstract. In this paper, evolutionary dynamic weighted aggregation methods are generalized

More information

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster , pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing

More information

MISSING ENTRY REPLACEMENT DATA ANALYSIS: A REPLACEMENT APPROACH TO DEALING WITH MISSING DATA IN PALEONTOLOGICAL AND TOTAL EVIDENCE DATA SETS

MISSING ENTRY REPLACEMENT DATA ANALYSIS: A REPLACEMENT APPROACH TO DEALING WITH MISSING DATA IN PALEONTOLOGICAL AND TOTAL EVIDENCE DATA SETS Journal of Vertebrate Paleontology ():, June 00 00 by the Society of Vertebrate Paleontology MISSING ENTRY REPLACEMENT DATA ANALYSIS: A REPLACEMENT APPROACH TO DEALING WITH MISSING DATA IN PALEONTOLOGICAL

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Online Consensus and Agreement of Phylogenetic Trees.

Online Consensus and Agreement of Phylogenetic Trees. Online Consensus and Agreement of Phylogenetic Trees. Tanya Y. Berger-Wolf 1 Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA. tanyabw@cs.unm.edu Abstract. Computational

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

The QOOL Algorithm for fast Online Optimization of Multiple Degree of Freedom Robot Locomotion

The QOOL Algorithm for fast Online Optimization of Multiple Degree of Freedom Robot Locomotion The QOOL Algorithm for fast Online Optimization of Multiple Degree of Freedom Robot Locomotion Daniel Marbach January 31th, 2005 Swiss Federal Institute of Technology at Lausanne Daniel.Marbach@epfl.ch

More information

The 2010 British Informatics Olympiad

The 2010 British Informatics Olympiad Time allowed: 3 hours The 2010 British Informatics Olympiad Instructions You should write a program for part (a) of each question, and produce written answers to the remaining parts. Programs may be used

More information

Using Simulation to Understand and Optimize a Lean Service Process

Using Simulation to Understand and Optimize a Lean Service Process Using Simulation to Understand and Optimize a Lean Service Process Kumar Venkat Surya Technologies, Inc. 4888 NW Bethany Blvd., Suite K5, #191 Portland, OR 97229 kvenkat@suryatech.com Wayne W. Wakeland

More information

Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling

Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling Avram Golbert 01574669 agolbert@gmail.com Abstract: While Artificial Intelligence research has shown great success in deterministic

More information

Peer-to-peer Cooperative Backup System

Peer-to-peer Cooperative Backup System Peer-to-peer Cooperative Backup System Sameh Elnikety Mark Lillibridge Mike Burrows Rice University Compaq SRC Microsoft Research Abstract This paper presents the design and implementation of a novel backup

More information

Chapter 4 SUPPLY CHAIN PERFORMANCE MEASUREMENT USING ANALYTIC HIERARCHY PROCESS METHODOLOGY

Chapter 4 SUPPLY CHAIN PERFORMANCE MEASUREMENT USING ANALYTIC HIERARCHY PROCESS METHODOLOGY Chapter 4 SUPPLY CHAIN PERFORMANCE MEASUREMENT USING ANALYTIC HIERARCHY PROCESS METHODOLOGY This chapter highlights on supply chain performance measurement using one of the renowned modelling technique

More information

Effect of Using Neural Networks in GA-Based School Timetabling

Effect of Using Neural Networks in GA-Based School Timetabling Effect of Using Neural Networks in GA-Based School Timetabling JANIS ZUTERS Department of Computer Science University of Latvia Raina bulv. 19, Riga, LV-1050 LATVIA janis.zuters@lu.lv Abstract: - The school

More information

Molecular Clocks and Tree Dating with r8s and BEAST

Molecular Clocks and Tree Dating with r8s and BEAST Integrative Biology 200B University of California, Berkeley Principals of Phylogenetics: Ecology and Evolution Spring 2011 Updated by Nick Matzke Molecular Clocks and Tree Dating with r8s and BEAST Today

More information

Research on a Heuristic GA-Based Decision Support System for Rice in Heilongjiang Province

Research on a Heuristic GA-Based Decision Support System for Rice in Heilongjiang Province Research on a Heuristic GA-Based Decision Support System for Rice in Heilongjiang Province Ran Cao 1,1, Yushu Yang 1, Wei Guo 1, 1 Engineering college of Northeast Agricultural University, Haerbin, China

More information

6 Creating the Animation

6 Creating the Animation 6 Creating the Animation Now that the animation can be represented, stored, and played back, all that is left to do is understand how it is created. This is where we will use genetic algorithms, and this

More information

Routing in packet-switching networks

Routing in packet-switching networks Routing in packet-switching networks Circuit switching vs. Packet switching Most of WANs based on circuit or packet switching Circuit switching designed for voice Resources dedicated to a particular call

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

!!!#$$%&'()*+$(,%!#$%$&'()*%(+,'-*&./#-$&'(-&(0*.$#-$1(2&.3$'45 !"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q...

Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q... Lecture 4 Scheduling 1 Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max structure of a schedule 0 Q 1100 11 00 11 000 111 0 0 1 1 00 11 00 11 00

More information

Load Distribution in Large Scale Network Monitoring Infrastructures

Load Distribution in Large Scale Network Monitoring Infrastructures Load Distribution in Large Scale Network Monitoring Infrastructures Josep Sanjuàs-Cuxart, Pere Barlet-Ros, Gianluca Iannaccone, and Josep Solé-Pareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu

More information

Borges, J. L. 1998. On exactitude in science. P. 325, In, Jorge Luis Borges, Collected Fictions (Trans. Hurley, H.) Penguin Books.

Borges, J. L. 1998. On exactitude in science. P. 325, In, Jorge Luis Borges, Collected Fictions (Trans. Hurley, H.) Penguin Books. ... In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those

More information

Testing Metrics. Introduction

Testing Metrics. Introduction Introduction Why Measure? What to Measure? It is often said that if something cannot be measured, it cannot be managed or improved. There is immense value in measurement, but you should always make sure

More information

The 7 Attributes of a Good Software Configuration Management System

The 7 Attributes of a Good Software Configuration Management System Software Development Best Practices The 7 Attributes of a Good Software Configuration Management System Robert Kennedy IBM Rational software Benefits of Business Driven Development GOVERNANCE DASHBOARD

More information

3/8/2011. Applying Integrated Risk and Performance Management: A Case Study. Background: Our Discussion Today

3/8/2011. Applying Integrated Risk and Performance Management: A Case Study. Background: Our Discussion Today FINANCIAL SERVICES Applying Integrated Risk and Performance Management: A Case Study KPMG LLP Our Discussion Today Background What is Integrated Risk and Performance Management: Economic Theory Perspective

More information

How To Compare Load Sharing And Job Scheduling In A Network Of Workstations

How To Compare Load Sharing And Job Scheduling In A Network Of Workstations A COMPARISON OF LOAD SHARING AND JOB SCHEDULING IN A NETWORK OF WORKSTATIONS HELEN D. KARATZA Department of Informatics Aristotle University of Thessaloniki 546 Thessaloniki, GREECE Email: karatza@csd.auth.gr

More information

Lotto Master Formula (v1.3) The Formula Used By Lottery Winners

Lotto Master Formula (v1.3) The Formula Used By Lottery Winners Lotto Master Formula (v.) The Formula Used By Lottery Winners I. Introduction This book is designed to provide you with all of the knowledge that you will need to be a consistent winner in your local lottery

More information

GEOENGINE MSc in Geomatics Engineering (Master Thesis) Anamelechi, Falasy Ebere

GEOENGINE MSc in Geomatics Engineering (Master Thesis) Anamelechi, Falasy Ebere Master s Thesis: ANAMELECHI, FALASY EBERE Analysis of a Raster DEM Creation for a Farm Management Information System based on GNSS and Total Station Coordinates Duration of the Thesis: 6 Months Completion

More information

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique

More information

Studio 5.0 User s Guide

Studio 5.0 User s Guide Studio 5.0 User s Guide wls-ug-administrator-20060728-05 Revised 8/8/06 ii Copyright 2006 by Wavelink Corporation All rights reserved. Wavelink Corporation 6985 South Union Park Avenue, Suite 335 Midvale,

More information

Roulette Wheel Testing. Report on Stage 3.1 of NWML/GBGB Project Proposal

Roulette Wheel Testing. Report on Stage 3.1 of NWML/GBGB Project Proposal NOTICE: Following large wins from professional roulette teams, the UK Weights and Measures Lab (government lab) conducted a study to determine if particular "wheel conditions" made roulette spins predictable.

More information

Analysis of Micromouse Maze Solving Algorithms

Analysis of Micromouse Maze Solving Algorithms 1 Analysis of Micromouse Maze Solving Algorithms David M. Willardson ECE 557: Learning from Data, Spring 2001 Abstract This project involves a simulation of a mouse that is to find its way through a maze.

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

Chapter 13: Binary and Mixed-Integer Programming

Chapter 13: Binary and Mixed-Integer Programming Chapter 3: Binary and Mixed-Integer Programming The general branch and bound approach described in the previous chapter can be customized for special situations. This chapter addresses two special situations:

More information

INTEGER PROGRAMMING. Integer Programming. Prototype example. BIP model. BIP models

INTEGER PROGRAMMING. Integer Programming. Prototype example. BIP model. BIP models Integer Programming INTEGER PROGRAMMING In many problems the decision variables must have integer values. Example: assign people, machines, and vehicles to activities in integer quantities. If this is

More information

Credit Card Market Study Interim Report: Annex 4 Switching Analysis

Credit Card Market Study Interim Report: Annex 4 Switching Analysis MS14/6.2: Annex 4 Market Study Interim Report: Annex 4 November 2015 This annex describes data analysis we carried out to improve our understanding of switching and shopping around behaviour in the UK

More information

What makes a good process?

What makes a good process? Rob Davis Everyone wants a good process. Our businesses would be more profitable if we had them. But do we know what a good process is? Would we recognized one if we saw it? And how do we ensure we can

More information

Classification/Decision Trees (II)

Classification/Decision Trees (II) Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).

More information

How to Learn Good Cue Orders: When Social Learning Benefits Simple Heuristics

How to Learn Good Cue Orders: When Social Learning Benefits Simple Heuristics How to Learn Good Cue Orders: When Social Learning Benefits Simple Heuristics Rocio Garcia-Retamero (rretamer@mpib-berlin.mpg.de) Center for Adaptive Behavior and Cognition, Max Plank Institute for Human

More information

Automatic Inventory Control: A Neural Network Approach. Nicholas Hall

Automatic Inventory Control: A Neural Network Approach. Nicholas Hall Automatic Inventory Control: A Neural Network Approach Nicholas Hall ECE 539 12/18/2003 TABLE OF CONTENTS INTRODUCTION...3 CHALLENGES...4 APPROACH...6 EXAMPLES...11 EXPERIMENTS... 13 RESULTS... 15 CONCLUSION...

More information

Load Balancing and Rebalancing on Web Based Environment. Yu Zhang

Load Balancing and Rebalancing on Web Based Environment. Yu Zhang Load Balancing and Rebalancing on Web Based Environment Yu Zhang This report is submitted as partial fulfilment of the requirements for the Honours Programme of the School of Computer Science and Software

More information

Genetic algorithms for changing environments

Genetic algorithms for changing environments Genetic algorithms for changing environments John J. Grefenstette Navy Center for Applied Research in Artificial Intelligence, Naval Research Laboratory, Washington, DC 375, USA gref@aic.nrl.navy.mil Abstract

More information

Deployment of express checkout lines at supermarkets

Deployment of express checkout lines at supermarkets Deployment of express checkout lines at supermarkets Maarten Schimmel Research paper Business Analytics April, 213 Supervisor: René Bekker Faculty of Sciences VU University Amsterdam De Boelelaan 181 181

More information

Research Paper Business Analytics. Applications for the Vehicle Routing Problem. Jelmer Blok

Research Paper Business Analytics. Applications for the Vehicle Routing Problem. Jelmer Blok Research Paper Business Analytics Applications for the Vehicle Routing Problem Jelmer Blok Applications for the Vehicle Routing Problem Jelmer Blok Research Paper Vrije Universiteit Amsterdam Faculteit

More information

Multiobjective Multicast Routing Algorithm

Multiobjective Multicast Routing Algorithm Multiobjective Multicast Routing Algorithm Jorge Crichigno, Benjamín Barán P. O. Box 9 - National University of Asunción Asunción Paraguay. Tel/Fax: (+9-) 89 {jcrichigno, bbaran}@cnc.una.py http://www.una.py

More information

Improved Single and Multiple Approximate String Matching

Improved Single and Multiple Approximate String Matching Improved Single and Multiple Approximate String Matching Kimmo Fredriksson Department of Computer Science, University of Joensuu, Finland Gonzalo Navarro Department of Computer Science, University of Chile

More information

The Mathematics of the RSA Public-Key Cryptosystem

The Mathematics of the RSA Public-Key Cryptosystem The Mathematics of the RSA Public-Key Cryptosystem Burt Kaliski RSA Laboratories ABOUT THE AUTHOR: Dr Burt Kaliski is a computer scientist whose involvement with the security industry has been through

More information

The mathematical branch of probability has its

The mathematical branch of probability has its ACTIVITIES for students Matthew A. Carlton and Mary V. Mortlock Teaching Probability and Statistics through Game Shows The mathematical branch of probability has its origins in games and gambling. And

More information

Using Analytic Hierarchy Process (AHP) Method to Prioritise Human Resources in Substitution Problem

Using Analytic Hierarchy Process (AHP) Method to Prioritise Human Resources in Substitution Problem Using Analytic Hierarchy Process (AHP) Method to Raymond Ho-Leung TSOI Software Quality Institute Griffith University *Email:hltsoi@hotmail.com Abstract In general, software project development is often

More information

The Classes P and NP

The Classes P and NP The Classes P and NP We now shift gears slightly and restrict our attention to the examination of two families of problems which are very important to computer scientists. These families constitute the

More information

Concept of Cache in web proxies

Concept of Cache in web proxies Concept of Cache in web proxies Chan Kit Wai and Somasundaram Meiyappan 1. Introduction Caching is an effective performance enhancing technique that has been used in computer systems for decades. However,

More information

A data management framework for the Fungal Tree of Life

A data management framework for the Fungal Tree of Life Web Accessible Sequence Analysis for Biological Inference A data management framework for the Fungal Tree of Life Kauff F, Cox CJ, Lutzoni F. 2007. WASABI: An automated sequence processing system for multi-gene

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling)

What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling) data analysis data mining quality control web-based analytics What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling) StatSoft

More information

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swiss-prot MSDB, NCBI nr dbest Species specific ORFS

More information

(Refer Slide Time: 01:52)

(Refer Slide Time: 01:52) Software Engineering Prof. N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture - 2 Introduction to Software Engineering Challenges, Process Models etc (Part 2) This

More information

Server & Client Optimization

Server & Client Optimization Table of Contents: Farmers WIFE / Farmers WIFE Advanced Make sure your Server specification is within the requirements... 2 Operating System... 2 Hardware... 2 Processor Server... 2 Memory... 2 Hard disk

More information

ARLA Members Survey of the Private Rented Sector

ARLA Members Survey of the Private Rented Sector Prepared for The Association of Residential Letting Agents ARLA Members Survey of the Private Rented Sector Fourth Quarter 2013 Prepared by: O M Carey Jones 5 Henshaw Lane Yeadon Leeds LS19 7RW December,

More information

8.1 Min Degree Spanning Tree

8.1 Min Degree Spanning Tree CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree

More information

Asexual Versus Sexual Reproduction in Genetic Algorithms 1

Asexual Versus Sexual Reproduction in Genetic Algorithms 1 Asexual Versus Sexual Reproduction in Genetic Algorithms Wendy Ann Deslauriers (wendyd@alumni.princeton.edu) Institute of Cognitive Science,Room 22, Dunton Tower Carleton University, 25 Colonel By Drive

More information

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE

More information

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.

More information

Smart Queue Scheduling for QoS Spring 2001 Final Report

Smart Queue Scheduling for QoS Spring 2001 Final Report ENSC 833-3: NETWORK PROTOCOLS AND PERFORMANCE CMPT 885-3: SPECIAL TOPICS: HIGH-PERFORMANCE NETWORKS Smart Queue Scheduling for QoS Spring 2001 Final Report By Haijing Fang(hfanga@sfu.ca) & Liu Tang(llt@sfu.ca)

More information

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES ULFAR ERLINGSSON, MARK MANASSE, FRANK MCSHERRY MICROSOFT RESEARCH SILICON VALLEY MOUNTAIN VIEW, CALIFORNIA, USA ABSTRACT Recent advances in the

More information

Empirically Identifying the Best Genetic Algorithm for Covering Array Generation

Empirically Identifying the Best Genetic Algorithm for Covering Array Generation Empirically Identifying the Best Genetic Algorithm for Covering Array Generation Liang Yalan 1, Changhai Nie 1, Jonathan M. Kauffman 2, Gregory M. Kapfhammer 2, Hareton Leung 3 1 Department of Computer

More information

Acing Math (One Deck At A Time!): A Collection of Math Games. Table of Contents

Acing Math (One Deck At A Time!): A Collection of Math Games. Table of Contents Table of Contents Introduction to Acing Math page 5 Card Sort (Grades K - 3) page 8 Greater or Less Than (Grades K - 3) page 9 Number Battle (Grades K - 3) page 10 Place Value Number Battle (Grades 1-6)

More information

Reinvent your storage infrastructure for e-business

Reinvent your storage infrastructure for e-business Reinvent your storage infrastructure for e-business Paul Wang SolutionSoft Systems, Inc. 2345 North First Street, Suite 210 San Jose, CA 95131 pwang@solution-soft.com 408.346.1400 Abstract As the data

More information

Genetic Algorithms and Sudoku

Genetic Algorithms and Sudoku Genetic Algorithms and Sudoku Dr. John M. Weiss Department of Mathematics and Computer Science South Dakota School of Mines and Technology (SDSM&T) Rapid City, SD 57701-3995 john.weiss@sdsmt.edu MICS 2009

More information

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Journal of Al-Nahrain University Vol.15 (2), June, 2012, pp.161-168 Science Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Manal F. Younis Computer Department, College

More information

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

CAD Algorithms. P and NP

CAD Algorithms. P and NP CAD Algorithms The Classes P and NP Mohammad Tehranipoor ECE Department 6 September 2010 1 P and NP P and NP are two families of problems. P is a class which contains all of the problems we solve using

More information

Parallel Scalable Algorithms- Performance Parameters

Parallel Scalable Algorithms- Performance Parameters www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

Comparing Alternate Designs For A Multi-Domain Cluster Sample

Comparing Alternate Designs For A Multi-Domain Cluster Sample Comparing Alternate Designs For A Multi-Domain Cluster Sample Pedro J. Saavedra, Mareena McKinley Wright and Joseph P. Riley Mareena McKinley Wright, ORC Macro, 11785 Beltsville Dr., Calverton, MD 20705

More information

Problems, Methods and Tools of Advanced Constrained Scheduling

Problems, Methods and Tools of Advanced Constrained Scheduling Problems, Methods and Tools of Advanced Constrained Scheduling Victoria Shavyrina, Spider Project Team Shane Archibald, Archibald Associates Vladimir Liberzon, Spider Project Team 1. Introduction In this

More information

USING BACKTRACKING TO SOLVE THE SCRAMBLE SQUARES PUZZLE

USING BACKTRACKING TO SOLVE THE SCRAMBLE SQUARES PUZZLE USING BACKTRACKING TO SOLVE THE SCRAMBLE SQUARES PUZZLE Keith Brandt, Kevin R. Burger, Jason Downing, Stuart Kilzer Mathematics, Computer Science, and Physics Rockhurst University, 1100 Rockhurst Road,

More information

An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,

More information

If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C?

If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C? Problem 3 If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C? Suggested Questions to ask students about Problem 3 The key to this question

More information

Anomaly Detection in Predictive Maintenance

Anomaly Detection in Predictive Maintenance Anomaly Detection in Predictive Maintenance Anomaly Detection with Time Series Analysis Phil Winters Iris Adae Rosaria Silipo Phil.Winters@knime.com Iris.Adae@uni-konstanz.de Rosaria.Silipo@knime.com Copyright

More information

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL What mathematical optimization can, and cannot, do for biologists Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL Introduction There is no shortage of literature about the

More information

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups Abstract Yan Shen 1, Bao Wu 2* 3 1 Hangzhou Normal University,

More information

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions

More information

Resource Allocation Schemes for Gang Scheduling

Resource Allocation Schemes for Gang Scheduling Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian

More information

8. KNOWLEDGE BASED SYSTEMS IN MANUFACTURING SIMULATION

8. KNOWLEDGE BASED SYSTEMS IN MANUFACTURING SIMULATION - 1-8. KNOWLEDGE BASED SYSTEMS IN MANUFACTURING SIMULATION 8.1 Introduction 8.1.1 Summary introduction The first part of this section gives a brief overview of some of the different uses of expert systems

More information

Snapshots in the Data Warehouse BY W. H. Inmon

Snapshots in the Data Warehouse BY W. H. Inmon Snapshots in the Data Warehouse BY W. H. Inmon There are three types of modes that a data warehouse is loaded in: loads from archival data loads of data from existing systems loads of data into the warehouse

More information

In the IEEE Standard Glossary of Software Engineering Terminology the Software Life Cycle is:

In the IEEE Standard Glossary of Software Engineering Terminology the Software Life Cycle is: In the IEEE Standard Glossary of Software Engineering Terminology the Software Life Cycle is: The period of time that starts when a software product is conceived and ends when the product is no longer

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Evaluation of a New Method for Measuring the Internet Degree Distribution: Simulation Results

Evaluation of a New Method for Measuring the Internet Degree Distribution: Simulation Results Evaluation of a New Method for Measuring the Internet Distribution: Simulation Results Christophe Crespelle and Fabien Tarissan LIP6 CNRS and Université Pierre et Marie Curie Paris 6 4 avenue du président

More information