Heuristics for the Sorting by LengthWeighted Inversions Problem on Signed Permutations


 Abigayle Amelia Cain
 1 years ago
 Views:
Transcription
1 Heuristics for the Sorting by LengthWeighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute of Computing University of Campinas Campinas, SP, Brazil Ulisses Dias Institute of Computing University of Campinas Campinas, SP, Brazil Zanoni Dias Institute of Computing University of Campinas Campinas, SP, Brazil Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
2 Outline Genome Rearrangement Field LengthWeighted Inversions Greedy Randomized Search Procedure Experimental Results Conclusions Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
3 Part I Genome Rearrangement Field Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
4 Genome Rearrangement Field Genome Rearrangements Genes are shared by genomes from different species. Given two contemporary genomes, gene order and orientation may differ Yesrsinia pestis Nepal Yesrsinia pestis Angola Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
5 Genome Rearrangement Field Genome Rearrangements Genomes undergo large scale mutations during the evolutionary process (move DNAsequence from one place to the other). An inversion occurs when a chromosome breaks at two locations called breakpoints, and the DNA between the breakpoints is reversed. Pseudomonas putida KT Pseudomonas putida F1 Inversion 0... i1 i... j j+1... n ρ(i,j) 0... i1 j... i j+1... n Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
6 Genome Rearrangement Field Evolutionary Distance The genetic data available for many organisms allows accurate evolutionary inference. The traditional approach uses nucleotide (or amino acid) comparison to find the edit distance. When mutational events (like inversions) affect very large stretches of DNA sequence? WholeGenome Distance Measure. Compute the minimum number of lagescale events needed to transform one genome into the other (parsimony criterion). Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
7 Genome Rearrangement Field Evolutionary Distance Regarding the computation of parsimonious scenarios when only inversions are considered. A polynomial problem (Hannenhali and Pevzner, 1998). GRIMM is the most used tool for this job. Several studies have shown that inversions are shorter than expected under a neutral model. The sorting by inversion problem do not take into account the length of the reversed sequence. The sequence of operations that most likely happened during the evolution may not involve the movement of many long sequences. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
8 Part II LengthWeighted Inversions Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
9 LengthWeighted Inversions Genome Representation We regard genomes as permutations: π = (π 1 π 2... π n ), for π i Z, 1 π i n and i j π i π j. Same Orientation Different Orientation Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
10 LengthWeighted Inversions Sorting by Inversions Problem Given the gene order of two contemporary genomes, the task of finding the minimum number of inversions which transform one genome into the other is called Sorting by Inversion Problem Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
11 LengthWeighted Inversions Sorting by LengthWeighted Inversions Problem cost = cost = 5 cost = 4 cost = 1 cost = 2 Total = 14 Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
12 Part III Greedy Randomized Search Procedure Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
13 Greedy Randomized Search Procedure Building Blocks Initial Solution Neighborhood Local Search Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
14 Greedy Randomized Search Procedure Building Blocks Initial Solution A solution is a sequence of permutations s =< s 0,s 1,...,s m > such that s k differs from s k 1 by one inversion, 1 k m, s 0 = π and s m = ι. s =< ( ),( ),( ),( ),( ) >. We use an optimal solution for the Sorting by Inversions Problem as Initial Solution. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
15 Greedy Randomized Search Procedure Building Blocks Neighborhood Let s =< s 0,s 1,...,s m > be the current solution. Another solution s =< s 0,s 1,...,s m > is in the neighborhood of s, namely N f (s), if they differ by a frame that has no more than f elements. j  i +1 = f Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
16 Greedy Randomized Search Procedure Building Blocks Local Search Our method iteratively improves the current solution. Each step requires a local change. The local change will be restricted to a given frame choosen by random. If a less costly sequence is found, it is made the current solution. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
17 Greedy Randomized Search Procedure Local Search  Definitions Let < s i,...,s j > be the frame where the change will be performed. We will call s i = α and s j = β. Breakpoint: pair of elements that are consecutive in α but not consecutive in β. α = ( ) β = ( ) Entropy: How far each element in α is from its position in beta. ent(α,β) = n p(α,i) p(β,i). i=1 Benefit: for any inversion ρ, the benefit δ is given by: δ(α,β,ρ) = ent(π,β) ent(π ρ,β) cost(ρ) Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
18 Greedy Randomized Search Procedure Local Search We start the construction of new frame that will starts with α and ends with β. We select inversions that decrease the number of breakpoints. When no inversion of the kind exists, we execute one step of the algorithm proposed by Bergeron, 2006 for the Sorting by Inversions Problem. Building a Frame α... β Benefit Inversion Inv 1 Inv 2... Inv 5 Inv 6 Inv 7... Five best scored permutations Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
19 Greedy Randomized Search Procedure Local Search We rank the selected inversions based on the benefit. The inversions ranked as high as fifth qualify to the second phase. We choose one inversion based on a random process called roulette wheel selection mechanism. Each inversion has a selection likelihood proportional to the square of its benefit. Building a Frame α... β Benefit Inversion Inv 1 Inv 2... Inv 5 Inv 6 Inv 7... Five best scored permutations Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
20 Part IV Experimental Results Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
21 Experimental Results Parameters Frame Size: < 14,12,10,8,6,4 > Iterations: 900 (150 for each frame size). Instances: we generated 1000 permutations whose sizes range from 10 to 100 in intervals of 5. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
22 Experimental Results Number of Improved Initial Solutions 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Permutation size Percentage of Times our Heuristic Improved the Initial Solution Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
23 Experimental Results 15% Average Improvement 12% 9% 6% 3% 0% Permutation size Percentage of Improvement on the Initial Solution Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
24 Experimental Results Average Cost Swidan GRASP 0 GRIMM Permutation Size Comparative Analysis: Average Cost Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
25 Part V Conclusions Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
26 Conclusions We presented a new method for the lengthweighted inversion problem on signed permutations. We considered the case where the weight function is simply the number of elements in the reversed segment. We were able to improve the initial solution in 94% of the cases. Our solutions cost 12% less than the initial solution, on average. We also show that our method provides solution that are less costly than a previous algorithm. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
27 Acknowledgments Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 13, / 27
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1 ActiveSet Newton Algorithm for Overcomplete NonNegative Representations of Audio Tuomas Virtanen, Member,
More informationAnt Colony System: A Cooperative Learning Approach to the Traveling Salesman Problem
Ant Colony System: A Cooperative Learning Approach to the Traveling Salesman Problem TR/IRIDIA/19965 Université Libre de Bruxelles Belgium Marco Dorigo IRIDIA, Université Libre de Bruxelles, CP 194/6,
More informationDealing with Uncertainty in Operational Transport Planning
Dealing with Uncertainty in Operational Transport Planning Jonne Zutt, Arjan van Gemund, Mathijs de Weerdt, and Cees Witteveen Abstract An important problem in transportation is how to ensure efficient
More informationA Framework for Genetic Algorithms in Games
A Framework for Genetic Algorithms in Games Vinícius Godoy de Mendonça Cesar Tadeu Pozzer Roberto Tadeu Raiitz 1 Universidade Positivo, Departamento de Informática 2 Universidade Federal de Santa Maria,
More informationSo Who Won? Dynamic Max Discovery with the Crowd
So Who Won? Dynamic Max Discovery with the Crowd Stephen Guo Stanford University Stanford, CA, USA sdguo@cs.stanford.edu Aditya Parameswaran Stanford University Stanford, CA, USA adityagp@cs.stanford.edu
More informationLearning to Select Features using their Properties
Journal of Machine Learning Research 9 (2008) 23492376 Submitted 8/06; Revised 1/08; Published 10/08 Learning to Select Features using their Properties Eyal Krupka Amir Navot Naftali Tishby School of
More informationDistributed Optimization by Ant Colonies
APPEARED IN PROCEEDINGS OF ECAL91  EUROPEAN CONFERENCE ON ARTIFICIAL LIFE, PARIS, FRANCE, ELSEVIER PUBLISHING, 134 142. Distributed Optimization by Ant Colonies Alberto Colorni, Marco Dorigo, Vittorio
More informationOn SetBased Multiobjective Optimization
1 On SetBased Multiobjective Optimization Eckart Zitzler, Lothar Thiele, and Johannes Bader Abstract Assuming that evolutionary multiobjective optimization (EMO) mainly deals with set problems, one can
More informationDynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked
More informationRevisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations
Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations Melanie Mitchell 1, Peter T. Hraber 1, and James P. Crutchfield 2 In Complex Systems, 7:8913, 1993 Abstract We present
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More informationCLoud Computing is the long dreamed vision of
1 Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data Cong Wang, Student Member, IEEE, Ning Cao, Student Member, IEEE, Kui Ren, Senior Member, IEEE, Wenjing Lou, Senior Member,
More informationLearning and Inference over Constrained Output
IJCAI 05 Learning and Inference over Constrained Output Vasin Punyakanok Dan Roth Wentau Yih Dav Zimak Department of Computer Science University of Illinois at UrbanaChampaign {punyakan, danr, yih, davzimak}@uiuc.edu
More informationTopk Set Similarity Joins
Topk Set Similarity Joins Chuan Xiao Wei Wang Xuemin Lin Haichuan Shang The University of New South Wales & NICTA {chuanx, weiw, lxue, shangh}@cse.unsw.edu.au Abstract Similarity join is a useful primitive
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationThe Backpropagation Algorithm
7 The Backpropagation Algorithm 7. Learning as gradient descent We saw in the last chapter that multilayered networks are capable of computing a wider range of Boolean functions than networks with a single
More informationThe SmallWorld Phenomenon: An Algorithmic Perspective
The SmallWorld Phenomenon: An Algorithmic Perspective Jon Kleinberg Abstract Long a matter of folklore, the smallworld phenomenon the principle that we are all linked by short chains of acquaintances
More informationEfficient Similarity Search over Encrypted Data
Efficient Similarity Search over Encrypted Data Mehmet Kuzu, Mohammad Saiful Islam, Murat Kantarcioglu Department of Computer Science, The University of Texas at Dallas Richardson, TX 758, USA {mehmet.kuzu,
More informationTwoFrame Motion Estimation Based on Polynomial Expansion
TwoFrame Motion Estimation Based on Polynomial Expansion Gunnar Farnebäck Computer Vision Laboratory, Linköping University, SE581 83 Linköping, Sweden gf@isy.liu.se http://www.isy.liu.se/cvl/ Abstract.
More informationExcel s Business Tools: WhatIf Analysis
Excel s Business Tools: Introduction is an important aspect of planning and managing any business. Understanding the implications of changes in the factors that influence your business is crucial when
More informationA Googlelike Model of Road Network Dynamics and its Application to Regulation and Control
A Googlelike Model of Road Network Dynamics and its Application to Regulation and Control Emanuele Crisostomi, Steve Kirkland, Robert Shorten August, 2010 Abstract Inspired by the ability of Markov chains
More informationPreferencebased Search using ExampleCritiquing with Suggestions
Journal of Artificial Intelligence Research 27 (2006) 465503 Submitted 04/06; published 12/06 Preferencebased Search using ExampleCritiquing with Suggestions Paolo Viappiani Boi Faltings Artificial
More informationCombating Web Spam with TrustRank
Combating Web Spam with TrustRank Zoltán Gyöngyi Hector GarciaMolina Jan Pedersen Stanford University Stanford University Yahoo! Inc. Computer Science Department Computer Science Department 70 First Avenue
More informationThe Set Data Model CHAPTER 7. 7.1 What This Chapter Is About
CHAPTER 7 The Set Data Model The set is the most fundamental data model of mathematics. Every concept in mathematics, from trees to real numbers, is expressible as a special kind of set. In this book,
More informationNearOptimal Sensor Placements in Gaussian Processes
Carlos Guestrin Andreas Krause Ajit Paul Singh School of Computer Science, Carnegie Mellon University GUESTRIN@CS.CMU.EDU KRAUSEA@CS.CMU.EDU AJIT@CS.CMU.EDU Abstract When monitoring spatial phenomena,
More informationWhere the Bugs Are. Thomas J. Ostrand AT&T Labs  Research 180 Park Avenue Florham Park, NJ 07932 ostrand@research.att.com. Elaine J.
Where the Bugs Are Thomas J. Ostrand AT&T Labs  Research 180 Park Avenue Florham Park, NJ 07932 ostrand@research.att.com Elaine J. Weyuker AT&T Labs  Research 180 Park Avenue Florham Park, NJ 07932 weyuker@research.att.com
More informationWHICH SCORING RULE MAXIMIZES CONDORCET EFFICIENCY? 1. Introduction
WHICH SCORING RULE MAXIMIZES CONDORCET EFFICIENCY? DAVIDE P. CERVONE, WILLIAM V. GEHRLEIN, AND WILLIAM S. ZWICKER Abstract. Consider an election in which each of the n voters casts a vote consisting of
More informationContextBased Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard
620 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 7, JULY 2003 ContextBased Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard Detlev Marpe, Member,
More informationSubspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at UrbanaChampaign
More informationCompetitive Coevolution through Evolutionary Complexification
Journal of Artificial Intelligence Research 21 (2004) 63100 Submitted 8/03; published 2/04 Competitive Coevolution through Evolutionary Complexification Kenneth O. Stanley Risto Miikkulainen Department
More information