Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute of Computing University of Campinas Campinas, SP, Brazil thiago.arruda@students.ic.unicamp.br Ulisses Dias Institute of Computing University of Campinas Campinas, SP, Brazil udias@ic.unicamp.br Zanoni Dias Institute of Computing University of Campinas Campinas, SP, Brazil zanoni@ic.unicamp.br Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 1 / 27
Outline Genome Rearrangement Field Length-Weighted Inversions Greedy Randomized Search Procedure Experimental Results Conclusions Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 2 / 27
Part I Genome Rearrangement Field Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 3 / 27
Genome Rearrangement Field Genome Rearrangements Genes are shared by genomes from different species. Given two contemporary genomes, gene order and orientation may differ. 4500000 Yesrsinia pestis Nepal 516 4000000 3500000 3000000 2500000 2000000 1500000 1000000 500000 0 0 500000 1000000 1500000 2000000 2500000 3000000 4500000 4000000 3500000 Yesrsinia pestis Angola Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 4 / 27
Genome Rearrangement Field Genome Rearrangements Genomes undergo large scale mutations during the evolutionary process (move DNA-sequence from one place to the other). An inversion occurs when a chromosome breaks at two locations called breakpoints, and the DNA between the breakpoints is reversed. Pseudomonas putida KT2440 6000000 5000000 4000000 3000000 2000000 1000000 0 0 1000000 2000000 3000000 4000000 5000000 Pseudomonas putida F1 Inversion 0... i-1 i... j j+1... n ρ(i,j) 0... i-1 -j... -i j+1... n Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 5 / 27
Genome Rearrangement Field Evolutionary Distance The genetic data available for many organisms allows accurate evolutionary inference. The traditional approach uses nucleotide (or amino acid) comparison to find the edit distance. When mutational events (like inversions) affect very large stretches of DNA sequence? Whole-Genome Distance Measure. Compute the minimum number of lage-scale events needed to transform one genome into the other (parsimony criterion). Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 6 / 27
Genome Rearrangement Field Evolutionary Distance Regarding the computation of parsimonious scenarios when only inversions are considered. A polynomial problem (Hannenhali and Pevzner, 1998). GRIMM is the most used tool for this job. Several studies have shown that inversions are shorter than expected under a neutral model. The sorting by inversion problem do not take into account the length of the reversed sequence. The sequence of operations that most likely happened during the evolution may not involve the movement of many long sequences. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 7 / 27
Part II Length-Weighted Inversions Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 8 / 27
Length-Weighted Inversions Genome Representation We regard genomes as permutations: π = (π 1 π 2... π n ), for π i Z, 1 π i n and i j π i π j. Same Orientation Different Orientation Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 9 / 27
Length-Weighted Inversions Sorting by Inversions Problem Given the gene order of two contemporary genomes, the task of finding the minimum number of inversions which transform one genome into the other is called Sorting by Inversion Problem -5 +2-3 +1 +4-5 +2-1 +3 +4-4 -3 +1-2 +5 +2-1 +3 +4 +5-2 -1 +3 +4 +5 +1 +2 +3 +4 +5 Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 10 / 27
Length-Weighted Inversions Sorting by Length-Weighted Inversions Problem -5 +2-3 +1 +4 cost = 2-5 +2-1 +3 +4-4 -3 +1-2 +5 +2-1 +3 +4 +5-2 -1 +3 +4 +5 +1 +2 +3 +4 +5 cost = 5 cost = 4 cost = 1 cost = 2 Total = 14 Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 11 / 27
Part III Greedy Randomized Search Procedure Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 12 / 27
Greedy Randomized Search Procedure Building Blocks Initial Solution Neighborhood Local Search Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 13 / 27
Greedy Randomized Search Procedure Building Blocks Initial Solution A solution is a sequence of permutations s =< s 0,s 1,...,s m > such that s k differs from s k 1 by one inversion, 1 k m, s 0 = π and s m = ι. s =< ( 5 +2 3 +1 +4),( 5 +2 1 +3 +4),( 4 3 + 1 2 +5),(+2 1 +3 +4 +5),(+1 +2 +3 +4 +5) >. We use an optimal solution for the Sorting by Inversions Problem as Initial Solution. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 14 / 27
Greedy Randomized Search Procedure Building Blocks Neighborhood Let s =< s 0,s 1,...,s m > be the current solution. Another solution s =< s 0,s 1,...,s m > is in the neighborhood of s, namely N f (s), if they differ by a frame that has no more than f elements. j - i +1 = f Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 15 / 27
Greedy Randomized Search Procedure Building Blocks Local Search Our method iteratively improves the current solution. Each step requires a local change. The local change will be restricted to a given frame choosen by random. If a less costly sequence is found, it is made the current solution. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 16 / 27
Greedy Randomized Search Procedure Local Search - Definitions Let < s i,...,s j > be the frame where the change will be performed. We will call s i = α and s j = β. Breakpoint: pair of elements that are consecutive in α but not consecutive in β. α = (+1 +2 4 3 +5) β = (+1 +2 +3 +4 +5) Entropy: How far each element in α is from its position in beta. ent(α,β) = n p(α,i) p(β,i). i=1 Benefit: for any inversion ρ, the benefit δ is given by: δ(α,β,ρ) = ent(π,β) ent(π ρ,β) cost(ρ) Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 17 / 27
Greedy Randomized Search Procedure Local Search We start the construction of new frame that will starts with α and ends with β. We select inversions that decrease the number of breakpoints. When no inversion of the kind exists, we execute one step of the algorithm proposed by Bergeron, 2006 for the Sorting by Inversions Problem. Building a Frame α... β Benefit Inversion Inv 1 Inv 2... Inv 5 Inv 6 Inv 7... Five best scored permutations Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 18 / 27
Greedy Randomized Search Procedure Local Search We rank the selected inversions based on the benefit. The inversions ranked as high as fifth qualify to the second phase. We choose one inversion based on a random process called roulette wheel selection mechanism. Each inversion has a selection likelihood proportional to the square of its benefit. Building a Frame α... β Benefit Inversion Inv 1 Inv 2... Inv 5 Inv 6 Inv 7... Five best scored permutations Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 19 / 27
Part IV Experimental Results Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 20 / 27
Experimental Results Parameters Frame Size: < 14,12,10,8,6,4 > Iterations: 900 (150 for each frame size). Instances: we generated 1000 permutations whose sizes range from 10 to 100 in intervals of 5. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 21 / 27
Experimental Results Number of Improved Initial Solutions 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Permutation size Percentage of Times our Heuristic Improved the Initial Solution 14 12 10 8 6 4 Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 22 / 27
Experimental Results 15% Average Improvement 12% 9% 6% 3% 0% 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Permutation size Percentage of Improvement on the Initial Solution 14 12 10 8 6 4 Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 23 / 27
Experimental Results Average Cost 1800 1600 1400 1200 1000 800 600 400 200 Swidan GRASP 0 GRIMM 10 20 30 40 50 60 70 80 90 100 Permutation Size Comparative Analysis: Average Cost Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 24 / 27
Part V Conclusions Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 25 / 27
Conclusions We presented a new method for the length-weighted inversion problem on signed permutations. We considered the case where the weight function is simply the number of elements in the reversed segment. We were able to improve the initial solution in 94% of the cases. Our solutions cost 12% less than the initial solution, on average. We also show that our method provides solution that are less costly than a previous algorithm. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 26 / 27
Acknowledgments Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 27 / 27