Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations

Size: px

Start display at page:

Download "Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations"

Abigayle Amelia Cain
8 years ago
Views:

1 Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute of Computing University of Campinas Campinas, SP, Brazil Ulisses Dias Institute of Computing University of Campinas Campinas, SP, Brazil Zanoni Dias Institute of Computing University of Campinas Campinas, SP, Brazil Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

unicamp.br Ulisses Dias Institute of Computing University of Campinas Campinas, SP, Brazil udias@ic.unicamp.br Zanoni Dias Institute of Computing University of Campinas Campinas, SP, Brazil zanoni@ic.

2 Outline Genome Rearrangement Field Length-Weighted Inversions Greedy Randomized Search Procedure Experimental Results Conclusions Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

Experimental Results Conclusions Thiago da Silva

3 Part I Genome Rearrangement Field Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

4 Genome Rearrangement Field Genome Rearrangements Genes are shared by genomes from different species. Given two contemporary genomes, gene order and orientation may differ Yesrsinia pestis Nepal Yesrsinia pestis Angola Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

4500000 Yesrsinia pestis Nepal 516 4000000 3500000 3000000 2500000 2000000 1500000 1000000 500000 0 0 500000

5 Genome Rearrangement Field Genome Rearrangements Genomes undergo large scale mutations during the evolutionary process (move DNA-sequence from one place to the other). An inversion occurs when a chromosome breaks at two locations called breakpoints, and the DNA between the breakpoints is reversed. Pseudomonas putida KT Pseudomonas putida F1 Inversion 0... i-1 i... j j+1... n ρ(i,j) 0... i-1 -j... -i j+1... n Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

An inversion occurs when a chromosome breaks at two locations called breakpoints, and the DNA between the breakpoints is reversed.

6 Genome Rearrangement Field Evolutionary Distance The genetic data available for many organisms allows accurate evolutionary inference. The traditional approach uses nucleotide (or amino acid) comparison to find the edit distance. When mutational events (like inversions) affect very large stretches of DNA sequence? Whole-Genome Distance Measure. Compute the minimum number of lage-scale events needed to transform one genome into the other (parsimony criterion). Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

When mutational events (like inversions) affect very large stretches of DNA sequence? Whole-Genome Distance Measure.

7 Genome Rearrangement Field Evolutionary Distance Regarding the computation of parsimonious scenarios when only inversions are considered. A polynomial problem (Hannenhali and Pevzner, 1998). GRIMM is the most used tool for this job. Several studies have shown that inversions are shorter than expected under a neutral model. The sorting by inversion problem do not take into account the length of the reversed sequence. The sequence of operations that most likely happened during the evolution may not involve the movement of many long sequences. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

Several studies have shown that inversions are shorter than expected under a neutral model.

8 Part II Length-Weighted Inversions Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

9 Length-Weighted Inversions Genome Representation We regard genomes as permutations: π = (π 1 π 2... π n ), for π i Z, 1 π i n and i j π i π j. Same Orientation Different Orientation Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

10 Length-Weighted Inversions Sorting by Inversions Problem Given the gene order of two contemporary genomes, the task of finding the minimum number of inversions which transform one genome into the other is called Sorting by Inversion Problem Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

other is called Sorting by Inversion Problem -5 +2-3 +1 +4-5 +2-1 +3 +4-4 -3 +1-2 +5 +2-1 +3 +4 +5-2

11 Length-Weighted Inversions Sorting by Length-Weighted Inversions Problem cost = cost = 5 cost = 4 cost = 1 cost = 2 Total = 14 Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

+5 +1 +2 +3 +4 +5 cost = 5 cost = 4 cost = 1 cost = 2 Total = 14 Thiago da

12 Part III Greedy Randomized Search Procedure Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

13 Greedy Randomized Search Procedure Building Blocks Initial Solution Neighborhood Local Search Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

14 Greedy Randomized Search Procedure Building Blocks Initial Solution A solution is a sequence of permutations s =< s 0,s 1,...,s m > such that s k differs from s k 1 by one inversion, 1 k m, s 0 = π and s m = ι. s =< ( ),( ),( ),( ),( ) >. We use an optimal solution for the Sorting by Inversions Problem as Initial Solution. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

s =< ( 5 +2 3 +1 +4),( 5 +2 1 +3 +4),( 4 3 + 1 2 +5),(+2 1 +3 +4 +5),(+1 +2 +3 +4 +5) >.

15 Greedy Randomized Search Procedure Building Blocks Neighborhood Let s =< s 0,s 1,...,s m > be the current solution. Another solution s =< s 0,s 1,...,s m > is in the neighborhood of s, namely N f (s), if they differ by a frame that has no more than f elements. j - i +1 = f Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

..,s m > is in the neighborhood of s, namely N f (s), if they differ by a frame that has

16 Greedy Randomized Search Procedure Building Blocks Local Search Our method iteratively improves the current solution. Each step requires a local change. The local change will be restricted to a given frame choosen by random. If a less costly sequence is found, it is made the current solution. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

The local change will be restricted to a given frame choosen by random.

17 Greedy Randomized Search Procedure Local Search - Definitions Let < s i,...,s j > be the frame where the change will be performed. We will call s i = α and s j = β. Breakpoint: pair of elements that are consecutive in α but not consecutive in β. α = ( ) β = ( ) Entropy: How far each element in α is from its position in beta. ent(α,β) = n p(α,i) p(β,i). i=1 Benefit: for any inversion ρ, the benefit δ is given by: δ(α,β,ρ) = ent(π,β) ent(π ρ,β) cost(ρ) Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

α = (+1 +2 4 3 +5) β = (+1 +2 +3 +4 +5) Entropy: How far each element in α is from its position in beta. ent(α,β) = n p(α,i) p(β,i).

18 Greedy Randomized Search Procedure Local Search We start the construction of new frame that will starts with α and ends with β. We select inversions that decrease the number of breakpoints. When no inversion of the kind exists, we execute one step of the algorithm proposed by Bergeron, 2006 for the Sorting by Inversions Problem. Building a Frame α... β Benefit Inversion Inv 1 Inv 2... Inv 5 Inv 6 Inv 7... Five best scored permutations Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

When no inversion of the kind exists, we execute one step of the algorithm proposed by Bergeron, 2006 for the Sorting by Inversions

19 Greedy Randomized Search Procedure Local Search We rank the selected inversions based on the benefit. The inversions ranked as high as fifth qualify to the second phase. We choose one inversion based on a random process called roulette wheel selection mechanism. Each inversion has a selection likelihood proportional to the square of its benefit. Building a Frame α... β Benefit Inversion Inv 1 Inv 2... Inv 5 Inv 6 Inv 7... Five best scored permutations Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

We choose one inversion based on a random process called roulette wheel selection mechanism.

20 Part IV Experimental Results Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

21 Experimental Results Parameters Frame Size: < 14,12,10,8,6,4 > Iterations: 900 (150 for each frame size). Instances: we generated 1000 permutations whose sizes range from 10 to 100 in intervals of 5. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

22 Experimental Results Number of Improved Initial Solutions 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Permutation size Percentage of Times our Heuristic Improved the Initial Solution Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

23 Experimental Results 15% Average Improvement 12% 9% 6% 3% 0% Permutation size Percentage of Improvement on the Initial Solution Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

24 Experimental Results Average Cost Swidan GRASP 0 GRIMM Permutation Size Comparative Analysis: Average Cost Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

25 Part V Conclusions Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

26 Conclusions We presented a new method for the length-weighted inversion problem on signed permutations. We considered the case where the weight function is simply the number of elements in the reversed segment. We were able to improve the initial solution in 94% of the cases. Our solutions cost 12% less than the initial solution, on average. We also show that our method provides solution that are less costly than a previous algorithm. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

27 Acknowledgments Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, / 27

Asexual Versus Sexual Reproduction in Genetic Algorithms 1

Asexual Versus Sexual Reproduction in Genetic Algorithms Wendy Ann Deslauriers (wendyd@alumni.princeton.edu) Institute of Cognitive Science,Room 22, Dunton Tower Carleton University, 25 Colonel By Drive