Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations



Similar documents
Asexual Versus Sexual Reproduction in Genetic Algorithms 1

Worksheet - COMPARATIVE MAPPING 1

Pairwise Sequence Alignment

DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Regents Biology REGENTS REVIEW: PROTEIN SYNTHESIS

Hidden Markov Models

Genetic Algorithm. Based on Darwinian Paradigm. Intrinsically a robust search and optimization mechanism. Conceptual Algorithm

Genetic Algorithms commonly used selection, replacement, and variation operators Fernando Lobo University of Algarve

CCR Biology - Chapter 9 Practice Test - Summer 2012

Human-Mouse Synteny in Functional Genomics Experiment

Bio-Informatics Lectures. A Short Introduction

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Genome Explorer For Comparative Genome Analysis

GA as a Data Optimization Tool for Predictive Analytics

Alpha Cut based Novel Selection for Genetic Algorithm

Biological Sciences Initiative. Human Genome

6 Creating the Animation

Protein Protein Interaction Networks

Bioinformatics Resources at a Glance

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

1 Mutation and Genetic Change

MUTATION, DNA REPAIR AND CANCER

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Introduction To Genetic Algorithms

Genetic Algorithm Performance with Different Selection Strategies in Solving TSP

Hybrid Genetic Algorithm for DNA Sequencing with Errors

College of information technology Department of software

CPO Science and the NGSS

MAKING AN EVOLUTIONARY TREE

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

A Non-Linear Schema Theorem for Genetic Algorithms

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Volume 3, Issue 2, February 2015 International Journal of Advance Research in Computer Science and Management Studies

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Comparative Study: ACO and EC for TSP

GenBank, Entrez, & FASTA

Integer Programming: Algorithms - 3

Next Generation Sequencing: Technology, Mapping, and Analysis

Genetomic Promototypes

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

Evolutionary SAT Solver (ESS)

DnaSP, DNA polymorphism analyses by the coalescent and other methods.

ISSN: ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 2, Issue 3, May 2013

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE

Management Science Letters

Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

Lecture 19: Proteins, Primary Struture

Modified Version of Roulette Selection for Evolution Algorithms - the Fan Selection

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

New Modifications of Selection Operator in Genetic Algorithms for the Traveling Salesman Problem

Effect of Using Neural Networks in GA-Based School Timetabling

Principles of Evolution - Origin of Species

Evolutionary Detection of Rules for Text Categorization. Application to Spam Filtering

A Review And Evaluations Of Shortest Path Algorithms

A Framework for Genetic Algorithms in Games

Randomized Sorting as a Big Data Search Algorithm

Year 10: The transmission of heritable characteristics from one generation to the next involves DNA

Evolution (18%) 11 Items Sample Test Prep Questions

Bob Jesberg. Boston, MA April 3, 2014

TOWARD BIG DATA ANALYSIS WORKSHOP

Introduction to Bioinformatics 3. DNA editing and contig assembly

Practical Applications of Evolutionary Computation to Financial Engineering

LECTURE 6 Gene Mutation (Chapter )

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

RNA Structure and folding

Offline sorting buffers on Line

Terms: The following terms are presented in this lesson (shown in bold italics and on PowerPoint Slides 2 and 3):

Replication Study Guide

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

A greedy algorithm for the DNA sequencing by hybridization with positive and negative errors and information about repetitions

An ACO/VNS Hybrid Approach for a Large-Scale Energy Management Problem

Comparison of Major Domination Schemes for Diploid Binary Genetic Algorithms in Dynamic Environments

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

Network Protocol Analysis using Bioinformatics Algorithms

A Genetic Algorithm Processor Based on Redundant Binary Numbers (GAPBRBN)

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

A Service Revenue-oriented Task Scheduling Model of Cloud Computing

HYBRID GENETIC ALGORITHM PARAMETER EFFECTS FOR OPTIMIZATION OF CONSTRUCTION RESOURCE ALLOCATION PROBLEM. Jin-Lee KIM 1, M. ASCE

Fact Sheet 14 EPIGENETICS

Original Article Efficient Genetic Algorithm on Linear Programming Problem for Fittest Chromosomes

New binary representation in Genetic Algorithms for solving TSP by mapping permutations to a list of ordered numbers

Soft-Computing Models for Building Applications - A Feasibility Study (EPSRC Ref: GR/L84513)

Lab 4: 26 th March Exercise 1: Evolutionary algorithms

Chapter 6 DNA Replication

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

The Applications of Genetic Algorithms in Stock Market Data Mining Optimisation

AP Biology Essential Knowledge Student Diagnostic

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL

Transcription:

Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute of Computing University of Campinas Campinas, SP, Brazil thiago.arruda@students.ic.unicamp.br Ulisses Dias Institute of Computing University of Campinas Campinas, SP, Brazil udias@ic.unicamp.br Zanoni Dias Institute of Computing University of Campinas Campinas, SP, Brazil zanoni@ic.unicamp.br Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 1 / 27

Outline Genome Rearrangement Field Length-Weighted Inversions Greedy Randomized Search Procedure Experimental Results Conclusions Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 2 / 27

Part I Genome Rearrangement Field Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 3 / 27

Genome Rearrangement Field Genome Rearrangements Genes are shared by genomes from different species. Given two contemporary genomes, gene order and orientation may differ. 4500000 Yesrsinia pestis Nepal 516 4000000 3500000 3000000 2500000 2000000 1500000 1000000 500000 0 0 500000 1000000 1500000 2000000 2500000 3000000 4500000 4000000 3500000 Yesrsinia pestis Angola Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 4 / 27

Genome Rearrangement Field Genome Rearrangements Genomes undergo large scale mutations during the evolutionary process (move DNA-sequence from one place to the other). An inversion occurs when a chromosome breaks at two locations called breakpoints, and the DNA between the breakpoints is reversed. Pseudomonas putida KT2440 6000000 5000000 4000000 3000000 2000000 1000000 0 0 1000000 2000000 3000000 4000000 5000000 Pseudomonas putida F1 Inversion 0... i-1 i... j j+1... n ρ(i,j) 0... i-1 -j... -i j+1... n Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 5 / 27

Genome Rearrangement Field Evolutionary Distance The genetic data available for many organisms allows accurate evolutionary inference. The traditional approach uses nucleotide (or amino acid) comparison to find the edit distance. When mutational events (like inversions) affect very large stretches of DNA sequence? Whole-Genome Distance Measure. Compute the minimum number of lage-scale events needed to transform one genome into the other (parsimony criterion). Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 6 / 27

Genome Rearrangement Field Evolutionary Distance Regarding the computation of parsimonious scenarios when only inversions are considered. A polynomial problem (Hannenhali and Pevzner, 1998). GRIMM is the most used tool for this job. Several studies have shown that inversions are shorter than expected under a neutral model. The sorting by inversion problem do not take into account the length of the reversed sequence. The sequence of operations that most likely happened during the evolution may not involve the movement of many long sequences. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 7 / 27

Part II Length-Weighted Inversions Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 8 / 27

Length-Weighted Inversions Genome Representation We regard genomes as permutations: π = (π 1 π 2... π n ), for π i Z, 1 π i n and i j π i π j. Same Orientation Different Orientation Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 9 / 27

Length-Weighted Inversions Sorting by Inversions Problem Given the gene order of two contemporary genomes, the task of finding the minimum number of inversions which transform one genome into the other is called Sorting by Inversion Problem -5 +2-3 +1 +4-5 +2-1 +3 +4-4 -3 +1-2 +5 +2-1 +3 +4 +5-2 -1 +3 +4 +5 +1 +2 +3 +4 +5 Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 10 / 27

Length-Weighted Inversions Sorting by Length-Weighted Inversions Problem -5 +2-3 +1 +4 cost = 2-5 +2-1 +3 +4-4 -3 +1-2 +5 +2-1 +3 +4 +5-2 -1 +3 +4 +5 +1 +2 +3 +4 +5 cost = 5 cost = 4 cost = 1 cost = 2 Total = 14 Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 11 / 27

Part III Greedy Randomized Search Procedure Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 12 / 27

Greedy Randomized Search Procedure Building Blocks Initial Solution Neighborhood Local Search Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 13 / 27

Greedy Randomized Search Procedure Building Blocks Initial Solution A solution is a sequence of permutations s =< s 0,s 1,...,s m > such that s k differs from s k 1 by one inversion, 1 k m, s 0 = π and s m = ι. s =< ( 5 +2 3 +1 +4),( 5 +2 1 +3 +4),( 4 3 + 1 2 +5),(+2 1 +3 +4 +5),(+1 +2 +3 +4 +5) >. We use an optimal solution for the Sorting by Inversions Problem as Initial Solution. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 14 / 27

Greedy Randomized Search Procedure Building Blocks Neighborhood Let s =< s 0,s 1,...,s m > be the current solution. Another solution s =< s 0,s 1,...,s m > is in the neighborhood of s, namely N f (s), if they differ by a frame that has no more than f elements. j - i +1 = f Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 15 / 27

Greedy Randomized Search Procedure Building Blocks Local Search Our method iteratively improves the current solution. Each step requires a local change. The local change will be restricted to a given frame choosen by random. If a less costly sequence is found, it is made the current solution. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 16 / 27

Greedy Randomized Search Procedure Local Search - Definitions Let < s i,...,s j > be the frame where the change will be performed. We will call s i = α and s j = β. Breakpoint: pair of elements that are consecutive in α but not consecutive in β. α = (+1 +2 4 3 +5) β = (+1 +2 +3 +4 +5) Entropy: How far each element in α is from its position in beta. ent(α,β) = n p(α,i) p(β,i). i=1 Benefit: for any inversion ρ, the benefit δ is given by: δ(α,β,ρ) = ent(π,β) ent(π ρ,β) cost(ρ) Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 17 / 27

Greedy Randomized Search Procedure Local Search We start the construction of new frame that will starts with α and ends with β. We select inversions that decrease the number of breakpoints. When no inversion of the kind exists, we execute one step of the algorithm proposed by Bergeron, 2006 for the Sorting by Inversions Problem. Building a Frame α... β Benefit Inversion Inv 1 Inv 2... Inv 5 Inv 6 Inv 7... Five best scored permutations Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 18 / 27

Greedy Randomized Search Procedure Local Search We rank the selected inversions based on the benefit. The inversions ranked as high as fifth qualify to the second phase. We choose one inversion based on a random process called roulette wheel selection mechanism. Each inversion has a selection likelihood proportional to the square of its benefit. Building a Frame α... β Benefit Inversion Inv 1 Inv 2... Inv 5 Inv 6 Inv 7... Five best scored permutations Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 19 / 27

Part IV Experimental Results Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 20 / 27

Experimental Results Parameters Frame Size: < 14,12,10,8,6,4 > Iterations: 900 (150 for each frame size). Instances: we generated 1000 permutations whose sizes range from 10 to 100 in intervals of 5. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 21 / 27

Experimental Results Number of Improved Initial Solutions 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Permutation size Percentage of Times our Heuristic Improved the Initial Solution 14 12 10 8 6 4 Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 22 / 27

Experimental Results 15% Average Improvement 12% 9% 6% 3% 0% 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Permutation size Percentage of Improvement on the Initial Solution 14 12 10 8 6 4 Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 23 / 27

Experimental Results Average Cost 1800 1600 1400 1200 1000 800 600 400 200 Swidan GRASP 0 GRIMM 10 20 30 40 50 60 70 80 90 100 Permutation Size Comparative Analysis: Average Cost Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 24 / 27

Part V Conclusions Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 25 / 27

Conclusions We presented a new method for the length-weighted inversion problem on signed permutations. We considered the case where the weight function is simply the number of elements in the reversed segment. We were able to improve the initial solution in 94% of the cases. Our solutions cost 12% less than the initial solution, on average. We also show that our method provides solution that are less costly than a previous algorithm. Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 26 / 27

Acknowledgments Thiago da Silva Arruda, Ulisses Dias and Zanoni Dias AlCoB 2014 July 1-3, 2014 27 / 27