UN EXEMPLE D INFÉRENCES BAYÉSIENNES

Similar documents
Pérégrination d un modèle alose au rythme d une méthode ABC

BAPS: Bayesian Analysis of Population Structure

BayeScan v2.1 User Manual

Biology Notes for exam 5 - Population genetics Ch 13, 14, 15

Summary Genes and Variation Evolution as Genetic Change. Name Class Date

Conservation genetics in Amentotaxus formosana

Seed4C: A Cloud Security Infrastructure validated on Grid 5000

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

REVIEWS. Computer programs for population genetics data analysis: a survival guide FOCUS ON STATISTICAL ANALYSIS

Lecture 10 Friday, March 20, 2009

PRINCIPLES OF POPULATION GENETICS

Basic Principles of Forensic Molecular Biology and Genetics. Population Genetics

Documentation for structure software: Version 2.3

Forensic DNA Testing Terminology

Genetics and Evolution: An ios Application to Supplement Introductory Courses in. Transmission and Evolutionary Genetics

Principles of Evolution - Origin of Species

DNA for Defense Attorneys. Chapter 6

Cloud Ready for Bioinformatics?

AP Biology Essential Knowledge Student Diagnostic

Paola Guzzo. The influence of French feminism on English feminism during the XXst century

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy Page 1 of 8

GC3 Use cases for the Cloud

Development of Lygus Management Strategies for Texas Cotton

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

Deterministic computer simulations were performed to evaluate the effect of maternallytransmitted

Genetic evidence reveals temporal change in hybridization

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

Introduction au BIM. ESEB Seyssinet-Pariset Economie de la construction contact@eseb.fr

Big Data and Cloud Computing for GHRSST

Gene Mapping Techniques

LRmix tutorial, version 4.1

Microsatellite variation and population structure of a recovering Tree frog (Hyla arborea L.) metapopulation

CRIME SCENE INVESTIGATION THROUGH DNA TRACES USING BAYESIAN NETWORKS

GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015

Grid Computing Perspectives for IBM

Bayesian coalescent inference of population size history

CHAPTER 6 ANTIBODY GENETICS: ISOTYPES, ALLOTYPES, IDIOTYPES

Understanding by Design. Title: BIOLOGY/LAB. Established Goal(s) / Content Standard(s): Essential Question(s) Understanding(s):

Contents. Dedication List of Figures List of Tables. Acknowledgments

E-SCIENCE IN WESTERN FRANCE : THE BEGINNING

URGI and ELIXIR France for plants and food

NatureServe Canada. Towards a Quebec Biodiversity Observation Network. Douglas Hyde Executive Director

Modelling neutral agricultural landscapes with tessellation methods: the GENEXP-LANDSITES software - Application to the simulation of gene flow

The SIST-GIRE Plate-form, an example of link between research and communication for the development

Language Maintenance and Transmission: The Case of Cajun French

Genomics GENterprise

INTRODUCING A SECTORIAL FRAMEWORK TO BETTER EVALUATE LIFE CYCLE THINKING MATURITY

Seeing Faces and History through Human Genome Sequences

Présentation de la KIC Raw Materials

UCLG POLICY PAPER ON URBAN STRATEGIC PLANNING INPUTS FROM THE CITIES

Evolution, Natural Selection, and Adaptation

Transboundary river contract Semois/Semoy in France and Wallonia

Session 1 Somerset Wildlife Trust Marais de Redon et de Vilaine

Grades 3-5. Benchmark A: Use map elements or coordinates to locate physical and human features of North America.

MapReduce Détails Optimisation de la phase Reduce avec le Combiner

Detecting the number of clusters of individuals using

Dynamic Case-Based Reasoning Based on the Multi-Agent Systems: Individualized Follow-Up of Learners in Distance Learning

Ricardo-AEA. AirDART. Air quality Data Analysis & Retrieval Tool. DMUG, 9 th December Andrea Fraser.

Mechanisms of Evolution

App coverage. ericsson White paper Uen Rev B August 2015

Institut Français de Bioinformatique, Un Cloud pour les Sciences du Vivant

DevOps with Containers. for Microservices

Security Requirements Analysis of Web Applications using UML

New challenges for research in biodiversity and ecology of micro- and macro-algae

A Hands-On Exercise To Demonstrate Evolution

Summer School: New challenges for research in biodiversity and ecology of micro- and macro-algae. Date: from 27/11 to 06/12/2015,

IMWA 2010 Sydney, Nova Scotia Mine Water & Innovative Thinking. 3 main divisions : Slide 2 DCO - 08/09/ titre - 2. Context

Stochastic control for underwater optimal trajectories CQFD & DCNS. Inria Bordeaux Sud Ouest & University of Bordeaux France

Approximating the Coalescent with Recombination. Niall Cardin Corpus Christi College, University of Oxford April 2, 2007

Eclipse Exam Tutorial - Pros and Cons

An In-Context and Collaborative Software Localisation Model: Demonstration

EIT ICT Labs Information & Communication Technology Labs

Two-locus population genetics

Using the Coherence Cloud Service

Extensive Cryptic Diversity in Indo-Australian Rainbowfishes Revealed by DNA Barcoding

Offsets and biodiversity conservation in France: limits and perspectives

PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015

Annex to the Accreditation Certificate D-PL according to DIN EN ISO/IEC 17025:2005

An Energy-aware Multi-start Local Search Metaheuristic for Scheduling VMs within the OpenNebula Cloud Distribution

Composable cost estimation and monitoring for computational applications in cloud computing environments

Development and application of molecular and bioinformatic tools for the genetic monitoring of beets

Energy efficiency in HPC :

How many of you have checked out the web site on protein-dna interactions?

P. ramorum diagnostics - update. USDA APHIS PPQ CPHST March, 2006

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)

Big Data Management in the Clouds and HPC Systems

1. 2. OUR MISSION SERVIR L AVENIR * OUR EXPERTISE ACCOMPANYING ENTREPRENEURS OUR STRATEGY BRINGING TOGETHER THE BEST OF THE PUBLIC AND PRIVATE SECTORS

Data mining and official statistics

D où vient le plomb? :

PolyLens: Software for Map-based Visualization and Analysis of Genome-scale Polymorphism Data

Basic Concepts Recombinant DNA Use with Chapter 13, Section 13.2

ATP Co C pyr y ight 2013 B l B ue C o C at S y S s y tems I nc. All R i R ghts R e R serve v d. 1

Extinction; Lecture-8

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

2nd Singapore Heritage Science Conference

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

Lab 2/Phylogenetics/September 16, PHYLOGENETICS

Transcription:

Journées SUCCES France Grilles Institut de Physique du Globe de Paris 5 et 6 novembre 2015 UN EXEMPLE D INFÉRENCES BAYÉSIENNES EN GÉNÉTIQUE DES POPULATIONS, LE CAS DU CRAPAUD CALAMITE (EPIDALEA CALAMITA) DANS LE NORD DE LA FRANCE Leslie Faucher, Sophie Gallina & Jean-François Arnaud Université de Lille Sciences & Technologies 1

Introduction Material & Methods Results Conclusion Perspectives North of France (Nord Pas De Calais district) Mining area Three centuries (XVIIIth to XXth century) of coal extraction Contrasting landscape shaped by the human activity from the coastline to the mining area. 2

Introduction Material & Methods Results Conclusion Perspectives Ponds in the top of the slap heap or in its slope By contrast, mining activity had created new open field habitats suitable for pioneering species 3

Introduction Material & Methods Results Conclusion Perspectives Natterjack toad (Epidalea calamita) Pioneer species High patrimonial interest Protection status Found in the «Nord Pas de Calais» district in coastal natural but fragmented habitats and more recently in new habitats within the mining area 4

Introduction Material & Methods Results Conclusion Perspectives Landscape fragmentation isolation of populations Decrease of levels of gene flow Increase genetic drift Decrease of adaptative capacity Loss of intraspecific genetic diversity Higher inbreeding Reduction of population size It questions the probability of species persistence Fragmentation is a major factor implicated in population declines (e.g. Fischer & Lindenmayer 2007) 5

Introduction Material & Methods Results Conclusion Perspectives What is population genetics? Genetic : Study of the transmission of inherited information Discipline qui étudie la transmission de l'information héréditaire et son utilisation dans le développement et le fonctionnement des organismes 6

Introduction Material & Methods Results Conclusion Perspectives What is population genetics? Population Genetics : How and why genetic information evolve over time [and space] within species and populations? Discipline qui étudie la transmission de l'information héréditaire et son utilisation dans le développement et le fonctionnement des organismes 7

Introduction Material & Methods Results Conclusion Perspectives What is population genetics? Main goal : estimating the reproductive system, migration rates from the information carried by neutral genetic Discipline qui étudie la transmission de diversity l'information héréditaire et son utilisation dans In our le case développement et le fonctionnement des organismes observed estimated m 1 m 2 Analysing dispersal movements over time and space through the spatial distribution of neutral genetic variability in populations 8

Introduction Material & Methods Results Conclusion Perspectives Main question : do coastal and inland populations differ in terms of genetic structuring? Are there contrasting patterns of inbreeding and/or level of genetic diversity between these two areas? Where do the mining area populations come from? Have these populations been founded by accidental human-mediated dispersal events during mining exploitation? 9

Introduction Material & Methods Results Conclusion Perspectives How do we proceed? 10

Introduction Material & Methods Results Conclusion Perspectives SAMPLING DNA extractions 11

Latitude 50.0 50.2 50.4 50.6 50.8 51.0 51.2 Introduction Material & Methods Results Conclusion Perspectives SAMPLING 5 25 45 Sample size 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Longitude 959 natterjack toads sampled for 72 ponds (29 along the coastline and 43 in the mining area) 12

Introduction Material & Methods Results Conclusion Perspectives MOLECULAR MARKERS 37 nuclear microsatellite loci (15 already published, 22 recently isolated) 13

Introduction Material & Methods Results Conclusion Perspectives CLUSTERING BAYÉSIEN : The method attempts to assign individuals to clusters on the basis of their multilocus genotype, while simultaneously estimating cluster allele frequencies K cluster : each of which is characterized by a set of allele frequencies at each locus. Assignation of individuals to one cluster (or jointly to two or more clusters if their genotypes indicate that they are admixed). Within clusters, the loci are at Hardy-Weinberg equilibrium, and linkage equilibrium. STRUCTURE (Pritchard et al. 2000) 30 replicates K = from 2 to total number of populations sampled 14

Introduction Material & Methods Results Conclusion Perspectives TECHNICAL DETAILS : Grid Infrastructure VO Biomed Send job for structure program Get output Server EEP Results synthesis and analysis 15

Introduction Material & Methods Results Conclusion Perspectives What did we observed? 16

LnP(D) Introduction Material & Methods Results Conclusion Perspectives Bayesian clustering analysis (Pritchard et al. 2000) DeltaK- 72 northern populations and 32 loci -60000-62000 3000-64000 2000 K -66000 1000-68000 -70000 0 0 10 20 30 40 50 60 70 Mean population probabilities of membership to belong to K=2 genetic clusters 17

Latitude 50.0 50.2 50.4 50.6 50.8 51.0 51.2 Introduction Material & Methods Results Conclusion Perspectives Bayesian clustering analysis (Pritchard et al. 2000) Sample size 2 15 50 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Longitude Mean population probabilities of membership to belong to K=2 genetic clusters 18

Latitude 50.0 50.2 50.4 50.6 50.8 51.0 51.2 Introduction Material & Methods Results Conclusion Perspectives Bayesian clustering analysis (Pritchard et al. 2000) Substructuring in inland populations with a few admixed populations 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Longitude Clear genetic distinctiveness of coastline and inland populations 19

Introduction Material & Methods Results Conclusion Perspectives Bayesian clustering analysis (Pritchard et al. 2000) Number of K clusters tested 859 individus Number of replicats Number of Realised job Loci Kmax rep1 rep2 rep3 jobs hours days months years Time needed 35locus_68pop- 35 68 40 35 30 2380 71400 2975 99.17 8.15 35locus_44pop- 35 44 40 35 30 1540 30800 1283.33 42.78 3.52 35locus_24pop 35 24 40 35 30 840 5880 245 8.17 0.67 26locus_TOTAL 26 85 40 35 30 2975 8925 371.87 12.40 1.02 Need of high throughput computation 20

Latitude 50.0 50.2 50.4 50.6 50.8 51.0 51.2 Latitude 50.0 50.2 50.4 50.6 50.8 51.0 Introduction Material & Methods Results Conclusion Perspectives Coastal populations: Isolation by distance through a classical stepping stone model -0.3-0.1 0.1 0.3 following a scenario of post-glacial recolonisation 4.0 4.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Eigenvalues Longitude Synthetic map of the first three global scores 4.0 4.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Inland populations: Higher observed levels of genetic admixture and levels of genetic diversity suggest a metapopulation structure with multiple introductions 21

Introduction Material & Methods Results Conclusion Perspectives Tracing back the history of colonization of the coal basin: preliminary results 22

Introduction Material & Methods Results Conclusion Perspectives Tracing back the history of colonization of the coal basin: preliminary results Synthetic map of the first two global scores.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Longitude Populations from the mining area may have stronger genetic affinities with continental inland populations 23

Introduction Material & Methods Results Conclusion Perspectives PERSPECTIVES: Test different evolutionary scenarios : End of the last glaciation (-11 000 BP ) Scenario 1 Scenario 2 Scenario 3 t3 t2 1-m m t1 t0 Actual Na N1 N2 N3 N4 N5 Several parameters and even more scenarios which one is the best? 24

Introduction Material & Methods Results Conclusion Perspectives We plan to proceed to Approximate Bayesian Computation (ABC) analysis ABC analysis workflow Ressources needed: Pipeline using many tools, with dependences for example library python numpy Perform simulations for calibrations very long task (~2-4 weeks) incompatible with grid usage Perform simulations Perform likelihood free MCMC chain need the use of virtual machines on cloud Estimate posterior density ABCtoolbox manual, Wegmann et al., 2009 25

Introduction Material & Methods Results Conclusion Perspectives We are currently setting up an appliance with pipeline Virtual Machines using this appliance will be deployed on three clouds Institut Français de Bioinformatique (IFB) Paris (3000 cores) Univ Lille - IFB & France Grilles (FG) plateform - Lille (320 cores) IPHC, FG plateform - Strasbourg (176 cores) It will allow us to deploye each different scenarios on one virtual machine 26

Thank for your attention! Many thanks to all the people who participated to the sampling, Conservatoire d Espaces Naturels du Nord-Pas de Calais, CPIE Chaine des terrils, Groupement ornithologique et naturaliste du Nord, Julie Jacquiery (Université de Rennes), Baptiste Faure (Biotop), Conservatoire des Espaces Naturels de Lorraine, Bretagne Vivante, LPO Franche Comté, LPO Lot, Conservatoire d Espaces Naturels de Bourgogne Many thanks to Cécile Godé and Laura Henocq for help in genotyping And many thanks to Geneviève Romier, Jérôme Pansanel at France Grilles, Christophe Blanchet at Institut Français de Bioinformatique and Matthieu Marquillie, Cyrille Toulet at Univ Lille for the computing infrastucture. 27