Physical map of the wheat chromosome arm 3DS Jan Bartoš

Similar documents
Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Introduction to Bioinformatics 3. DNA editing and contig assembly

restriction enzymes 350 Home R. Ward: Spring 2001

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Recombinant DNA and Biotechnology

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Next Generation Sequencing: Technology, Mapping, and Analysis

An Overview of DNA Sequencing

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Answer Key Problem Set 5

DNA Technology Mapping a plasmid digesting How do restriction enzymes work?

Next Generation Sequencing

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

RESTRICTION DIGESTS Based on a handout originally available at

Introduction to NGS data analysis

Introduction to next-generation sequencing data

Commonly Used STR Markers

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

G E N OM I C S S E RV I C ES

DNA Sequencing & The Human Genome Project

TruSeq Custom Amplicon v1.5

Data Analysis for Ion Torrent Sequencing

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

NGS data analysis. Bernardo J. Clavijo

Arabidopsis. A Practical Approach. Edited by ZOE A. WILSON Plant Science Division, School of Biological Sciences, University of Nottingham

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource

The Human Genome Project

European Medicines Agency

Analysis of ChIP-seq data in Galaxy

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Version 5.0 Release Notes

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Geospiza s Finch-Server: A Complete Data Management System for DNA Sequencing

Overview of Next Generation Sequencing platform technologies

Searching Nucleotide Databases

TOOLS FOR T-RFLP DATA ANALYSIS USING EXCEL

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

De Novo Assembly Using Illumina Reads

2 The Human Genome Project

How Sequencing Experiments Fail

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

Welcome to Pacific Biosciences' Introduction to SMRTbell Template Preparation.

How is genome sequencing done?

Human-Mouse Synteny in Functional Genomics Experiment

Becker Muscular Dystrophy

Reading DNA Sequences:

RNA- seq de novo ABiMS

PreciseTM Whitepaper

Innovations in Molecular Epidemiology

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Centre of the Region Haná for Biotechnological and Agricultural Research. The region Haná

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Automated Lab Management for Illumina SeqLab

CCR Biology - Chapter 9 Practice Test - Summer 2012

DNA Sequence Analysis

A Primer of Genome Science THIRD

Computational Genomics. Next generation sequencing (NGS)

DNA Sequencing Handbook

STRATEGIES FOR THE SYSTEMATIC SEQUENCING OF COMPLEX GENOMES

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

GnpIS: an information system for plant breeding

What is a contig? What are the contig assembly programs?

Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques

History of DNA Sequencing & Current Applications

E. coli plasmid and gene profiling using Next Generation Sequencing

Comparing Methods for Identifying Transcription Factor Target Genes

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Health Data Analytics and Decision Support Prof.Dr. Bart De Moor

FOR REFERENCE PURPOSES

Step by Step Guide to Importing Genetic Data into JMP Genomics

July 7th 2009 DNA sequencing

Genetic Technology. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Forensic DNA Testing Terminology

Mitochondrial DNA Analysis

SEQUENCING. From Sample to Sequence-Ready

An FPGA Acceleration of Short Read Human Genome Mapping

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Genomics GENterprise

Workshop on Methods for Isolation and Identification of Campylobacter spp. June 13-17, 2005

AP BIOLOGY 2007 SCORING GUIDELINES

Single Nucleotide Polymorphisms (SNPs)

Genomic Services and Development Unit User Manual

Frequently Asked Questions Next Generation Sequencing

Transfection-Transfer of non-viral genetic material into eukaryotic cells. Infection/ Transduction- Transfer of viral genetic material into cells.

Genome Sequencing. Phil McClean September, 2005

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

Next Generation Sequencing data Analysis at Genoscope. Jean-Marc Aury

Software Getting Started Guide

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

The Galaxy workflow. George Magklaras PhD RHCE

The Techniques of Molecular Biology: Forensic DNA Fingerprinting

Transcription:

Physical map of the wheat chromosome arm 3DS Jan Bartoš Centre of Region Haná for Biotechnological and Agricultural Research Institute of Experimental Botany Šlechtitelů 31 783 71 Olomouc - Holice

Wheat chromosome arm 3DS Chromosome arm 3DS characteristics Estimated size 321 Mbp Less than 2% of wheat genome Low level of polymorphism in D genome I III Important genes Ph2 locus (pairing homologs) 3DS 3DL II Yr49 (yellow rust resistance) 3B Relative fluorescence intensity

3DS physical map BAC library and fingerprinting 36,864 clones 11x chromosome coverage 27,880 useful fingerprints Automated assembly FPC based Following IWGSC rules Cut-off: 1e-75 => 1e-45 Automated assembly Cut-off 1e-45 Contigs 1,360 Q-clones 282 Assembly length (Mb) 310 (97%) Longest contig (kb) 1,092 N50 contig length (kb) 244 MTP (clones) 3,823

Contig size (kb) 3DS physical map Manual assembly FPC based Cut-off: 1e-45 => 1e-15 Correction using LTC Distribution of contig sizes 1800 1600 1400 1200 1000 800 600 400 200 0 0 100 200 300 400 500 600 700 800 900 Automated assembly Manual assembly Cutoff 1e-45 1e-15 Contigs 1,360 918 Q-clones 282 499 Assembly length (Mb) 310 (97%) 278 (87%) Longest contig (kb) 1,092 1,870 N50 contig length (kb) 244 412 MTP (clones) 3,823 ---

In silico anchoring workflow MTP definition (FPC) Markers e.g. IWGSC survey sequence MTP 3-D pool sequencing Read mapping to marker sequences (bwa) Resolving unique reads quality control BAC address deconvolution Positive pool detection (for each seq)

MTP pool sequencing MTP 3,823 clones Fifty 3D MTP pools (10 plates, 16 rows, 24 columns) Pools of each dimensions sequenced as indexed libraries on Illumina HiSEQ 367,907,030 reads (2 x 100 bp) Unequal pool coverage 6 166x (mean 35x; median 23.5x)

Read mapping to marker sequences Reference sequence IWGSC 3DS survey sequence 314,944 sequences Total length 145,374,274 bp (45% of chromosome arm) Read alignment Using Burrows-Wheeler aligner Reads of each pool renamed to track their origin Maximal coverage 30x/pool Plate 01 Plate 02 Row A Row B Column 01 Column 02 Reference sequence sequence_1 sequence_2 sequence_3 sequence_4

No. sequences (x 1,000) Positive pool identification Only reads mapped to unique position with no mismatch used Positive pools identified individually for each sequence Aligned reads counted for each pool Number of aligned reads normalized by pool coverage Pool positive if normalized read number 20% of average for pools with at least one aligned read At least 1 plate, 1 row and 1 column pool for 258,146 seqs 35% 30% 25% 20% 15% 10% 5% 0% 70 60 50 40 30 20 10 0 Reads aligned at unique positions 0 1 2 3 4 5 6 7 8 9 10 11 12 >13 Positive pools

BAC address deconvolution 1) One positive pool in each dimension (1 1 1) Direct BAC clone identification Plate07 RowC Column18 --> TaaCsp3DShA_0055B07 2) Multiple positive pools in at least one dimension (e.g. 2 2 2) Identification of all candidate BAC clones a) Check contig information for all clones b) Check possible overlap in case of end clones 3) Sequence not anchored if: a) Positive pool is missing for plate, row or column. b) Five or more positive pools in at least one dimension. c) No positive clone was found in step 2).

Anchoring results Anchored 184,880 sequences 58.7% survey sequences 96,784,747 bp anchored 66.6% of survey sequence length 30.2% estimated arm length 878 contigs with at least one sequence 1 2,514 sequences per contig Anchored sequences 3% 34% 63% type_1 type_2a type_2b

Contig size (kb) Contig size (kb) Analysis of anchored sequences DArT 194 DArTs identified in 182 sequences 125 contigs anchored by DArT markers (1 6 markers/contig) 184,880 anchored sequences Gene fragments 1,906 gene models/fragments identified in 3DS survey sequences 1,408 (73.9%) genes/fragment anchored (by 1,372 sequences) 377 contigs contain at least one gene (1 24/contig) 793 organized using GenomeZipper approach 291 contigs anchored by GenomeZipper (1-17 gene fragments/contig) 319 contigs anchored - 53.4% of physical map length 1800 1600 1400 1200 1000 800 600 400 200 0 0 100 200 300 400 500 600 700 800 900 1800 1600 1400 1200 1000 800 600 400 200 0 anchored contigs 0 100 200 300 400 500 600 700 800 900

Contig size (kb) Analysis of anchored sequences Repeat junctions IsbpFinder used to identify repeat junctions (potential ISBP markers) 24,517 TE insertions with preserved ends were found in 3DS survey sequences 17,684 (72.1%) ISBPs anchored to contigs (in 13,870 sequences) Up to 232 ISBPs in one contig 652 contigs (85.6% of physical map length) have at least one insertion site 184,880 anchored sequences 1800 1600 1400 1200 1000 800 600 400 200 0 ISBP contigs 0 100 200 300 400 500 600 700 800 900

Quality control DArT 40 contigs with more than one DArT 74% same or close position on DArT map GenomeZipper 192 contigs with more than one gene fragment 70% neighbour positions on GenomeZipper Hmmm Anchoring error rate is overestimated Additional sources of error BAC contig miss-assembly Genetic mapping of DArT markers Incorrect position of gene fragment in GenomeZipper

3DS015A12 Contig miss-assembly is significant source of error

Physical localization of gene fragments at identical GenomeZipper position 178 GenomeZipper positions with multiple gene fragments For 161 (90.5%) fragments have identical position

Additional assembly improvement 6,362 sequences of anchoring type 2b) could be used to merge contigs Sequences anchored to clones in different contigs Match of the clones at e-10

Conclusion We developed protocol for high-throughput contig anchoring 66% of survey sequence (97 Mbp) anchored to physical map 74% genes identified in survey sequences localized in BAC clones 53% of the physical map organized through anchoring to DArT genetic map and 3DS GenomeZipper Future perspective Additional validation of results (including wet lab) Cleaning and integration of ISBP markers, polymorphism identification within CS x Renan population Sequencing of 3,823 clones of MTP

Acknowledgement Jaroslav Doležel Kateřina Cvikova Jan Šafář Hana Šimková Andrzej Killian Federica Cattonaro Michael Alaux