An example of bioinformatics application on plant breeding projects in Rijk Zwaan



Similar documents
Towards Integrating the Detection of Genetic Variants into an In-Memory Database

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Next Generation Sequencing: Technology, Mapping, and Analysis

Practical Guideline for Whole Genome Sequencing

Introduction to NGS data analysis

How-To: SNP and INDEL detection

-> Integration of MAPHiTS in Galaxy

How Sequencing Experiments Fail

Analysis of NGS Data

School of Nursing. Presented by Yvette Conley, PhD

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

A Primer of Genome Science THIRD

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Genomes and SNPs in Malaria and Sickle Cell Anemia

Custom TaqMan Assays For New SNP Genotyping and Gene Expression Assays. Design and Ordering Guide

Introduction to Bioinformatics 3. DNA editing and contig assembly

Text file One header line meta information lines One line : variant/position

New solutions for Big Data Analysis and Visualization

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

Challenges associated with analysis and storage of NGS data

Next generation sequencing (NGS)

Single Nucleotide Polymorphisms (SNPs)

CloudMap: A Cloud-based Pipeline for Analysis of Mutant Genome Sequences

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

Big Data Healthcare. Fei Wang Associate Professor Department of Computer Science and Engineering School of Engineering University of Connecticut

CCR Biology - Chapter 9 Practice Test - Summer 2012

CHALLENGES IN NEXT-GENERATION SEQUENCING

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

Typing in the NGS era: The way forward!

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

HiSeq Analysis Software v0.9 User Guide

Genomic Testing: Actionability, Validation, and Standard of Lab Reports

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Genetics Module B, Anchor 3

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Accelerating variant calling

SAP HANA Enabling Genome Analysis

Biomedical Big Data and Precision Medicine

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

Research Roadmap for the Future. National Grape and Wine Initiative March 2013

Delivering the power of the world s most successful genomics platform

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Welcome to the Plant Breeding and Genomics Webinar Series

The Human Genome Project. From genome to health From human genome to other genomes and to gene function Structural Genomics initiative

Next Generation Sequencing

Data Analysis for Ion Torrent Sequencing

NGS data analysis. Bernardo J. Clavijo

Module 1. Sequence Formats and Retrieval. Charles Steward

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Commonly Used STR Markers

DNA Sequencing & The Human Genome Project

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

European Medicines Agency

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

SEQUENCING. From Sample to Sequence-Ready

Innovations in Molecular Epidemiology

G E N OM I C S S E RV I C ES

Bioinformatics Unit Department of Biological Services. Get to know us

Next generation DNA sequencing technologies. theory & prac-ce

Workflow. Reference Genome. Variant Calling. Galaxy Format Conversion Groomer. Mapping BWA GATK Preprocess

Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline Workflows

DNA-Analytik III. Genetische Variabilität

Genomic Selection in. Applied Training Workshop, Sterling. Hans Daetwyler, The Roslin Institute and R(D)SVS

Milk protein genetic variation in Butana cattle

Arabidopsis. A Practical Approach. Edited by ZOE A. WILSON Plant Science Division, School of Biological Sciences, University of Nottingham

Overview of Next Generation Sequencing platform technologies

Version 5.0 Release Notes

Simplifying Data Interpretation with Nexus Copy Number

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Bioinformatics Resources at a Glance

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Accelerating Data-Intensive Genome Analysis in the Cloud

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

Biological Sciences Initiative. Human Genome

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

A Complete Example of Next- Gen DNA Sequencing Read Alignment. Presentation Title Goes Here

Comparing Methods for Identifying Transcription Factor Target Genes

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Services. Updated 05/31/2016

How To Find Rare Variants In The Human Genome

Recombinant DNA and Biotechnology

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015

SOP 3 v2: web-based selection of oligonucleotide primer trios for genotyping of human and mouse polymorphisms

What s New in Pathway Studio Web 11.1

Polymorphism of retrotransposons in Bos taurus

UNIVERSITÀ DEGLI STUDI DI SASSARI

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Human Genome Organization: An Update. Genome Organization: An Update

NGS and complex genetics

Core Facility Genomics

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

Transcription:

An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012

Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on the development of high-quality vegetable varieties for professional growers in food-producing horticulture, be that in glasshouses, tunnels or outdoors. Middle position in world top 10 globally-operating vegetable breeding companies Around1900 employees all over the world 1.000 varieties in approximately 30 different vegetable crops Sell seeds in more than 100 different countries all over the world

RZ NGS projects overview. Gene prediction. SNP calling: RNA-seq, DNA. Gene expression analysis. Transcription binding site Prediction (mirna). Protein functional analysis (domain prediction, structural analysis). Orthologue gene prediction.server maintenance.g browser.snp database.species: Tomato Cucumber Melon / water melon Pepper Lettuce

SNP Identification for watermelon Single-nucleotide polymorphism (SNP, pronounced snip) is a DNA sequence variation occurring when a single nucleotide A, T, C or G in the genome (or other shared sequence) differs between members of a biological species or paired chromosomes in an individual. Genetic marker

General information Resources: RZ 900 RZ 901 (parents lines with different Phenotypes) Illumina hiseq 2000 paired end reads Reference assembly version: Watermelon V70 Purposes *Used as genetic markers for genetic mapping. *Used as genetic markers associate with certain genes or phenotypes The interesting SNPs 1. Homozygous SNPs which are different between 2 parents lines 2. Evenly distributed on the Chromosome 3. Can be uniquely amplified by PCR, then proper for SNP validation assey

SNP Identification workflow for watermelon BWA + picard + GATK SNP validation

SNP calling by GATK BWA picard

SNP calling result VCF format scaffold9 721024. T C 752.98. AC=2;AF=0.50;AN=4;BaseQRankSum=1.087;DP=34;Dels=0.00;FS=25.118;HRun=1;HaplotypeScore=1.4816;MQ=59.32;MQ0=0;MQRankSum=- 0.845;QD=41.83;ReadPosRankSum=-0.742;SB=-354.80 GT:AD:DP:GQ:PL 1/1:0,18:18:54.18:794,54,0 0/0:16,0:16:48.16:0,48,716 SNPs selection criteria 1. DP > 10 GT:AD:DP:GQ:PL 1/1:0,18:18:54.18:794,54,0 0/0:16,0:16:48.16:0,48,716 2. AD ratio <=0.2 GT:AD:DP:GQ:PL 1/1:0,18:18:54.18:794,54,0 0/0:16,0:16:48.16:0,48,716 AD ratio sample1: 0/(0+18) < 0.2 alternative homozygous AD ratio sample1: 16/(0+16) > 0.8 reference homozygous 1122778 homozygous SNPs found differing between RZ900 and RZ901

SNPs chromosome distribution

SNP selection Step1. generating maker sequence without flanking SNP Length: 101bp, SNP in middle with 50bp flanking marker Format: >scaffold1_10477 TAGTACATTTCTATTATTCAACTGTGAGTTATTTTCGAAGTTTTATTAAT[T/G]TTCGTTTTTTATTTATAACTTTCAATTAATTAGAAAAATAGTAAAAACT Excluding the markers with flanking SNPs >scaffold1_13888 AAATATTTTTAAATATAAGAAAGTGTCATTGTTTATCAATAATAGACACT[G/A]ATGGATAAAnATTTTGTTATGTTTGTAACTATTTTGGTTTATTGCTGT

SNP selection Step2. Marker selection Doing the Blast of markers against v1 genome Excluding the repetitive marker With 100% identical hits in v1 genome database

SNP selection Step 1 Step 2

SNPs Calling Reads preprocess & alignment BWA Samtools SNP calling & genotyping GATK Picard Last step selection 840 SNPs 11 Chr

SNP calling question 1 How to define the DP cutoff? Minimum? Maximum?

SNP calling question 2 What is the best measurement for the SNP quality? QUAL: The Phred scaled probability of Probability that REF/ALT polymorphism exists at this site given sequencing data. DP: The DP field describe the total depth of reads that passed the Unified Genotypers internal quality control metrics GQ: The Genotype Quality, or Phred-scaled confidence that the true genotype is the one provided in GT MQ: Root Mean Square of the mapping quality of the reads across all samples QUAL vs MQ QUAL vs DP RZ900 QUAL vs GQ RZ900

Conflicting between GT and AD? SNP flanking consensus sequence quality? Using non-redundant reads?

Other Markers (Genetic variation)? Identification of : InDels Structural variants Copy number variants Transposable elements And the prediction of their functional consequences.

Marcel Kerkveld Sara Movahedi Remco Ursem Raoul Frijters Xiangyu Rao