Analysis of DNA methylation: bisulfite libraries and SOLiD sequencing



Similar documents
New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Next Generation Sequencing

Description: Molecular Biology Services and DNA Sequencing

Introduction to next-generation sequencing data

Next Generation Sequencing for DUMMIES

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Illumina Sequencing Technology

PreciseTM Whitepaper

Illumina TruSeq DNA Adapters De-Mystified James Schiemer

Concepts and methods in sequencing and genome assembly

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

SOLiD System accuracy with the Exact Call Chemistry module

Reduced Representation Bisulfite-Seq A Brief Guide to RRBS

Extensible Sequence (XSQ) File Format Specification 1.0.1

How is genome sequencing done?

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Next Generation Sequencing

Biotechnology: DNA Technology & Genomics

How many of you have checked out the web site on protein-dna interactions?

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

A Guide to LAMP primer designing (PrimerExplorer V4)

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

Single Nucleotide Polymorphisms (SNPs)

HiPer RT-PCR Teaching Kit

Introduction To Real Time Quantitative PCR (qpcr)

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

DNA Sequence Analysis

Universidade Estadual de Maringá

Whole genome Bisulfite Sequencing for Methylation Analysis Preparing Samples for the Illumina Sequencing Platform

TruSeq Custom Amplicon v1.5

Welcome to Pacific Biosciences' Introduction to SMRTbell Template Preparation.

July 7th 2009 DNA sequencing

Real-Time PCR Vs. Traditional PCR

Technical Note. Roche Applied Science. No. LC 18/2004. Assay Formats for Use in Real-Time PCR

Data Analysis for Ion Torrent Sequencing

Reading DNA Sequences:

Reduced Representation Bisulfite Sequencing for Methylation Analysis Preparing Samples for the Illumina Sequencing Platform

Computational Genomics. Next generation sequencing (NGS)

Introduction Bioo Scientific

- In , Allan Maxam and walter Gilbert devised the first method for sequencing DNA fragments containing up to ~ 500 nucleotides.

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Nucleic Acid Techniques in Bacterial Systematics

Next Generation Sequencing: Technology, Mapping, and Analysis

TIANquick Mini Purification Kit

Sequencing Guidelines Adapted from ABI BigDye Terminator v3.1 Cycle Sequencing Kit and Roswell Park Cancer Institute Core Laboratory website

Mitochondrial DNA Analysis

Design of conditional gene targeting vectors - a recombineering approach

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Innovations in Molecular Epidemiology

FOR REFERENCE PURPOSES

Cluster Generation. Module 2: Overview

ZR DNA Sequencing Clean-up Kit

Real-time qpcr Assay Design Software

DNA Sequencing Overview

The Biotechnology Education Company

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

CUSTOM DNA SEQUENCING SERVICES

ZR-96 DNA Sequencing Clean-up Kit Catalog Nos. D4052 & D4053

Genome Sequencer System. Amplicon Sequencing. Application Note No. 5 / February

Chapter 6 DNA Replication

SEQUENCING. From Sample to Sequence-Ready

Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

The Techniques of Molecular Biology: Forensic DNA Fingerprinting

Difficult DNA Templates Sequencing. Primer Walking Service

Forensic DNA Testing Terminology

1. Molecular computation uses molecules to represent information and molecular processes to implement information processing.

Troubleshooting for PCR and multiplex PCR

Next generation DNA sequencing technologies. theory & prac-ce

Genome-wide measurements of protein-dna interaction by chromatin immunoprecipitation

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

DNA. Discovery of the DNA double helix

An Overview of DNA Sequencing

Recombinant DNA Unit Exam

PrimeSTAR HS DNA Polymerase

restriction enzymes 350 Home R. Ward: Spring 2001

DNA Core Facility: DNA Sequencing Guide

1/12 Dideoxy DNA Sequencing

Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA

Complete Genomics Sequencing

Dye-Blob message: Example: Generally, this is due to incomplete excess dye removal of the cycle sequence reaction.

Problem Set 3 KEY

Agencourt AMPure XP. Xtra Performance Post-PCR clean UP

ONLINE SUPPLEMENTAL MATERIAL. Allele-Specific Expression of Angiotensinogen in Human Subcutaneous Adipose Tissue

Sequencing of DNA modifications

How Sequencing Experiments Fail

DNA Sequencing. Ben Langmead. Department of Computer Science

Getting Started Guide

Design high specificity CRISPR-Cas9 grnas: principles and tools. Heidi Huang, PhD

NGS data analysis. Bernardo J. Clavijo

DNA-functionalized hydrogels for confined membrane-free in vitro transcription/translation

MeDIP-chip service report

Table of Contents. I. Description II. Kit Components III. Storage IV. 1st Strand cdna Synthesis Reaction... 3

Mir-X mirna First-Strand Synthesis Kit User Manual

Procedures For DNA Sequencing

Overview of Next Generation Sequencing platform technologies

PATHOGEN DETECTION SYSTEMS BY REAL TIME PCR. Results Interpretation Guide

DNA Replication in Prokaryotes

Transcription:

Analysis of DNA methylation: bisulfite libraries and SOLiD sequencing

An easy view of the bisulfite approach CH3 genome TAGTACGTTGAT TAGTACGTTGAT read TAGTACGTTGAT TAGTATGTTGAT

Three main problems 1. We need some software specifically designed to align bisulfite reads 2. Loss of sensibility and specificity due to the reduced complexity (3 letters instead than 4) and to the increased size of the reference 3. Need of special strategies for making the shotgun libraries

Three main problems 1. We need some software specifically designed to align bisulfite reads 2. Loss of sensibility and specificity due to the reduced complexity (3 letters instead than 4) and to the increased size of the reference 3. Need of special strategies for making the shotgun libraries Before 5' ATGCTGCACTGACACGTGAT 3' 3' TACGACGTGACTGTGCACTA 5' After 5' ATGUTGUAUTGAUAUGTGAT 3' 3' TAUGAUGTGAUTGTGUAUTA 5'

Need of special strategies for making the shotgun libraries Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR: Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 2009, 462:315-322.

CRIBI method for bisulfite libraries preparation - MeSS Methylome Solid Sequencing Lisa Marchioretto and Robin Targon DNA Nuclei Cells Bisulfite treatment Adaptor ligation PCR Sequencing

Optimization of the fragmentation and bisulfite treatment

Optimization of adaptor ligation Comparing to other Bis-seq methods, MeSS requires ten times less starting genomic DNA, avoids intermediate purification steps between enzymatic reactions, and allows an efficient amplification with fewer PCR cycles.

Loss of sensibility and specificity due to the reduced complexity (3 letters instead than 4) and to the increased size of the reference Directional cloning would half the mapping complexity Before 5' ATGCTGCACTGACACGTGAT 3' 3' TACGACGTGACTGTGCACTA 5' After 5' ATGUTGUAUTGAUAUGTGAT 3' 3' TAUGAUGTGAUTGTGUAUTA 5' SOLiD color space maintains the full set of 4 colors after C/U conversion >882_4_710_F3 T12303201320002311102023132033102120101 >882_4_840_F3 T30132200013022300130131231321021133033 >882_4_1657_F3 T33213100102312210311012322012203112333 >882_5_1275_F3 T31201000021203112332021200212201223112 >882_6_553_F3 T31321031020123002032223323001301333313...

software specifically designed to align bisulfite reads

Exaustive approach of bisulfite alignment STEP 1 Virtual bisulfite conversion of the genome Genome...ATGCTGCACTGACACGTGATGTCGTA... Converted AGT genome...atgttgtattgatatgtgatgttgta... STEP 2 Virtual bisulfite conversion of any C in the reads, remembering the original Read #1 Read #2 TGTTGTATTG TGTTGTATTG TGATGTCGTA TGATGTTGTA STEP 3 Alignment of three base sequences Converted genome Converted reads STEP 4/5 If original read had any C, check that also genome was C and label as Met Original genome Converted genome Converted read Original read...atgttgtattgatatgtgatgttgta... TGTTGTATTG TGATGTTGTA CH3 /...ATGCTGCACTGACACGTGATGTCGTA......ATGTTGTATTGATATGTGATGTTGTA... TGATGTTGTA TGATGTCGTA

PASS implementation of bisulfite alignment Simulated test set Starting from 3 simulated hg19 reference genome which cytosines was randomly methylated on both DNA strands to obtain 3 cytosines methylation percent level ( 0%, 50% and 100% ) we have generated 6 test sets containing 1 million of reads each one (3 for colorspace and 3 for basespace data) using dwgsim-0.1.8 (ref.) program. The same procedure is applied to obtain the not bisulfite threated DNA simulated test sets except for the unmodified hg19 reference genome as input of dwgsim-0.1.8 program. Used parameters: [ -y 0 -z 0 -d 100 -S 2 -c 0 or 1 (for Illumina or SOLiD data) -1 50-2 50 -C -1 -N 1000000 ] The per base/color/flow error rate and the rate of mutation is set to the default values (respectively: 0.02 and 0.001). All simulated test sets was produced using the same seed, so they are comparable for number of reads, position and strand to the human reference genome (hg19 ).

PASS implementation of bisulfite alignment General strategy 1. Find seeds in base space 2. Extend alignment in color space

SOLiD chemistry: ligation probes Ligation site, cleavage site & dye are spatially separated Cleavage site Ligation site Fluorescent dye interrogates base on 1st + 2nd position 2nd Base A C G T A T n n n z z z N=degenerate bases, Z=universal bases 45 = 1024 probes (256 probes per color) es t1as B Ligation Probes are Octamers A C G T 2-base encoding is based on ligation sequencing rather than sequencing by synthesis. It takes advantage of fluorescent labeled 8-mer probes that distinguish the two 3 prime most bases (AT in the figure). To have a full coverage, repeated cycles of ligation are done, using primers annealing to different positions of the adapter sequence (see next slides).

SOLiD 4-color ligation Ligation reaction universal seq primer ligase Y-probe XXnnnzzz 1µm 1µm bead bead P1 Primer XXnnnzzz X Xn n n z z z B-probe G-probe Template Sequence R-probe XXnnnzzz

SOLiD 4-color ligation Ligation reaction ligase Y-probe XXnnnzzz X Xn n n z z z B-probe G-probe XXnnnzzz R-probe XXnnnzzz ligase universal seq primer 1µm 1µm bead bead p xx P1 Primer Template Sequence

SOLiD 4-color ligation Visualization universal seq primer 1µm 1µm bead bead xx P1 Primer Template Sequence Y 1-2

SOLiD ligation-based sequencing chemistry (2) Image Cap unextended strands Cleave-off fluor

SOLiD 4-color ligation Cleavage universal seq primer 1µm 1µm bead bead xx P1 Primer p Template Sequence Y 1-2

SOLiD 4-color ligation Ligation (2nd cycle) ligase Y-probe XXnnnzzz X Xn n n z z z B-probe G-probe XXnnnzzz R-probe XXnnnzzz ligase universal seq primer 1µm 1µm bead bead xx Adapter Oligo Sequence xx Template Sequence Y 1-2

SOLiD 4-color ligation Visualization (2nd cycle) universal seq primer 1µm 1µm bead bead XX xx Adapter Oligo Sequence Template Sequence Y R 1-2 6-7

SOLiD 4-color ligation Cleavage (2nd cycle) universal seq primer 1µm 1µm bead bead XX xx Adapter Oligo Sequence p Template Sequence Y R 1-2 6-7

SOLiD 4-color ligation interrogates every 4th-5th base universal seq primer 1µm 1µm bead bead XX XX XX Adapter Oligo Sequence XX XX Template Sequence Y R R B 1-2 6-7 11-12 16-17 21-22 G

SOLiD 4-color ligation Reset 1µm 1µm bead bead Adapter Oligo Sequence Template Sequence

SOLiD 4-color ligation (1st cycle after reset) universal seq primer n-1 p ligase Y-probe XXnnnzzz X Xn n n z z z B-probe G-probe XXnnnzzz R-probe XXnnnzzz ligase universal seq primer n-1 p 1µm 1µm bead bead xx Adapter Oligo Sequence Template Sequence

SOLiD 4-color ligation (1st cycle after reset) universal seq primer n-1 1µm 1µm bead bead xx Adapter Oligo Sequence Template Sequence R 0-1

SOLiD 4-color ligation (2nd Round) universal seq primer n-1 1µm 1µm bead bead XX XX XX Adapter Oligo Sequence XX XX Template Sequence R R R B G 01 56 1011 1516 2021

Sequential rounds of sequencing Multiple cycles per round 1µm 1µm bead bead Adapter Oligo Sequence Template Sequence universal seq primer 1-2 reset 11-12 16-17 21-22 universal seq primer n-1 0-1 reset 5-6 10-11 15-16 20-21 14-15 19-20 24-25 universal seq primer n+3 reset 4-5 spacer 9-10 universal seq primer n+2 3-4 8-9 13-14 18-19 23-24 spacer reset universal seq primer n+1 6-7 spacer 2-3 7-8 12-13 17-18 22-23

01 02 03 Agenda Item Agenda Item Agenda Item SOLiD Chemistry Double Base Encoding

2 Base Pair Encoding Using 4 Dyes Red-probe 2nd Base A C G A T n n n z z z T A Blue-probe C es t1as B G T T n n n z z z T

2 base pair encoding reference alignment in color space A C G G T C G T C G T G T G C G T Base reference Color reference

2 base pair encoding reference alignment in color space A C G G T C G T C G T G T G C G T reference expected observed A C G G T C G C C G T G T G C G T A SNP to be real must be encoded by two color changes

Advantages of 2 base pair encoding Miscall A C G G T C G T C G T G T G C G T reference expected observed A C G G T C G C T A C A C A T A C 2nd Base A Single color change, represents sequencing error. C G T A es t1as B C G T

But there is more Only certain transitions are allowed for a real SNP Consider a triplet of bases, they define 2 colors. C A T There are only 3 possibilities for a change in the middle base, hence only 3 possibilities for the 2 colors to change to. Any of the other 6 possibilities for a 2-color change are not allowed and most probably represent measurement errors.

The Only Allowed Transitions C A T CGT Reverse Colors C C T C T T Other two colors (both orientations) Any other transitions would require the outer two bases to change

Not Allowed Transitions 2nd Base A C A T C G T A es t1as B C G T A G T T C T G T T C G C C C A C T G 1/3rd allowed vs 2/3rd not allowed

SOLiD Exact Call Chemistry (ECC) ECC allows to perform an extra run of ligations with 3-base encoding. This is used as a control of the accuracy, thus improving the quality of the sequence in color space. Also, it can return a sequence in base space with a good accuracy.

PASS implementation of bisulfite alignment (Davide Campagna) General strategy 1. Find seeds in base space 2. Extend alignment in color space 3. Rescue unaligned reads using a reference with the combination of methylated patterns