Analysis of DNA methylation: bisulfite libraries and SOLiD sequencing
An easy view of the bisulfite approach CH3 genome TAGTACGTTGAT TAGTACGTTGAT read TAGTACGTTGAT TAGTATGTTGAT
Three main problems 1. We need some software specifically designed to align bisulfite reads 2. Loss of sensibility and specificity due to the reduced complexity (3 letters instead than 4) and to the increased size of the reference 3. Need of special strategies for making the shotgun libraries
Three main problems 1. We need some software specifically designed to align bisulfite reads 2. Loss of sensibility and specificity due to the reduced complexity (3 letters instead than 4) and to the increased size of the reference 3. Need of special strategies for making the shotgun libraries Before 5' ATGCTGCACTGACACGTGAT 3' 3' TACGACGTGACTGTGCACTA 5' After 5' ATGUTGUAUTGAUAUGTGAT 3' 3' TAUGAUGTGAUTGTGUAUTA 5'
Need of special strategies for making the shotgun libraries Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR: Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 2009, 462:315-322.
CRIBI method for bisulfite libraries preparation - MeSS Methylome Solid Sequencing Lisa Marchioretto and Robin Targon DNA Nuclei Cells Bisulfite treatment Adaptor ligation PCR Sequencing
Optimization of the fragmentation and bisulfite treatment
Optimization of adaptor ligation Comparing to other Bis-seq methods, MeSS requires ten times less starting genomic DNA, avoids intermediate purification steps between enzymatic reactions, and allows an efficient amplification with fewer PCR cycles.
Loss of sensibility and specificity due to the reduced complexity (3 letters instead than 4) and to the increased size of the reference Directional cloning would half the mapping complexity Before 5' ATGCTGCACTGACACGTGAT 3' 3' TACGACGTGACTGTGCACTA 5' After 5' ATGUTGUAUTGAUAUGTGAT 3' 3' TAUGAUGTGAUTGTGUAUTA 5' SOLiD color space maintains the full set of 4 colors after C/U conversion >882_4_710_F3 T12303201320002311102023132033102120101 >882_4_840_F3 T30132200013022300130131231321021133033 >882_4_1657_F3 T33213100102312210311012322012203112333 >882_5_1275_F3 T31201000021203112332021200212201223112 >882_6_553_F3 T31321031020123002032223323001301333313...
software specifically designed to align bisulfite reads
Exaustive approach of bisulfite alignment STEP 1 Virtual bisulfite conversion of the genome Genome...ATGCTGCACTGACACGTGATGTCGTA... Converted AGT genome...atgttgtattgatatgtgatgttgta... STEP 2 Virtual bisulfite conversion of any C in the reads, remembering the original Read #1 Read #2 TGTTGTATTG TGTTGTATTG TGATGTCGTA TGATGTTGTA STEP 3 Alignment of three base sequences Converted genome Converted reads STEP 4/5 If original read had any C, check that also genome was C and label as Met Original genome Converted genome Converted read Original read...atgttgtattgatatgtgatgttgta... TGTTGTATTG TGATGTTGTA CH3 /...ATGCTGCACTGACACGTGATGTCGTA......ATGTTGTATTGATATGTGATGTTGTA... TGATGTTGTA TGATGTCGTA
PASS implementation of bisulfite alignment Simulated test set Starting from 3 simulated hg19 reference genome which cytosines was randomly methylated on both DNA strands to obtain 3 cytosines methylation percent level ( 0%, 50% and 100% ) we have generated 6 test sets containing 1 million of reads each one (3 for colorspace and 3 for basespace data) using dwgsim-0.1.8 (ref.) program. The same procedure is applied to obtain the not bisulfite threated DNA simulated test sets except for the unmodified hg19 reference genome as input of dwgsim-0.1.8 program. Used parameters: [ -y 0 -z 0 -d 100 -S 2 -c 0 or 1 (for Illumina or SOLiD data) -1 50-2 50 -C -1 -N 1000000 ] The per base/color/flow error rate and the rate of mutation is set to the default values (respectively: 0.02 and 0.001). All simulated test sets was produced using the same seed, so they are comparable for number of reads, position and strand to the human reference genome (hg19 ).
PASS implementation of bisulfite alignment General strategy 1. Find seeds in base space 2. Extend alignment in color space
SOLiD chemistry: ligation probes Ligation site, cleavage site & dye are spatially separated Cleavage site Ligation site Fluorescent dye interrogates base on 1st + 2nd position 2nd Base A C G T A T n n n z z z N=degenerate bases, Z=universal bases 45 = 1024 probes (256 probes per color) es t1as B Ligation Probes are Octamers A C G T 2-base encoding is based on ligation sequencing rather than sequencing by synthesis. It takes advantage of fluorescent labeled 8-mer probes that distinguish the two 3 prime most bases (AT in the figure). To have a full coverage, repeated cycles of ligation are done, using primers annealing to different positions of the adapter sequence (see next slides).
SOLiD 4-color ligation Ligation reaction universal seq primer ligase Y-probe XXnnnzzz 1µm 1µm bead bead P1 Primer XXnnnzzz X Xn n n z z z B-probe G-probe Template Sequence R-probe XXnnnzzz
SOLiD 4-color ligation Ligation reaction ligase Y-probe XXnnnzzz X Xn n n z z z B-probe G-probe XXnnnzzz R-probe XXnnnzzz ligase universal seq primer 1µm 1µm bead bead p xx P1 Primer Template Sequence
SOLiD 4-color ligation Visualization universal seq primer 1µm 1µm bead bead xx P1 Primer Template Sequence Y 1-2
SOLiD ligation-based sequencing chemistry (2) Image Cap unextended strands Cleave-off fluor
SOLiD 4-color ligation Cleavage universal seq primer 1µm 1µm bead bead xx P1 Primer p Template Sequence Y 1-2
SOLiD 4-color ligation Ligation (2nd cycle) ligase Y-probe XXnnnzzz X Xn n n z z z B-probe G-probe XXnnnzzz R-probe XXnnnzzz ligase universal seq primer 1µm 1µm bead bead xx Adapter Oligo Sequence xx Template Sequence Y 1-2
SOLiD 4-color ligation Visualization (2nd cycle) universal seq primer 1µm 1µm bead bead XX xx Adapter Oligo Sequence Template Sequence Y R 1-2 6-7
SOLiD 4-color ligation Cleavage (2nd cycle) universal seq primer 1µm 1µm bead bead XX xx Adapter Oligo Sequence p Template Sequence Y R 1-2 6-7
SOLiD 4-color ligation interrogates every 4th-5th base universal seq primer 1µm 1µm bead bead XX XX XX Adapter Oligo Sequence XX XX Template Sequence Y R R B 1-2 6-7 11-12 16-17 21-22 G
SOLiD 4-color ligation Reset 1µm 1µm bead bead Adapter Oligo Sequence Template Sequence
SOLiD 4-color ligation (1st cycle after reset) universal seq primer n-1 p ligase Y-probe XXnnnzzz X Xn n n z z z B-probe G-probe XXnnnzzz R-probe XXnnnzzz ligase universal seq primer n-1 p 1µm 1µm bead bead xx Adapter Oligo Sequence Template Sequence
SOLiD 4-color ligation (1st cycle after reset) universal seq primer n-1 1µm 1µm bead bead xx Adapter Oligo Sequence Template Sequence R 0-1
SOLiD 4-color ligation (2nd Round) universal seq primer n-1 1µm 1µm bead bead XX XX XX Adapter Oligo Sequence XX XX Template Sequence R R R B G 01 56 1011 1516 2021
Sequential rounds of sequencing Multiple cycles per round 1µm 1µm bead bead Adapter Oligo Sequence Template Sequence universal seq primer 1-2 reset 11-12 16-17 21-22 universal seq primer n-1 0-1 reset 5-6 10-11 15-16 20-21 14-15 19-20 24-25 universal seq primer n+3 reset 4-5 spacer 9-10 universal seq primer n+2 3-4 8-9 13-14 18-19 23-24 spacer reset universal seq primer n+1 6-7 spacer 2-3 7-8 12-13 17-18 22-23
01 02 03 Agenda Item Agenda Item Agenda Item SOLiD Chemistry Double Base Encoding
2 Base Pair Encoding Using 4 Dyes Red-probe 2nd Base A C G A T n n n z z z T A Blue-probe C es t1as B G T T n n n z z z T
2 base pair encoding reference alignment in color space A C G G T C G T C G T G T G C G T Base reference Color reference
2 base pair encoding reference alignment in color space A C G G T C G T C G T G T G C G T reference expected observed A C G G T C G C C G T G T G C G T A SNP to be real must be encoded by two color changes
Advantages of 2 base pair encoding Miscall A C G G T C G T C G T G T G C G T reference expected observed A C G G T C G C T A C A C A T A C 2nd Base A Single color change, represents sequencing error. C G T A es t1as B C G T
But there is more Only certain transitions are allowed for a real SNP Consider a triplet of bases, they define 2 colors. C A T There are only 3 possibilities for a change in the middle base, hence only 3 possibilities for the 2 colors to change to. Any of the other 6 possibilities for a 2-color change are not allowed and most probably represent measurement errors.
The Only Allowed Transitions C A T CGT Reverse Colors C C T C T T Other two colors (both orientations) Any other transitions would require the outer two bases to change
Not Allowed Transitions 2nd Base A C A T C G T A es t1as B C G T A G T T C T G T T C G C C C A C T G 1/3rd allowed vs 2/3rd not allowed
SOLiD Exact Call Chemistry (ECC) ECC allows to perform an extra run of ligations with 3-base encoding. This is used as a control of the accuracy, thus improving the quality of the sequence in color space. Also, it can return a sequence in base space with a good accuracy.
PASS implementation of bisulfite alignment (Davide Campagna) General strategy 1. Find seeds in base space 2. Extend alignment in color space 3. Rescue unaligned reads using a reference with the combination of methylated patterns