Overview sequence projects Bioassist NGS meeting 15-01-2010 Barbera van Schaik KEBB - Bioinformatics Laboratory b.d.vanschaik@amc.uva.nl
NGS at the Academic Medical Center Sequence facility Laboratory Division Bioinformatics Laboratory - KEBB Roche (454) sequencer ABI Solid sequencer 2
IT resources Sequence laboratory Roche onrig Roche cluster Solid on-instrument cluster Data analysis server (4 dual core, 5TB) Bioinformatics laboratory Linux server 1: 16 quad core Linux server 2: 8 quad core Shared file system 2TB Other Biostatistics cluster (specs?) Central storage (ICT department) Backups at SARA Dutch grid (VLe, EBioInfra, VBrowser, Moteur) 3
Software Software we use: blat (local and on grid) blast (on grid) roche package celera assembler (cabog) R solid RNApipeline Programming language Perl and shell scripting Java 4
NGS data analysis at the AMC Basics: Group sequences per MID/barcode, primer, or any other sequence Count things Run existing analysis software (loops, file handling) Calculate read coverage (to load into the genome browser) Projects Mutation screening T- and B-cell variants Virus discovery Alternative splicing Bacterial genomes microrna expression... Several departments Rheumatology / Immunology Virus Discovery Unit Neurogenetics Neurology / Medical microbiology Experimental virology / Sequence lab 5
T and B cell variation Rheumatology / Immunology
TCR Rheumatoid arthritis 7 http://en.wikipedia.org/wiki/t_cell_receptor
Total theoretical variation Paul Klarenbeek 8
Goal: identify and enumerate TCR variants Thymocytes Germline DNA mrna Paul Klarenbeek CDR3 region Unique for each clonal expansion 9
T-cell pipeline 5 C V N D N J polya variable region 30-60 bp Convert sff to fasta+quality scores Identify: MIDs, primers Sort sequences based on MID and region Also applicable to B-cell variants (Marieke Doorenspleet) Identify the V, J and C segments Count variants Locate highly variable area Quality control Perl scripts Roche software BLAT Access/Excel
Virus discovery Virus discovery unit
Vidisca Extract virus DNA and RNA Digest DNA Amplify Sequence with selective primers Selective PCR (16 primer combin.) Direct 454 sequencing 12 Michel de Vries
Blast on grid with e-bioinfra 13
Splice variant detection Neurogenetics
Splice variant detection WT 1 2 3 4 5 6 7 8 9 10 11 11b 12 tissue-specific 1 2 3 4 5 6 7 8 9 10 11 11b 12 tissue-specific frameshift 1 2 3 4 5 6 7 8 9 10 11 11c 12 Katja Ritz
Program overview Sequence run: all data Submit grid jobs for each combination cmd> blat S1 S1 > blat.out cmd> R_graph.pl blat.out Split data Sequence set n Sequence set 3 Sequence set 2 Sequence set 1 All jobs finished? si no Wait Collect output Merge identical sequences Reduced sequence set n Reduced sequence set 3 Reduced sequence set 2 Reduced sequence set 1 Blat groups against genome Compare all sequences within one set Check if groups are correct 16
Example of output 17
Comparison of bacterial genomes Neurology / Medical microbiology
Comparison of bacteria strains between two groups of patients with meningitis (good vs bad outcome) Whole genome sequencing (20 strains are sequenced) Sort samples per MID Genome assembly with Cabog and Newbler (de novo and with reference sequences) Genome annotation using the Comprehensive Microbial Resource Differences between strains will be detected using a DNA only detection tool Jurgen Piet 19
Sequence assembly MID sorting: allowing for 0 or 2 errors Newbler 2.0 Newbler 2.3 Cabog 5.4 Cabog 6-beta 20 Note that the commandline and GUI interface of Newbler gives different results! (Check tips&tricks on Bioassist wiki) Combine assemblies?
Keep track of projects Bioinformatics laboratory
Communication - log - file sharing 22
People and resources Sequencing facility operators SeqLab server DNA sequencers sequence data workstations sequence data results analyses algorithm developers workflow users BioLab server workstations Bioinformaticians biomedical researchers Research laboratories 23
People and labs (selection) Rheumatology / immunology Paul Klarenbeek Marieke Doorenspleet Niek de Vries Virus discovery unit Michel de Vries Martin Deijs Lia van der Hoek Neurology / Medical microbiology Jurgen Piet Ewout Jansen Diederik van de Beek Arie van der Ende Clinical genetics Olaf Mook Jean Soucy Neurogenetics / Sequence facility Katja Ritz Marja Jakobs Ted Bradley Frank Baas Bioinformatics laboratory - KEBB Angela Luyf Marcel Willemsen Barbera van Schaik Silvia D Olabarriaga Antoine van Kampen 24