Introduction to NGS data analysis

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Introduction to NGS data analysis"

Transcription

1 Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics

2 Sequencing Illumina platforms Characteristics: High throughput (3 genomes). Paired end. High accuracy. Read length 2 125bp. Relatively long run time (6 days). Relatively expensive. Figure 1: HiSeq Workshop NGS, Hogeschool Leiden 1/50 Thursday, May 22, 2014

3 Sequencing Illumina platforms Characteristics: Moderate throughput (3 exomes). Paired end. High accuracy. Read length 2 300bp. Relatively short run time (3 days). Relatively expensive. Figure 2: MiSeq. Workshop NGS, Hogeschool Leiden 2/50 Thursday, May 22, 2014

4 Sequencing Illumina platforms Figure 3: Flowcell. Figure 4: Amplification. Workshop NGS, Hogeschool Leiden 3/50 Thursday, May 22, 2014

5 Sequencing Illumina platforms Figure 3: Flowcell. Optical system. DNA fragments are loaded on a flowcell. Fragments are amplified. Clusters of fragments give the same signal. Figure 4: Amplification. Workshop NGS, Hogeschool Leiden 3/50 Thursday, May 22, 2014

6 Sequencing Illumina platforms Figure 5: Image of tile (part of the flowcell). Workshop NGS, Hogeschool Leiden 4/50 Thursday, May 22, 2014

7 Sequencing Illumina platforms Figure 6: Base calling on Illumina systems. Workshop NGS, Hogeschool Leiden 5/50 Thursday, May 22, 2014

8 Sequencing Illumina platforms Figure 7: Paired end. Workshop NGS, Hogeschool Leiden 6/50 Thursday, May 22, 2014

9 Sequencing Illumina platforms Advantages: Mapping to non-unique regions. Structural variation. De novo assembly (scaffolding). Figure 7: Paired end. Workshop NGS, Hogeschool Leiden 6/50 Thursday, May 22, 2014

10 Life-Tech Sequencing Characteristics: Low throughput. Single end (for now). High accuracy. Read length ±400bp. Short run time (1 day). Cheap runs. Figure 8: Ion torrent. Workshop NGS, Hogeschool Leiden 7/50 Thursday, May 22, 2014

11 Life-Tech Sequencing Figure 9: Ion proton. Characteristics: Moderate throughput (1 exome). Single end (for now). High accuracy. Read length ±175bp. Short run time (1 day). Cheap runs. Workshop NGS, Hogeschool Leiden 8/50 Thursday, May 22, 2014

12 Life-Tech Sequencing Figure 10: Ion Torrent chip. Figure 11: Ion Proton chip. Workshop NGS, Hogeschool Leiden 9/50 Thursday, May 22, 2014

13 Life-Tech Sequencing Figure 12: Close up of an Ion chip. Workshop NGS, Hogeschool Leiden 10/50 Thursday, May 22, 2014

14 Life-Tech Sequencing Figure 14: Base calling. Figure 13: Loading a sample. Problems with mono nucleotide stretches. Workshop NGS, Hogeschool Leiden 11/50 Thursday, May 22, 2014

15 Sequencing Pacific Biosciences Figure 15: PacBio RS. Workshop NGS, Hogeschool Leiden 12/50 Thursday, May 22, 2014

16 Sequencing Pacific Biosciences Characteristics: Long reads (50% is longer than 10,000 bp). High error rate (15-20%). Low throughput (comparable with the Roche 454). Workshop NGS, Hogeschool Leiden 13/50 Thursday, May 22, 2014

17 Sequencing Pacific Biosciences Characteristics: Long reads (50% is longer than 10,000 bp). High error rate (15-20%). Low throughput (comparable with the Roche 454). Circular consensus sequencing. Sequence the same molecule several times. Extremely high accuracy. Acceptable read length (±250bp). Workshop NGS, Hogeschool Leiden 13/50 Thursday, May 22, 2014

18 Sequencing Pacific Biosciences Figure 16: Real time polymerase monitoring. Workshop NGS, Hogeschool Leiden 14/50 Thursday, May 22, 2014

19 Sequencing Pacific Biosciences Figure 17: Base calling on the PacBio. Workshop NGS, Hogeschool Leiden 15/50 Thursday, May 22, 2014

20 Data analysis Next generation sequencing data 2 TTCGGGGGCTGGCAAATCCACTTCCGTGACACGCTACCATTCGCTGGTGGT , &07+03:54/ CGGTAAACCACCCTGCTGACGGAACCCTAATGCGCCTGAAAGACAGCGTTC / (.55:;:99(0(+2(22(0316;185;;0;:<<>=AA59 10 TCGTTAACGACTTTGTTCGCCACCGCAACCGCCTGTTTCGGGTCACAGGCA ;5? 14 TTGATGAATATATTATTTCAGGGAATAATTATGACACCTTTAGAACGCATT Listing 1: A FastQ file. Workshop NGS, Hogeschool Leiden 16/50 Thursday, May 22, 2014

21 Data analysis Common data analysis techniques Reference based techniques: Resequencing. Exome sequencing. Transcriptome sequencing (RNA). Full genome sequencing. Structural variation. Copy number variation. Not reference based: De novo assembly. k-mer profiling. Workshop NGS, Hogeschool Leiden 17/50 Thursday, May 22, 2014

22 Resequencing Data analysis Resequencing pipelines can roughly be divided in five steps. Workshop NGS, Hogeschool Leiden 18/50 Thursday, May 22, 2014

23 Resequencing Data analysis Resequencing pipelines can roughly be divided in five steps. 1. Pre-alignment. Quality control. Data cleaning. Workshop NGS, Hogeschool Leiden 18/50 Thursday, May 22, 2014

24 Resequencing Data analysis Resequencing pipelines can roughly be divided in five steps. 1. Pre-alignment. Quality control. Data cleaning. 2. Alignment. Post-alignment quality control. Workshop NGS, Hogeschool Leiden 18/50 Thursday, May 22, 2014

25 Resequencing Data analysis Resequencing pipelines can roughly be divided in five steps. 1. Pre-alignment. Quality control. Data cleaning. 2. Alignment. Post-alignment quality control. 3. Variant calling. Workshop NGS, Hogeschool Leiden 18/50 Thursday, May 22, 2014

26 Resequencing Data analysis Resequencing pipelines can roughly be divided in five steps. 1. Pre-alignment. Quality control. Data cleaning. 2. Alignment. Post-alignment quality control. 3. Variant calling. 4. Filtering. Post-variant calling quality control. Workshop NGS, Hogeschool Leiden 18/50 Thursday, May 22, 2014

27 Resequencing Data analysis Resequencing pipelines can roughly be divided in five steps. 1. Pre-alignment. Quality control. Data cleaning. 2. Alignment. Post-alignment quality control. 3. Variant calling. 4. Filtering. Post-variant calling quality control. 5. Annotation. Workshop NGS, Hogeschool Leiden 18/50 Thursday, May 22, 2014

28 Pre-alignment Data cleaning Depending on the sequencing platform, parts of the reads need to be removed. Remove linker sequences (Cutadapt, FASTX toolkit). Clip low quality reads at the end of the read (Sickle, Trimmomatic, FASTX toolkit). Length filtering (Fastools). toolkit/ Workshop NGS, Hogeschool Leiden 19/50 Thursday, May 22, 2014

29 Trimming Pre-alignment Figure 18: Quality score per position. Workshop NGS, Hogeschool Leiden 20/50 Thursday, May 22, 2014

30 Clipping Pre-alignment Figure 19: Sequencing linkers. Workshop NGS, Hogeschool Leiden 21/50 Thursday, May 22, 2014

31 Pre-alignment Quality control The FastQC toolkit can be used for quality control (both before and after the data cleaning step). GC content. GC distribution. Quality scores distribution Workshop NGS, Hogeschool Leiden 22/50 Thursday, May 22, 2014

32 Figure 21: Per sequence quality. Workshop NGS, Hogeschool Leiden 23/50 Thursday, May 22, 2014 Pre-alignment Example QC output Figure 20: Per base sequence content.

33 Alignment The best match to the reference genome Very efficient. The reference genome needs to be indexed. Finding an alignment is as easy as looking up a word in a dictionary. Figure 22: Visualisation of an alignment. Workshop NGS, Hogeschool Leiden 24/50 Thursday, May 22, 2014

34 Alignment Choose an aligner Not all aligners can deal with indels. Only a couple of years ago, only SNPs were considered. Bowtie Workshop NGS, Hogeschool Leiden 25/50 Thursday, May 22, 2014

35 Alignment Choose an aligner Not all aligners can deal with indels. Only a couple of years ago, only SNPs were considered. Bowtie. Few aligners can work with large deletions. Spliced RNA. GMAP / GSNAP. Tophat. BWA-MEM Workshop NGS, Hogeschool Leiden 25/50 Thursday, May 22, 2014

36 Alignment Choose an aligner The choice of aligner may be restricted by the sequencer. For the Ion Torrent: Tmap. Combination of three different aligners. Deals with errors in homopolymer stretches. For the PacBio: BLASR. Workshop NGS, Hogeschool Leiden 26/50 Thursday, May 22, 2014

37 Variant calling Consistent deviations from the reference Figure 23: Result of an alignment. Workshop NGS, Hogeschool Leiden 27/50 Thursday, May 22, 2014

38 Variant calling Some considerations Things a variant caller might take into account: Strand balance. Base quality. Mapping quality. Distribution within the reads. Ploidity of the organism in question. Workshop NGS, Hogeschool Leiden 28/50 Thursday, May 22, 2014

39 Variant calling Some considerations Things a variant caller might take into account: Strand balance. Base quality. Mapping quality. Distribution within the reads. Ploidity of the organism in question. Complicating factors: Pooled samples. Workshop NGS, Hogeschool Leiden 28/50 Thursday, May 22, 2014

40 Variant calling Some considerations Things a variant caller might take into account: Strand balance. Base quality. Mapping quality. Distribution within the reads. Ploidity of the organism in question. Complicating factors: Pooled samples. RNA. Allele specific expression. RNA editing. Workshop NGS, Hogeschool Leiden 28/50 Thursday, May 22, 2014

41 Variant calling Some considerations Things a variant caller might take into account: Strand balance. Base quality. Mapping quality. Distribution within the reads. Ploidity of the organism in question. Complicating factors: Pooled samples. RNA. Allele specific expression. RNA editing. Strand specific sampleprep. Workshop NGS, Hogeschool Leiden 28/50 Thursday, May 22, 2014

42 Variant calling Choice of variant caller Rules of thumb: Well known organism and experiment: Statistical model. Use a simpler variant caller otherwise Workshop NGS, Hogeschool Leiden 29/50 Thursday, May 22, 2014

43 Variant calling Choice of variant caller Rules of thumb: Well known organism and experiment: Statistical model. Use a simpler variant caller otherwise. Popular variant callers: Samtools. GATK. VarScan Workshop NGS, Hogeschool Leiden 29/50 Thursday, May 22, 2014

44 Variant filtering Filtering on coverage We can set some thresholds: Minimum. Maximum. Workshop NGS, Hogeschool Leiden 30/50 Thursday, May 22, 2014

45 Variant filtering Filtering on coverage We can set some thresholds: Minimum. Maximum. We filter for a maximum coverage because of copy number variation. Workshop NGS, Hogeschool Leiden 30/50 Thursday, May 22, 2014

46 Variant filtering Filtering on coverage We can set some thresholds: Minimum. Maximum. We filter for a maximum coverage because of copy number variation. A good way to calculate the maximum: Calculate the mean coverage. Only of the covered (targeted) regions. Multiply this number with a reasonable factor e.g., 2.5. Workshop NGS, Hogeschool Leiden 30/50 Thursday, May 22, 2014

47 Annotation What is already known about a variant A selection of SeattleSeq annotation: Is the variant known? Does it hit a gene? Workshop NGS, Hogeschool Leiden 31/50 Thursday, May 22, 2014

48 Annotation What is already known about a variant A selection of SeattleSeq annotation: Is the variant known? Does it hit a gene? Is it in an intron? Does it hit a splice site? Workshop NGS, Hogeschool Leiden 31/50 Thursday, May 22, 2014

49 Annotation What is already known about a variant A selection of SeattleSeq annotation: Is the variant known? Does it hit a gene? Is it in an intron? Does it hit a splice site? Is it in the coding region? Is there a gain/loss of a stop codon? Does the variant result in a frameshift?... Workshop NGS, Hogeschool Leiden 31/50 Thursday, May 22, 2014

50 Annotation What is already known about a variant A selection of SeattleSeq annotation: Is the variant known? Does it hit a gene? Is it in an intron? Does it hit a splice site? Is it in the coding region? Is there a gain/loss of a stop codon? Does the variant result in a frameshift?... Is it in the 5 /3 UTR of a gene?... Workshop NGS, Hogeschool Leiden 31/50 Thursday, May 22, 2014

51 Annotation What is already known about a variant A selection of SeattleSeq annotation: Is the variant known? Does it hit a gene? Is it in an intron? Does it hit a splice site? Is it in the coding region? Is there a gain/loss of a stop codon? Does the variant result in a frameshift?... Is it in the 5 /3 UTR of a gene?... Is it in a regulatory region?... Workshop NGS, Hogeschool Leiden 31/50 Thursday, May 22, 2014

52 Full genome sequencing Copy number variation Figure 24: Coverage patterns over a whole chromosome. Workshop NGS, Hogeschool Leiden 32/50 Thursday, May 22, 2014

53 Full genome sequencing Copy number variation Per sample: The reference needs to be very good. Sequencability biases. Mapping biases. Workshop NGS, Hogeschool Leiden 33/50 Thursday, May 22, 2014

54 Full genome sequencing Copy number variation Per sample: The reference needs to be very good. Sequencability biases. Mapping biases. Within a population: Mixture of distributions. Not sensitive to aforementioned biases. Needs a lot of controls. Workshop NGS, Hogeschool Leiden 33/50 Thursday, May 22, 2014

55 Full genome sequencing Structural variation Figure 25: Multiple issues while mapping. Workshop NGS, Hogeschool Leiden 34/50 Thursday, May 22, 2014

56 Full genome sequencing Structural variation Figure 26: Discordant and split reads. Workshop NGS, Hogeschool Leiden 35/50 Thursday, May 22, 2014

57 Full genome sequencing Structural variation Figure 27: Different types of structural variation. Workshop NGS, Hogeschool Leiden 36/50 Thursday, May 22, 2014

58 Assesmbly De Novo assembly Figure 28: General overview of de novo assembly. Workshop NGS, Hogeschool Leiden 37/50 Thursday, May 22, 2014

59 Assesmbly De Novo assembly Figure 29: Overlaps are used to reconstruct a genome. Workshop NGS, Hogeschool Leiden 38/50 Thursday, May 22, 2014

60 Scaffolding De Novo assembly Figure 30: Paired end or mate pair reads can be used. Workshop NGS, Hogeschool Leiden 39/50 Thursday, May 22, 2014

61 De Novo assembly Easier assembly with PacBio Figure 31: Correcting PacBio reads. Workshop NGS, Hogeschool Leiden 40/50 Thursday, May 22, 2014

62 Pipelines Pipelines Figure 32: A real-life pipeline. Workshop NGS, Hogeschool Leiden 41/50 Thursday, May 22, 2014

63 Pipelines Pipelines Figure 33: Scene from Modern times. Workshop NGS, Hogeschool Leiden 42/50 Thursday, May 22, 2014

64 Pipelines Pipelines Combining tools: The output of one tool can serve as the input for another. Not necessarily linear.... Workshop NGS, Hogeschool Leiden 43/50 Thursday, May 22, 2014

65 Pipelines Pipelines Combining tools: The output of one tool can serve as the input for another. Not necessarily linear.... Running various different tools: Two or three different aligners. A couple of variant callers.... Workshop NGS, Hogeschool Leiden 43/50 Thursday, May 22, 2014

66 Pipelines Combining tools 1 bwa aln t 8 $reference $i > $i. sai 2 bwa samse $reference $i. sai $i > $i.sam 3 samtools view bt $reference o $i.bam $i.sam Listing 2: Shell script Workshop NGS, Hogeschool Leiden 44/50 Thursday, May 22, 2014

67 Pipelines Combining tools 1 bwa aln t 8 $reference $i > $i. sai 2 bwa samse $reference $i. sai $i > $i.sam 3 samtools view bt $reference o $i.bam $i.sam Listing 2: Shell script 1 %. sai : %.fq 2 $ (BWA) aln t $ (THREADS) $ ( c a l l MKREF, $< > 3 4 %.sam: %. sai %.fq 5 $ (BWA) samse $ ( c a l l MKREF, $ ˆ > 6 7 %.bam: %.sam 8 $ (SAMTOOLS) view bt $ ( c a l l MKREF, o $< Listing 3: Makefile Workshop NGS, Hogeschool Leiden 44/50 Thursday, May 22, 2014

68 Graphical interfaces Galaxy Galaxy: a graphical user interface: Wrapper for command line utilities. User friendly. Point and click. Workshop NGS, Hogeschool Leiden 45/50 Thursday, May 22, 2014

69 Galaxy Graphical interfaces Galaxy: a graphical user interface: Wrapper for command line utilities. User friendly. Point and click. Workflows. Save all the steps you did in your analysis. Rerun the entire analysis on a new dataset. Share your workflow with other people Workshop NGS, Hogeschool Leiden 45/50 Thursday, May 22, 2014

70 Galaxy Graphical interfaces Figure 34: Galaxy main user interface Workshop NGS, Hogeschool Leiden 46/50 Thursday, May 22, 2014

71 Galaxy Graphical interfaces Figure 35: User friendly interface with Galaxy Workshop NGS, Hogeschool Leiden 47/50 Thursday, May 22, 2014

72 Graphical interfaces Workflow of a parallel pipeline Makefile blueprint all create_report.secondary Makefile.SUFFIXES.DEFAULT recursive clean MC0184ACXX_s_1.h_cov MC0184ACXX_s_1.tar.gz MC0184ACXX_s_1.final.snponly.vcf.annotation MC0184ACXX_s_1.final.indelonly.vcf.annotation MC0184ACXX_s_1.covPerTarget MC0184ACXX_s_1.hsMetrics MC0184ACXX_s_1.mtdna_mincov.bed MC0184ACXX_s_1.asomes_avgcov.bed MC0184ACXX_s_1.xysomes_avgcov.bed MC0184ACXX_s_1.final.vcf.ti-tv MC0184ACXX_s_1.final.snponly.vcf MC0184ACXX_s_1.final.indelonly.vcf MC0184ACXX_s_1.final.vcf.vdb_annotation /sureselect_whole_exome_v2.hg19.cbr.bed.interval MC0184ACXX_s_1.asomes_mincov.bed MC0184ACXX_s_1.xysomes_mincov.bed MC0184ACXX_s_1.asomes_cnvhigh.bed MC0184ACXX_s_1.xysomes_cnvhigh.bed MC0184ACXX_s_1.final.vcf MC0184ACXX_s_1.insertSizesHist.pdf MC0184ACXX_s_1.pileup MC0184ACXX_s_1.insertSizes MC0184ACXX_s_1.mincov.bed MC0184ACXX_s_1.asomes_cutoff_steepcov.bed MC0184ACXX_s_1.xysomes_cutoff_steepcov.bed MC0184ACXX_s_1.asome.vcf MC0184ACXX_s_1.xysome.vcf MC0184ACXX_s_1.mtdna.vcf MC0184ACXX_s_1.bases.aligned MC0184ACXX_s_1.wig MC0184ACXX_s_1.asome_cutoff.vcf MC0184ACXX_s_1.xysome_cutoff.vcf MC0184ACXX_s_1.mtdna_cutoff.vcf MC0184ACXX_s_1.sorted.dedup.bam.flagstat MC0184ACXX_s_1.readsPerChr MC0184ACXX_s_1.sorted.dedup.bam.bai MC0184ACXX_s_1.sorted.dedup.bam.pileup MC0184ACXX_s_1.asome.targeted.avg-cov MC0184ACXX_s_1.xysome.targeted.avg-cov MC0184ACXX_s_1.bases.loose-ontarget MC0184ACXX_s_1.loose-ontarget.bam.flagstat MC0184ACXX_s_1.bases.strong-ontarget MC0184ACXX_s_1.strong-ontarget.bam.flagstat MC0184ACXX_s_1.bcf MC0184ACXX_s_1.sorted.dedup.bam.sorted.dedup.bam H.Sapiens/hg19/reference.fa /sureselect_whole_exome_v2.hg19.cbr.bed MC0184ACXX_s_1.sorted.dedup.bam MC0184ACXX_s_1.sorted.dedup.bam.sorted.bam MC0184ACXX_s_1.sorted.bam MC0184ACXX_s_1_1_sequence.filtered_fastqc MC0184ACXX_s_1_1_sequence_fastqc MC0184ACXX_s_1.sorted.dedup.bam.parts MC0184ACXX_s_1.parts 000.MC0184ACXX_s_1_1_sequence.part 000.MC0184ACXX_s_1_2_sequence.part MC0184ACXX_s_1_2_sequence.filtered_fastqc MC0184ACXX_s_1_1_sequence.part000 MC0184ACXX_s_1_2_sequence.part000 MC0184ACXX_s_1_2_sequence_fastqc MC0184ACXX_s_1_1_sequence.sync MC0184ACXX_s_1_2_sequence.sync MC0184ACXX_s_1_1_sequence.filtered MC0184ACXX_s_1_2_sequence.filtered MC0184ACXX_s_1_1_sequence.trimmed MC0184ACXX_s_1_2_sequence.txt.recaldata MC0184ACXX_s_1_1_sequence.txt.recaldata MC0184ACXX_s_1_2_sequence.trimmed MC0184ACXX_s_1_1_sequence.txt MC0184ACXX_s_1_2_sequence.txt Figure 36: Dependency diagram. Workshop NGS, Hogeschool Leiden 48/50 Thursday, May 22, 2014

73 Graphical interfaces Workflow of a parallel pipeline MC0184ACXX_s_1.sorted.dedup.bam MC0184ACXX_s_1.sorted.bam MC0184ACXX_s_1_1_sequenc MC0184ACXX_s_1.sorted.dedup.bam.parts MC0184ACXX_s_1.parts 000.MC0184ACXX_s_1_1_sequence.part 000.MC0184ACXX_s_1_2_sequence.part MC0184ACXX_s_1_1_sequence.part000 MC0184ACXX_s_1_2_sequence.part000 MC0184ACXX_s_1_1_sequence.sync MC0184ACXX_s_1_2_sequence.sync MC0184ACXX_s_1_1_sequence.filtered MC0184ACXX_s_1_2_sequence.filtered MC0184ACXX_s_1_1_sequence.trimmed MC0184ACXX_s_1_2_sequence.txt.recaldata MC0184ACXX_s_1 MC0184ACXX_s_1_1_sequence.txt Figure 37: Zoomed in. Workshop NGS, Hogeschool Leiden 49/50 Thursday, May 22, 2014

74 Questions? Acknowledgements: Michiel van Galen Martijn Vermaat Johan den Dunnen Workshop NGS, Hogeschool Leiden 50/50 Thursday, May 22, 2014

Introduction to next-generation sequencing data

Introduction to next-generation sequencing data Introduction to next-generation sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/research-centres/cem/ Outline History of DNA sequencing NGS

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977

More information

Next Gen Sequencing Summary of the short course Next Gen Sequencing at Avans hogeschool, Breda. 24/04/2013 Next gen Sequencing technologies

Next Gen Sequencing Summary of the short course Next Gen Sequencing at Avans hogeschool, Breda. 24/04/2013 Next gen Sequencing technologies Next Gen Sequencing Summary of the short course Next Gen Sequencing at Avans hogeschool, Breda 24/04/2013 Next gen Sequencing technologies 1 2nd Gen Sequencing Summary of the short course Next Gen Sequencing

More information

Writing & Running Pipelines on the Open Grid Engine using QMake. Wibowo Arindrarto DTLS Focus Meeting 15.04.2014

Writing & Running Pipelines on the Open Grid Engine using QMake. Wibowo Arindrarto DTLS Focus Meeting 15.04.2014 Writing & Running Pipelines on the Open Grid Engine using QMake Wibowo Arindrarto DTLS Focus Meeting 15.04.2014 Makefile (re)introduction Atomic recipes / rules that define full pipelines Initially written

More information

Next generation sequencing (NGS) Bioinformatics Challenges and strategies. Urmi Trivedi Lead Bioinformatician

Next generation sequencing (NGS) Bioinformatics Challenges and strategies. Urmi Trivedi Lead Bioinformatician Next generation sequencing (NGS) Bioinformatics Challenges and strategies Urmi Trivedi Lead Bioinformatician urmi.trivedi@ed.ac.uk Major Bottlenecks Data volume Data complexity Data noise Overview Solutions

More information

NGS data analysis. Bernardo J. Clavijo

NGS data analysis. Bernardo J. Clavijo NGS data analysis Bernardo J. Clavijo 1 A brief history of DNA sequencing 1953 double helix structure, Watson & Crick! 1977 rapid DNA sequencing, Sanger! 1977 first full (5k) genome bacteriophage Phi X!

More information

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249

More information

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis ; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome Technology Center,

More information

Next Generation Sequencing: Technology, Mapping, and Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took

More information

Next generation DNA sequencing technologies. theory & prac-ce

Next generation DNA sequencing technologies. theory & prac-ce Next generation DNA sequencing technologies theory & prac-ce Outline Next- Genera-on sequencing (NGS) technologies overview NGS applica-ons NGS workflow: data collec-on and processing the exome sequencing

More information

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis ; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome Technology Center,

More information

INTRODUCTION TO NGS VARIANT CALLING ANALYSIS

INTRODUCTION TO NGS VARIANT CALLING ANALYSIS Hospital Universitari Vall d Hebron Institut de Recerca - VHIR Institut d Investigació Sanitària de l Instituto de Salud Carlos III (ISCIII) INTRODUCTION TO NGS VARIANT CALLING ANALYSIS Bioinformàtica

More information

Copy Number Variation: available tools

Copy Number Variation: available tools Copy Number Variation: available tools Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Introduction A literature review of available

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

E. coli plasmid and gene profiling using Next Generation Sequencing

E. coli plasmid and gene profiling using Next Generation Sequencing E. coli plasmid and gene profiling using Next Generation Sequencing Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Introduction General

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing Matthew D. Clark PhD Group leader Genomics, The Genome Analysis Centre Norwich, UK Costs & disrup,ve technologies 454 & polony Solexa & SOLiD End of the gold rush? GAII HiSeq

More information

Data Analysis for Ion Torrent Sequencing

Data Analysis for Ion Torrent Sequencing IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page

More information

Next Generation Sequencing. Tobias Österlund

Next Generation Sequencing. Tobias Österlund Next Generation Sequencing Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 12/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 25/2 08.00-09.45

More information

MiSeq: Imaging and Base Calling

MiSeq: Imaging and Base Calling MiSeq: Imaging and Page Welcome Navigation Presenter Introduction MiSeq Sequencing Workflow Narration Welcome to MiSeq: Imaging and. This course takes 35 minutes to complete. Click Next to continue. Please

More information

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Data Analysis & Management of High-throughput Sequencing Data Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Current Issues Current Issues The QSEQ file Number files per

More information

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis ; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Avans Hogeschool Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome

More information

Next Generation Sequencing I: Technologies. Jim Noonan Department of Genetics

Next Generation Sequencing I: Technologies. Jim Noonan Department of Genetics Next Generation Sequencing I: Technologies Jim Noonan Department of Genetics Sequence as the readout for biological processes Determining the biological state of cells, tissues and organisms requires the

More information

Analysis of NGS Data

Analysis of NGS Data Analysis of NGS Data Introduction and Basics Folie: 1 Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference

More information

FastQC 1. Introduction 1.1 What is FastQC

FastQC 1. Introduction 1.1 What is FastQC FastQC 1. Introduction 1.1 What is FastQC Modern high throughput sequencers can generate tens of millions of sequences in a single run. Before analysing this sequence to draw biological conclusions you

More information

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

An example of bioinformatics application on plant breeding projects in Rijk Zwaan An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on

More information

Next Generation Sequencing for Invertebrate Virus Discovery

Next Generation Sequencing for Invertebrate Virus Discovery Next Generation Sequencing for Invertebrate Virus Discovery -a practical approach Sijun Liu & Bryony C. Bonning Iowa State University, USA 8-14-2013 SIP Pittsburgh Outline Introduction: Why use NGS? Traditional

More information

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. To published results faster With proven scalability To the forefront of discovery To limitless applications

More information

INTRODUCTION TO NEXT-GENERATION SEQUENCING

INTRODUCTION TO NEXT-GENERATION SEQUENCING INTRODUCTION TO NEXT-GENERATION SEQUENCING Louis Letourneau, eng. and Mathieu Bourgey, Ph.D CAGEKID Cancer Genomics Workshop March 18 th - 21 st CAGEKID M12 Meeting - 6 May 2011 My work Many facets of

More information

G E N OM I C S S E RV I C ES

G E N OM I C S S E RV I C ES GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E

More information

Single Nucleotide Polymorphism (SNP) Calling from Next-Gen Sequencing (NGS) data for Bacterial Phylogenetics

Single Nucleotide Polymorphism (SNP) Calling from Next-Gen Sequencing (NGS) data for Bacterial Phylogenetics Single Nucleotide Polymorphism (SNP) Calling from Next-Gen Sequencing (NGS) data for Bacterial Phylogenetics Taj Azarian, MPH Doctoral Student Department of Epidemiology College of Medicine and College

More information

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office 2013 Laboratory Accreditation Program Audioconferences and Webinars Implementing Next Generation Sequencing (NGS) as a Clinical Tool in the Laboratory Nazneen Aziz, PhD Director, Molecular Medicine Transformation

More information

Text file One header line meta information lines One line : variant/position

Text file One header line meta information lines One line : variant/position Software Calling: GATK SAMTOOLS mpileup Varscan SOAP VCF format Text file One header line meta information lines One line : variant/position ##fileformat=vcfv4.1! ##filedate=20090805! ##source=myimputationprogramv3.1!

More information

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis By the end of this lab students should be able to: Describe the uses for each line of the DNA subway program (Red/Yellow/Blue/Green) Describe

More information

Bioinformatics Summer School Konstantin Okonechnikov Max Planck Institute For Infection Biology

Bioinformatics Summer School Konstantin Okonechnikov Max Planck Institute For Infection Biology Bioinformatics Summer School 2014 Konstantin Okonechnikov Max Planck Institute For Infection Biology Quality Control of High Throughput Sequencing Data Летняя Школа Биоинформатики 2014 If we lived in a

More information

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing

More information

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design) Experimental Design & Intro to NGS Data Analysis Ryan Peters Field Application Specialist Partek, Incorporated Agenda Experimental Design Examples ANOVA What assays are possible? NGS Analytical Process

More information

RNA-Seq Software, Tools, and Workflows

RNA-Seq Software, Tools, and Workflows RNA-Seq Software, Tools, and Workflows Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 2016 Workshop Some mrna-seq Applications Differential gene expression analysis Transcriptional profiling Assumption:

More information

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless

More information

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Introduction Mate pair sequencing enables the generation of libraries with insert sizes in the range of several kilobases (Kb).

More information

NGS Data Analysis: An Intro to RNA-Seq

NGS Data Analysis: An Intro to RNA-Seq NGS Data Analysis: An Intro to RNA-Seq March 25th, 2014 GST Colloquim: March 25th, 2014 1 / 1 Workshop Design Basics of NGS Sample Prep RNA-Seq Analysis GST Colloquim: March 25th, 2014 2 / 1 Experimental

More information

LifeScope Genomic Analysis Software 2.5

LifeScope Genomic Analysis Software 2.5 USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use

More information

Lectures 1 and 8 15. February 7, 2013. Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

Lectures 1 and 8 15. February 7, 2013. Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling Lectures 1 and 8 15 February 7, 2013 This is a review of the material from lectures 1 and 8 14. Note that the material from lecture 15 is not relevant for the final exam. Today we will go over the material

More information

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,

More information

Analysis of ChIP-seq data in Galaxy

Analysis of ChIP-seq data in Galaxy Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

RNA-Seq Data Analysis. I-Hsuan Lin

RNA-Seq Data Analysis. I-Hsuan Lin RNA-Seq Data Analysis I-Hsuan Lin LSL Next-Generation Sequencing Workshop (Day 3) 19 Nov 2015 Transcriptome 2 The complete set of RNA species in a cell and their quantities Transcriptomics To catalogue

More information

Challenges associated with analysis and storage of NGS data

Challenges associated with analysis and storage of NGS data Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group gabry@ebi.ac.uk Next-generation sequencing Next-generation sequencing

More information

Computational Genomics. Next generation sequencing (NGS)

Computational Genomics. Next generation sequencing (NGS) Computational Genomics Next generation sequencing (NGS) Sequencing technology defies Moore s law Nature Methods 2011 Log 10 (price) Sequencing the Human Genome 2001: Human Genome Project 2.7G$, 11 years

More information

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice Next-generation Sequencing: Standardization of Clinical Testing (Nex-StoCT) Workgroup Principles and Guidelines Supplementary

More information

Deep Sequencing Data Analysis

Deep Sequencing Data Analysis Deep Sequencing Data Analysis Ross Whetten Professor Forestry & Environmental Resources Background Who am I, and why am I teaching this topic? I am not an expert in bioinformatics I started as a biologist

More information

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis ; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Avans Hogeschool Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome

More information

Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12

Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12 Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12 What s Galaxy? Bringing Developers And Biologists Together. Reproducible Science Is Our Goal An open, web-based platform for data intensive

More information

BioHPC Web Computing Resources at CBSU

BioHPC Web Computing Resources at CBSU BioHPC Web Computing Resources at CBSU 3CPG workshop Robert Bukowski Computational Biology Service Unit http://cbsu.tc.cornell.edu/lab/doc/biohpc_web_tutorial.pdf BioHPC infrastructure at CBSU BioHPC Web

More information

Comparing Methods for Identifying Transcription Factor Target Genes

Comparing Methods for Identifying Transcription Factor Target Genes Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF

More information

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples DATA Sheet Single-Cell DNA Sequencing with the C 1 Single-Cell Auto Prep System Reveal hidden populations and genetic diversity within complex samples Single-cell sensitivity Discover and detect SNPs,

More information

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v2.2.0. 1.1 SMRT Analysis v2.2.0 Overview. Notes:

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v2.2.0. 1.1 SMRT Analysis v2.2.0 Overview. Notes: SMRT Analysis v2.2.0 Overview 100 338 400 01 1. SMRT Analysis v2.2.0 1.1 SMRT Analysis v2.2.0 Overview Welcome to Pacific Biosciences' SMRT Analysis v2.2.0 Overview 1.2 Contents This module will introduce

More information

-> Integration of MAPHiTS in Galaxy

-> Integration of MAPHiTS in Galaxy Enabling NGS Analysis with(out) the Infrastructure, 12:0512 Development of a workflow for SNPs detection in grapevine From Sets to Graphs: Towards a Realistic Enrichment Analy species: MAPHiTS -> Integration

More information

How-To: SNP and INDEL detection

How-To: SNP and INDEL detection How-To: SNP and INDEL detection April 23, 2014 Lumenogix NGS SNP and INDEL detection Mutation Analysis Identifying known, and discovering novel genomic mutations, has been one of the most popular applications

More information

Practical Guideline for Whole Genome Sequencing

Practical Guideline for Whole Genome Sequencing Practical Guideline for Whole Genome Sequencing Disclosure Kwangsik Nho Assistant Professor Center for Neuroimaging Department of Radiology and Imaging Sciences Center for Computational Biology and Bioinformatics

More information

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova New generation sequencing: current limits and future perspectives Giorgio Valle CRIBI Università di Padova Around 2004 the Race for the 1000$ Genome started A few questions... When? How? Why? Standard

More information

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each

More information

Bioinformatics in next generation sequencing projects

Bioinformatics in next generation sequencing projects Once sequenced the problem becomes computational Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet

More information

Written test: Analysis of data from high-throughput molecular biology experiments BB2490 (BIO) or DD2399 (CSC)

Written test: Analysis of data from high-throughput molecular biology experiments BB2490 (BIO) or DD2399 (CSC) Written test: Analysis of data from high-throughput molecular biology experiments BB2490 (BIO) or DD2399 (CSC) Name: Pnr: Wednesday the 18th of February, 13.00-15.00, Albanova FB52 Instructions: The test

More information

PLNT2530 Unit 6e DNA Sequencing

PLNT2530 Unit 6e DNA Sequencing PLNT2530 Unit 6e DNA Sequencing Unless otherwise cited or referenced, all content of this presenataion is licensed under the Creative Commons License Attribution Share-Alike 2.5 Canada 1 High-throughput

More information

Genomics for Dummies. Bio informatics and Comparative Genomes Analysis: Jean Michel Claverie 6 18 mai 2013

Genomics for Dummies. Bio informatics and Comparative Genomes Analysis: Jean Michel Claverie 6 18 mai 2013 Genomics for Dummies Bio informatics and Comparative Genomes Analysis: Jean Michel Claverie 6 18 mai 2013 Do I need to know more than just this telephone number? Introduction Try to learn/understand things

More information

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011 NECC History Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011 EPSCoR Cyberinfrastructure Workshop First regional NENI (now NECC) Workshop held in Vermont in August 2007 Workshop heldinkentucky

More information

COURSE OF BIOINFORMATICS

COURSE OF BIOINFORMATICS COURSE OF BIOINFORMATICS a.a. 2015-2016 Bioinformatic Analysis of Next Generation Sequencing Data What is massively parallel sequencing? Next-generation sequencing (NGS), also known as high-throughput

More information

NSilico Life Science Introductory Bioinformatics Course

NSilico Life Science Introductory Bioinformatics Course NSilico Life Science Introductory Bioinformatics Course INTRODUCTORY BIOINFORMATICS COURSE A public course delivered over three days on the fundamentals of bioinformatics and illustrated with lectures,

More information

SEQUENCING. From Sample to Sequence-Ready

SEQUENCING. From Sample to Sequence-Ready SEQUENCING From Sample to Sequence-Ready ACCESS ARRAY SYSTEM HIGH-QUALITY LIBRARIES, NOT ONCE, BUT EVERY TIME The highest-quality amplicons more sensitive, accurate, and specific Full support for all major

More information

Version 5.0 Release Notes

Version 5.0 Release Notes Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

RNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance

RNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance RNA Express Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance ILLUMINA PROPRIETARY 15052918 Rev. A February 2014 This document and its contents are

More information

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University Genotyping by sequencing and data analysis Ross Whetten North Carolina State University Stein (2010) Genome Biology 11:207 More New Technology on the Horizon Genotyping By Sequencing Timeline 2007 Complexity

More information

Towards Integrating the Detection of Genetic Variants into an In-Memory Database

Towards Integrating the Detection of Genetic Variants into an In-Memory Database Towards Integrating the Detection of Genetic Variants into an 2nd International Workshop on Big Data in Bioinformatics and Healthcare Oct 27, 2014 Motivation Genome Data Analysis Process DNA Sample Base

More information

NGS Technologies for Genomics and Transcriptomics

NGS Technologies for Genomics and Transcriptomics NGS Technologies for Genomics and Transcriptomics Massimo Delledonne Department of Biotechnologies - University of Verona http://profs.sci.univr.it/delledonne 13 years and $3 billion required for the Human

More information

Data formats and file conversions

Data formats and file conversions Building Excellence in Genomics and Computational Bioscience s Richard Leggett (TGAC) John Walshaw (IFR) Common file formats FASTQ FASTA BAM SAM Raw sequence Alignments MSF EMBL UniProt BED WIG Databases

More information

Next generation sequencing (NGS)

Next generation sequencing (NGS) Next generation sequencing (NGS) Vijayachitra Modhukur BIIT modhukur@ut.ee 1 Bioinformatics course 11/13/12 Sequencing 2 Bioinformatics course 11/13/12 Microarrays vs NGS Sequences do not need to be known

More information

July 7th 2009 DNA sequencing

July 7th 2009 DNA sequencing July 7th 2009 DNA sequencing Overview Sequencing technologies Sequencing strategies Sample preparation Sequencing instruments at MPI EVA 2 x 5 x ABI 3730/3730xl 454 FLX Titanium Illumina Genome Analyzer

More information

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012 RNA-Seq Tutorial 1 John Garbe Research Informatics Support Systems, MSI March 19, 2012 Tutorial 1 RNA-Seq Tutorials RNA-Seq experiment design and analysis Instruction on individual software will be provided

More information

Practical Solutions for Big Data Analytics

Practical Solutions for Big Data Analytics Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)

More information

Basic processing of next-generation sequencing (NGS) data

Basic processing of next-generation sequencing (NGS) data Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance

More information

Introduction to NGS Technologies

Introduction to NGS Technologies Introduction to NGS Technologies Ignacio Medina im411@cam.ac.uk Head of Computational Biology Lab HPC Service, University of Cambridge, UK EMBL-EBI Scientific collaborator Genome Campus, Hinxton, Cambridge,

More information

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow Barry Bolding Cray Inc Seattle, WA 1 CUG 2013 Paper Genomic Applications on Cray supercomputers: Next Generation Sequencing

More information

Ion Torrent Amplicon Sequencing

Ion Torrent Amplicon Sequencing APPLICATION NOTE Amplicon Sequencing Ion Torrent Amplicon Sequencing Introduction The ability to sequence a genome or a portion of a genome has enabled researchers to begin to understand how the genetic

More information

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis ; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome Technology Center,

More information

MiSeq Reporter Library QC Workflow Guide

MiSeq Reporter Library QC Workflow Guide MiSeq Reporter Library QC Workflow Guide For Research Use Only. Not for use in diagnostic procedures. Revision History 3 Introduction 4 Library QC Workflow Overview 5 Library QC Summary Tab 6 Library QC

More information

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation PN 100-9879 A1 TECHNICAL NOTE Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation Introduction Cancer is a dynamic evolutionary process of which intratumor genetic and phenotypic

More information

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing STGAAC STGAACT GTGCACT GTGAACT STGAAC STGAACT GTGCACT GTGAACT STGAAC STGAAC GTGCAC GTGAAC Wouter Coppieters Head of the genomics core facility GIGA center, University of Liège Bioruptor NGS: Unbiased DNA

More information

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute for Age Research - Fritz Lipmann Institute (FLI)

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute for Age Research - Fritz Lipmann Institute (FLI) DNA-Sequencing Technologies & Devices Matthias Platzer Genome Analysis Leibniz Institute for Age Research - Fritz Lipmann Institute (FLI) Genome analysis DNA sequencing platforms ABI 3730xl 4/2004 & 6/2006

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing Molecular Methods Sylvain Forêt March 2010 http://dayhoff.anu.edu.au/~sf/next_gen_seq 1 Introduction 2 Sanger 3 Illumina 4 454 5 SOLiD 6 Summary The Genomic Age Recent landmarks

More information

RNAseq / ChipSeq / Methylseq and personalized genomics

RNAseq / ChipSeq / Methylseq and personalized genomics RNAseq / ChipSeq / Methylseq and personalized genomics 7711 Lecture Subhajyo) De, PhD Division of Biomedical Informa)cs and Personalized Biomedicine, Department of Medicine University of Colorado School

More information

Detecting the Sardinian Specific Variability Trough Next Generation Sequencing of 2120 Individuals

Detecting the Sardinian Specific Variability Trough Next Generation Sequencing of 2120 Individuals UNIVERSITÀ DEGLI STUDI DI SASSARI Scuola di Dottorato in Scienze Biomediche XXV CICLO DOTTORATO DI RICERCA IN SCIENZE BIOMEDICHE INDIRIZZO DI GENETICA MEDICA, MALATTIE METABOLICHE E NUTRIGENOMICA Direttore:

More information

Welcome to the Plant Breeding and Genomics Webinar Series

Welcome to the Plant Breeding and Genomics Webinar Series Welcome to the Plant Breeding and Genomics Webinar Series Today s Presenter: Dr. Candice Hansey Presentation: http://www.extension.org/pages/ 60428 Host: Heather Merk Technical Production: John McQueen

More information

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production Page 1 of 6 UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production February 05, 2010 Newsletter: BioInform BioInform - February 5, 2010 By Vivien Marx Scientists at the department

More information

ABiL. Workforce Development Course Description. A unique bioinformatics resource for the translation of molecular data into

ABiL. Workforce Development Course Description. A unique bioinformatics resource for the translation of molecular data into Workforce Development Course Description ABiL A unique bioinformatics resource for the translation of molecular data into Applied BioInformatics Laboratory actionable public health intelligence ABiL is

More information

How Sequencing Experiments Fail

How Sequencing Experiments Fail How Sequencing Experiments Fail v1.0 Simon Andrews simon.andrews@babraham.ac.uk Classes of Failure Technical Tracking Library Contamination Biological Interpretation Something went wrong with a machine

More information

Genomic Testing: Actionability, Validation, and Standard of Lab Reports

Genomic Testing: Actionability, Validation, and Standard of Lab Reports Genomic Testing: Actionability, Validation, and Standard of Lab Reports emerge: Laura Rasmussen-Torvik Reaction: Heidi Rehm Summary: Dick Weinshilboum Panel: Murray Brilliant, David Carey, John Carpten,

More information

Next Generation Sequencing data Analysis at Genoscope

Next Generation Sequencing data Analysis at Genoscope Next Generation Sequencing data Analysis at Genoscope Genoscope (National Sequencing center) Among the largest sequencing center in Europe Part of the CEA Institut de Génomique since 05/2007 Provide high-throughput

More information

Bioinformatics Unit Department of Biological Services. Get to know us

Bioinformatics Unit Department of Biological Services. Get to know us Bioinformatics Unit Department of Biological Services Get to know us Domains of Activity IT & programming Microarray analysis Sequence analysis Bioinformatics Team Biostatistical support NGS data analysis

More information

Exercise 11 - Understanding the Output for a blastn Search (excerpted from a document created by Wilson Leung, Washington University)

Exercise 11 - Understanding the Output for a blastn Search (excerpted from a document created by Wilson Leung, Washington University) Exercise 11 - Understanding the Output for a blastn Search (excerpted from a document created by Wilson Leung, Washington University) Read the following tutorial to better understand the BLAST report for

More information

Improving MAKER Gene Annotations in Grasses through the Use of GC Specific Hidden Markov Models

Improving MAKER Gene Annotations in Grasses through the Use of GC Specific Hidden Markov Models Improving MAKER Gene Annotations in Grasses through the Use of GC Specific Hidden Markov Models Megan Bowman Childs Lab Bioinformatics Seminar 22 April 2015 Outline GC content in plant genomes Codon usage

More information