Introduction to NGS data analysis

Size: px
Start display at page:

Download "Introduction to NGS data analysis"

Transcription

1 Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics

2 Sequencing Illumina platforms Characteristics: High throughput (3 genomes). Paired end. High accuracy. Read length 2 125bp. Relatively long run time (6 days). Relatively expensive. Figure 1: HiSeq Workshop NGS, Hogeschool Leiden 1/50 Thursday, May 22, 2014

3 Sequencing Illumina platforms Characteristics: Moderate throughput (3 exomes). Paired end. High accuracy. Read length 2 300bp. Relatively short run time (3 days). Relatively expensive. Figure 2: MiSeq. Workshop NGS, Hogeschool Leiden 2/50 Thursday, May 22, 2014

4 Sequencing Illumina platforms Figure 3: Flowcell. Figure 4: Amplification. Workshop NGS, Hogeschool Leiden 3/50 Thursday, May 22, 2014

5 Sequencing Illumina platforms Figure 3: Flowcell. Optical system. DNA fragments are loaded on a flowcell. Fragments are amplified. Clusters of fragments give the same signal. Figure 4: Amplification. Workshop NGS, Hogeschool Leiden 3/50 Thursday, May 22, 2014

6 Sequencing Illumina platforms Figure 5: Image of tile (part of the flowcell). Workshop NGS, Hogeschool Leiden 4/50 Thursday, May 22, 2014

7 Sequencing Illumina platforms Figure 6: Base calling on Illumina systems. Workshop NGS, Hogeschool Leiden 5/50 Thursday, May 22, 2014

8 Sequencing Illumina platforms Figure 7: Paired end. Workshop NGS, Hogeschool Leiden 6/50 Thursday, May 22, 2014

9 Sequencing Illumina platforms Advantages: Mapping to non-unique regions. Structural variation. De novo assembly (scaffolding). Figure 7: Paired end. Workshop NGS, Hogeschool Leiden 6/50 Thursday, May 22, 2014

10 Life-Tech Sequencing Characteristics: Low throughput. Single end (for now). High accuracy. Read length ±400bp. Short run time (1 day). Cheap runs. Figure 8: Ion torrent. Workshop NGS, Hogeschool Leiden 7/50 Thursday, May 22, 2014

11 Life-Tech Sequencing Figure 9: Ion proton. Characteristics: Moderate throughput (1 exome). Single end (for now). High accuracy. Read length ±175bp. Short run time (1 day). Cheap runs. Workshop NGS, Hogeschool Leiden 8/50 Thursday, May 22, 2014

12 Life-Tech Sequencing Figure 10: Ion Torrent chip. Figure 11: Ion Proton chip. Workshop NGS, Hogeschool Leiden 9/50 Thursday, May 22, 2014

13 Life-Tech Sequencing Figure 12: Close up of an Ion chip. Workshop NGS, Hogeschool Leiden 10/50 Thursday, May 22, 2014

14 Life-Tech Sequencing Figure 14: Base calling. Figure 13: Loading a sample. Problems with mono nucleotide stretches. Workshop NGS, Hogeschool Leiden 11/50 Thursday, May 22, 2014

15 Sequencing Pacific Biosciences Figure 15: PacBio RS. Workshop NGS, Hogeschool Leiden 12/50 Thursday, May 22, 2014

16 Sequencing Pacific Biosciences Characteristics: Long reads (50% is longer than 10,000 bp). High error rate (15-20%). Low throughput (comparable with the Roche 454). Workshop NGS, Hogeschool Leiden 13/50 Thursday, May 22, 2014

17 Sequencing Pacific Biosciences Characteristics: Long reads (50% is longer than 10,000 bp). High error rate (15-20%). Low throughput (comparable with the Roche 454). Circular consensus sequencing. Sequence the same molecule several times. Extremely high accuracy. Acceptable read length (±250bp). Workshop NGS, Hogeschool Leiden 13/50 Thursday, May 22, 2014

18 Sequencing Pacific Biosciences Figure 16: Real time polymerase monitoring. Workshop NGS, Hogeschool Leiden 14/50 Thursday, May 22, 2014

19 Sequencing Pacific Biosciences Figure 17: Base calling on the PacBio. Workshop NGS, Hogeschool Leiden 15/50 Thursday, May 22, 2014

20 Data analysis Next generation sequencing data 2 TTCGGGGGCTGGCAAATCCACTTCCGTGACACGCTACCATTCGCTGGTGGT , &07+03:54/ >@/ (+ 6 CGGTAAACCACCCTGCTGACGGAACCCTAATGCGCCTGAAAGACAGCGTTC / (.55:;:99(0(+2(22(0316;185;;0;:<<>=AA59 10 TCGTTAACGACTTTGTTCGCCACCGCAACCGCCTGTTTCGGGTCACAGGCA ;5? <;?@A4?B:BBB<AA>CCC>C>BB0. >= :@5@< 14 TTGATGAATATATTATTTCAGGGAATAATTATGACACCTTTAGAACGCATT <<@::5:<;==7;>>/79<:.:494.8(,,8:753/5@5??C>B???B7 Listing 1: A FastQ file. Workshop NGS, Hogeschool Leiden 16/50 Thursday, May 22, 2014

21 Data analysis Common data analysis techniques Reference based techniques: Resequencing. Exome sequencing. Transcriptome sequencing (RNA). Full genome sequencing. Structural variation. Copy number variation. Not reference based: De novo assembly. k-mer profiling. Workshop NGS, Hogeschool Leiden 17/50 Thursday, May 22, 2014

22 Resequencing Data analysis Resequencing pipelines can roughly be divided in five steps. Workshop NGS, Hogeschool Leiden 18/50 Thursday, May 22, 2014

23 Resequencing Data analysis Resequencing pipelines can roughly be divided in five steps. 1. Pre-alignment. Quality control. Data cleaning. Workshop NGS, Hogeschool Leiden 18/50 Thursday, May 22, 2014

24 Resequencing Data analysis Resequencing pipelines can roughly be divided in five steps. 1. Pre-alignment. Quality control. Data cleaning. 2. Alignment. Post-alignment quality control. Workshop NGS, Hogeschool Leiden 18/50 Thursday, May 22, 2014

25 Resequencing Data analysis Resequencing pipelines can roughly be divided in five steps. 1. Pre-alignment. Quality control. Data cleaning. 2. Alignment. Post-alignment quality control. 3. Variant calling. Workshop NGS, Hogeschool Leiden 18/50 Thursday, May 22, 2014

26 Resequencing Data analysis Resequencing pipelines can roughly be divided in five steps. 1. Pre-alignment. Quality control. Data cleaning. 2. Alignment. Post-alignment quality control. 3. Variant calling. 4. Filtering. Post-variant calling quality control. Workshop NGS, Hogeschool Leiden 18/50 Thursday, May 22, 2014

27 Resequencing Data analysis Resequencing pipelines can roughly be divided in five steps. 1. Pre-alignment. Quality control. Data cleaning. 2. Alignment. Post-alignment quality control. 3. Variant calling. 4. Filtering. Post-variant calling quality control. 5. Annotation. Workshop NGS, Hogeschool Leiden 18/50 Thursday, May 22, 2014

28 Pre-alignment Data cleaning Depending on the sequencing platform, parts of the reads need to be removed. Remove linker sequences (Cutadapt, FASTX toolkit). Clip low quality reads at the end of the read (Sickle, Trimmomatic, FASTX toolkit). Length filtering (Fastools). toolkit/ Workshop NGS, Hogeschool Leiden 19/50 Thursday, May 22, 2014

29 Trimming Pre-alignment Figure 18: Quality score per position. Workshop NGS, Hogeschool Leiden 20/50 Thursday, May 22, 2014

30 Clipping Pre-alignment Figure 19: Sequencing linkers. Workshop NGS, Hogeschool Leiden 21/50 Thursday, May 22, 2014

31 Pre-alignment Quality control The FastQC toolkit can be used for quality control (both before and after the data cleaning step). GC content. GC distribution. Quality scores distribution Workshop NGS, Hogeschool Leiden 22/50 Thursday, May 22, 2014

32 Figure 21: Per sequence quality. Workshop NGS, Hogeschool Leiden 23/50 Thursday, May 22, 2014 Pre-alignment Example QC output Figure 20: Per base sequence content.

33 Alignment The best match to the reference genome Very efficient. The reference genome needs to be indexed. Finding an alignment is as easy as looking up a word in a dictionary. Figure 22: Visualisation of an alignment. Workshop NGS, Hogeschool Leiden 24/50 Thursday, May 22, 2014

34 Alignment Choose an aligner Not all aligners can deal with indels. Only a couple of years ago, only SNPs were considered. Bowtie Workshop NGS, Hogeschool Leiden 25/50 Thursday, May 22, 2014

35 Alignment Choose an aligner Not all aligners can deal with indels. Only a couple of years ago, only SNPs were considered. Bowtie. Few aligners can work with large deletions. Spliced RNA. GMAP / GSNAP. Tophat. BWA-MEM Workshop NGS, Hogeschool Leiden 25/50 Thursday, May 22, 2014

36 Alignment Choose an aligner The choice of aligner may be restricted by the sequencer. For the Ion Torrent: Tmap. Combination of three different aligners. Deals with errors in homopolymer stretches. For the PacBio: BLASR. Workshop NGS, Hogeschool Leiden 26/50 Thursday, May 22, 2014

37 Variant calling Consistent deviations from the reference Figure 23: Result of an alignment. Workshop NGS, Hogeschool Leiden 27/50 Thursday, May 22, 2014

38 Variant calling Some considerations Things a variant caller might take into account: Strand balance. Base quality. Mapping quality. Distribution within the reads. Ploidity of the organism in question. Workshop NGS, Hogeschool Leiden 28/50 Thursday, May 22, 2014

39 Variant calling Some considerations Things a variant caller might take into account: Strand balance. Base quality. Mapping quality. Distribution within the reads. Ploidity of the organism in question. Complicating factors: Pooled samples. Workshop NGS, Hogeschool Leiden 28/50 Thursday, May 22, 2014

40 Variant calling Some considerations Things a variant caller might take into account: Strand balance. Base quality. Mapping quality. Distribution within the reads. Ploidity of the organism in question. Complicating factors: Pooled samples. RNA. Allele specific expression. RNA editing. Workshop NGS, Hogeschool Leiden 28/50 Thursday, May 22, 2014

41 Variant calling Some considerations Things a variant caller might take into account: Strand balance. Base quality. Mapping quality. Distribution within the reads. Ploidity of the organism in question. Complicating factors: Pooled samples. RNA. Allele specific expression. RNA editing. Strand specific sampleprep. Workshop NGS, Hogeschool Leiden 28/50 Thursday, May 22, 2014

42 Variant calling Choice of variant caller Rules of thumb: Well known organism and experiment: Statistical model. Use a simpler variant caller otherwise Workshop NGS, Hogeschool Leiden 29/50 Thursday, May 22, 2014

43 Variant calling Choice of variant caller Rules of thumb: Well known organism and experiment: Statistical model. Use a simpler variant caller otherwise. Popular variant callers: Samtools. GATK. VarScan Workshop NGS, Hogeschool Leiden 29/50 Thursday, May 22, 2014

44 Variant filtering Filtering on coverage We can set some thresholds: Minimum. Maximum. Workshop NGS, Hogeschool Leiden 30/50 Thursday, May 22, 2014

45 Variant filtering Filtering on coverage We can set some thresholds: Minimum. Maximum. We filter for a maximum coverage because of copy number variation. Workshop NGS, Hogeschool Leiden 30/50 Thursday, May 22, 2014

46 Variant filtering Filtering on coverage We can set some thresholds: Minimum. Maximum. We filter for a maximum coverage because of copy number variation. A good way to calculate the maximum: Calculate the mean coverage. Only of the covered (targeted) regions. Multiply this number with a reasonable factor e.g., 2.5. Workshop NGS, Hogeschool Leiden 30/50 Thursday, May 22, 2014

47 Annotation What is already known about a variant A selection of SeattleSeq annotation: Is the variant known? Does it hit a gene? Workshop NGS, Hogeschool Leiden 31/50 Thursday, May 22, 2014

48 Annotation What is already known about a variant A selection of SeattleSeq annotation: Is the variant known? Does it hit a gene? Is it in an intron? Does it hit a splice site? Workshop NGS, Hogeschool Leiden 31/50 Thursday, May 22, 2014

49 Annotation What is already known about a variant A selection of SeattleSeq annotation: Is the variant known? Does it hit a gene? Is it in an intron? Does it hit a splice site? Is it in the coding region? Is there a gain/loss of a stop codon? Does the variant result in a frameshift?... Workshop NGS, Hogeschool Leiden 31/50 Thursday, May 22, 2014

50 Annotation What is already known about a variant A selection of SeattleSeq annotation: Is the variant known? Does it hit a gene? Is it in an intron? Does it hit a splice site? Is it in the coding region? Is there a gain/loss of a stop codon? Does the variant result in a frameshift?... Is it in the 5 /3 UTR of a gene?... Workshop NGS, Hogeschool Leiden 31/50 Thursday, May 22, 2014

51 Annotation What is already known about a variant A selection of SeattleSeq annotation: Is the variant known? Does it hit a gene? Is it in an intron? Does it hit a splice site? Is it in the coding region? Is there a gain/loss of a stop codon? Does the variant result in a frameshift?... Is it in the 5 /3 UTR of a gene?... Is it in a regulatory region?... Workshop NGS, Hogeschool Leiden 31/50 Thursday, May 22, 2014

52 Full genome sequencing Copy number variation Figure 24: Coverage patterns over a whole chromosome. Workshop NGS, Hogeschool Leiden 32/50 Thursday, May 22, 2014

53 Full genome sequencing Copy number variation Per sample: The reference needs to be very good. Sequencability biases. Mapping biases. Workshop NGS, Hogeschool Leiden 33/50 Thursday, May 22, 2014

54 Full genome sequencing Copy number variation Per sample: The reference needs to be very good. Sequencability biases. Mapping biases. Within a population: Mixture of distributions. Not sensitive to aforementioned biases. Needs a lot of controls. Workshop NGS, Hogeschool Leiden 33/50 Thursday, May 22, 2014

55 Full genome sequencing Structural variation Figure 25: Multiple issues while mapping. Workshop NGS, Hogeschool Leiden 34/50 Thursday, May 22, 2014

56 Full genome sequencing Structural variation Figure 26: Discordant and split reads. Workshop NGS, Hogeschool Leiden 35/50 Thursday, May 22, 2014

57 Full genome sequencing Structural variation Figure 27: Different types of structural variation. Workshop NGS, Hogeschool Leiden 36/50 Thursday, May 22, 2014

58 Assesmbly De Novo assembly Figure 28: General overview of de novo assembly. Workshop NGS, Hogeschool Leiden 37/50 Thursday, May 22, 2014

59 Assesmbly De Novo assembly Figure 29: Overlaps are used to reconstruct a genome. Workshop NGS, Hogeschool Leiden 38/50 Thursday, May 22, 2014

60 Scaffolding De Novo assembly Figure 30: Paired end or mate pair reads can be used. Workshop NGS, Hogeschool Leiden 39/50 Thursday, May 22, 2014

61 De Novo assembly Easier assembly with PacBio Figure 31: Correcting PacBio reads. Workshop NGS, Hogeschool Leiden 40/50 Thursday, May 22, 2014

62 Pipelines Pipelines Figure 32: A real-life pipeline. Workshop NGS, Hogeschool Leiden 41/50 Thursday, May 22, 2014

63 Pipelines Pipelines Figure 33: Scene from Modern times. Workshop NGS, Hogeschool Leiden 42/50 Thursday, May 22, 2014

64 Pipelines Pipelines Combining tools: The output of one tool can serve as the input for another. Not necessarily linear.... Workshop NGS, Hogeschool Leiden 43/50 Thursday, May 22, 2014

65 Pipelines Pipelines Combining tools: The output of one tool can serve as the input for another. Not necessarily linear.... Running various different tools: Two or three different aligners. A couple of variant callers.... Workshop NGS, Hogeschool Leiden 43/50 Thursday, May 22, 2014

66 Pipelines Combining tools 1 bwa aln t 8 $reference $i > $i. sai 2 bwa samse $reference $i. sai $i > $i.sam 3 samtools view bt $reference o $i.bam $i.sam Listing 2: Shell script Workshop NGS, Hogeschool Leiden 44/50 Thursday, May 22, 2014

67 Pipelines Combining tools 1 bwa aln t 8 $reference $i > $i. sai 2 bwa samse $reference $i. sai $i > $i.sam 3 samtools view bt $reference o $i.bam $i.sam Listing 2: Shell script 1 %. sai : %.fq 2 $ (BWA) aln t $ (THREADS) $ ( c a l l MKREF, $@) $< > $@ 3 4 %.sam: %. sai %.fq 5 $ (BWA) samse $ ( c a l l MKREF, $@) $ ˆ > $@ 6 7 %.bam: %.sam 8 $ (SAMTOOLS) view bt $ ( c a l l MKREF, $@) o $@ $< Listing 3: Makefile Workshop NGS, Hogeschool Leiden 44/50 Thursday, May 22, 2014

68 Graphical interfaces Galaxy Galaxy: a graphical user interface: Wrapper for command line utilities. User friendly. Point and click. Workshop NGS, Hogeschool Leiden 45/50 Thursday, May 22, 2014

69 Galaxy Graphical interfaces Galaxy: a graphical user interface: Wrapper for command line utilities. User friendly. Point and click. Workflows. Save all the steps you did in your analysis. Rerun the entire analysis on a new dataset. Share your workflow with other people Workshop NGS, Hogeschool Leiden 45/50 Thursday, May 22, 2014

70 Galaxy Graphical interfaces Figure 34: Galaxy main user interface Workshop NGS, Hogeschool Leiden 46/50 Thursday, May 22, 2014

71 Galaxy Graphical interfaces Figure 35: User friendly interface with Galaxy Workshop NGS, Hogeschool Leiden 47/50 Thursday, May 22, 2014

72 Graphical interfaces Workflow of a parallel pipeline Makefile blueprint all create_report.secondary Makefile.SUFFIXES.DEFAULT recursive clean MC0184ACXX_s_1.h_cov MC0184ACXX_s_1.tar.gz MC0184ACXX_s_1.final.snponly.vcf.annotation MC0184ACXX_s_1.final.indelonly.vcf.annotation MC0184ACXX_s_1.covPerTarget MC0184ACXX_s_1.hsMetrics MC0184ACXX_s_1.mtdna_mincov.bed MC0184ACXX_s_1.asomes_avgcov.bed MC0184ACXX_s_1.xysomes_avgcov.bed MC0184ACXX_s_1.final.vcf.ti-tv MC0184ACXX_s_1.final.snponly.vcf MC0184ACXX_s_1.final.indelonly.vcf MC0184ACXX_s_1.final.vcf.vdb_annotation /sureselect_whole_exome_v2.hg19.cbr.bed.interval MC0184ACXX_s_1.asomes_mincov.bed MC0184ACXX_s_1.xysomes_mincov.bed MC0184ACXX_s_1.asomes_cnvhigh.bed MC0184ACXX_s_1.xysomes_cnvhigh.bed MC0184ACXX_s_1.final.vcf MC0184ACXX_s_1.insertSizesHist.pdf MC0184ACXX_s_1.pileup MC0184ACXX_s_1.insertSizes MC0184ACXX_s_1.mincov.bed MC0184ACXX_s_1.asomes_cutoff_steepcov.bed MC0184ACXX_s_1.xysomes_cutoff_steepcov.bed MC0184ACXX_s_1.asome.vcf MC0184ACXX_s_1.xysome.vcf MC0184ACXX_s_1.mtdna.vcf MC0184ACXX_s_1.bases.aligned MC0184ACXX_s_1.wig MC0184ACXX_s_1.asome_cutoff.vcf MC0184ACXX_s_1.xysome_cutoff.vcf MC0184ACXX_s_1.mtdna_cutoff.vcf MC0184ACXX_s_1.sorted.dedup.bam.flagstat MC0184ACXX_s_1.readsPerChr MC0184ACXX_s_1.sorted.dedup.bam.bai MC0184ACXX_s_1.sorted.dedup.bam.pileup MC0184ACXX_s_1.asome.targeted.avg-cov MC0184ACXX_s_1.xysome.targeted.avg-cov MC0184ACXX_s_1.bases.loose-ontarget MC0184ACXX_s_1.loose-ontarget.bam.flagstat MC0184ACXX_s_1.bases.strong-ontarget MC0184ACXX_s_1.strong-ontarget.bam.flagstat MC0184ACXX_s_1.bcf MC0184ACXX_s_1.sorted.dedup.bam.sorted.dedup.bam H.Sapiens/hg19/reference.fa /sureselect_whole_exome_v2.hg19.cbr.bed MC0184ACXX_s_1.sorted.dedup.bam MC0184ACXX_s_1.sorted.dedup.bam.sorted.bam MC0184ACXX_s_1.sorted.bam MC0184ACXX_s_1_1_sequence.filtered_fastqc MC0184ACXX_s_1_1_sequence_fastqc MC0184ACXX_s_1.sorted.dedup.bam.parts MC0184ACXX_s_1.parts 000.MC0184ACXX_s_1_1_sequence.part 000.MC0184ACXX_s_1_2_sequence.part MC0184ACXX_s_1_2_sequence.filtered_fastqc MC0184ACXX_s_1_1_sequence.part000 MC0184ACXX_s_1_2_sequence.part000 MC0184ACXX_s_1_2_sequence_fastqc MC0184ACXX_s_1_1_sequence.sync MC0184ACXX_s_1_2_sequence.sync MC0184ACXX_s_1_1_sequence.filtered MC0184ACXX_s_1_2_sequence.filtered MC0184ACXX_s_1_1_sequence.trimmed MC0184ACXX_s_1_2_sequence.txt.recaldata MC0184ACXX_s_1_1_sequence.txt.recaldata MC0184ACXX_s_1_2_sequence.trimmed MC0184ACXX_s_1_1_sequence.txt MC0184ACXX_s_1_2_sequence.txt Figure 36: Dependency diagram. Workshop NGS, Hogeschool Leiden 48/50 Thursday, May 22, 2014

73 Graphical interfaces Workflow of a parallel pipeline MC0184ACXX_s_1.sorted.dedup.bam MC0184ACXX_s_1.sorted.bam MC0184ACXX_s_1_1_sequenc MC0184ACXX_s_1.sorted.dedup.bam.parts MC0184ACXX_s_1.parts 000.MC0184ACXX_s_1_1_sequence.part 000.MC0184ACXX_s_1_2_sequence.part MC0184ACXX_s_1_1_sequence.part000 MC0184ACXX_s_1_2_sequence.part000 MC0184ACXX_s_1_1_sequence.sync MC0184ACXX_s_1_2_sequence.sync MC0184ACXX_s_1_1_sequence.filtered MC0184ACXX_s_1_2_sequence.filtered MC0184ACXX_s_1_1_sequence.trimmed MC0184ACXX_s_1_2_sequence.txt.recaldata MC0184ACXX_s_1 MC0184ACXX_s_1_1_sequence.txt Figure 37: Zoomed in. Workshop NGS, Hogeschool Leiden 49/50 Thursday, May 22, 2014

74 Questions? Acknowledgements: Michiel van Galen Martijn Vermaat Johan den Dunnen Workshop NGS, Hogeschool Leiden 50/50 Thursday, May 22, 2014

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977

More information

Introduction to next-generation sequencing data

Introduction to next-generation sequencing data Introduction to next-generation sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/research-centres/cem/ Outline History of DNA sequencing NGS

More information

Writing & Running Pipelines on the Open Grid Engine using QMake. Wibowo Arindrarto DTLS Focus Meeting 15.04.2014

Writing & Running Pipelines on the Open Grid Engine using QMake. Wibowo Arindrarto DTLS Focus Meeting 15.04.2014 Writing & Running Pipelines on the Open Grid Engine using QMake Wibowo Arindrarto DTLS Focus Meeting 15.04.2014 Makefile (re)introduction Atomic recipes / rules that define full pipelines Initially written

More information

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249

More information

NGS data analysis. Bernardo J. Clavijo

NGS data analysis. Bernardo J. Clavijo NGS data analysis Bernardo J. Clavijo 1 A brief history of DNA sequencing 1953 double helix structure, Watson & Crick! 1977 rapid DNA sequencing, Sanger! 1977 first full (5k) genome bacteriophage Phi X!

More information

Next Generation Sequencing: Technology, Mapping, and Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took

More information

Next generation DNA sequencing technologies. theory & prac-ce

Next generation DNA sequencing technologies. theory & prac-ce Next generation DNA sequencing technologies theory & prac-ce Outline Next- Genera-on sequencing (NGS) technologies overview NGS applica-ons NGS workflow: data collec-on and processing the exome sequencing

More information

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis ; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome Technology Center,

More information

Copy Number Variation: available tools

Copy Number Variation: available tools Copy Number Variation: available tools Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Introduction A literature review of available

More information

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis ; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome Technology Center,

More information

E. coli plasmid and gene profiling using Next Generation Sequencing

E. coli plasmid and gene profiling using Next Generation Sequencing E. coli plasmid and gene profiling using Next Generation Sequencing Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Introduction General

More information

Data Analysis for Ion Torrent Sequencing

Data Analysis for Ion Torrent Sequencing IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

MiSeq: Imaging and Base Calling

MiSeq: Imaging and Base Calling MiSeq: Imaging and Page Welcome Navigation Presenter Introduction MiSeq Sequencing Workflow Narration Welcome to MiSeq: Imaging and. This course takes 35 minutes to complete. Click Next to continue. Please

More information

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Data Analysis & Management of High-throughput Sequencing Data Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Current Issues Current Issues The QSEQ file Number files per

More information

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis ; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Avans Hogeschool Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome

More information

Analysis of NGS Data

Analysis of NGS Data Analysis of NGS Data Introduction and Basics Folie: 1 Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference

More information

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

An example of bioinformatics application on plant breeding projects in Rijk Zwaan An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on

More information

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office 2013 Laboratory Accreditation Program Audioconferences and Webinars Implementing Next Generation Sequencing (NGS) as a Clinical Tool in the Laboratory Nazneen Aziz, PhD Director, Molecular Medicine Transformation

More information

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. To published results faster With proven scalability To the forefront of discovery To limitless applications

More information

Text file One header line meta information lines One line : variant/position

Text file One header line meta information lines One line : variant/position Software Calling: GATK SAMTOOLS mpileup Varscan SOAP VCF format Text file One header line meta information lines One line : variant/position ##fileformat=vcfv4.1! ##filedate=20090805! ##source=myimputationprogramv3.1!

More information

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing

More information

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design) Experimental Design & Intro to NGS Data Analysis Ryan Peters Field Application Specialist Partek, Incorporated Agenda Experimental Design Examples ANOVA What assays are possible? NGS Analytical Process

More information

G E N OM I C S S E RV I C ES

G E N OM I C S S E RV I C ES GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E

More information

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis By the end of this lab students should be able to: Describe the uses for each line of the DNA subway program (Red/Yellow/Blue/Green) Describe

More information

NGS Data Analysis: An Intro to RNA-Seq

NGS Data Analysis: An Intro to RNA-Seq NGS Data Analysis: An Intro to RNA-Seq March 25th, 2014 GST Colloquim: March 25th, 2014 1 / 1 Workshop Design Basics of NGS Sample Prep RNA-Seq Analysis GST Colloquim: March 25th, 2014 2 / 1 Experimental

More information

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice Next-generation Sequencing: Standardization of Clinical Testing (Nex-StoCT) Workgroup Principles and Guidelines Supplementary

More information

Lectures 1 and 8 15. February 7, 2013. Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

Lectures 1 and 8 15. February 7, 2013. Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling Lectures 1 and 8 15 February 7, 2013 This is a review of the material from lectures 1 and 8 14. Note that the material from lecture 15 is not relevant for the final exam. Today we will go over the material

More information

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Introduction Mate pair sequencing enables the generation of libraries with insert sizes in the range of several kilobases (Kb).

More information

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless

More information

Challenges associated with analysis and storage of NGS data

Challenges associated with analysis and storage of NGS data Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group gabry@ebi.ac.uk Next-generation sequencing Next-generation sequencing

More information

LifeScope Genomic Analysis Software 2.5

LifeScope Genomic Analysis Software 2.5 USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use

More information

Computational Genomics. Next generation sequencing (NGS)

Computational Genomics. Next generation sequencing (NGS) Computational Genomics Next generation sequencing (NGS) Sequencing technology defies Moore s law Nature Methods 2011 Log 10 (price) Sequencing the Human Genome 2001: Human Genome Project 2.7G$, 11 years

More information

Deep Sequencing Data Analysis

Deep Sequencing Data Analysis Deep Sequencing Data Analysis Ross Whetten Professor Forestry & Environmental Resources Background Who am I, and why am I teaching this topic? I am not an expert in bioinformatics I started as a biologist

More information

How-To: SNP and INDEL detection

How-To: SNP and INDEL detection How-To: SNP and INDEL detection April 23, 2014 Lumenogix NGS SNP and INDEL detection Mutation Analysis Identifying known, and discovering novel genomic mutations, has been one of the most popular applications

More information

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,

More information

Comparing Methods for Identifying Transcription Factor Target Genes

Comparing Methods for Identifying Transcription Factor Target Genes Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF

More information

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis ; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Avans Hogeschool Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome

More information

BioHPC Web Computing Resources at CBSU

BioHPC Web Computing Resources at CBSU BioHPC Web Computing Resources at CBSU 3CPG workshop Robert Bukowski Computational Biology Service Unit http://cbsu.tc.cornell.edu/lab/doc/biohpc_web_tutorial.pdf BioHPC infrastructure at CBSU BioHPC Web

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

Analysis of ChIP-seq data in Galaxy

Analysis of ChIP-seq data in Galaxy Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers

More information

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples DATA Sheet Single-Cell DNA Sequencing with the C 1 Single-Cell Auto Prep System Reveal hidden populations and genetic diversity within complex samples Single-cell sensitivity Discover and detect SNPs,

More information

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each

More information

RNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance

RNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance RNA Express Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance ILLUMINA PROPRIETARY 15052918 Rev. A February 2014 This document and its contents are

More information

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v2.2.0. 1.1 SMRT Analysis v2.2.0 Overview. Notes:

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v2.2.0. 1.1 SMRT Analysis v2.2.0 Overview. Notes: SMRT Analysis v2.2.0 Overview 100 338 400 01 1. SMRT Analysis v2.2.0 1.1 SMRT Analysis v2.2.0 Overview Welcome to Pacific Biosciences' SMRT Analysis v2.2.0 Overview 1.2 Contents This module will introduce

More information

Practical Guideline for Whole Genome Sequencing

Practical Guideline for Whole Genome Sequencing Practical Guideline for Whole Genome Sequencing Disclosure Kwangsik Nho Assistant Professor Center for Neuroimaging Department of Radiology and Imaging Sciences Center for Computational Biology and Bioinformatics

More information

Version 5.0 Release Notes

Version 5.0 Release Notes Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University Genotyping by sequencing and data analysis Ross Whetten North Carolina State University Stein (2010) Genome Biology 11:207 More New Technology on the Horizon Genotyping By Sequencing Timeline 2007 Complexity

More information

Data formats and file conversions

Data formats and file conversions Building Excellence in Genomics and Computational Bioscience s Richard Leggett (TGAC) John Walshaw (IFR) Common file formats FASTQ FASTA BAM SAM Raw sequence Alignments MSF EMBL UniProt BED WIG Databases

More information

Next generation sequencing (NGS)

Next generation sequencing (NGS) Next generation sequencing (NGS) Vijayachitra Modhukur BIIT modhukur@ut.ee 1 Bioinformatics course 11/13/12 Sequencing 2 Bioinformatics course 11/13/12 Microarrays vs NGS Sequences do not need to be known

More information

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova New generation sequencing: current limits and future perspectives Giorgio Valle CRIBI Università di Padova Around 2004 the Race for the 1000$ Genome started A few questions... When? How? Why? Standard

More information

Welcome to the Plant Breeding and Genomics Webinar Series

Welcome to the Plant Breeding and Genomics Webinar Series Welcome to the Plant Breeding and Genomics Webinar Series Today s Presenter: Dr. Candice Hansey Presentation: http://www.extension.org/pages/ 60428 Host: Heather Merk Technical Production: John McQueen

More information

-> Integration of MAPHiTS in Galaxy

-> Integration of MAPHiTS in Galaxy Enabling NGS Analysis with(out) the Infrastructure, 12:0512 Development of a workflow for SNPs detection in grapevine From Sets to Graphs: Towards a Realistic Enrichment Analy species: MAPHiTS -> Integration

More information

SEQUENCING. From Sample to Sequence-Ready

SEQUENCING. From Sample to Sequence-Ready SEQUENCING From Sample to Sequence-Ready ACCESS ARRAY SYSTEM HIGH-QUALITY LIBRARIES, NOT ONCE, BUT EVERY TIME The highest-quality amplicons more sensitive, accurate, and specific Full support for all major

More information

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow Barry Bolding Cray Inc Seattle, WA 1 CUG 2013 Paper Genomic Applications on Cray supercomputers: Next Generation Sequencing

More information

Basic processing of next-generation sequencing (NGS) data

Basic processing of next-generation sequencing (NGS) data Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance

More information

NGS Technologies for Genomics and Transcriptomics

NGS Technologies for Genomics and Transcriptomics NGS Technologies for Genomics and Transcriptomics Massimo Delledonne Department of Biotechnologies - University of Verona http://profs.sci.univr.it/delledonne 13 years and $3 billion required for the Human

More information

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012

RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012 RNA-Seq Tutorial 1 John Garbe Research Informatics Support Systems, MSI March 19, 2012 Tutorial 1 RNA-Seq Tutorials RNA-Seq experiment design and analysis Instruction on individual software will be provided

More information

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation PN 100-9879 A1 TECHNICAL NOTE Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation Introduction Cancer is a dynamic evolutionary process of which intratumor genetic and phenotypic

More information

RNAseq / ChipSeq / Methylseq and personalized genomics

RNAseq / ChipSeq / Methylseq and personalized genomics RNAseq / ChipSeq / Methylseq and personalized genomics 7711 Lecture Subhajyo) De, PhD Division of Biomedical Informa)cs and Personalized Biomedicine, Department of Medicine University of Colorado School

More information

How To Find Rare Variants In The Human Genome

How To Find Rare Variants In The Human Genome UNIVERSITÀ DEGLI STUDI DI SASSARI Scuola di Dottorato in Scienze Biomediche XXV CICLO DOTTORATO DI RICERCA IN SCIENZE BIOMEDICHE INDIRIZZO DI GENETICA MEDICA, MALATTIE METABOLICHE E NUTRIGENOMICA Direttore:

More information

July 7th 2009 DNA sequencing

July 7th 2009 DNA sequencing July 7th 2009 DNA sequencing Overview Sequencing technologies Sequencing strategies Sample preparation Sequencing instruments at MPI EVA 2 x 5 x ABI 3730/3730xl 454 FLX Titanium Illumina Genome Analyzer

More information

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011 NECC History Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011 EPSCoR Cyberinfrastructure Workshop First regional NENI (now NECC) Workshop held in Vermont in August 2007 Workshop heldinkentucky

More information

How Sequencing Experiments Fail

How Sequencing Experiments Fail How Sequencing Experiments Fail v1.0 Simon Andrews simon.andrews@babraham.ac.uk Classes of Failure Technical Tracking Library Contamination Biological Interpretation Something went wrong with a machine

More information

Bioinformatics Unit Department of Biological Services. Get to know us

Bioinformatics Unit Department of Biological Services. Get to know us Bioinformatics Unit Department of Biological Services Get to know us Domains of Activity IT & programming Microarray analysis Sequence analysis Bioinformatics Team Biostatistical support NGS data analysis

More information

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis ; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome Technology Center,

More information

High Performance Compu2ng Facility

High Performance Compu2ng Facility High Performance Compu2ng Facility Center for Health Informa2cs and Bioinforma2cs Accelera2ng Scien2fic Discovery and Innova2on in Biomedical Research at NYULMC through Advanced Compu2ng Efstra'os Efstathiadis,

More information

Practical Solutions for Big Data Analytics

Practical Solutions for Big Data Analytics Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)

More information

Databases and mapping BWA. Samtools

Databases and mapping BWA. Samtools Databases and mapping BWA Samtools FASTQ, SFF, bax.h5 ACE, FASTG FASTA BAM/SAM GFF, BED GenBank/Embl/DDJB many more File formats FASTQ Output format from Illumina and IonTorrent sequencers. Quality scores:

More information

About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster

About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster Cluster Info Sheet About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster Welcome to the PMCBRC cluster! We are happy to provide and manage this compute cluster as a resource

More information

Towards Integrating the Detection of Genetic Variants into an In-Memory Database

Towards Integrating the Detection of Genetic Variants into an In-Memory Database Towards Integrating the Detection of Genetic Variants into an 2nd International Workshop on Big Data in Bioinformatics and Healthcare Oct 27, 2014 Motivation Genome Data Analysis Process DNA Sample Base

More information

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing STGAAC STGAACT GTGCACT GTGAACT STGAAC STGAACT GTGCACT GTGAACT STGAAC STGAAC GTGCAC GTGAAC Wouter Coppieters Head of the genomics core facility GIGA center, University of Liège Bioruptor NGS: Unbiased DNA

More information

Genomic Testing: Actionability, Validation, and Standard of Lab Reports

Genomic Testing: Actionability, Validation, and Standard of Lab Reports Genomic Testing: Actionability, Validation, and Standard of Lab Reports emerge: Laura Rasmussen-Torvik Reaction: Heidi Rehm Summary: Dick Weinshilboum Panel: Murray Brilliant, David Carey, John Carpten,

More information

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics The GS Junior System The Power of Next-Generation Sequencing on Your Benchtop Proven technology: Uses the same long

More information

NEXT GENERATION SEQUENCING

NEXT GENERATION SEQUENCING NEXT GENERATION SEQUENCING Dr. R. Piazza SANGER SEQUENCING + DNA NEXT GENERATION SEQUENCING Flowcell NEXT GENERATION SEQUENCING Library di DNA Genomic DNA NEXT GENERATION SEQUENCING NEXT GENERATION SEQUENCING

More information

HiSeq Analysis Software v0.9 User Guide

HiSeq Analysis Software v0.9 User Guide HiSeq Analysis Software v0.9 User Guide FOR RESEARCH USE ONLY Quick Start 4 Introduction 5 Enrichment Analysis Workflow 6 Whole Genome Sequencing Analysis Workflow 8 Additional Software 12 Installing HiSeq

More information

Module 1. Sequence Formats and Retrieval. Charles Steward

Module 1. Sequence Formats and Retrieval. Charles Steward The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.

More information

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production Page 1 of 6 UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production February 05, 2010 Newsletter: BioInform BioInform - February 5, 2010 By Vivien Marx Scientists at the department

More information

New solutions for Big Data Analysis and Visualization

New solutions for Big Data Analysis and Visualization New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology

More information

Q&A: Kevin Shianna on Ramping up Sequencing for the New York Genome Center

Q&A: Kevin Shianna on Ramping up Sequencing for the New York Genome Center Q&A: Kevin Shianna on Ramping up Sequencing for the New York Genome Center Name: Kevin Shianna Age: 39 Position: Senior vice president, sequencing operations, New York Genome Center, since July 2012 Experience

More information

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED Targeted TARGETED Sequencing sequencing solutions Accurate, scalable, fast Sequencing for every lab, every budget, every application Ion Torrent semiconductor sequencing Ion Torrent technology has pioneered

More information

Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline Workflows

Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline Workflows Genes 2012, 3, 545-575; doi:10.3390/genes3030545 Article OPEN ACCESS genes ISSN 2073-4425 www.mdpi.com/journal/genes Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline

More information

A Complete Example of Next- Gen DNA Sequencing Read Alignment. Presentation Title Goes Here

A Complete Example of Next- Gen DNA Sequencing Read Alignment. Presentation Title Goes Here A Complete Example of Next- Gen DNA Sequencing Read Alignment Presentation Title Goes Here 1 FASTQ Format: The de- facto file format for sharing sequence read data Sequence and a per- base quality score

More information

Software Getting Started Guide

Software Getting Started Guide Software Getting Started Guide For Research Use Only. Not for use in diagnostic procedures. P/N 001-097-569-03 Copyright 2010-2013, Pacific Biosciences of California, Inc. All rights reserved. Information

More information

Core Facility Genomics

Core Facility Genomics Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray

More information

CHALLENGES IN NEXT-GENERATION SEQUENCING

CHALLENGES IN NEXT-GENERATION SEQUENCING CHALLENGES IN NEXT-GENERATION SEQUENCING BASIC TENETS OF DATA AND HPC Gray s Laws of data engineering 1 : Scientific computing is very dataintensive, with no real limits. The solution is scale-out architecture

More information

Disease gene identification with exome sequencing

Disease gene identification with exome sequencing Disease gene identification with exome sequencing Christian Gilissen Dept. of Human Genetics Radboud University Nijmegen Medical Centre c.gilissen@antrg.umcn.nl Contents Infrastructure Exome sequencing

More information

PreciseTM Whitepaper

PreciseTM Whitepaper Precise TM Whitepaper Introduction LIMITATIONS OF EXISTING RNA-SEQ METHODS Correctly designed gene expression studies require large numbers of samples, accurate results and low analysis costs. Analysis

More information

Illumina Sequencing Technology

Illumina Sequencing Technology Illumina Sequencing Technology Highest data accuracy, simple workflow, and a broad range of applications. Introduction Figure 1: Illumina Flow Cell Illumina sequencing technology leverages clonal array

More information

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop André Schumacher, Luca Pireddu, Matti Niemenmaa, Aleksi Kallio, Eija Korpelainen, Gianluigi Zanetti and Keijo Heljanko Abstract

More information

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated

More information

Overview sequence projects

Overview sequence projects Overview sequence projects Bioassist NGS meeting 15-01-2010 Barbera van Schaik KEBB - Bioinformatics Laboratory b.d.vanschaik@amc.uva.nl NGS at the Academic Medical Center Sequence facility Laboratory

More information

DNA Sequencing & The Human Genome Project

DNA Sequencing & The Human Genome Project DNA Sequencing & The Human Genome Project An Endeavor Revolutionizing Modern Biology Jutta Marzillier, Ph.D Lehigh University Biological Sciences November 13 th, 2013 Guess, who turned 60 earlier this

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

Globus Genomics Tutorial GlobusWorld 2014

Globus Genomics Tutorial GlobusWorld 2014 Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for

More information

Deep Sequencing Data Analysis: Challenges and Solutions

Deep Sequencing Data Analysis: Challenges and Solutions 29 Deep Sequencing Data Analysis: Challenges and Solutions Ofer Isakov and Noam Shomron Sackler Faculty of Medicine, Tel Aviv University, Israel 1. Introduction Ultra high throughput sequencing, also known

More information

AS4.1 190509 Replaces 260806 Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

AS4.1 190509 Replaces 260806 Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO): Replaces 260806 Page 1 of 50 ATF Software for DNA Sequencing Operators Manual Replaces 260806 Page 2 of 50 1 About ATF...5 1.1 Compatibility...5 1.1.1 Computer Operator Systems...5 1.1.2 DNA Sequencing

More information

HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT

HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT Kimberly Bishop Lilly 1,2, Truong Luu 1,2, Regina Cer 1,2, and LT Vishwesh Mokashi 1 1 Naval Medical Research Center, NMRC Frederick, 8400 Research Plaza,

More information

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

Automated DNA sequencing 20/12/2009. Next Generation Sequencing DNA sequencing the beginnings Ghent University (Fiers et al) pioneers sequencing first complete gene (1972) first complete genome (1976) Next Generation Sequencing Fred Sanger develops dideoxy sequencing

More information

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis Genetic Analysis Phenotype analysis: biological-biochemical analysis Behaviour under specific environmental conditions Behaviour of specific genetic configurations Behaviour of progeny in crosses - Genotype

More information