8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)
|
|
- Jody McDaniel
- 8 years ago
- Views:
Transcription
1 Experimental Design & Intro to NGS Data Analysis Ryan Peters Field Application Specialist Partek, Incorporated Agenda Experimental Design Examples ANOVA What assays are possible? NGS Analytical Process Alignment of NGS Data Challenges of NGS Analysis Partek Flow Demonstration 2 Examples Shoe Example Breast Cancer Example Rat Example (Experimental Design) Tips on setting up your next experiment 1
2 The Role of Experimental Design The goal of statistics is to find signals in a sea of noise The goal of experimental design is to reduce that noise so true biological signals can be found with as small a sample size as possible Partek Shoe Example Question: Do shoes affect height? Hypothesis: Yes, shoes affect height. Assay: Measure the height 10 people with & without shoes. (Change only one variable.) Sample Size: 10 Partek (5 male, 5 female) Analysis: Use a two sample t-test to see if there is a difference between the mean of two groups: with shoes and without shoes. t-test A simple t-test does not have the power to correctly identify this pattern, because it assumes multiple samples from the same individual are independent when they are not. p= 0.51 Fold-change = 1.02 Conclusion - No statistically significant difference in height due to shoes. 2
3 Paired t-test The paired t-test provides substantially more statistical power by removing person-to-person differences from the noise. p(shoes)=1e-5 p(person)=2e-9 Introducing Gender Once person is known, gender is already known; thus the p-value for Shoe remains unchanged. We get the estimate of gender effect for free! Add Gender (3-way ANOVA) p(shoes)=1e-5 p(gender)=.04 p(person)=2e-9 It appears (p=.04) that men (at Partek) are significantly taller than women 3
4 Explore Gender/Shoe Interaction Do shoes have the same effect on men & women? p(shoes)=1e-8 p(gender)=.04 p(person)=2e-12 p(shoe*gender) =7e-5 Wow! Shoes affect women s height more than men s! Also note that p-values for shoe effect are even smaller because we explained more noise. Breast Cancer Example Example of Large Batch Effect Example Data, GEO Experiment GSE848 Control (E2) Plus Drug Treatment of Breast Cancer Cells 5 Treatments x 3 Time Points x 2 replicates Biological replicates were processed in 2 batches Control Estrogen (E2) E2 + ICI E2 + Raloxifene E2 + Tomoxifen 0 hr 2 8 hr hr Fortunately, treatments were perfectly balanced across processing batches. 4
5 As Seen Using PCA As Seen Using Hierarchical Clustering What is Analysis of Variance? Analysis (Source: m-w.com) Etymology: New Latin, from Greek, from analyein to break up separation of a whole into its component parts 17.49% 1.15% 17.40% 58.36% Treatment Time 1.64% Analysis of Variance ANOVA a technique that partitions the variance in data into separate components or factors 5
6 Good News! Balanced Experimental Design The treatments were perfectly balanced with the batches, so batch can be included as a blocking factor in ANOVA, and the batch effect (noise) can be removed from the data. In terms of p-values for this gene, the difference is dramatic. With a simple 2-way ANOVA, this gene was #228 on the gene list and would not pass multiple test correction for significance. With a 3-way ANOVA including batch, it was #2 on the gene list. Factor 2-way ANOVA 3-way ANOVA Treatment E-07 Time Treatment*Time E-05 #2 Most Significant Gene Monday Median A =8.5 Median B =9.7 Tuesday Tue vs. Mon more than 2-fold difference ANOVA Partitions Variability Total variance is partitioned into variability due to influencing factors and the rest is assumed to be due to random error (noise). R 2 =81% for 2-way ANOVA R 2 =99% when Batch included 6
7 Batch Effect Remover Before Batch Removal After Batch Removal 19 Batch Effect Remover For visualization purposes only! Factors you would normally add for ANOVA How do we account for batch without Partek Batch Remover? 20 Building Blocks of Experimental Design No Randomization Completely Randomized Subjects randomly assigned to treatment groups Randomized Block Subjects randomly assigned to treatment groups within similar blocks (e.g. gender, litters) Requires a priori knowledge of differences between the blocks 7
8 Simplest Design: Not Randomized 8 Male Rats 4 Treated 4 Control Stripe coated rats are faster or more alert. Completely Randomized 8 Male Rats 4 Treated 4 Control A Better Approach Randomized Block Design First divide into blocks, then randomly assign to treatment groups 8
9 Randomized Block Design 8 Male Rats 4 Treated 4 Control Technical Blocks in Microarray Experiments Litter is an example of a biological block Examples of Technical/Processing Blocks: RNA Isolation Batch Hybridization Batch Operator As well as (although less so) Wash and Stain Batch Reagent, Cocktail Batches Chip Lot In Summary Block what you can and randomize what you cannot. Box, Hunter, & Hunter (1978) Blocking ensures that the differences in treatment cannot possibly be due to the blocking factor Blocking completely eliminates noise due to blocks Randomization gives approximate balance across other variables unaccounted for 9
10 Analysis of Variance Also Known As: ANOVA ANCOVA Linear Model Mixed Linear Model Invented in 1900, 1908, 1923 Still remains the most commonly used statistical method to analyze clinical trials! Simple ANOVA: Student s t-test t and F Statistics Fun fact In equal variance t-test is mathematically equivalent to a 1-way ANOVA. Student/Gosset Fisher 10
11 Assumptions of ANOVA Data is Normally distributed (bell shaped) within different treatment groups Ensure data is log transformed Variance is equal within different treatment groups Design balanced experiments Samples groups are independent. Don t make the shoe mistake *Replicates Required to get p-value Random vs Fixed Effects If the experiment were to be performed again, would the same levels of the factor be used? Yes - Fixed effect (e.g. gender, dose, time, dye) No - Random effect (e.g. hyb batch, wash batch, litter, subject) Why do I have to worry about this? In general, treating a random effect as a fixed effect will produce an overoptimistic p-value, leading to a false discovery. What Factors Belong in the Model? Obviously, the factors of interest to the researcher e.g. strain, time, strain*time Any factor needed to account for dependence of samples (don t violate assumption of independence!) e.g. donor Any additional blocking factors for noise reduction e.g. batch 11
12 Partek Expression Philosophy Use PCA to aid in quality control & sample grouping Use ANOVA to detect significantly expressed genes. Fold change is interesting for ranking, but not a great primary filtering metric Incorporate as much phenotypic and experimental design information into the ANOVA model as possible. Measure the experimental technical components.* Make sense of gene lists through functional groups How NOT to Run/Ruin Your Next Experiment! Samples are frequently organized by treatment groups. Samples are then processed in batches corresponding to treatment groups. But please do NOT process your control samples on Monday, and then process your treated samples on Tuesday. You will confound these two variables. ANOVA is powerful but not magical. Summary Experimental Design & Analysis Understand how separating variables in your analysis is critical to your success Design balanced experiments. Let p-values rank your data, but don t be a slave to FDR. 12
13 What kinds of assays are possible? DNA-Seq Copy Number SNP Structural variants Whole genome sequencing Metagenomics Targeted/Amplicon Sequencing ChIP-Seq Transcription Factor binding sites Methylation sites Histone modifications RIP-Seq (RNA-binding proteins) RNA-Seq Transcriptome Differential Gene Expression Alternative Splicing SNP detection Indel detection Novel exons/genes mirna-seq identify regulatory (non-coding) RNAs NGS Analysis Phases Primary Analysis Secondary Analysis Data File (Reads + Quality) Tertiary Analysis Reads aligned to genome FASTQ, BAM, Control Software Bowtie/BIOSC OPE/BWA, etc. Data File (Reads + Quality) FASTQ Reads aligned to genome Modified from Strand Life Sciences 38 NGS Analytical Process Illumina HiSeq SOLiD Roche 454 Ion Torrent PacBio Sequencing Genome Alignment GAGGTTGCAGTTTG chr R ACTGCTCCGCCTCA chr F GAATAAAAAATCCA chr F CGTCCTTCACCCTCT chr R CCTTAAGGAAAGGA chr F CAGCTAGGGTTGCC chr R CTGCTGGTGCTGCG chr F QC & Exploratory Analysis Powerful Statistics Intuitive Visualizations Integrated Genomics Biological Interpretation Publication 39 13
14 Comprehensive Analysis of NGS Data DNA-Seq SmallRNA- Seq RNA-Seq Methylation Seq ChIP-Seq 40 Read Types for NGS Single End Reads Paired-end Reads Junction Reads Multiple Aligned reads Strand-specific reads Paired End & Single End Reads DNA Space Single End Paired End chr2 DNA Space chr5 Multiple aligned 42 14
15 Junction Reads Derived from transcripts, some RNA-Seq reads will read through splice junctions (single end or pairedend) They will not align well to genomic reference since the two ends are many nucleotides apart (separated by the intronic region) DNA Space 43 Next Gen File Formats Unaligned (FASTA, FASTQ, SCARF, QSEQ, SRA, RAW, TXT, others) Alignment Tools ELAND BFAST Bowtie TMAP BWA TopHat SOAP Etc. GAGGTTGCAGTTTG chr R ACTGCTCCGCCTCA chr F GAATAAAAAATCCA chr F CGTCCTTCACCCTCT chr R CCTTAAGGAAAGGA chr F CAGCTAGGGTTGCC chr R CTGCTGGTGCTGCG chr F Aligned: SAM, BAM, Vendor Specific Formats/Color Space Variant Call File (VCF, BCF) SNPs, indels 44 What to expect? Cluster Cloud Laptop File size depends on read length, read type 4GB single lane (~100 million reads) Bowtie w/ 8 cores = 20/25 minutes; reference genome - read length = 33bp (older) TopHat same file 1 day Read length x number of reads x 8 = file size (fasta, double for fastq) BAM file ~ 3-4x smaller than unaligned file 15
16 FASTQ Format(Unaligned GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT +!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 Line 1) begins with a '@' character and is followed by a sequence identifier and an optional description (like a FASTA title line). Line 2) is the raw sequence letters(acgt). Line 3) begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again. Line 4) encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence Sanger format can encode a Phred quality score from 0 to 93 using ASCII 33 to What is Alignment? Read comes off a sequencing machine A T G G T C A Goal: Determine where on the genome that read belongs Method: Match sequence of read to sequence from a reference genome (reference G G C A T G G T C A T T C genome) (read) A T G G T C A Result: Genomic Location of read 47 Align junction reads Gene/Transcript Exon junction (Reference Genome) DNA Space G A T G C A C G G A T T G T C A T RNA Space A T G G T C A (Read) 1) Align to Genome gapped alignment time expensive -breaks up read in pieces (25mer) 2) Align to transcriptome lose genomic context! 48 16
17 SAM Format (Aligned reads) Sequence Alignment/MAP (SAM) format is TAB-delimited BAM is binary SAM M-match Position Header line I-Insertion position of mate D-Deletion sequence length quality (header) (Reference Sequence) CIGAR Read id Bitwise flag Quality score Reference genome Reference name of mate optional Explain flags 50 VCF Format (Variant Call Format) The Variant Call Format (VCF) is a TAB-delimited format with each data line consists of the following fields: Chromosome, Position, variant id, reference/alternative alleles, quality, information(read depth), event, sample Id (optional), format (optional) 51 17
18 Partek Flow Web based Application Cloud, Desktop, Server Chrome, Firefox, Safari Access from any terminal, smartphone Project centric Protocols Collaborate with others Current release 1.0 / 2.1 beta Alignment, QA/QC, GSA Export results to PGS Coming soon SGE, 52 Challenge: Data volume is a bottleneck Help, I m drowning in data! How do I handle all this data? Solution: Schedule Tasks Schedule & Queue tasks s you when tasks are complete Keep your hardware running 24/7/365 18
19 Challenge: The quality of the data will affect the alignment How do I determine data quality? Do I have outliers? Can I move forward with my analysis? Do I need to trim/filter my reads? 55 Solution: Pre & Post Alignment QA/QC Group and individual QA/QC for excluding outliers Quality score per read/position Look for drop in quality scores Make intelligent decisions for trimming/filtering adaptors, barcodes, low quality reads 56 Challenge: Alignment Different people, different parameters will result in different alignment. Which aligner to use? Some aligners have more than 50 different options. How do I know what to set? What options do I choose for RNA-Seq, ChIP-Seq, DNA-Seq, mirna-seq, MeDip-Seq? What options do I choose for the different read types? Junction reads? Paired-End reads? Multiple Aligned reads? 57 19
20 Solution: Multiple aligners with recommended defaults Vendor Specific default options Automatic Download of reference genomes Assay specific default options (RNA- Seq, ChIP-Seq, DNA- Seq) Advanced options also available through GUI Interface (no command line) 58 Challenge: How do I keep track of my samples? Which samples are Tumor? Control? Age? Sex/Gender? How am I ever going to keep track of this clinical information? 59 Solution: Advanced Sample Management Manage files associated with sample throughout life of project Keep track of reference genome Controlled vocabulary SNOMED List In-place editing of sample info 20
21 Plug-in for Torrent Suite Perform QA/QC within Torrent Suite and seamlessly upload data to Partek Flow for Comprehensive Data Analysis and Visualization Performs QA/QC within Torrent Suite Uploads data to Partek Flow Comprehensive Solution for RNA-Seq Alignment Mapping QC Statistics Visualization Integrated Genomics Biological Interpretation Acknowledgements Partek Flow Demonstration 63 21
Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center
Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing
More informationNew solutions for Big Data Analysis and Visualization
New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology
More informationFrequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
More informationChallenges associated with analysis and storage of NGS data
Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group gabry@ebi.ac.uk Next-generation sequencing Next-generation sequencing
More informationIntroduction to NGS data analysis
Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High
More informationTutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment
Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249
More informationComparing Methods for Identifying Transcription Factor Target Genes
Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF
More informationAnalysis of NGS Data
Analysis of NGS Data Introduction and Basics Folie: 1 Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference
More informationIntroduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
More informationNext Generation Sequencing: Technology, Mapping, and Analysis
Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took
More informationNext Generation Sequencing
Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977
More informationAnalysis of ChIP-seq data in Galaxy
Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers
More informationBasic processing of next-generation sequencing (NGS) data
Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
More informationData Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute
Data Analysis & Management of High-throughput Sequencing Data Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Current Issues Current Issues The QSEQ file Number files per
More informationUsing Illumina BaseSpace Apps to Analyze RNA Sequencing Data
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless
More informationIntroduction to next-generation sequencing data
Introduction to next-generation sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/research-centres/cem/ Outline History of DNA sequencing NGS
More informationNext generation DNA sequencing technologies. theory & prac-ce
Next generation DNA sequencing technologies theory & prac-ce Outline Next- Genera-on sequencing (NGS) technologies overview NGS applica-ons NGS workflow: data collec-on and processing the exome sequencing
More informationLifeScope Genomic Analysis Software 2.5
USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use
More informationRNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance
RNA Express Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance ILLUMINA PROPRIETARY 15052918 Rev. A February 2014 This document and its contents are
More informationNGS Data Analysis: An Intro to RNA-Seq
NGS Data Analysis: An Intro to RNA-Seq March 25th, 2014 GST Colloquim: March 25th, 2014 1 / 1 Workshop Design Basics of NGS Sample Prep RNA-Seq Analysis GST Colloquim: March 25th, 2014 2 / 1 Experimental
More informationGenomeStudio Data Analysis Software
GenomeStudio Data Analysis Software Illumina has created a comprehensive suite of data analysis tools to support a wide range of genetic analysis assays. This single software package provides data visualization
More informationAn example of bioinformatics application on plant breeding projects in Rijk Zwaan
An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on
More informationLectures 1 and 8 15. February 7, 2013. Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling
Lectures 1 and 8 15 February 7, 2013 This is a review of the material from lectures 1 and 8 14. Note that the material from lecture 15 is not relevant for the final exam. Today we will go over the material
More informationNext Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013
Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine
More informationRemoving Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data
Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University HiCOMB 2014 May 19 th, Phoenix, Arizona 1 Outline
More informationSingle-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples
DATA Sheet Single-Cell DNA Sequencing with the C 1 Single-Cell Auto Prep System Reveal hidden populations and genetic diversity within complex samples Single-cell sensitivity Discover and detect SNPs,
More informationG E N OM I C S S E RV I C ES
GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E
More informationVersion 5.0 Release Notes
Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com
More informationGenomeStudio Data Analysis Software
GenomeStudio Analysis Software Illumina has created a comprehensive suite of data analysis tools to support a wide range of genetic analysis assays. This single software package provides data visualization
More informationNext generation sequencing (NGS)
Next generation sequencing (NGS) Vijayachitra Modhukur BIIT modhukur@ut.ee 1 Bioinformatics course 11/13/12 Sequencing 2 Bioinformatics course 11/13/12 Microarrays vs NGS Sequences do not need to be known
More informationComputational Genomics. Next generation sequencing (NGS)
Computational Genomics Next generation sequencing (NGS) Sequencing technology defies Moore s law Nature Methods 2011 Log 10 (price) Sequencing the Human Genome 2001: Human Genome Project 2.7G$, 11 years
More informationUnderstanding West Nile Virus Infection
Understanding West Nile Virus Infection The QIAGEN Bioinformatics Solution: Biomedical Genomics Workbench (BXWB) + Ingenuity Pathway Analysis (IPA) Functional Genomics & Predictive Medicine, May 21-22,
More informationCore Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
More informationHow Sequencing Experiments Fail
How Sequencing Experiments Fail v1.0 Simon Andrews simon.andrews@babraham.ac.uk Classes of Failure Technical Tracking Library Contamination Biological Interpretation Something went wrong with a machine
More informationNew Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.
New Technologies for Sensitive, Low-Input RNA-Seq Clontech Laboratories, Inc. Outline Introduction Single-Cell-Capable mrna-seq Using SMART Technology SMARTer Ultra Low RNA Kit for the Fluidigm C 1 System
More informationNazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office
2013 Laboratory Accreditation Program Audioconferences and Webinars Implementing Next Generation Sequencing (NGS) as a Clinical Tool in the Laboratory Nazneen Aziz, PhD Director, Molecular Medicine Transformation
More informationAnalyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study
Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study The data for this study is taken from experiment GSE848 from the Gene Expression
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationRNAseq / ChipSeq / Methylseq and personalized genomics
RNAseq / ChipSeq / Methylseq and personalized genomics 7711 Lecture Subhajyo) De, PhD Division of Biomedical Informa)cs and Personalized Biomedicine, Department of Medicine University of Colorado School
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More information-> Integration of MAPHiTS in Galaxy
Enabling NGS Analysis with(out) the Infrastructure, 12:0512 Development of a workflow for SNPs detection in grapevine From Sets to Graphs: Towards a Realistic Enrichment Analy species: MAPHiTS -> Integration
More informationmygenomatix - secure cloud for NGS analysis
mygenomatix Speed. Quality. Results. mygenomatix - secure cloud for NGS analysis background information & contents 2011 Genomatix Software GmbH Bayerstr. 85a 80335 Munich Germany info@genomatix.de www.genomatix.de
More informationNGS data analysis. Bernardo J. Clavijo
NGS data analysis Bernardo J. Clavijo 1 A brief history of DNA sequencing 1953 double helix structure, Watson & Crick! 1977 rapid DNA sequencing, Sanger! 1977 first full (5k) genome bacteriophage Phi X!
More informationBIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis
BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis By the end of this lab students should be able to: Describe the uses for each line of the DNA subway program (Red/Yellow/Blue/Green) Describe
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationDiscovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat
Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat 1 RNA Transcription to RNA and subsequent
More informationPartek Methylation User Guide
Partek Methylation User Guide Introduction This user guide will explain the different types of workflow that can be used to analyze methylation datasets. Under the Partek Methylation workflow there are
More informationText file One header line meta information lines One line : variant/position
Software Calling: GATK SAMTOOLS mpileup Varscan SOAP VCF format Text file One header line meta information lines One line : variant/position ##fileformat=vcfv4.1! ##filedate=20090805! ##source=myimputationprogramv3.1!
More informationGene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands
More informationAnalyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,
More informationBioHPC Web Computing Resources at CBSU
BioHPC Web Computing Resources at CBSU 3CPG workshop Robert Bukowski Computational Biology Service Unit http://cbsu.tc.cornell.edu/lab/doc/biohpc_web_tutorial.pdf BioHPC infrastructure at CBSU BioHPC Web
More informationModule 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
More informationDeep Sequencing Data Analysis
Deep Sequencing Data Analysis Ross Whetten Professor Forestry & Environmental Resources Background Who am I, and why am I teaching this topic? I am not an expert in bioinformatics I started as a biologist
More informationStandards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium
Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium I. Introduction: Sequence based assays of transcriptomes (RNA-seq) are in wide use because of their favorable
More informationHigh Throughput Sequencing Data Analysis using Cloud Computing
High Throughput Sequencing Data Analysis using Cloud Computing Stéphane Le Crom (stephane.le_crom@upmc.fr) LBD - Université Pierre et Marie Curie (UPMC) Institut de Biologie de l École normale supérieure
More informationMiSeq: Imaging and Base Calling
MiSeq: Imaging and Page Welcome Navigation Presenter Introduction MiSeq Sequencing Workflow Narration Welcome to MiSeq: Imaging and. This course takes 35 minutes to complete. Click Next to continue. Please
More informationPractical Solutions for Big Data Analytics
Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)
More informationUCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production
Page 1 of 6 UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production February 05, 2010 Newsletter: BioInform BioInform - February 5, 2010 By Vivien Marx Scientists at the department
More informationGo where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe
Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. To published results faster With proven scalability To the forefront of discovery To limitless applications
More informationRT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial
RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial Samuel J. Rulli, Jr., Ph.D. qpcr-applications Scientist Samuel.Rulli@QIAGEN.com Pathway Focused Research from Sample Prep to Data Analysis! -2-
More informationExpression Quantification (I)
Expression Quantification (I) Mario Fasold, LIFE, IZBI Sequencing Technology One Illumina HiSeq 2000 run produces 2 times (paired-end) ca. 1,2 Billion reads ca. 120 GB FASTQ file RNA-seq protocol Task
More informationIntroduction. Overview of Bioconductor packages for short read analysis
Overview of Bioconductor packages for short read analysis Introduction General introduction SRAdb Pseudo code (Shortread) Short overview of some packages Quality assessment Example sequencing data in Bioconductor
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationPractical Guideline for Whole Genome Sequencing
Practical Guideline for Whole Genome Sequencing Disclosure Kwangsik Nho Assistant Professor Center for Neuroimaging Department of Radiology and Imaging Sciences Center for Computational Biology and Bioinformatics
More informationPreciseTM Whitepaper
Precise TM Whitepaper Introduction LIMITATIONS OF EXISTING RNA-SEQ METHODS Correctly designed gene expression studies require large numbers of samples, accurate results and low analysis costs. Analysis
More informationGene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a
More informationGene Models & Bed format: What they represent.
GeneModels&Bedformat:Whattheyrepresent. Gene models are hypotheses about the structure of transcripts produced by a gene. Like all models, they may be correct, partly correct, or entirely wrong. Typically,
More informationPrimePCR Assay Validation Report
Gene Information Gene Name Gene Symbol Organism Gene Summary Gene Aliases RefSeq Accession No. UniGene ID Ensembl Gene ID papillary renal cell carcinoma (translocation-associated) PRCC Human This gene
More informationFlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationSRA File Formats Guide
SRA File Formats Guide Version 1.1 10 Mar 2010 National Center for Biotechnology Information National Library of Medicine EMBL European Bioinformatics Institute DNA Databank of Japan 1 Contents SRA File
More informationGenotyping by sequencing and data analysis. Ross Whetten North Carolina State University
Genotyping by sequencing and data analysis Ross Whetten North Carolina State University Stein (2010) Genome Biology 11:207 More New Technology on the Horizon Genotyping By Sequencing Timeline 2007 Complexity
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationMolecular Genetics: Challenges for Statistical Practice. J.K. Lindsey
Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray
More informationQuality Assessment of Exon and Gene Arrays
Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such
More informationA Complete Example of Next- Gen DNA Sequencing Read Alignment. Presentation Title Goes Here
A Complete Example of Next- Gen DNA Sequencing Read Alignment Presentation Title Goes Here 1 FASTQ Format: The de- facto file format for sharing sequence read data Sequence and a per- base quality score
More informationUsing Galaxy for NGS Analysis. Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org
Using Galaxy for NGS Analysis Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org Overview NGS Data Galaxy tools for NGS Data Galaxy for Sequencing Facilities Overview
More informationBioinformatics Unit Department of Biological Services. Get to know us
Bioinformatics Unit Department of Biological Services Get to know us Domains of Activity IT & programming Microarray analysis Sequence analysis Bioinformatics Team Biostatistical support NGS data analysis
More informationData Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms
Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Introduction Mate pair sequencing enables the generation of libraries with insert sizes in the range of several kilobases (Kb).
More informationMicroarray Data Analysis. A step by step analysis using BRB-Array Tools
Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.
More informationSchool of Nursing. Presented by Yvette Conley, PhD
Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression
More informationAppendix 2 Molecular Biology Core Curriculum. Websites and Other Resources
Appendix 2 Molecular Biology Core Curriculum Websites and Other Resources Chapter 1 - The Molecular Basis of Cancer 1. Inside Cancer http://www.insidecancer.org/ From the Dolan DNA Learning Center Cold
More informationUser Manual. Transcriptome Analysis Console (TAC) Software. For Research Use Only. Not for use in diagnostic procedures. P/N 703150 Rev.
User Manual Transcriptome Analysis Console (TAC) Software For Research Use Only. Not for use in diagnostic procedures. P/N 703150 Rev. 1 Trademarks Affymetrix, Axiom, Command Console, DMET, GeneAtlas,
More information17 July 2014 WEB-SERVER MANUAL. Contact: Michael Hackenberg (hackenberg@ugr.es)
WEB-SERVER MANUAL Contact: Michael Hackenberg (hackenberg@ugr.es) 1 1 Introduction srnabench is a free web-server tool and standalone application for processing small- RNA data obtained from next generation
More informationData Analysis for Ion Torrent Sequencing
IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page
More informationSubread/Rsubread Users Guide
Subread/Rsubread Users Guide Subread v1.5.0-p1/rsubread v1.20.3 1 February 2016 Wei Shi and Yang Liao Bioinformatics Division The Walter and Eliza Hall Institute of Medical Research The University of Melbourne
More informationSingle-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation
PN 100-9879 A1 TECHNICAL NOTE Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation Introduction Cancer is a dynamic evolutionary process of which intratumor genetic and phenotypic
More informationFOR REFERENCE PURPOSES
BIOO LIFE SCIENCE PRODUCTS FOR REFERENCE PURPOSES This manual is for Reference Purposes Only. DO NOT use this protocol to run your assays. Periodically, optimizations and revisions are made to the kit
More informationBiological Sciences Initiative. Human Genome
Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationNECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011
NECC History Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011 EPSCoR Cyberinfrastructure Workshop First regional NENI (now NECC) Workshop held in Vermont in August 2007 Workshop heldinkentucky
More informationSeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications
Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each
More informationHow-To: SNP and INDEL detection
How-To: SNP and INDEL detection April 23, 2014 Lumenogix NGS SNP and INDEL detection Mutation Analysis Identifying known, and discovering novel genomic mutations, has been one of the most popular applications
More informationVisualisation tools for next-generation sequencing
Visualisation tools for next-generation sequencing Simon Anders EBI is an Outstation of the European Molecular Biology Laboratory. Outline Exploring and checking alignment with alignment viewers Using
More informationINTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B
INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE QUALITY OF BIOTECHNOLOGICAL PRODUCTS: ANALYSIS
More informationRNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012
RNA-Seq Tutorial 1 John Garbe Research Informatics Support Systems, MSI March 19, 2012 Tutorial 1 RNA-Seq Tutorials RNA-Seq experiment design and analysis Instruction on individual software will be provided
More informationGeneProf and the new GeneProf Web Services
GeneProf and the new GeneProf Web Services Florian Halbritter florian.halbritter@ed.ac.uk Stem Cell Bioinformatics Group (Simon R. Tomlinson) simon.tomlinson@ed.ac.uk December 10, 2012 Florian Halbritter
More informationAdvances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage
Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research March 17, 2011 Rendez-Vous Séquençage Presentation Overview Core Technology Review Sequence Enrichment Application
More informationAnalysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics
Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Christopher Benner, PhD Director, Integrative Genomics and Bioinformatics Core (IGC) idash Webinar,
More informationAthanasia Pavlopoulou University of Thessaly, Lamia June 2015
Athanasia Pavlopoulou University of Thessaly, Lamia June 2015 Early DNA Sequencing Technologies Early efforts at DNA sequencing were: o tedious o time consuming o labor intensive Frederick Sanger (Sanger
More information