Workflow. Reference Genome. Variant Calling. Galaxy Format Conversion Groomer. Mapping BWA GATK Preprocess
|
|
- Laurence Hutchinson
- 8 years ago
- Views:
Transcription
1 Workflow Fastq Reference Genome Galaxy Format Conversion Groomer Quality Control FastQC Mapping BWA Format conversion Sam-to-Bam Removing PCR duplicates MarkDup Preprocess GATK Base Recalibration Preprocess GATK Indel Realignment Variant Calling GATK Unified Genotyper Mpileup Variant Calling VarScan VCF Filtering VCF Annotation
2 Genome Analysis Toolkit
3 Plan Introduction Prétraitements des données NGS Recherche de Variants Pourquoi faire la Real/Recab? Travaux pratiques
4 Introduction
5 GATK GATK : Genome Analysis ToolKit The Genome Analysis Toolkit : A MapReduce framework for analyzing next-generation DNA sequencing data, McKenna et al. (2010) Développé par l'équipe de développement du Broad Institute (USA) Utilisé dans de nombreux projets (1000 Genomes Project, The Cancer Genome Atlas...) A la base développé pour génetique humaine mais maintenant générique Développé en Java Citations : Sources GATK Website* Google Scholar * Nature, Science, Nature Genetics, Nature Biotechnology, New England Journal of Medicine, Cell, and Genome Research.
6 GATK
7 Comment est détecté un SNP?
8 Comment est détecté un SNP?
9 Comment est détecté un SNP?
10 Comment est détecté un SNP? Complex bayesian algorithms based on : Base scale Read scale Position scale Genotype scale Phred-Quality Base Mapping quality Forward/Reverse ALT allele count REF allele count Overall genotype association ALT / REF Read Depth SNP quality 10 => P error = 1 / => P error = 1 / 1000
11 Comment est détecté un SNP? Biais de séquençage connus: GA / Hi-Seq : Base Quality 454 : Homopolymères SOLiD : Base Quality + Color space traduction Base scale Read scale Position scale Genotype scale Phred-Quality Base Mapping quality Forward/Reverse ALT allele count REF allele count Overall genotype association ALT / REF Read Depth SNP quality
12 Prétraitement des données NGS
13 Raw reads Produits par les logiciels des Séquenceurs Une première étape de recalibration/correction des reads peut être effectuée : 454 : Pyrobayes / Pyrocleaner SOLiD : Rsolid Illumina : Ibis /BayesCall + Taux erreur amélioré de 5 à 30 % - Temps de calcul
14 Raw reads NATAAATGCTGTCATACAGACTTGTTGGTGTTGTAAGGCAGCAGACTCCTTTGAGCTTTCATCCGAGAACAATTGAGACTAAATTCCTGGTGCAAAGTCCA +HISEQ4_0105:4:1101:1533:1998#TAGCTT/1 NAAGAAGGCACGAAGCAACTACTTCACTGCATGCTGCCTGTCCTTGGGCTGTTTGCTGCCTTTGGCTAACACCTTTGATTATTTCTGGCTAAGTAGATAGG +HISEQ4_0105:4:1101:2421:1947#TAGCTT/1 NAGAGCTATTTATGAAAACGAGGATGACTAAAACTGCCCAGAAAAAAAACCAACCAACCACGTTTCCAGTGACTGCCACCCTTAGCAAGCAAGGTAATAAC csfasta + Qual
15 Mapping Alignement reads VS Génome de référence Tout logiciel produisant des BAM Ex: BWA, Bowtie, Gsnap, SOAP, SSAHA 1 fichier par lane / individu / condition ou groupé avec Read group (obligatoire)
16 Mapping PHOSPHORE:181:C0KD3ACXX:8:2101:3676: M = X0:i:1 X1:i:0 MD:Z:101 RG:Z:ind1 XG:i:0 AM:i:0 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U PHOSPHORE:181:C0KD3ACXX:8:2101:3676: * = RG:Z:ind1 PHOSPHORE:181:C0KD3ACXX:8:1206:13256: M = X0:i:1 X1:i:0 MD:Z:101 RG:Z:ind2 XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U PHOSPHORE:181:C0KD3ACXX:8:1206:13256: M = TCCTTACTTTCAACAGCCTCCATTACCAATTCCAGGGAAAGTCTCCATCAACCAGGAATGCATCAGTATAAGGCACTCTGAAAGAAAGCAATCTAAATCCC :>DCDDDECAA>>@BFFEC@EIHE;GBHF=GFGHGGGGIIHFHGDG@GDB9IIJIIGHHGGGHIIGDIIHFHHEFGEIIJHGH?GBGIHHGGDFFDFFCC@ X0:i:1 X1:i:0 MD:Z:101 RG:Z:ind2 XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U PHOSPHORE:181:C0KD3ACXX:8:1202:6947: M = GCAGGCTTTTAAGAATATGTTCTGTTTTCAAATAGTAACCCAAAAAGGGGTGGGGGCGGGGGCAAAGTGCTGTGTGTGTGTGTGTGTGTGTGTGTGTGT CC@FFFFFGHGFHFGGGII>JHGGEHIJIIEHHEGHIGHIJJGGIJFGIJ@FHIIHFBDBDDBB@BC44@:@4?><8A2<2?8?<B<<2<2<<A<ABB? X0:i:1 X1:i:0 MD:Z:99 RG:Z:ind2 XG:i:0 AM:i:29 NM:i:0 SM:i:29 XM:i:0 XO:i:0 XT:A:U SAM spécifications:
17 Duplicate Marking/Removing Duplicats PCR (construction des librairies) Samtools rmdup Picard MarkDuplicates Identification Removing
18 Local Realignment Identification des régions à réaligner : The algorithm begins by first identifying regions for realignment where 1) at least one read contains an indel, 2) there exists a cluster of mismatching bases or 3) an already known indel segregates at the site DePristo et al (2011) Réalignement des reads Next, all reads are realigned against just the best haplotype Hi and the reference (H0), and each read Rj is assigned to Hi or H0 DePristo et al (2011)
19 Local Realignment
20 Raw data Base quality recalibration «The per-base quality scores, which convey the probability that the called base in the read is the true sequenced base, are quite inaccurate and co-vary with features like sequencing technology, machine cycle and sequence context» DePristo et al. (2011) Ewing and Green (1998) Li et al. (2004 ; 2009) Mean BQ = 32,8 - Median = 36,7
21 Raw data Recalibrated data Base quality recalibration Conséquences Mean BQ = 32,8 - Median = 36,7 Mean BQ = 28,8 Median = 28,7 Baisse de la variabilité Baisse de la qualité moyenne
22 Base quality recalibration DePristo et al (2011)
23 Raw data Analysis-ready reads Nouveau fichier BAM Peut être utilisé ensuite avec d autre outils pour la suite des analyses (Samtools mpileup, Popoolation, etc )
24 Recherche de Variants
25 Single vs Multiple sample analysis Data processing and analysis of genetic variation using nextgeneration sequencing Mark DePristo Dec. 8th, 2011 (
26 Unified Genotyper Outil GATK Multiple sample analysis Différents modes de détection SNP Indels
27 Format VCF
28 Pourquoi faire le Real/Recab?
29 Comparaison d outils de SNP calling SIGENAE Team LGC - INRA APACHE Project (Alain Vignal) To find SNPs (Single Nucleotide Polymorphism) which differentiate populations Barbary Duck : no reference genome (Beijing duck genome is available) Beijing duck Journée Bioinfo Génotoul 29/03/2012 Barbarie duck
30 Impact of realignment / recalibration on SNP count More homogenous SNP count Δ = 777% Δ = 714% Δ = 42% Δ = 45% Mpileup Mpileup -B Mpileup -E GATK Popoolation raw data realigned data recalibrated data Realigned & recalibrated data Higher impact of recalibration on SNP count
31 raw data Reliable results with other species? DUCK realigned data Realigned & recalibrated data recalibrated data Mpileup Mpileup -B Mpileup -E GATK Raw data Realigned/Recal data Δ tools 777% 20% BAMs bruts CHICKEN BAMs réalignés BAMs réalignés/recalibrés BAMs recalibrés Mpileup Mpileup -B Mpileup -E GATK Raw data Realigned/Recal data Δ tools 234% 4% PIG Mpileup Mpileup -B Mpileup -E GATK Raw data Realigned/Recal data Δ tools 454% 9% 0 0 BAMs réalignés BAMs réalignés/recalibrés BAMs bruts BAMs recalibrés Not the same proportion but huge impact on realignment/recalibration
32 Conclusion Variability between called SNP by different tools GATK realignment/recalibration greatly helps to reduce this variability High impact of base quality score Reliable on various DNA data, but not on RNA data Nature Genetics 2012 «We recommend a recalibration of per-base quality scores as in GATK or SOAPsnp» «Several additional steps can be taken to improve genotype calls, such as local realignments...»
33 Bilan GATK nécessite un peu d'habitude Points forts : Assez rapide d'exécution grâce à la parallélisation possible Comptage allélique Prise en compte des positions multi-alléliques Beaucoup de fonctionnalités et d'options SNPs semblent être fiables Améliorations fréquentes Site Internet Points faibles : Recalibration basée sur des SNPs connus... À l'origine créé pour l'analyse de génomes humains Beaucoup d'étapes avant de lancer l'unifiedgenotyper Nécessite beaucoup d'espace disque pour suivre le pipeline de bout en bout
34 Travaux Pratiques Galaxy
35 Le site de référence GATK Download logiciels + ressources (vcf) Guide Analyse Best Practices Forum Documentation Technique Etc
36 References Samtools : Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup - The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, (2009). Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18: (2008). GATK A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics 43, 491 (2011). The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. McKenna AH, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, Depristo M. Genome Res. (2010). Popoolation2 R. Kofler, R. V. Pandey, C. Schlotterer. PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics (2011). Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Quinlan AR, Stewart DA, Strömberg MP, Marth GT. Nat Methods (2008) BayesCall: a model-based base-calling algorithm for high-throughput short-read sequencing. Kao W-C, Stevens K, Song YS. Genome Res (2009). Ibis Improved base calling for the Illumina Genome Analyzer using machine learning strategies. Kircher M, Stenzel U, Kelso J.. Genome Biol. (2009). Pyrocleaner Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool. Mariette J, Noirot C, Klopp C. BMC Research Notes 2011 Genotype and SNP calling from next-generation sequencing data. Nielsen, R. et al. Nature Reviews Genetics 12: (2011).
Practical Guideline for Whole Genome Sequencing
Practical Guideline for Whole Genome Sequencing Disclosure Kwangsik Nho Assistant Professor Center for Neuroimaging Department of Radiology and Imaging Sciences Center for Computational Biology and Bioinformatics
More informationHow-To: SNP and INDEL detection
How-To: SNP and INDEL detection April 23, 2014 Lumenogix NGS SNP and INDEL detection Mutation Analysis Identifying known, and discovering novel genomic mutations, has been one of the most popular applications
More informationText file One header line meta information lines One line : variant/position
Software Calling: GATK SAMTOOLS mpileup Varscan SOAP VCF format Text file One header line meta information lines One line : variant/position ##fileformat=vcfv4.1! ##filedate=20090805! ##source=myimputationprogramv3.1!
More informationAn example of bioinformatics application on plant breeding projects in Rijk Zwaan
An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on
More informationA Complete Example of Next- Gen DNA Sequencing Read Alignment. Presentation Title Goes Here
A Complete Example of Next- Gen DNA Sequencing Read Alignment Presentation Title Goes Here 1 FASTQ Format: The de- facto file format for sharing sequence read data Sequence and a per- base quality score
More informationIntroduction to NGS data analysis
Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High
More informationAnalysis of NGS Data
Analysis of NGS Data Introduction and Basics Folie: 1 Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference
More informationWorkload Characteristics of DNA Sequence Analysis: from Storage Systems Perspective
Workload Characteristics of DNA Sequence Analysis: from Storage Systems Perspective Kyeongyeol Lim, Geehan Park, Minsuk Choi, Youjip Won Hanyang University 7 Seongdonggu Hangdangdong, Seoul, Korea {lkyeol,
More informationAccelerating variant calling
Accelerating variant calling Mauricio Carneiro GSA Broad Institute Intel Genomic Sequencing Pipeline Workshop Mount Sinai 12/10/2013 This is the work of many Genome sequencing and analysis team Mark DePristo
More informationTowards Integrating the Detection of Genetic Variants into an In-Memory Database
Towards Integrating the Detection of Genetic Variants into an 2nd International Workshop on Big Data in Bioinformatics and Healthcare Oct 27, 2014 Motivation Genome Data Analysis Process DNA Sample Base
More informationsnp-search: simple processing, manipulation and searching of SNPs from high-throughput sequencing
Al-Shahib and Underwood BMC Bioinformatics 2013, 14:326 SOFTWARE Open Access snp-search: simple processing, manipulation and searching of SNPs from high-throughput sequencing Ali Al-Shahib * and Anthony
More informationNext generation sequencing (NGS)
Next generation sequencing (NGS) Vijayachitra Modhukur BIIT modhukur@ut.ee 1 Bioinformatics course 11/13/12 Sequencing 2 Bioinformatics course 11/13/12 Microarrays vs NGS Sequences do not need to be known
More informationt e c h n i c a l r e p o r t s
t e c h n i c a l r e p o r t s A framework for variation discovery and genotyping using next-generation DNA sequencing data 2 Nature America, Inc. All rights reserved. Mark A DePristo, Eric Banks, Ryan
More informationAccelerating Data-Intensive Genome Analysis in the Cloud
Accelerating Data-Intensive Genome Analysis in the Cloud Nabeel M Mohamed Heshan Lin Wu-chun Feng Department of Computer Science Virginia Tech Blacksburg, VA 24060 {nabeel, hlin2, wfeng}@vt.edu Abstract
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationTutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment
Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249
More informationNew solutions for Big Data Analysis and Visualization
New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology
More informationChallenges associated with analysis and storage of NGS data
Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group gabry@ebi.ac.uk Next-generation sequencing Next-generation sequencing
More informationData Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute
Data Analysis & Management of High-throughput Sequencing Data Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Current Issues Current Issues The QSEQ file Number files per
More informationAbout the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster
Cluster Info Sheet About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster Welcome to the PMCBRC cluster! We are happy to provide and manage this compute cluster as a resource
More informationVersion 5.0 Release Notes
Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com
More information-> Integration of MAPHiTS in Galaxy
Enabling NGS Analysis with(out) the Infrastructure, 12:0512 Development of a workflow for SNPs detection in grapevine From Sets to Graphs: Towards a Realistic Enrichment Analy species: MAPHiTS -> Integration
More information«Object-Oriented Multi-Methods in Cecil» Craig Chambers (Cours IFT6310, H08)
«Object-Oriented Multi-Methods in Cecil» Craig Chambers (Cours IFT6310, H08) Mathieu Lemoine 2008/02/25 Craig Chambers : Professeur à l Université de Washington au département de Computer Science and Engineering,
More informationUsing Galaxy for NGS Analysis. Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org
Using Galaxy for NGS Analysis Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org Overview NGS Data Galaxy tools for NGS Data Galaxy for Sequencing Facilities Overview
More informationAnalysis of ChIP-seq data in Galaxy
Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers
More informationDeep Sequencing Data Analysis
Deep Sequencing Data Analysis Ross Whetten Professor Forestry & Environmental Resources Background Who am I, and why am I teaching this topic? I am not an expert in bioinformatics I started as a biologist
More informationEoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille
Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille Journées SUCCES Stéphane Le Crom (UPMC IBENS) stephane.le_crom@upmc.fr Paris November 2013 The Sanger DNA sequencing method Sequencing
More informationHadoop. Bioinformatics Big Data
Hadoop Bioinformatics Big Data Paolo D Onorio De Meo Mattia D Antonio p.donoriodemeo@cineca.it m.dantonio@cineca.it Big Data Too much information! Big Data Explosive data growth proliferation of data capture
More informationCopy Number Variation: available tools
Copy Number Variation: available tools Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Introduction A literature review of available
More informationESMA REGISTERS OJ/26/06/2012-PROC/2012/004. Questions/ Answers
ESMA REGISTERS OJ/26/06/2012-PROC/2012/004 Questions/ Answers Question n.10 (dated 18/07/2012) In the Annex VII Financial Proposal, an estimated budget of 1,500,000 Euro is mentioned for the total duration
More informationCloudMap: A Cloud-based Pipeline for Analysis of Mutant Genome Sequences
Genetics: Advance Online Publication, published on October 10, 2012 as 10.1534/genetics.112.144204 CloudMap: A Cloud-based Pipeline for Analysis of Mutant Genome Sequences Gregory Minevich 1,, Danny S.
More informationBasic processing of next-generation sequencing (NGS) data
Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
More informationSingle-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation
PN 100-9879 A1 TECHNICAL NOTE Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation Introduction Cancer is a dynamic evolutionary process of which intratumor genetic and phenotypic
More informationMapReducing a Genomic Sequencing Workflow
MapReducing a Genomic Sequencing Workflow Luca Pireddu CRS4 Pula, CA, Italy luca.pireddu@crs4.it Simone Leo CRS4 Pula, CA, Italy simone.leo@crs4.it Gianluigi Zanetti CRS4 Pula, CA, Italy gianluigi.zanetti@crs4.it
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationBuilding Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis
Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis Yanlei Diao, Abhishek Roy University of Massachusetts Amherst {yanlei,aroy}@cs.umass.edu Toby Bloom New York Genome Center tbloom@nygenome.org
More informationIntegrated Rule-based Data Management System for Genome Sequencing Data
Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer
More informationIntroduction au BIM. ESEB 38170 Seyssinet-Pariset Economie de la construction email : contact@eseb.fr
Quel est l objectif? 1 La France n est pas le seul pays impliqué 2 Une démarche obligatoire 3 Une organisation plus efficace 4 Le contexte 5 Risque d erreur INTERVENANTS : - Architecte - Économiste - Contrôleur
More informationHADOOP IN THE LIFE SCIENCES:
White Paper HADOOP IN THE LIFE SCIENCES: An Introduction Abstract This introductory white paper reviews the Apache Hadoop TM technology, its components MapReduce and Hadoop Distributed File System (HDFS)
More informationAudit de sécurité avec Backtrack 5
Audit de sécurité avec Backtrack 5 DUMITRESCU Andrei EL RAOUSTI Habib Université de Versailles Saint-Quentin-En-Yvelines 24-05-2012 UVSQ - Audit de sécurité avec Backtrack 5 DUMITRESCU Andrei EL RAOUSTI
More informationNext Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline Workflows
Genes 2012, 3, 545-575; doi:10.3390/genes3030545 Article OPEN ACCESS genes ISSN 2073-4425 www.mdpi.com/journal/genes Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline
More informationHiSeq Analysis Software v0.9 User Guide
HiSeq Analysis Software v0.9 User Guide FOR RESEARCH USE ONLY Quick Start 4 Introduction 5 Enrichment Analysis Workflow 6 Whole Genome Sequencing Analysis Workflow 8 Additional Software 12 Installing HiSeq
More informationNext Generation Sequencing: Technology, Mapping, and Analysis
Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took
More informationCloud-Based Big Data Analytics in Bioinformatics
Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large
More informationRemoving Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data
Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University HiCOMB 2014 May 19 th, Phoenix, Arizona 1 Outline
More informationHadoop-BAM and SeqPig
Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3 1 Department of Computer
More informationMapReduce Détails Optimisation de la phase Reduce avec le Combiner
MapReduce Détails Optimisation de la phase Reduce avec le Combiner S'il est présent, le framework insère le Combiner dans la pipeline de traitement sur les noeuds qui viennent de terminer la phase Map.
More informationAccessing the 1000 Genomes Data. Paul Flicek European BioinformaMcs InsMtute
Accessing the 1000 Genomes Data Paul Flicek European BioinformaMcs InsMtute Data access General informamon File access 1000 Genomes Browser Tools Where to find help www.1000genomes.org www.1000genomes.org
More informationComparing Methods for Identifying Transcription Factor Target Genes
Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF
More informationLarge-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri
Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis
More informationGenotyping by sequencing and data analysis. Ross Whetten North Carolina State University
Genotyping by sequencing and data analysis Ross Whetten North Carolina State University Stein (2010) Genome Biology 11:207 More New Technology on the Horizon Genotyping By Sequencing Timeline 2007 Complexity
More informationCore Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationAssuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines
Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice Next-generation Sequencing: Standardization of Clinical Testing (Nex-StoCT) Workgroup Principles and Guidelines Supplementary
More informationHadoopizer : a cloud environment for bioinformatics data analysis
Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) anthony.bretaudeau@irisa.fr, INRIA/Irisa, Campus de Beaulieu, 35042,
More informationDeep Sequencing Data Analysis: Challenges and Solutions
29 Deep Sequencing Data Analysis: Challenges and Solutions Ofer Isakov and Noam Shomron Sackler Faculty of Medicine, Tel Aviv University, Israel 1. Introduction Ultra high throughput sequencing, also known
More informationCOMPARISON OF BIG DATA ANALYTICS TOOLS: A BIOINFORMATICS CASE STUDY
SHAHZAD AND AHSAN (2014), FUUAST J. BIOL., 4(1): 113-118 COMPARISON OF BIG DATA ANALYTICS TOOLS: A BIOINFORMATICS CASE STUDY MUHAMMAD SHAHZAD 1 AND KAMRAN AHSAN 2 1 Department of Computer Science, PAF
More informationIntroduction ToIP/Asterisk Quelques applications Trixbox/FOP Autres distributions Conclusion. Asterisk et la ToIP. Projet tuteuré
Asterisk et la ToIP Projet tuteuré Luis Alonso Domínguez López, Romain Gegout, Quentin Hourlier, Benoit Henryon IUT Charlemagne, Licence ASRALL 2008-2009 31 mars 2009 Asterisk et la ToIP 31 mars 2009 1
More informationCahier de réalisation
Référence : cahier_realisation_mini_projet-sepia-theba-1.0 Page : 1/8 Cahier de réalisation SEPIA THEBA REDACTION Nom, prénom Gildas Le Corguillé Erwan Corre Unité ABiMS ABiMS Version Date Nature des modifications
More informationRNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012
RNA-Seq Tutorial 1 John Garbe Research Informatics Support Systems, MSI March 19, 2012 Tutorial 1 RNA-Seq Tutorials RNA-Seq experiment design and analysis Instruction on individual software will be provided
More informationCSE-E5430 Scalable Cloud Computing. Lecture 4
Lecture 4 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 5.10-2015 1/23 Hadoop - Linux of Big Data Hadoop = Open Source Distributed Operating System
More informationHadoop s Rise in Life Sciences
Exploring EMC Isilon scale-out storage solutions Hadoop s Rise in Life Sciences By John Russell, Contributing Editor, Bio IT World Produced by Cambridge Healthtech Media Group By now the Big Data challenge
More informationSeqPig: simple and scalable scripting for large sequencing data sets in Hadoop
SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop André Schumacher, Luca Pireddu, Matti Niemenmaa, Aleksi Kallio, Eija Korpelainen, Gianluigi Zanetti and Keijo Heljanko Abstract
More informationHENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT
HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT Kimberly Bishop Lilly 1,2, Truong Luu 1,2, Regina Cer 1,2, and LT Vishwesh Mokashi 1 1 Naval Medical Research Center, NMRC Frederick, 8400 Research Plaza,
More informationToward Efficient Variant Calling inside Main-Memory Database Systems
Toward Efficient Variant Calling inside Main-Memory Database Systems Sebastian Dorok Bayer Pharma AG and sebastian.dorok@ovgu.de Sebastian Breß sebastian.bress@ovgu.de Gunter Saake gunter.saake@ovgu.de
More informationHigh Throughput Sequencing Data Analysis using Cloud Computing
High Throughput Sequencing Data Analysis using Cloud Computing Stéphane Le Crom (stephane.le_crom@upmc.fr) LBD - Université Pierre et Marie Curie (UPMC) Institut de Biologie de l École normale supérieure
More informationStockage distribué sous Linux
Félix Simon Ludovic Gauthier IUT Nancy-Charlemagne - LP ASRALL Mars 2009 1 / 18 Introduction Répartition sur plusieurs machines Accessibilité depuis plusieurs clients Vu comme un seul et énorme espace
More informationSAP HANA Enabling Genome Analysis
SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in
More informationSeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications
Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each
More informationIntroduction to next-generation sequencing data
Introduction to next-generation sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/research-centres/cem/ Outline History of DNA sequencing NGS
More informationReconstruction d un modèle géométrique à partir d un maillage 3D issu d un scanner surfacique
Reconstruction d un modèle géométrique à partir d un maillage 3D issu d un scanner surfacique Silvère Gauthier R. Bénière, W. Puech, G. Pouessel, G. Subsol LIRMM, CNRS, Université Montpellier, France C4W,
More informationIntroduction. GEAL Bibliothèque Java pour écrire des algorithmes évolutionnaires. Objectifs. Simplicité Evolution et coévolution Parallélisme
GEAL 1.2 Generic Evolutionary Algorithm Library http://dpt-info.u-strasbg.fr/~blansche/fr/geal.html 1 /38 Introduction GEAL Bibliothèque Java pour écrire des algorithmes évolutionnaires Objectifs Généricité
More information8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)
Experimental Design & Intro to NGS Data Analysis Ryan Peters Field Application Specialist Partek, Incorporated Agenda Experimental Design Examples ANOVA What assays are possible? NGS Analytical Process
More informationSun Enterprise Optional Power Sequencer Installation Guide
Sun Enterprise Optional Power Sequencer Installation Guide For the Sun Enterprise 6500/5500 System Cabinet and the Sun Enterprise 68-inch Expansion Cabinet Sun Microsystems, Inc. 901 San Antonio Road Palo
More informationCloudflow A Framework for MapReduce Pipeline Development in Biomedical Research
Cloudflow A Framework for MapReduce Pipeline Development in Biomedical Research Lukas Forer 1,2, Enis Afgan 3,4, Hansi Weißensteiner 1,2, Davor Davidović 3, Günther Specht 2, Florian Kronenberg 1, Sebastian
More informationSun Management Center 3.6 Version 5 Add-On Software Release Notes
Sun Management Center 3.6 Version 5 Add-On Software Release Notes For Sun Fire, Sun Blade, Netra, and Sun Ultra Systems Sun Microsystems, Inc. www.sun.com Part No. 819-7977-10 October 2006, Revision A
More informationProcessing NGS Data with Hadoop-BAM and SeqPig
Processing NGS Data with Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3
More informationLe projet européen ECOLABEL
4/02/2015 Le projet européen ECOLABEL Agnès JULLIEN (Ifsttar) WP1 leader (key performance indicators) agnes.jullien@ifsttar.fr ECOLABEL PROJECT PROPOSED METHODOLOGY and KPIs Main concepts Development of
More informationN1 Grid Service Provisioning System 5.0 User s Guide for the Linux Plug-In
N1 Grid Service Provisioning System 5.0 User s Guide for the Linux Plug-In Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 819 0735 December 2004 Copyright 2004 Sun Microsystems,
More informationSun StorEdge Availability Suite Software Point-in-Time Copy Software Maximizing Backup Performance
Sun StorEdge Availability Suite Software Point-in-Time Copy Software Maximizing Backup Performance A Best Practice Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. 650-960-1300 Part
More informationComputational Requirements
Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Computational Requirements Steve Sherry, Lisa Brooks, Paul Flicek, Anton Nekrutenko, Kenna Shaw, Heidi Sofia High-density
More informationPractical Solutions for Big Data Analytics
Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)
More informationData formats and file conversions
Building Excellence in Genomics and Computational Bioscience s Richard Leggett (TGAC) John Walshaw (IFR) Common file formats FASTQ FASTA BAM SAM Raw sequence Alignments MSF EMBL UniProt BED WIG Databases
More informationSetting up a monitoring and remote control tool
Setting up a monitoring and remote control tool Oral examination for internship - Second year of Master in Computer Sciences Kevin TAOCHY Department of Mathematics and Computer Sciences University of Reunion
More informationEfficient storage of high throughput DNA sequencing data using reference-based compression
Method Efficient storage of high throughput DNA sequencing data using reference-based compression Markus Hsi-Yang Fritz, Rasko Leinonen, Guy Cochrane, and Ewan Birney 1 European Molecular Biology Laboratory
More informationHow To Find Rare Variants In The Human Genome
UNIVERSITÀ DEGLI STUDI DI SASSARI Scuola di Dottorato in Scienze Biomediche XXV CICLO DOTTORATO DI RICERCA IN SCIENZE BIOMEDICHE INDIRIZZO DI GENETICA MEDICA, MALATTIE METABOLICHE E NUTRIGENOMICA Direttore:
More informationListe d'adresses URL
Liste de sites Internet concernés dans l' étude Le 25/02/2014 Information à propos de contrefacon.fr Le site Internet https://www.contrefacon.fr/ permet de vérifier dans une base de donnée de plus d' 1
More informationGuide Share France Groupe de Travail MQ sept 2013
Guide Share France Groupe de Travail MQ sept 2013 Carl Farkas Pan-EMEA zwebsphere Application Integration Consultant IBM France D/2708 Paris, France Internet : farkas@fr.ibm.com 2013 IBM Corporation p1
More informationNext generation DNA sequencing technologies. theory & prac-ce
Next generation DNA sequencing technologies theory & prac-ce Outline Next- Genera-on sequencing (NGS) technologies overview NGS applica-ons NGS workflow: data collec-on and processing the exome sequencing
More informationCOLLABORATIVE LCA. Rachel Arnould and Thomas Albisser. Hop-Cube, France
COLLABORATIVE LCA Rachel Arnould and Thomas Albisser Hop-Cube, France Abstract Ecolabels, standards, environmental labeling: product category rules supporting the desire for transparency on products environmental
More informationFactors for success in big data science
Factors for success in big data science Damjan Vukcevic Data Science Murdoch Childrens Research Institute 16 October 2014 Big Data Reading Group (Department of Mathematics & Statistics, University of Melbourne)
More informationBioinforma)cs workpackages
Bioinforma)cs workpackages IFB General Assembly January 2016 Valen)n Loux Valen)n.loux@jouy.inra.fr WP organiza)on WP1: two tasks oriented towards the e- infrastructure and regional sequencing facili)es
More informationUpgrading the Solaris PC NetLink Software
Upgrading the Solaris PC NetLink Software By Don DeVitt - Enterprise Engineering Sun BluePrints OnLine - January 2000 http://www.sun.com/blueprints Sun Microsystems, Inc. 901 San Antonio Road Palo Alto,
More informationBioinformatics Unit Department of Biological Services. Get to know us
Bioinformatics Unit Department of Biological Services Get to know us Domains of Activity IT & programming Microarray analysis Sequence analysis Bioinformatics Team Biostatistical support NGS data analysis
More informationRNAseq / ChipSeq / Methylseq and personalized genomics
RNAseq / ChipSeq / Methylseq and personalized genomics 7711 Lecture Subhajyo) De, PhD Division of Biomedical Informa)cs and Personalized Biomedicine, Department of Medicine University of Colorado School
More informationA Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System
A Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System Young-Ho Kim, Eun-Ji Lim, Gyu-Il Cha, Seung-Jo Bae Electronics and Telecommunications
More informationHigh Performance Compu2ng Facility
High Performance Compu2ng Facility Center for Health Informa2cs and Bioinforma2cs Accelera2ng Scien2fic Discovery and Innova2on in Biomedical Research at NYULMC through Advanced Compu2ng Efstra'os Efstathiadis,
More informationModifier le texte d'un élément d'un feuillet, en le spécifiant par son numéro d'index:
Bezier Curve Une courbe de "Bézier" (fondé sur "drawing object"). select polygon 1 of page 1 of layout "Feuillet 1" of document 1 set class of selection to Bezier curve select Bezier curve 1 of page 1
More informationBioinformatique sur Cloud Cas d usage avec le portail Galaxy
Bioinformatique sur Cloud Cas d usage avec le portail Galaxy Christophe Blanchet Institute of Biology and Chemistry of Proteins Head of Service Infrastructure for Biology - IDB CNRS-IBCP FR3302 - LYON
More informationSRA File Formats Guide
SRA File Formats Guide Version 1.1 10 Mar 2010 National Center for Biotechnology Information National Library of Medicine EMBL European Bioinformatics Institute DNA Databank of Japan 1 Contents SRA File
More information