PAGANTEC: OPENMP PARALLEL ERROR CORRECTION FOR NEXT-GENERATION SEQUENCING DATA
|
|
|
- Baldric Robinson
- 10 years ago
- Views:
Transcription
1 PAGANTEC: OPENMP PARALLEL ERROR CORRECTION FOR NEXT-GENERATION SEQUENCING DATA Markus Joppich, Tony Bolger, Dirk Schmidl Björn Usadel and Torsten Kuhlen Markus Joppich Lehr- und Forschungseinheit Bioinformatik Institut für Informatik LMU München
2 PAGANtec PAGANtec Probabilistic Algorithm for Genome Assembly of Next-Generation Sequencing Data transcriptome error correction 1. Biological Background 2. PAGANtec 3. OpenMP Parallelism in PAGANtec 4. Conclusion Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 2
3 Biology / Sequencing DNA consists of words w = Σ = A, T, G, C from a 4 letter alphabet bases or base-pairs (bp) Sequencing tries to translate from molecule to textual representation Many technologies Sanger (<1000bp) NGS ( ~ bp) Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 3
4 Raw Read Data >SequenceID AABASDA >SequenceID AABASDA Fragments/Sequences are called reads Sequencing is error-prone Up to 10% for early Illumina, 30% Nanopore Huge amount of data: several 100GB Not easy to spot mistakes by eye Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 4
5 Assembly Assembly: put reads together to get original sequence Assembly Problem: Maximize the likelihood that the assembly is the source of the set of reads Gene/Transcript Reads Aligned Reads Genome Several Methods: Overlap / Layout / Consenus or De Bruijn Graph Problem complexity ~ Graph complexity Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 5
6 DBG: Graph Structure 1. Take every read 2. Split read into k long subwords k-mers s-mers 3. Connect unique k-mers if consecutive in read Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 6
7 Error Correction Several Structures can be observed in k-mer graph: Tangles, Spurs and Bubbles due to errors in reads, make assembly harder Essentially error correction removes such artifacts Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 7
8 Error Correction There are several error correctors available Mostly for genomes Different characteristics SEECER error corrector for transcriptomes [Le 2013] Results not satisfying enough Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 8
9 PAGAN Graph Structure Allows lossless storage of reads in a k-mer-graph Single reads can be extracted Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 10
10 PAGANtec: algorithmic Correction strategey depends on the location of error Error at an end Or forming a bubble or long spur Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 11
11 PAGANtec: algorithmic Short tip correction Detect erroneous sequence Replace by correct sequence Attention: tails can be used by other reads! Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 12
12 PAGANtec: algorithmic Long tip/bubble correction Detect erroneous sequence Store correction and corrected sequence Not depending on other reads! Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 13
13 PAGANtec: results SEECER PAGANtec Good correction in general and in detail! No structures forming However slower. Easy parallelization Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 14
14 PAGANtec: filter Each algorithm for Short tips Long tips Bubbles Essentially same steps Detect Errors Find correction Store/Apply correction Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 15
15 PAGANtec: parallelism Apply #pragma omp parallel for ~ ~ < 10 1 > 10 6 Width of smer FilterTips: race conditions / data race! Correction Storage: Threadsafe Hash-Map Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 16
16 PAGANtec: load imbalance Several stages identifiable Problems detected: Tips have a low CPU usage! One thread takes essentially longer than others! Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 17
17 PAGANtec: load imbalance Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 18
18 PAGANtec: characteristics Low CPU usage? Locking! Load Imbalance? Very coarse grained! First reason: omp parallel for Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 19
19 PAGANtec: characteristics Second reason: busy k-mers Low average route width for whole dataset O 10 1 Accumulated Width explains behaviour Very few s-mers account major part of width! Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 20
20 Lessons Learned OpenMP parallel task construct Create batches of s-mers with 2000 route entries Create extra tasks for routes with > 500 entries Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 21
21 PAGANtec: parallelism Make use of parallel tasks Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 22
22 PAGANtec: revised Why is there still a load imbalance? How many s-mers are there per task? How many route entries (width) are per task? Red Ticks: > 1 inner task Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 23
23 Conclusion Convoluted graph structure Not easily splittable into same chunks Locking poses a problem #pragma omp task greatly improves performance Not possible with #pragma omp parallel for Manual creation of chunks Still difficulties from coarse grained structure Prioritize chunk to avoid imbalance Instead of leaving it at end of queue Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 24
24 Thank you for your attention! Dirk Schmidl & Torsten Kuhlen Tony Bolger & Björn Usadel Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 25
25 Literature [Le 2013] Le, H.S., Schulz, M.H., McCauley, B.M., Hinman, V.F., Bar-Joseph, Z.: Probabilistic error correction for RNA sequencing. Nucleic acids research 41(10), e109 (May 2013) Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 26
26 Chunk Creation Manual effort to create chunks Pattern observed frequently with tasks! Often leads to some (minor) code duplication Integrate into OpenMP-clause? Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 27
27 Frequency Flows per s-mer G = (V, E) V = n n k mers E = i j suff j, k 1 == pref i, k 1 Markus Joppich, LFE Bioinformatik, Institut für Informatik, LMU München 28
Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center
Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing
Using the Intel Inspector XE
Using the Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) Race Condition Data Race: the typical OpenMP programming error, when: two or more threads access the same memory
The NGS IT notes. George Magklaras PhD RHCE
The NGS IT notes George Magklaras PhD RHCE Biotechnology Center of Oslo & The Norwegian Center of Molecular Medicine University of Oslo, Norway http://www.biotek.uio.no http://www.ncmm.uio.no http://www.no.embnet.org
Computational Genomics. Next generation sequencing (NGS)
Computational Genomics Next generation sequencing (NGS) Sequencing technology defies Moore s law Nature Methods 2011 Log 10 (price) Sequencing the Human Genome 2001: Human Genome Project 2.7G$, 11 years
Introduction to next-generation sequencing data
Introduction to next-generation sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/research-centres/cem/ Outline History of DNA sequencing NGS
How Sequencing Experiments Fail
How Sequencing Experiments Fail v1.0 Simon Andrews [email protected] Classes of Failure Technical Tracking Library Contamination Biological Interpretation Something went wrong with a machine
Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment
Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249
Analysis of ChIP-seq data in Galaxy
Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers
Visualization of Phylogenetic Trees and Metadata
Visualization of Phylogenetic Trees and Metadata November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com [email protected]
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
Hadoopizer : a cloud environment for bioinformatics data analysis
Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) [email protected], INRIA/Irisa, Campus de Beaulieu, 35042,
Introduction to NGS data analysis
Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High
Basic processing of next-generation sequencing (NGS) data
Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms
Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Introduction Mate pair sequencing enables the generation of libraries with insert sizes in the range of several kilobases (Kb).
DNA Mapping/Alignment. Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky
DNA Mapping/Alignment Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky Overview Summary Research Paper 1 Research Paper 2 Research Paper 3 Current Progress Software Designs to
Big Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.
Big Data Challenges technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data Deluge: Due to the changes in big data generation Example: Biomedicine
MiSeq: Imaging and Base Calling
MiSeq: Imaging and Page Welcome Navigation Presenter Introduction MiSeq Sequencing Workflow Narration Welcome to MiSeq: Imaging and. This course takes 35 minutes to complete. Click Next to continue. Please
When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want
1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very
Bioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
July 7th 2009 DNA sequencing
July 7th 2009 DNA sequencing Overview Sequencing technologies Sequencing strategies Sample preparation Sequencing instruments at MPI EVA 2 x 5 x ABI 3730/3730xl 454 FLX Titanium Illumina Genome Analyzer
SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
NGS data analysis. Bernardo J. Clavijo
NGS data analysis Bernardo J. Clavijo 1 A brief history of DNA sequencing 1953 double helix structure, Watson & Crick! 1977 rapid DNA sequencing, Sanger! 1977 first full (5k) genome bacteriophage Phi X!
Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University
Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary
Delivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
Next generation sequencing (NGS)
Next generation sequencing (NGS) Vijayachitra Modhukur BIIT [email protected] 1 Bioinformatics course 11/13/12 Sequencing 2 Bioinformatics course 11/13/12 Microarrays vs NGS Sequences do not need to be known
Frequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
Next Generation Sequencing: Technology, Mapping, and Analysis
Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University [email protected] http://tandem.bu.edu/ The Human Genome Project took
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
De Novo Assembly Using Illumina Reads
De Novo Assembly Using Illumina Reads High quality de novo sequence assembly using Illumina Genome Analyzer reads is possible today using publicly available short-read assemblers. Here we summarize the
Reading DNA Sequences:
Reading DNA Sequences: 18-th Century Mathematics for 21-st Century Technology Michael Waterman University of Southern California Tsinghua University DNA Genetic information of an organism Double helix,
NGS Technologies for Genomics and Transcriptomics
NGS Technologies for Genomics and Transcriptomics Massimo Delledonne Department of Biotechnologies - University of Verona http://profs.sci.univr.it/delledonne 13 years and $3 billion required for the Human
Big Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, Counting Distinct Elements 2, 3, counting distinct elements problem formalization input: stream of elements o from some universe U e.g. ids from a set
Prepare the environment Practical Part 1.1
Prepare the environment Practical Part 1.1 The first exercise should get you comfortable with the computer environment. I am going to assume that either you have some minimal experience with command line
Biological Sequence Data Formats
Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA
Next Generation Sequencing
Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977
CHALLENGES IN NEXT-GENERATION SEQUENCING
CHALLENGES IN NEXT-GENERATION SEQUENCING BASIC TENETS OF DATA AND HPC Gray s Laws of data engineering 1 : Scientific computing is very dataintensive, with no real limits. The solution is scale-out architecture
Integrated Rule-based Data Management System for Genome Sequencing Data
Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer
CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/
CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu [email protected] 1. Introduction
Analysis of NGS Data
Analysis of NGS Data Introduction and Basics Folie: 1 Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference
Managing and Conducting Biomedical Research on the Cloud Prasad Patil
Managing and Conducting Biomedical Research on the Cloud Prasad Patil Laboratory for Personalized Medicine Center for Biomedical Informatics Harvard Medical School SaaS & PaaS gmail google docs app engine
Cancer Genomics: What Does It Mean for You?
Cancer Genomics: What Does It Mean for You? The Connection Between Cancer and DNA One person dies from cancer each minute in the United States. That s 1,500 deaths each day. As the population ages, this
Module 10: Bioinformatics
Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior
Next generation DNA sequencing technologies. theory & prac-ce
Next generation DNA sequencing technologies theory & prac-ce Outline Next- Genera-on sequencing (NGS) technologies overview NGS applica-ons NGS workflow: data collec-on and processing the exome sequencing
Non-invasive prenatal detection of chromosome aneuploidies using next generation sequencing: First steps towards clinical application
Non-invasive prenatal detection of chromosome aneuploidies using next generation sequencing: First steps towards clinical application PD Dr. rer. nat. Markus Stumm Zentrum für Pränataldiagnostik Kudamm-199
Big Data Visualization on the MIC
Big Data Visualization on the MIC Tim Dykes School of Creative Technologies University of Portsmouth [email protected] Many-Core Seminar Series 26/02/14 Splotch Team Tim Dykes, University of Portsmouth
Automated and Scalable Data Management System for Genome Sequencing Data
Automated and Scalable Data Management System for Genome Sequencing Data Michael Mueller NIHR Imperial BRC Informatics Facility Faculty of Medicine Hammersmith Hospital Campus Continuously falling costs
Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013
Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine
The International Journal Of Science & Technoledge (ISSN 2321 919X) www.theijst.com
THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE Efficient Parallel Processing on Public Cloud Servers using Load Balancing Manjunath K. C. M.Tech IV Sem, Department of CSE, SEA College of Engineering
Deep Sequencing Data Analysis
Deep Sequencing Data Analysis Ross Whetten Professor Forestry & Environmental Resources Background Who am I, and why am I teaching this topic? I am not an expert in bioinformatics I started as a biologist
GC3 Use cases for the Cloud
GC3: Grid Computing Competence Center GC3 Use cases for the Cloud Some real world examples suited for cloud systems Antonio Messina Trieste, 24.10.2013 Who am I System Architect
G E N OM I C S S E RV I C ES
GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E
Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis
Genetic Analysis Phenotype analysis: biological-biochemical analysis Behaviour under specific environmental conditions Behaviour of specific genetic configurations Behaviour of progeny in crosses - Genotype
Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille
Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille Journées SUCCES Stéphane Le Crom (UPMC IBENS) [email protected] Paris November 2013 The Sanger DNA sequencing method Sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format
, pp.91-100 http://dx.doi.org/10.14257/ijhit.2014.7.4.09 Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format Jingjing Zheng 1,* and Ting Wang 1, 2 1,* Parallel Software and Computational
Original-page small file oriented EXT3 file storage system
Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin E-mail: [email protected]
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY
Efficient Parallel Processing on Public Cloud Servers Using Load Balancing
Efficient Parallel Processing on Public Cloud Servers Using Load Balancing Valluripalli Srinath 1, Sudheer Shetty 2 1 M.Tech IV Sem CSE, Sahyadri College of Engineering & Management, Mangalore. 2 Asso.
Storage Solutions for Bioinformatics
Storage Solutions for Bioinformatics Li Yan Director of FlexLab, Bioinformatics core technology laboratory [email protected] http://www.genomics.cn/flexlab/index.html Science and Technology Division,
How To Cluster Of Complex Systems
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
Next Generation Sequencing; Technologies, applications and data analysis
; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome Technology Center,
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
Performance Characteristics of Large SMP Machines
Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller [email protected] Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark
BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis
BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis By the end of this lab students should be able to: Describe the uses for each line of the DNA subway program (Red/Yellow/Blue/Green) Describe
Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar
Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working
Windows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING Dr. R. Piazza SANGER SEQUENCING + DNA NEXT GENERATION SEQUENCING Flowcell NEXT GENERATION SEQUENCING Library di DNA Genomic DNA NEXT GENERATION SEQUENCING NEXT GENERATION SEQUENCING
Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA
Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow Barry Bolding Cray Inc Seattle, WA 1 CUG 2013 Paper Genomic Applications on Cray supercomputers: Next Generation Sequencing
WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed
WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between
CRAC: An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data.
: An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data. Nicolas Philippe and Mikael Salson and Thérèse Commes and Eric Rivals February 13, 2013 1 Results
Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010
Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010 This document is provided as-is. Information and views expressed in this document, including URL and other Internet
<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization
An Experimental Model to Analyze OpenMP Applications for System Utilization Mark Woodyard Principal Software Engineer 1 The following is an overview of a research project. It is intended
Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing
STGAAC STGAACT GTGCACT GTGAACT STGAAC STGAACT GTGCACT GTGAACT STGAAC STGAAC GTGCAC GTGAAC Wouter Coppieters Head of the genomics core facility GIGA center, University of Liège Bioruptor NGS: Unbiased DNA
Distributed Image Processing using Hadoop MapReduce framework. Binoy A Fernandez (200950006) Sameer Kumar (200950031)
using Hadoop MapReduce framework Binoy A Fernandez (200950006) Sameer Kumar (200950031) Objective To demonstrate how the hadoop mapreduce framework can be extended to work with image data for distributed
Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
Local Alignment Tool Based on Hadoop Framework and GPU Architecture
Local Alignment Tool Based on Hadoop Framework and GPU Architecture Che-Lun Hung * Department of Computer Science and Communication Engineering Providence University Taichung, Taiwan [email protected] *
FQbin: a compatible and optimized format for storing and managing sequence data
FQbin: a compatible and optimized format for storing and managing sequence data Darío Guerrero-Fernández, Rafael Larrosa, and M. Gonzalo Claros University of Málaga, Plataforma Andaluza de Bioinformática-SCBI,
Searching Nucleotide Databases
Searching Nucleotide Databases 1 When we search a nucleic acid databases, Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from the forward strand and 3 reading frames
OpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
Installation Guide for Windows
Installation Guide for Windows Overview: Getting Ready Installing Sequencher Activating and Installing the License Registering Sequencher GETTING READY Trying Sequencher: Sequencher 5.2 and newer requires
RNA Structure and folding
RNA Structure and folding Overview: The main functional biomolecules in cells are polymers DNA, RNA and proteins For RNA and Proteins, the specific sequence of the polymer dictates its final structure
Hybrid and Custom Data Structures: Evolution of the Data Structures Course
Hybrid and Custom Data Structures: Evolution of the Data Structures Course Daniel J. Ernst, Daniel E. Stevenson, and Paul Wagner Department of Computer Science University of Wisconsin Eau Claire Eau Claire,
Automated Library Preparation for Next-Generation Sequencing
Buyer s Guide: Automated Library Preparation for Next-Generation Sequencing What to consider as you evaluate options for automating library preparation. Yes, success can be automated. Next-generation sequencing
Scaling up to Production
1 Scaling up to Production Overview Productionize then Scale Building Production Systems Scaling Production Systems Use Case: Scaling a Production Galaxy Instance Infrastructure Advice 2 PRODUCTIONIZE
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,
8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)
Experimental Design & Intro to NGS Data Analysis Ryan Peters Field Application Specialist Partek, Incorporated Agenda Experimental Design Examples ANOVA What assays are possible? NGS Analytical Process
Intro to Bioinformatics
Intro to Bioinformatics Marylyn D Ritchie, PhD Professor, Biochemistry and Molecular Biology Director, Center for Systems Genomics The Pennsylvania State University Sarah A Pendergrass, PhD Research Associate
Data-Flow Awareness in Parallel Data Processing
Data-Flow Awareness in Parallel Data Processing D. Bednárek, J. Dokulil *, J. Yaghob, F. Zavoral Charles University Prague, Czech Republic * University of Vienna, Austria 6 th International Symposium on
PARALLELIZED SUDOKU SOLVING ALGORITHM USING OpenMP
PARALLELIZED SUDOKU SOLVING ALGORITHM USING OpenMP Sruthi Sankar CSE 633: Parallel Algorithms Spring 2014 Professor: Dr. Russ Miller Sudoku: the puzzle A standard Sudoku puzzles contains 81 grids :9 rows
Illumina Sequencing Technology
Illumina Sequencing Technology Highest data accuracy, simple workflow, and a broad range of applications. Introduction Figure 1: Illumina Flow Cell Illumina sequencing technology leverages clonal array
LabGenius. Technical design notes. The world s most advanced synthetic DNA libraries. [email protected] V1.5 NOV 15
LabGenius The world s most advanced synthetic DNA libraries Technical design notes [email protected] V1.5 NOV 15 Introduction OUR APPROACH LabGenius is a gene synthesis company focussed on the design and manufacture
New solutions for Big Data Analysis and Visualization
New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina [email protected] http://bioinfo.cipf.es/imedina Head of the Computational Biology
Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected].
Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected] Outline Motivation why do we need GPUs? Past - how was GPU programming
