Overview sequence projects



Similar documents
Handling next generation sequence data

Next Generation Sequencing

Introduction to NGS data analysis

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis

Bioinformatics Unit Department of Biological Services. Get to know us

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

Delivering the power of the world s most successful genomics platform

Data Analysis for Ion Torrent Sequencing

Next generation DNA sequencing technologies. theory & prac-ce

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

LifeScope Genomic Analysis Software 2.5

Next generation sequencing (NGS)

BioHPC Web Computing Resources at CBSU

Running a Bioinformatics Help Desk. Solved and Unsolved Problems

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls?

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production

G E N OM I C S S E RV I C ES

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Bioinformatics Grid - Enabled Tools For Biologists.

GeneSifter: Next Generation Data Management and Analysis for Next Generation Sequencing

The University is comprised of seven colleges and offers 19. including more than 5000 graduate students.

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Final Project Report

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

Molecular and Cell Biology Laboratory (BIOL-UA 223) Instructor: Ignatius Tan Phone: Office: 764 Brown

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

CCR Biology - Chapter 9 Practice Test - Summer 2012

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Bioinformatics Resources at a Glance

Basic processing of next-generation sequencing (NGS) data

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

History of DNA Sequencing & Current Applications

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

Introduction to next-generation sequencing data

Typing in the NGS era: The way forward!

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Practical Solutions for Big Data Analytics

Copy Number Variation: available tools

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

Next Generation Sequencing: Technology, Mapping, and Analysis

The world of non-coding RNA. Espen Enerly

-> Integration of MAPHiTS in Galaxy

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Overview of Next Generation Sequencing platform technologies

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Human Genome Organization: An Update. Genome Organization: An Update

Course Descriptions. I. Professional Courses: MSEG 7216: Introduction to Infectious Diseases (Medical Students)

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle

Big Data Challenges. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm

Analysis of ChIP-seq data in Galaxy

Processing Genome Data using Scalable Database Technology. My Background

A Primer of Genome Science THIRD

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

Next Generation Sequencing; Technologies, applications and data analysis

Disease gene identification with exome sequencing

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Analysis of Illumina Gene Expression Microarray Data

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Version 5.0 Release Notes

Current Motif Discovery Tools and their Limitations

Graduate Certificate Pre-Med Program Course Descriptions For Year FALL

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

NIH/NIGMS Trainee Forum: Computational Biology and Medical Informatics at Georgia Tech

Next Generation Sequencing Technologies in Microbial Ecology. Frank Oliver Glöckner

Hadoopizer : a cloud environment for bioinformatics data analysis

Immunology Ambassador Guide (updated 2014)

TCB No September Technical Bulletin. GS FLX+ System & GS FLX System. Installation of 454 Sequencing System Software v2.

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

E. coli plasmid and gene profiling using Next Generation Sequencing

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

Transcription:

Overview sequence projects Bioassist NGS meeting 15-01-2010 Barbera van Schaik KEBB - Bioinformatics Laboratory b.d.vanschaik@amc.uva.nl

NGS at the Academic Medical Center Sequence facility Laboratory Division Bioinformatics Laboratory - KEBB Roche (454) sequencer ABI Solid sequencer 2

IT resources Sequence laboratory Roche onrig Roche cluster Solid on-instrument cluster Data analysis server (4 dual core, 5TB) Bioinformatics laboratory Linux server 1: 16 quad core Linux server 2: 8 quad core Shared file system 2TB Other Biostatistics cluster (specs?) Central storage (ICT department) Backups at SARA Dutch grid (VLe, EBioInfra, VBrowser, Moteur) 3

Software Software we use: blat (local and on grid) blast (on grid) roche package celera assembler (cabog) R solid RNApipeline Programming language Perl and shell scripting Java 4

NGS data analysis at the AMC Basics: Group sequences per MID/barcode, primer, or any other sequence Count things Run existing analysis software (loops, file handling) Calculate read coverage (to load into the genome browser) Projects Mutation screening T- and B-cell variants Virus discovery Alternative splicing Bacterial genomes microrna expression... Several departments Rheumatology / Immunology Virus Discovery Unit Neurogenetics Neurology / Medical microbiology Experimental virology / Sequence lab 5

T and B cell variation Rheumatology / Immunology

TCR Rheumatoid arthritis 7 http://en.wikipedia.org/wiki/t_cell_receptor

Total theoretical variation Paul Klarenbeek 8

Goal: identify and enumerate TCR variants Thymocytes Germline DNA mrna Paul Klarenbeek CDR3 region Unique for each clonal expansion 9

T-cell pipeline 5 C V N D N J polya variable region 30-60 bp Convert sff to fasta+quality scores Identify: MIDs, primers Sort sequences based on MID and region Also applicable to B-cell variants (Marieke Doorenspleet) Identify the V, J and C segments Count variants Locate highly variable area Quality control Perl scripts Roche software BLAT Access/Excel

Virus discovery Virus discovery unit

Vidisca Extract virus DNA and RNA Digest DNA Amplify Sequence with selective primers Selective PCR (16 primer combin.) Direct 454 sequencing 12 Michel de Vries

Blast on grid with e-bioinfra 13

Splice variant detection Neurogenetics

Splice variant detection WT 1 2 3 4 5 6 7 8 9 10 11 11b 12 tissue-specific 1 2 3 4 5 6 7 8 9 10 11 11b 12 tissue-specific frameshift 1 2 3 4 5 6 7 8 9 10 11 11c 12 Katja Ritz

Program overview Sequence run: all data Submit grid jobs for each combination cmd> blat S1 S1 > blat.out cmd> R_graph.pl blat.out Split data Sequence set n Sequence set 3 Sequence set 2 Sequence set 1 All jobs finished? si no Wait Collect output Merge identical sequences Reduced sequence set n Reduced sequence set 3 Reduced sequence set 2 Reduced sequence set 1 Blat groups against genome Compare all sequences within one set Check if groups are correct 16

Example of output 17

Comparison of bacterial genomes Neurology / Medical microbiology

Comparison of bacteria strains between two groups of patients with meningitis (good vs bad outcome) Whole genome sequencing (20 strains are sequenced) Sort samples per MID Genome assembly with Cabog and Newbler (de novo and with reference sequences) Genome annotation using the Comprehensive Microbial Resource Differences between strains will be detected using a DNA only detection tool Jurgen Piet 19

Sequence assembly MID sorting: allowing for 0 or 2 errors Newbler 2.0 Newbler 2.3 Cabog 5.4 Cabog 6-beta 20 Note that the commandline and GUI interface of Newbler gives different results! (Check tips&tricks on Bioassist wiki) Combine assemblies?

Keep track of projects Bioinformatics laboratory

Communication - log - file sharing 22

People and resources Sequencing facility operators SeqLab server DNA sequencers sequence data workstations sequence data results analyses algorithm developers workflow users BioLab server workstations Bioinformaticians biomedical researchers Research laboratories 23

People and labs (selection) Rheumatology / immunology Paul Klarenbeek Marieke Doorenspleet Niek de Vries Virus discovery unit Michel de Vries Martin Deijs Lia van der Hoek Neurology / Medical microbiology Jurgen Piet Ewout Jansen Diederik van de Beek Arie van der Ende Clinical genetics Olaf Mook Jean Soucy Neurogenetics / Sequence facility Katja Ritz Marja Jakobs Ted Bradley Frank Baas Bioinformatics laboratory - KEBB Angela Luyf Marcel Willemsen Barbera van Schaik Silvia D Olabarriaga Antoine van Kampen 24