BIG DATA BIG DATA 8/1/12. Cool Informa+cs Tools and Services for Biomedical Research. David Ruau, PhD. August 1 st, 2012
|
|
- Juniper Haynes
- 8 years ago
- Views:
Transcription
1 Cool Informa+cs Tools and Services for Biomedical Research David Ruau, PhD. August 1 st, Sponsored by the Office of Postdoctoral Affairs and the Lane Medical Library BIG DATA BIG DATA 1
2 Big Data in Biomedicine hip:// data- to- hit- milestone We live in a Big Course Data outline world 1. Analyzing genomic data 1. TradiOonal bioinformaocs tools 2. Microarrays/gene lists without any code 3. Microarrays/gene lists with code 4. NGS and mrna- seq 2. Beyond genomic 1. Protein- protein interacoon network 3. General data handling tools 1. Storing your data 2. Data are dirty 4. Sta+s+cs made easy 5. Graphics rules! 6. Demys+fying the work! (the code) 7. Conclusion + Q&A We live in TradiOonal a Big Data bioinformaocs world tools Bioinforma+cs sorware to solve everyday problems. The EMBOSS tool suite hip://emboss.sourceforge.net/ One web portal is: hip://mobyle.pasteur.fr/cgi- bin/portal.py - DNA / AA Pairwise global and local alignment - Sequence feature analysis (CpG island, gene scan, restricoon enzyme site, 2D/ 3D structure...) - Protein structure and domains - Similarity search (Blast, phi- blast, psi- blast, delta- blast...) - PhylogeneOcs (trees from mulople alignments)
3 We live in TradiOonal a Big Data bioinformaocs world tools Bioinforma+cs sorware to solve everyday problems. The EMBOSS tool suite hip://emboss.sourceforge.net/ One web portal is: hip://mobyle.pasteur.fr/cgi- bin/portal.py - DNA / AA Pairwise global and local alignment - Sequence feature analysis (CpG island, gene scan, restricoon enzyme site, 2D/ 3D structure...) - Protein structure and domains - Similarity search (Blast, phi- blast, psi- blast, delta- blast...) - PhylogeneOcs (trees from mulople alignments) -... UPGMA joining method We live in TradiOonal a Big Data bioinformaocs world tools Bioinforma+cs sorware to solve everyday problems. Some tools are provided through databases interface such as NCBI Entrez. - The UCSC genome browser. - The Encode project results - For example: visualize GC content and restricoon enzyme site in your gene of interest. StaOng the obvious This is not because you have a GUI that the analysis is brain dead simple. 3
4 We live in a Analyzing Big Data genomic world data Analyzing microarray gene expression microarray without any code. Gene PaIern: hip://genepaiern.broadinsotute.org/gp/ We live in a Analyzing Big Data genomic world data Upload your expression data as a text file. Gene PaIern takes RES and GCT files. Conversion tools are provided To transform CEL files to GCT. RES We live in a Analyzing Big Data genomic world data StarBiogene hip://web.mit.edu/star/biogene/index.html (java web app) - Part of GenePaIern but provide pipeline style process online SeqExpress hip:// (Windows only) - AlternaOve independent applicaoon (less acovity than GenePaIern) Expander hip://acgt.cs.tau.ac.il/expander/ - AlternaOve independent applicaoon (less acovity than GenePaIern) RMAExpress hip://rmaexpress.bmbolstad.com/ - InteresOng to perform a quality control of your microarrays. Cluster hip://bonsai.hgc.jp/~mdehoon/socware/cluster/ - This is the original program to analyze microarray results. No pre- processing funcoonality. You need to pre- process separately (using RMAExpress for example) SAM hip://www- stat.stanford.edu/~obs/sam/ (significance Analysis of Microarrays) - To extract the DE genes. This is a Excel plugin. Again, you need to pre- process separately 4
5 We live in a Analyzing Big Data genomic world data Commercial solu+on Genespring GX (first 20 days are free) Access through Stanford with CMGM hip://cmgm3.stanford.edu We live in a InterpreOng Big Data your world results InterpreOng a gene list rely on external knowledge. Several resources / tools are available to help. KEGG: hip:// pathway database REACTOME: hip:// pathway 2.0 database Gene Ontology: hip:// the ulomate resource for gene funcoon, processes, localizaoon BioMart: hip:// Portal providing access to mulople database GSEA: hip:// part of GenePa[ern but also R David: hip://david.abcc.ncifcrf.gov/ to perform an over- representaoon analysis Bingo: hip:// over- representaoon analysis but produce graphical result (cytoscape) BioGPS: hip://biogps.org/ To know where your gene is expressed in the body or which cell line We live in a InterpreOng Big Data your world results Reactome Made to be used programmaocally Cytoscape (a network tool) has a plugin for Reactome. Just give a gene list or a list of gene + the number of sample where the gene is mutated (for Cox survival analysis) - Retrieve a network from a gene list - Do network analysis - Perform Gene Ontology analysis - Survival analysis hip:// 5
6 We live in a InterpreOng Big Data your world results DAVID database Perform fast over- representaoon analysis again different databases - KEGG; Reactome; OMIM (diseases), Generif (literature), protein domain etc... Protein domains We live in a InterpreOng Big Data your world results biogps. Exploring expression across Ossues and cell lines Look at other library of Ossues We live Analyzing a Big public Data gene world expression data Analyzing public microarray with code (kind of...) 6
7 We live Analyzing a Big public Data gene world expression data Then clic on TOP 250 buion We live Analyzing a Big public Data gene world expression data Top 250 genes R code We live in Next a Big GeneraOon Data world Sequencing Next Genera+on Sequencing The main NGS plarorm are: Roche /454 (Genome Sequencer; GS) Illumina/Solexa (Genome Analyzer socware) SOLiD (Applied Bioscience) Upcoming challengers: Ion Torrent (Illumina) Oxford Nanopore What you should request Sequencer sequence of contigs (FASTA format) SAM/BAM alignment files Done by the core facility 7
8 We live in a Big Analyzing Data mrna- seq world Analyzing mrna- seq data: 4 steps. [with GUI and commercial] 1- Alignment and trimming of reads: Genome Studio from Illumina [no GUI] Genomequest [looks preiy awesome.] Tophat (assembly and splice juncoon mapper) Cufflinks (assembly and RPKM esomates) GALAXY provide access to Tophat, Cufflinks. 2- Calling variants and indels: GATK (hip:// VarScan (hip://varscan.sourceforge.net/) SHRIMP2; VARiD; Atlas- SNP2; SomaOcSniper... InterpretaOon of variants: SIFT (galaxy) 3- Finding differenoally expressed genes Cuffdiff (galaxy) DEXseq (R) 4- VisualizaOon: SAVANT (hip://genomesavant.com/savant/) IGV (hip:// We live in a Big How Data to use world Galaxy? Analyzing mrnaseq data: Introducing GALAXY hip://galaxy.psu.edu/ We live in a Working Big Data in the world cloud Dudley JT, and BuIe AJ In silico research in the era of cloud compuong. Nat Biotechnol 28:
9 We live in a Big Summary Data mrna- seq world GALAXY This is a compendium of socware. You even have UNIX tools and EMBOSS in it. Take home message: FASTQ files > Tophat > Cuffdiff > IGV (for differenoal expression) FASTQ files > Tophat > GATK > IGV (for variant detecoon) Where to find help: hip://seqanswers.com Analyzing RNAseq using R DEXSeq is a R / BioConductor package. R is a staosocal programming socware widely used in bioinformaocs We live in a Big Summary Data mrna- seq world Addi+onal tools for genomic - - Genomespace: h[p:// CollecOon of tools: GenePaIern, Galaxy, cytoscape, genomica etc... (free apparently). Data are stored in the cloud on Amazon VM. If you do not want to do it yourself: - - Science exchange: hips:// Science job for hire! This is where top core facilioes compete to provide the best service. - - Assay Depot: hips:// like home depot but for science - - taskrabbit: hip:// If science take too much of your 5me! We live Beyond in a genomics: Big Data results world interpretaoon Interpre+ng your gene list with protein- protein interac+on network. ihop: hip:// net.org/unipub/ihop/ Ingenuity Pathway Analysis (commercial) access through stanford 9
10 We live Beyond in a genomics: Big Data results world interpretaoon Looking into PPI databases: IntAct: hip:// BioGrid: hip://thebiogrid.org/ (soon mulogene search) HPRD: hip:// What about open- source soluoons for searching the interacoon between the genes in your gene list? Cytoscape hip://cytoscape.org BioNetBuilder hip://chiano.ucsd.edu/cyto_web/plugins/... R for programmaoc access to databases hip://brainchronicle.blogspot.com The plus of using R is that results are reproducible and you can share your method more easily than with point and clic interface. We live Data in management a Big Data and world manipulaoon REDCap: hip://project- redcap.org/ Web app for building and managing online survey and databases To find parocipants: hips:// MySQL for a professional relaoonal database. Requires some programming skills in SQL and database design. ApplicaOon to query and build databases (goodbye command line): [OS X]: SequelPro [Windows]: sqlyog; Toad for MySQL... We live in a Big Data Data are world dirty... How to clean your data more efficiently than doing everything by hand? 12:10: POCT Comment GLUCOSE BY METER 21:24:00 51 O2 SaturaOon, ISTAT (Ven) ISTAT EG7, VENOUS 5:39:00 91 Glu GLUCOSE BY METER 10:58: Comments BLOOD CULTURE (2 AEROBIC BOTTLES) 9:36: Report Status BLOOD CULTURE (2 AEROBIC BOTTLES) 16:25:00 25 CO2, Ser/Plas METABOLIC PANEL, COMPREHENSIVE 8:12: Glucose, Ser/Plas METABOLIC PANEL, BASIC 8:06: MONO, % CBC WITH DIFF 8:01: Glucose METABOLIC PANEL, BASIC 13:22: CO2 (a) BLOOD GASES, ARTERIAL 4:45: MONO CBC WITH DIFF Stanford hip://vimeo.com/ Google- down the road. A bit less intuiove than Wrangler. For more complex data transformaoon: reshape2 package in R 10
11 8/1/12 made easy... We live in a StaOsOcs Big Data world Excel... Obviously. But what else when you want something more powerful? Switch to a staosocal socware like R. R graphical interface: Deducer (hip:// hip:// The case of star+ng using R 1. Powerful staosocs procedures R has become the lingua franca for staosocal programming 2. Packages for everything from Flow cytometry DNA microarrays RNA- seq Google graph API... See hip://goo.gl/rwer7 3. Graphics, graphics, graphics... R graphical manual: hip://goo.gl/qshmq in R We live in a BigGraphics Data world cience VisualizaOon: We livedata in asbig Data world Circos CIRCOS: hip://circos.ca/ To visualize genome scale interacoon and funcoonal informaoon CIRCOS is a Perl program. Some light programming is needed. But it is worth it! 11
12 We live in Data a Big Science Data VisualizaOon world Tableau: hip:// Great for geo- localized data We live in Data a Big Science Data VisualizaOon world Google VisualizaOon: hips://developers.google.com/chart/interacove/docs/gallery Require data in JSON format. Fortunately a bridge with R is possible. Earthquake in Japan We live in Data a Big Science Data VisualizaOon world Google VisualizaOon: hips://developers.google.com/chart/interacove/docs/gallery MoOon chart hip:// 7TCIe08 R commands:! > M1 <- gvismotionchart(fruits, idvar="fruit", timevar="year )! > plot(m1)! 12
13 We live in a DemysOfying Big Data world the work Its all about reproducible research Sharing your analyocal process (aka. what you did) is as important as the final manuscript. How do you share what you did with a graphical interface? The soluoon is to use a programming language, like R if suitable, and share your code. Several tools can make your life easier. Rstudio or Deducer Come to the workshop in 2 weeks! We live in a Big The Data kitchen world TextMate and NotePad++ for coding Use version control systems like GitHub or Bitbucket To make research reproducible when data are not available: DataThief: hip:// To follow the last buzz in science: Some R books. Most of those book are available online for free through the Stanford Library. We live in a Big Data Q&A world This Class was sponsored by the Office of Postdoctoral Affairs and the Lane Library Offline quesoons to druau@stanford.edu Thanks! 13
Introduction to NGS data analysis
Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High
More informationNew solutions for Big Data Analysis and Visualization
New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology
More informationData Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute
Data Analysis & Management of High-throughput Sequencing Data Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Current Issues Current Issues The QSEQ file Number files per
More informationGenomeSpace Architecture
GenomeSpace Architecture The primary services, or components, are shown in Figure 1, the high level GenomeSpace architecture. These include (1) an Authorization and Authentication service, (2) an analysis
More informationPractical Solutions for Big Data Analytics
Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)
More informationBasic processing of next-generation sequencing (NGS) data
Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
More informationThe Galaxy workflow. George Magklaras PhD RHCE
The Galaxy workflow George Magklaras PhD RHCE Biotechnology Center of Oslo & The Norwegian Center of Molecular Medicine University of Oslo, Norway http://www.biotek.uio.no http://www.ncmm.uio.no http://www.no.embnet.org
More informationIntroduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationRNAseq / ChipSeq / Methylseq and personalized genomics
RNAseq / ChipSeq / Methylseq and personalized genomics 7711 Lecture Subhajyo) De, PhD Division of Biomedical Informa)cs and Personalized Biomedicine, Department of Medicine University of Colorado School
More informationIntroduction. Overview of Bioconductor packages for short read analysis
Overview of Bioconductor packages for short read analysis Introduction General introduction SRAdb Pseudo code (Shortread) Short overview of some packages Quality assessment Example sequencing data in Bioconductor
More informationVersion 5.0 Release Notes
Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com
More informationBioinformatics Unit Department of Biological Services. Get to know us
Bioinformatics Unit Department of Biological Services Get to know us Domains of Activity IT & programming Microarray analysis Sequence analysis Bioinformatics Team Biostatistical support NGS data analysis
More informationNazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office
2013 Laboratory Accreditation Program Audioconferences and Webinars Implementing Next Generation Sequencing (NGS) as a Clinical Tool in the Laboratory Nazneen Aziz, PhD Director, Molecular Medicine Transformation
More informationNext generation sequencing (NGS)
Next generation sequencing (NGS) Vijayachitra Modhukur BIIT modhukur@ut.ee 1 Bioinformatics course 11/13/12 Sequencing 2 Bioinformatics course 11/13/12 Microarrays vs NGS Sequences do not need to be known
More informationChallenges associated with analysis and storage of NGS data
Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group gabry@ebi.ac.uk Next-generation sequencing Next-generation sequencing
More informationNext Generation Sequencing: Technology, Mapping, and Analysis
Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took
More informationNext generation DNA sequencing technologies. theory & prac-ce
Next generation DNA sequencing technologies theory & prac-ce Outline Next- Genera-on sequencing (NGS) technologies overview NGS applica-ons NGS workflow: data collec-on and processing the exome sequencing
More information-> Integration of MAPHiTS in Galaxy
Enabling NGS Analysis with(out) the Infrastructure, 12:0512 Development of a workflow for SNPs detection in grapevine From Sets to Graphs: Towards a Realistic Enrichment Analy species: MAPHiTS -> Integration
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationIntroduction to next-generation sequencing data
Introduction to next-generation sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/research-centres/cem/ Outline History of DNA sequencing NGS
More informationNECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011
NECC History Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011 EPSCoR Cyberinfrastructure Workshop First regional NENI (now NECC) Workshop held in Vermont in August 2007 Workshop heldinkentucky
More informationProcessing Genome Data using Scalable Database Technology. My Background
Johann Christoph Freytag, Ph.D. freytag@dbis.informatik.hu-berlin.de http://www.dbis.informatik.hu-berlin.de Stanford University, February 2004 PhD @ Harvard Univ. Visiting Scientist, Microsoft Res. (2002)
More informationComparing Methods for Identifying Transcription Factor Target Genes
Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationModule 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
More informationSub menu of functions to give the user overall information about the data in the file
Visualize The Multitool for Proteomics! File Open Opens an.ez2 file to be examined. Import from TPP Imports data from files created by Trans Proteomic Pipeline. User chooses mzxml, pepxml and FASTA files
More informationRunning a Bioinformatics Help Desk. Solved and Unsolved Problems
2012/07/16 Running a Bioinformatics Help Desk from drawing colorful plasmid maps to working with HiSeq data Solved and Unsolved Problems Hans-Rudolf Hotz ( hrh@fmi.ch ) Friedrich Miescher Institute for
More informationBIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis
BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis By the end of this lab students should be able to: Describe the uses for each line of the DNA subway program (Red/Yellow/Blue/Green) Describe
More informationLectures 1 and 8 15. February 7, 2013. Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling
Lectures 1 and 8 15 February 7, 2013 This is a review of the material from lectures 1 and 8 14. Note that the material from lecture 15 is not relevant for the final exam. Today we will go over the material
More informationBIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS
BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title: Bioinformatics
More informationBIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
More informationHigh Performance Compu2ng Facility
High Performance Compu2ng Facility Center for Health Informa2cs and Bioinforma2cs Accelera2ng Scien2fic Discovery and Innova2on in Biomedical Research at NYULMC through Advanced Compu2ng Efstra'os Efstathiadis,
More information8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)
Experimental Design & Intro to NGS Data Analysis Ryan Peters Field Application Specialist Partek, Incorporated Agenda Experimental Design Examples ANOVA What assays are possible? NGS Analytical Process
More informationNext Generation Sequencing
Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977
More informationCloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers
Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/
More informationTutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment
Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249
More informationID of alternative translational initiation events. Description of gene function Reference of NCBI database access and relative literatures
Data resource: In this database, 650 alternatively translated variants assigned to a total of 300 genes are contained. These database records of alternative translational initiation have been collected
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationWelcome to the Plant Breeding and Genomics Webinar Series
Welcome to the Plant Breeding and Genomics Webinar Series Today s Presenter: Dr. Candice Hansey Presentation: http://www.extension.org/pages/ 60428 Host: Heather Merk Technical Production: John McQueen
More informationBioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
More informationCourse: Visual Analytics of largescale biological data. Kay Nieselt Center for Bioinformatics Tübingen University of Tübingen
Course: Visual Analytics of largescale biological data Kay Nieselt Center for Bioinformatics Tübingen University of Tübingen FUNDAMENTALS OF BIOLOGICAL DATA VISUALISATION 2 Presentation of known facts
More informationUGENE Quick Start Guide
Quick Start Guide This document contains a quick introduction to UGENE. For more detailed information, you can find the UGENE User Manual and other special manuals in project website: http://ugene.unipro.ru.
More informationUnderstanding West Nile Virus Infection
Understanding West Nile Virus Infection The QIAGEN Bioinformatics Solution: Biomedical Genomics Workbench (BXWB) + Ingenuity Pathway Analysis (IPA) Functional Genomics & Predictive Medicine, May 21-22,
More informationTeaching Bioinformatics to Undergraduates
Teaching Bioinformatics to Undergraduates http://www.med.nyu.edu/rcr/asm Stuart M. Brown Research Computing, NYU School of Medicine I. What is Bioinformatics? II. Challenges of teaching bioinformatics
More informationData search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource
Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource Alan R. Gingle Andrew H. Paterson Joshua A. Udall Jonathan F. Wendel 1 CEGC project goals set the context
More informationDeep Sequencing Data Analysis
Deep Sequencing Data Analysis Ross Whetten Professor Forestry & Environmental Resources Background Who am I, and why am I teaching this topic? I am not an expert in bioinformatics I started as a biologist
More informationUsing Illumina BaseSpace Apps to Analyze RNA Sequencing Data
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless
More informationGenomeStudio Data Analysis Software
GenomeStudio Analysis Software Illumina has created a comprehensive suite of data analysis tools to support a wide range of genetic analysis assays. This single software package provides data visualization
More informationNGS Data Analysis: An Intro to RNA-Seq
NGS Data Analysis: An Intro to RNA-Seq March 25th, 2014 GST Colloquim: March 25th, 2014 1 / 1 Workshop Design Basics of NGS Sample Prep RNA-Seq Analysis GST Colloquim: March 25th, 2014 2 / 1 Experimental
More informationData formats and file conversions
Building Excellence in Genomics and Computational Bioscience s Richard Leggett (TGAC) John Walshaw (IFR) Common file formats FASTQ FASTA BAM SAM Raw sequence Alignments MSF EMBL UniProt BED WIG Databases
More informationPresenting data: how to convey information most effectively Centre of Research Excellence in Patient Safety 20 Feb 2015
Presenting data: how to convey information most effectively Centre of Research Excellence in Patient Safety 20 Feb 2015 Biomedical Informatics: helping visualization from molecules to population Dr. Guillermo
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationShouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center
Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing
More informationDatabases and mapping BWA. Samtools
Databases and mapping BWA Samtools FASTQ, SFF, bax.h5 ACE, FASTG FASTA BAM/SAM GFF, BED GenBank/Embl/DDJB many more File formats FASTQ Output format from Illumina and IonTorrent sequencers. Quality scores:
More informationVisualizing Networks: Cytoscape. Prat Thiru
Visualizing Networks: Cytoscape Prat Thiru Outline Introduction to Networks Network Basics Visualization Inferences Cytoscape Demo 2 Why (Biological) Networks? 3 Networks: An Integrative Approach Zvelebil,
More informationGeneProf and the new GeneProf Web Services
GeneProf and the new GeneProf Web Services Florian Halbritter florian.halbritter@ed.ac.uk Stem Cell Bioinformatics Group (Simon R. Tomlinson) simon.tomlinson@ed.ac.uk December 10, 2012 Florian Halbritter
More informationGenomeStudio Data Analysis Software
GenomeStudio Data Analysis Software Illumina has created a comprehensive suite of data analysis tools to support a wide range of genetic analysis assays. This single software package provides data visualization
More informationNext Generation Sequencing; Technologies, applications and data analysis
; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome Technology Center,
More informationSAP HANA Enabling Genome Analysis
SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in
More informationNGS data analysis. Bernardo J. Clavijo
NGS data analysis Bernardo J. Clavijo 1 A brief history of DNA sequencing 1953 double helix structure, Watson & Crick! 1977 rapid DNA sequencing, Sanger! 1977 first full (5k) genome bacteriophage Phi X!
More informationLifeScope Genomic Analysis Software 2.5
USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use
More informationUsing Galaxy for NGS Analysis. Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org
Using Galaxy for NGS Analysis Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org Overview NGS Data Galaxy tools for NGS Data Galaxy for Sequencing Facilities Overview
More informationDistributed Data Mining in Discovery Net. Dr. Moustafa Ghanem Department of Computing Imperial College London
Distributed Data Mining in Discovery Net Dr. Moustafa Ghanem Department of Computing Imperial College London 1. What is Discovery Net 2. Distributed Data Mining for Compute Intensive Tasks 3. Distributed
More informationCluster software and Java TreeView
Cluster software and Java TreeView To download the software: http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/treeview.html Cluster 3.0
More informationPipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices
overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding
More informationIngenuity Pathway Analysis (IPA )
ProductProfile Ingenuity Pathway Analysis (IPA ) For the analysis and interpretation of omics data IPA is a web-based software application for the analysis, integration, and interpretation of data derived
More informationUCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production
Page 1 of 6 UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production February 05, 2010 Newsletter: BioInform BioInform - February 5, 2010 By Vivien Marx Scientists at the department
More informationLibrary page. SRS first view. Different types of database in SRS. Standard query form
SRS & Entrez SRS Sequence Retrieval System Bengt Persson Whatis SRS? Sequence Retrieval System User-friendly interface to databases http://srs.ebi.ac.uk Developed by Thure Etzold and co-workers EMBL/EBI
More informationorg.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.
org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank
More informationDevelopment at the Speed and Scale of Google. Ashish Kumar Engineering Tools
Development at the Speed and Scale of Google Ashish Kumar Engineering Tools The Challenge Speed and Scale of Google More than 5000 developers in more than 40 offices More than 2000 projects under active
More informationOrganization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator
Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator Why is the NGS data processing a big challenge? Computation cannot keep up with the Biology. Source: illumina
More informationMolecular typing of VTEC: from PFGE to NGS-based phylogeny
Molecular typing of VTEC: from PFGE to NGS-based phylogeny Valeria Michelacci 10th Annual Workshop of the National Reference Laboratories for E. coli in the EU Rome, November 5 th 2015 Molecular typing
More informationImportance of Statistics in creating high dimensional data
Importance of Statistics in creating high dimensional data Hemant K. Tiwari, PhD Section on Statistical Genetics Department of Biostatistics University of Alabama at Birmingham History of Genomic Data
More informationBioinformatica. Dr. Marco Fondi Lezione # 6. Corso di Laurea in Scienze Biologiche, AA 2012-2013
Bioinformatica Dr. Marco Fondi Lezione # 6 Corso di Laurea in Scienze Biologiche, AA 2012-2013 martedì 30 ottobre 2012 1 Sequenziamento ed analisi di genomi: la genomica 2 martedì 30 ottobre 2012 martedì
More informationENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013
ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and
More informationAbout the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster
Cluster Info Sheet About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster Welcome to the PMCBRC cluster! We are happy to provide and manage this compute cluster as a resource
More informationGlobus Genomics Tutorial GlobusWorld 2014
Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for
More informationAnalysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics
Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Christopher Benner, PhD Director, Integrative Genomics and Bioinformatics Core (IGC) idash Webinar,
More informationTHE UNIVERSITY OF MANCHESTER Unit Specification
1. GENERAL INFORMATION Title Unit code Credit rating 15 Level 7 Contact hours 30 Other Scheduled teaching and learning activities* Pre-requisite units Co-requisite units School responsible Member of staff
More informationModule 10: Bioinformatics
Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior
More informationNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING Dr. R. Piazza SANGER SEQUENCING + DNA NEXT GENERATION SEQUENCING Flowcell NEXT GENERATION SEQUENCING Library di DNA Genomic DNA NEXT GENERATION SEQUENCING NEXT GENERATION SEQUENCING
More informationBioHPC Web Computing Resources at CBSU
BioHPC Web Computing Resources at CBSU 3CPG workshop Robert Bukowski Computational Biology Service Unit http://cbsu.tc.cornell.edu/lab/doc/biohpc_web_tutorial.pdf BioHPC infrastructure at CBSU BioHPC Web
More informationHow To Use The Assembly Database In A Microarray (Perl) With A Microarcode) (Perperl 2) (For Macrogenome) (Genome 2)
The Ensembl Core databases and API Useful links Installation instructions: http://www.ensembl.org/info/docs/api/api_installation.html Schema description: http://www.ensembl.org/info/docs/api/core/core_schema.html
More informationAnalysis of Illumina Gene Expression Microarray Data
Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray
More informationA Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques
Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 402 A Multiple DNA Sequence Translation Tool Incorporating Web
More informationPreciseTM Whitepaper
Precise TM Whitepaper Introduction LIMITATIONS OF EXISTING RNA-SEQ METHODS Correctly designed gene expression studies require large numbers of samples, accurate results and low analysis costs. Analysis
More informationLecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr
Lecture 11 Data storage and LIMS solutions Stéphane LE CROM lecrom@biologie.ens.fr Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog
More informationBio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
More informationBIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
More informationFlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
More informationRNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012
RNA-Seq Tutorial 1 John Garbe Research Informatics Support Systems, MSI March 19, 2012 Tutorial 1 RNA-Seq Tutorials RNA-Seq experiment design and analysis Instruction on individual software will be provided
More informationProteinQuest user guide
ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for
More informationBioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
More informationMODULE 2: Advanced methodologies and tools for research. Research funding and innovation.
MODULE 2: Advanced methodologies and tools for research. Research funding and innovation. Code: 43642 Credits: 6 ECTS Type: Compulsory Language: English/Spanish Module s Coordinator: Àlex Sánchez alex.sanchez@vhir.org
More informationBrian Connolly Systems Engineer, LabKey Software brian@labkey.com. LabKey Server in the Cloud
Brian Connolly Systems Engineer, LabKey Software brian@labkey.com LabKey Server in the Cloud 1 Agenda What is the Cloud? Why would I want to use the cloud? What will it cost? Using LabKey in the cloud
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationHadoopizer : a cloud environment for bioinformatics data analysis
Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) anthony.bretaudeau@irisa.fr, INRIA/Irisa, Campus de Beaulieu, 35042,
More informationNext Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013
Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine
More information