Introduction to the BioConductor framework Algorithmic Analysis of Flow Cytometry Data (Part 1)

Similar documents
THE BIOCONDUCTOR PACKAGE FLOWCORE, A SHARED DEVELOPMENT PLATFORM FOR FLOW CYTOMETRY DATA ANALYSIS IN R

Deep profiling of multitube flow cytometry data Supplemental information

flowtrans: A Package for Optimizing Data Transformations for Flow Cytometry

FlowMergeCluster Documentation

Analyzing Flow Cytometry Data with Bioconductor

Exploratory Data Analysis with MATLAB

Using CyTOF Data with FlowJo Version Revised 2/3/14

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

Automated Quadratic Characterization of Flow Cytometer Instrument Sensitivity (flowqb Package: Introductory Processing Using Data NIH))

Immunophenotyping peripheral blood cells

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Grid Density Clustering Algorithm

Instructions for Use. CyAn ADP. High-speed Analyzer. Summit G June Beckman Coulter, Inc N. Harbor Blvd. Fullerton, CA 92835

Computational Statistics: A Crash Course using R for Biologists (and Their Friends)

not possible or was possible at a high cost for collecting the data.

Statistics for BIG data

Introduction to Pattern Recognition

DHL Data Mining Project. Customer Segmentation with Clustering

Today's Topics. COMP 388/441: Human-Computer Interaction. simple 2D plotting. 1D techniques. Ancient plotting techniques. Data Visualization:

Using self-organizing maps for visualization and interpretation of cytometry data

A MULTIVARIATE OUTLIER DETECTION METHOD

Clustering & Visualization

DELPHI 27 V 2016 CYTOMETRY STRATEGIES IN THE DIAGNOSIS OF HEMATOLOGICAL DISEASES

The Scientific Data Mining Process

Using multiple models: Bagging, Boosting, Ensembles, Forests

BIG DATA What it is and how to use?

Unsupervised Data Mining (Clustering)

Introduction to Data Mining

CyTOF2. Mass cytometry system. Unveil new cell types and function with high-parameter protein detection

flowcore: data structures package for flow cytometry data

Azure Machine Learning, SQL Data Mining and R

Structural Health Monitoring Tools (SHMTools)

Potency Assays for an Autologous Active Immunotherapy (Sipuleucel-T) Pocheng Liu, Ph.D. Senior Scientist of Product Development Dendreon Corporation

Flow Data Analysis. Qianjun Zhang Application Scientist, Tree Star Inc. Oregon, USA FLOWJO CYTOMETRY DATA ANALYSIS SOFTWARE

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Final Project Report

An Overview of Knowledge Discovery Database and Data mining Techniques

Course Syllabus. Purposes of Course:

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

BIOS 6660: Analysis of Biomedical Big Data Using R and Bioconductor, Fall 2015 Computer Lab: Education 2 North Room 2201DE (TTh 10:30 to 11:50 am)

Data Mining and Visualization

FACSCanto RUO Special Order QUICK REFERENCE GUIDE

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Selected Topics in Electrical Engineering: Flow Cytometry Data Analysis

Introduction to Flow Cytometry

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

What is Data mining?

Data Mining: Overview. What is Data Mining?

Immunophenotyping Flow Cytometry Tutorial. Contents. Experimental Requirements...1. Data Storage...2. Voltage Adjustments...3. Compensation...

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

123count ebeads Catalog Number: Also known as: Absolute cell count beads GPR: General Purpose Reagents. For Laboratory Use.

OplAnalyzer: A Toolbox for MALDI-TOF Mass Spectrometry Data Analysis

HT2015: SC4 Statistical Data Mining and Machine Learning

Gates/filters in Flow Cytometry Data Visualization

2015 Workshops for Professors

Using Data Mining for Mobile Communication Clustering and Characterization

Spherotech, Inc Irma Lee Circle, Unit 101, Lake Forest, Illinois

How To Cluster

The Open2Dprot Proteomics Project for n-dimensional Protein Expression Data Analysis

Cytell Cell Imaging System

Machine Learning with MATLAB David Willingham Application Engineer

Russell K. Anderson. Visual Data Mining THE VISMINER APPROACH

How To Perform An Ensemble Analysis

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

COC131 Data Mining - Clustering

Customer and Business Analytic

Hierarchical Cluster Analysis Some Basics and Algorithms

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Primetime for KNIME:

Chapter ML:XI (continued)

Hierarchical Clustering Analysis

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

MS1b Statistical Data Mining

Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation

BD FACSComp Software Tutorial

LabKey Server: An open source platform for scientific data integration, analysis, and collaboration

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Thermo Scientific CellInsight CX5 HCS Platform. discover more uncover the possibilities

What s New in SPSS 16.0

Imaging and Bioinformatics Software

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Compensation Basics - Bagwell. Compensation Basics. C. Bruce Bagwell MD, Ph.D. Verity Software House, Inc.

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Information Management course

Standardization, Calibration and Quality Control

Lecture 9: Introduction to Pattern Analysis

Transcription:

Introduction to the BioConductor framework Algorithmic Analysis of Flow Cytometry Data (Part 1) Ryan Brinkman Senior Scientist, BC Cancer Agency Associate Professor, Department of Medical Genetics, UBC Vancouver, British Columbia, Canada May 17, 2014

Outline BioConductor for Automated Flow Cytometry Analysis Mini-tutorial on BioConductor won t teach you how to fish It will teach you that there is such a thing as fishing And that fish are tasty Even when eaten raw and wriggling www.isac-net.org

What Automated Analysis Needs to Deal With Large number of dimensions, events, samples Mutifactorial formats Need quick, robust processing Need to maintain data & metadata relationships No commercially available software solves these issues* Bashashti et al. Adv Bioinformatics (2009); PMC2798157 *Robinson et al. Expert Opinion Drug Discovery (2012); PMID 22708834 Le Meur Curr Opin Biotechnol (2013); PMID 23062230 O Neil et al. PLoS Computational Biology (2014); PMC3867282 www.isac-net.org

Solution: Free, Open Source Statistical Programming R is a free/libre open source, robust statistical programming environment for Windows, Mac & Linux that offers a wide range of statistical and visualization methods BioConductor provides R software modules for biological and clinical data analysis A scripted approach to high throughput data analysis Non-interactive, self-documented, reproducible Breaks problem into smaller pieces (packages) Modules can plug-in & swap-out Integrates with other software tools via open data standards Collaborative development http://bioconductor.org www.isac-net.org

38 R packages for Flow Analysis Data processing & visualization (19/38) flowcore* platecore* flowutils* flowq* flowstats* ncdfflow QUALIFIER flowviz flowplots* flowworkspace* flowtrans* OpenCyto flowbeads flowcl flowcybar flowfit flowmap flowmatch* FCSClean Read/write & process flow data Analyze multiwell plates Import gates, transformation and compensation Quality control of ungated data Advanced statistical methods and functions Advanced methods for large dataset processing Quality control and assessment of gated data Visualization (e.g., histograms, dot plots, density plots) Graphical displays with statistical tests Importing FlowJo workspaces Estimates parameters for data transformation Simplifies data processing Analysis of flow bead data Semantic labelling of cell populations Visualize correlations between cell number& abiotic parameters Estimate proliferation of a cell population in cell-tracking dye studies Match and compare multiple flow cytometry samples Matching and meta-clustering Fluorescence vs. time gating for stream issues *Peer-reviewed manuscript available www.isac-net.org

15 R packages for Automated Gating flowclust* Clustering using t-mixture model with Box-Cox transformation flowmerge* flowclust + entropy-based merging flowmeans* k-means clustering and merging using the Mahalanobis distance SamSpectral* Efficient spectral clustering using density-based down-sampling flowqb Q&B analysis flowbin Combining multitube data by binning flowpeaks* Unsupervised clustering using k-means & mixture model flowfp* Fingerprint generation flowphyto* Analysis of marine biology data FLAME* Multivariate finite mixtures of skew & tailed distributions flowkoh Self-organizing maps NMF-curvHDR* Density-based clustering and non-negative matrix factorization flowcore/stats* Sequential gating and normalization w/ Beta-Binomial model PRAMS* 2D Clustering and logistic regression SPADE* Density-based sampling, k-means clustering & minimum spanning trees *Peer-reviewed manuscript available www.isac-net.org

4 Packages for Post-Gating Significance Assessment flowtype* Automated phenotyping using 1D gates extrapolated to multiple dimensions RchyOptimyx* Cellular hierarchies correlated with outcome of interest COMPASS Unbiased analysis of antigen-specic T-cell subsets MIMOSA* Mixture Models to model count data *Peer-reviewed manuscript available www.isac-net.org

www.isac-net.org BioConductor s Open, Extensible Infrastructure Packages are Interoperable & Interchangeable & Separable

QA: One of These Samples is Not Like the Others Le Meur et al., Cytometry A, 2007 Hahne et al., BMC Bioinformatics, 2009 www.isac-net.org

www.isac-net.org flowq: Summary web page

QA with QUALIFIER: : Flourescence Stability Finak et al., BMC Bioinformatics 2012 www.isac-net.org

FCSClean Remove outlier events due to stream issues Kipper Fletez-Brant & Pratip Chattopadhyay @NIH/VRC www.isac-net.org

Normalization Examples: Laser Change or Multiple Centres Normalization will also remove difference due to biology Movement of a subset of populations -> gating and cell populations matching problem www.isac-net.org

Data Normalization raw data 0 200 400 600 8001000 CD3 gaussnorm 0 200 400 600 8001000 CD3 fdanorm 0 200 400 600 8001000 CD3 0 200 400 600 800 0.000 0.002 0.004 0.006 0.008 raw data 0 200 400 600 800 0.000 0.002 0.004 0.006 0.008 gaussnorm 0 200 400 600 800 0.000 0.002 0.004 0.006 0.008 0.010 fdanorm Hahne et al., Cytometry A, 2009 www.isac-net.org

Automated Cell Population Identification 15 different R-based algorithms for gating Many different approaches available to the problem No best solution; more is better Can be as accurate (more?!) than human gating Better chance of finding interesting populations in high-d data Allow scientists to do valuable science See Part 2 of Tutorial 5 for details of some methods also FlowCAP Workshop & Aghaeepour et al. FlowCAP Nature Methods 2013 www.isac-net.org

www.isac-net.org Getting Started: r-project.org

www.isac-net.org Getting Started: bioconductor.org

www.isac-net.org bioconductor.org/install OSX, Windows, Linux

www.isac-net.org bioconductor.org.org/help

bioconductor.org/help/workflows/high-throughput-assays/ www.isac-net.org

BioConductor Vignettes Each Bioconductor package contains at least one vignette Vignettes provide a task-oriented description of functionality Vignettes contain interactive, executable examples You can access the PDF version of a vignette from R: browsevignettes(package = flowmeans ) Opens browser with links to the vignette PDF & plain-text R file containing the code used in the vignette. www.isac-net.org

www.isac-net.org Example Package Page

www.isac-net.org Vignettes: Peer-reviewed Executable Documentation

www.isac-net.org Documentation Peer-reviewed by Scientists

www.isac-net.org Working With R For Real: Getting Started RStudio

www.isac-net.org FCM data: Reading FCS files in R

www.isac-net.org Plotting FCM Data in R This is simplest example, it can mimic commercial software plots

www.isac-net.org Manipulating Multiple FCS Files fsapply

What Next? Bioinformatics.ca: Free 2-day in-depth walkthrough tutorial BioConductor.org: Mailing list of friendly people GenePattern.org: BioConductor packages in a webpage Collaborate www.isac-net.org

Acknowledgements Genentech Worldwide BCCA Funding R/BioConductor.org flow cytometry infrastructure Robert Gentleman All BioConductor contributors Bioinformatics.ca Tutorial Radina Droumeva $ NIH (NIBIB, NIAID), HIP-C, TFRI & TFF, CCS, MSFHR, WHCF, NSERC