Introduction to the BioConductor framework Algorithmic Analysis of Flow Cytometry Data (Part 1) Ryan Brinkman Senior Scientist, BC Cancer Agency Associate Professor, Department of Medical Genetics, UBC Vancouver, British Columbia, Canada May 17, 2014
Outline BioConductor for Automated Flow Cytometry Analysis Mini-tutorial on BioConductor won t teach you how to fish It will teach you that there is such a thing as fishing And that fish are tasty Even when eaten raw and wriggling www.isac-net.org
What Automated Analysis Needs to Deal With Large number of dimensions, events, samples Mutifactorial formats Need quick, robust processing Need to maintain data & metadata relationships No commercially available software solves these issues* Bashashti et al. Adv Bioinformatics (2009); PMC2798157 *Robinson et al. Expert Opinion Drug Discovery (2012); PMID 22708834 Le Meur Curr Opin Biotechnol (2013); PMID 23062230 O Neil et al. PLoS Computational Biology (2014); PMC3867282 www.isac-net.org
Solution: Free, Open Source Statistical Programming R is a free/libre open source, robust statistical programming environment for Windows, Mac & Linux that offers a wide range of statistical and visualization methods BioConductor provides R software modules for biological and clinical data analysis A scripted approach to high throughput data analysis Non-interactive, self-documented, reproducible Breaks problem into smaller pieces (packages) Modules can plug-in & swap-out Integrates with other software tools via open data standards Collaborative development http://bioconductor.org www.isac-net.org
38 R packages for Flow Analysis Data processing & visualization (19/38) flowcore* platecore* flowutils* flowq* flowstats* ncdfflow QUALIFIER flowviz flowplots* flowworkspace* flowtrans* OpenCyto flowbeads flowcl flowcybar flowfit flowmap flowmatch* FCSClean Read/write & process flow data Analyze multiwell plates Import gates, transformation and compensation Quality control of ungated data Advanced statistical methods and functions Advanced methods for large dataset processing Quality control and assessment of gated data Visualization (e.g., histograms, dot plots, density plots) Graphical displays with statistical tests Importing FlowJo workspaces Estimates parameters for data transformation Simplifies data processing Analysis of flow bead data Semantic labelling of cell populations Visualize correlations between cell number& abiotic parameters Estimate proliferation of a cell population in cell-tracking dye studies Match and compare multiple flow cytometry samples Matching and meta-clustering Fluorescence vs. time gating for stream issues *Peer-reviewed manuscript available www.isac-net.org
15 R packages for Automated Gating flowclust* Clustering using t-mixture model with Box-Cox transformation flowmerge* flowclust + entropy-based merging flowmeans* k-means clustering and merging using the Mahalanobis distance SamSpectral* Efficient spectral clustering using density-based down-sampling flowqb Q&B analysis flowbin Combining multitube data by binning flowpeaks* Unsupervised clustering using k-means & mixture model flowfp* Fingerprint generation flowphyto* Analysis of marine biology data FLAME* Multivariate finite mixtures of skew & tailed distributions flowkoh Self-organizing maps NMF-curvHDR* Density-based clustering and non-negative matrix factorization flowcore/stats* Sequential gating and normalization w/ Beta-Binomial model PRAMS* 2D Clustering and logistic regression SPADE* Density-based sampling, k-means clustering & minimum spanning trees *Peer-reviewed manuscript available www.isac-net.org
4 Packages for Post-Gating Significance Assessment flowtype* Automated phenotyping using 1D gates extrapolated to multiple dimensions RchyOptimyx* Cellular hierarchies correlated with outcome of interest COMPASS Unbiased analysis of antigen-specic T-cell subsets MIMOSA* Mixture Models to model count data *Peer-reviewed manuscript available www.isac-net.org
www.isac-net.org BioConductor s Open, Extensible Infrastructure Packages are Interoperable & Interchangeable & Separable
QA: One of These Samples is Not Like the Others Le Meur et al., Cytometry A, 2007 Hahne et al., BMC Bioinformatics, 2009 www.isac-net.org
www.isac-net.org flowq: Summary web page
QA with QUALIFIER: : Flourescence Stability Finak et al., BMC Bioinformatics 2012 www.isac-net.org
FCSClean Remove outlier events due to stream issues Kipper Fletez-Brant & Pratip Chattopadhyay @NIH/VRC www.isac-net.org
Normalization Examples: Laser Change or Multiple Centres Normalization will also remove difference due to biology Movement of a subset of populations -> gating and cell populations matching problem www.isac-net.org
Data Normalization raw data 0 200 400 600 8001000 CD3 gaussnorm 0 200 400 600 8001000 CD3 fdanorm 0 200 400 600 8001000 CD3 0 200 400 600 800 0.000 0.002 0.004 0.006 0.008 raw data 0 200 400 600 800 0.000 0.002 0.004 0.006 0.008 gaussnorm 0 200 400 600 800 0.000 0.002 0.004 0.006 0.008 0.010 fdanorm Hahne et al., Cytometry A, 2009 www.isac-net.org
Automated Cell Population Identification 15 different R-based algorithms for gating Many different approaches available to the problem No best solution; more is better Can be as accurate (more?!) than human gating Better chance of finding interesting populations in high-d data Allow scientists to do valuable science See Part 2 of Tutorial 5 for details of some methods also FlowCAP Workshop & Aghaeepour et al. FlowCAP Nature Methods 2013 www.isac-net.org
www.isac-net.org Getting Started: r-project.org
www.isac-net.org Getting Started: bioconductor.org
www.isac-net.org bioconductor.org/install OSX, Windows, Linux
www.isac-net.org bioconductor.org.org/help
bioconductor.org/help/workflows/high-throughput-assays/ www.isac-net.org
BioConductor Vignettes Each Bioconductor package contains at least one vignette Vignettes provide a task-oriented description of functionality Vignettes contain interactive, executable examples You can access the PDF version of a vignette from R: browsevignettes(package = flowmeans ) Opens browser with links to the vignette PDF & plain-text R file containing the code used in the vignette. www.isac-net.org
www.isac-net.org Example Package Page
www.isac-net.org Vignettes: Peer-reviewed Executable Documentation
www.isac-net.org Documentation Peer-reviewed by Scientists
www.isac-net.org Working With R For Real: Getting Started RStudio
www.isac-net.org FCM data: Reading FCS files in R
www.isac-net.org Plotting FCM Data in R This is simplest example, it can mimic commercial software plots
www.isac-net.org Manipulating Multiple FCS Files fsapply
What Next? Bioinformatics.ca: Free 2-day in-depth walkthrough tutorial BioConductor.org: Mailing list of friendly people GenePattern.org: BioConductor packages in a webpage Collaborate www.isac-net.org
Acknowledgements Genentech Worldwide BCCA Funding R/BioConductor.org flow cytometry infrastructure Robert Gentleman All BioConductor contributors Bioinformatics.ca Tutorial Radina Droumeva $ NIH (NIBIB, NIAID), HIP-C, TFRI & TFF, CCS, MSFHR, WHCF, NSERC