Analyzing Flow Cytometry Data with Bioconductor

Similar documents

THE BIOCONDUCTOR PACKAGE FLOWCORE, A SHARED DEVELOPMENT PLATFORM FOR FLOW CYTOMETRY DATA ANALYSIS IN R

flowcore: data structures package for flow cytometry data

flowcore: data structures package for flow cytometry data

Gates/filters in Flow Cytometry Data Visualization

FlowMergeCluster Documentation

COMPENSATION MIT Flow Cytometry Core Facility

Automated Quadratic Characterization of Flow Cytometer Instrument Sensitivity (flowqb Package: Introductory Processing Using Data NIH))

Using self-organizing maps for visualization and interpretation of cytometry data

flowtrans: A Package for Optimizing Data Transformations for Flow Cytometry

Deep profiling of multitube flow cytometry data Supplemental information

GWAS Data Cleaning. GENEVA Coordinating Center Department of Biostatistics University of Washington. January 13, 2016.

DELPHI 27 V 2016 CYTOMETRY STRATEGIES IN THE DIAGNOSIS OF HEMATOLOGICAL DISEASES

INSIDE THE BLACK BOX

CELL CYCLE BASICS. G0/1 = 1X S Phase G2/M = 2X DYE FLUORESCENCE

Flow Cytometry for Everyone Else Susan McQuiston, J.D., MLS(ASCP), C.Cy.

Flow Data Analysis. Qianjun Zhang Application Scientist, Tree Star Inc. Oregon, USA FLOWJO CYTOMETRY DATA ANALYSIS SOFTWARE

CELL CYCLE BASICS. G0/1 = 1X S Phase G2/M = 2X DYE FLUORESCENCE

Science is hard. Flow cytometry should be easy.

CyAn : 11 Parameter Desktop Flow Cytometer

R Language Fundamentals

Immunophenotyping Flow Cytometry Tutorial. Contents. Experimental Requirements...1. Data Storage...2. Voltage Adjustments...3. Compensation...

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

--Verity Software House, ModFit for Proliferation offers online tutorials

No-wash, no-lyse detection of leukocytes in human whole blood on the Attune NxT Flow Cytometer

Exploratory Data Analysis

Instructions for Use. CyAn ADP. High-speed Analyzer. Summit G June Beckman Coulter, Inc N. Harbor Blvd. Fullerton, CA 92835

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Environmental Remote Sensing GEOG 2021

APPLICATION INFORMATION

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

Classes and Methods for Spatial Data: the sp Package

Immunophenotyping peripheral blood cells

BD FACSDiva TUTORIAL TSRI FLOW CYTOMETRY CORE FACILITY

GETTING STARTED. 6. Click the New Specimen icon.

Technical Bulletin. Threshold and Analysis of Small Particles on the BD Accuri C6 Flow Cytometer

Computation of the Aggregate Claim Amount Distribution Using R and actuar. Vincent Goulet, Ph.D.

Compensation Controls Data Visualization Panel Development Gating Quantum Dots

1 Descriptive statistics: mode, mean and median

FACSCanto RUO Special Order QUICK REFERENCE GUIDE

LEGENDplex Data Analysis Software

Data Exploration Data Visualization

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Package MBA. February 19, Index 7. Canopy LIDAR data

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Multiple Linear Regression

Final Exam Practice Problem Answers

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

Katharina Lückerath (AG Dr. Martin Zörnig) adapted from Dr. Jörg Hildmann BD Biosciences,Customer Service

DEA implementation and clustering analysis using the K-Means algorithm

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

BD CellQuest Pro Software Analysis Tutorial

Normalization of RNA-Seq

Classification and Regression by randomforest

Introduction to the BioConductor framework Algorithmic Analysis of Flow Cytometry Data (Part 1)

Data exploration with Microsoft Excel: analysing more than one variable

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Analysis of Bead-summary Data using beadarray

Fluorescence Compensation. Jennifer Wilshire, PhD

Technical Bulletin. Standardizing Application Setup Across Multiple Flow Cytometers Using BD FACSDiva Version 6 Software. Abstract

Descriptive Statistics

Microsoft Excel 2010 Charts and Graphs

123count ebeads Catalog Number: Also known as: Absolute cell count beads GPR: General Purpose Reagents. For Laboratory Use.

Selected Topics in Electrical Engineering: Flow Cytometry Data Analysis

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Exploratory data analysis (Chapter 2) Fall 2011

Anchorage School District/Alaska Sr. High Math Performance Standards Algebra

Structural Health Monitoring Tools (SHMTools)

Hierarchical Clustering Analysis

Deploying Predictive Analytics Solutions Dr. Stephan Gerali Lockheed Martin Dr. Rafael Pacheco SAP

Navios Quick Reference

These particles have something in common

R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

BD FACSComp Software Tutorial

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

MODULE 7: FINANCIAL REPORTING AND ANALYSIS

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Application Information Bulletin: Set-Up of the CytoFLEX Set-Up of the CytoFLEX* for Extracellular Vesicle Measurement

Scientific Visualization with ParaView

A survey analysis example

The NIAID Flow Cytometry Advisory Committee; the Guidelines Subcommittee

Visualization and descriptive statistics. D.A. Forsyth

Package retrosheet. April 13, 2015

Linear Programming. March 14, 2014

Graphs. Exploratory data analysis. Graphs. Standard forms. A graph is a suitable way of representing data if:

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC Politecnico di Milano)

Introduction to Data Visualization

Contents WEKA Microsoft SQL Database

From The Little SAS Book, Fifth Edition. Full book available for purchase here.

Transcription:

Introduction Data Analysis Analyzing Flow Cytometry Data with Bioconductor Nolwenn Le Meur, Deepayan Sarkar, Errol Strain, Byron Ellis, Perry Haaland, Florian Hahne Fred Hutchinson Cancer Research Center 6 August 2007

Introduction Data Analysis Flow Cytometry ˆ High throughput, high volume, high dimensional data ˆ Software challenges ˆ implementing standard methods ˆ developing novel methods ˆ R + Bioconductor ˆ open source ˆ supports rapid prototyping ˆ large existing base of users

Introduction Data Analysis Bioconductor packages for FCM data ˆ Existing R packages: prada, rflowcyt ˆ underlying data structures different ˆ : attempt at standardization ˆ data input ˆ object model ˆ standard tools (gates, transformations) ˆ flowviz ˆ flexible visualization using infrastructure ˆ Potentially more packages ˆ building on top of the basic infrastructure ˆ implementing future developments

Introduction Data Analysis Goals of Lab Session ˆ Introduction to ˆ Some simple analysis and visualization

File formats ˆ Standard formats ˆ FCS 2, FCS 3 ˆ LMD (list mode) files ˆ can read all these formats ˆ read.fcs() single file ˆ read.fcsheader() header only ˆ read.flowset() collection of files

Reading in a file read.fcs() reads individual files > ff <- read.fcs("../data/0877408774.f06") > ff flowframe object with 10000 cells and 8 observables: <FSC-H> FSC-H <SSC-H> SSC-H <FL1-H> <FL2-H> <FL3-H> <FL1-A> <FL4- slot 'description' has 147 elements

> exprs(ff)[c(1:5, 9996:10000), c("fsc-h", "SSC-H", "FL1-H", "FL2-H", "Time")] FSC-H SSC-H FL1-H FL2-H Time [1,] 528 672 8.446683 6.163591 7 [2,] 519 645 11.785791 5.336699 7 [3,] 449 136 10.022534 15.440480 7 [4,] 269 319 95.168807 35.992104 7 [5,] 361 138 10.113176 1318.969048 7 [6,] 373 122 22.134185 1260.910502 464 [7,] 349 131 24.002372 1.154944 464 [8,] 213 179 43.093186 31.163494 464 [9,] 388 183 15.580123 4.110368 464 [10,] 426 177 13.249199 840.874501 464

> pdata(parameters(ff)) name desc range $P1 FSC-H FSC-H 1024 $P2 SSC-H SSC-H 1024 $P3 FL1-H 1024 $P4 FL2-H 1024 $P5 FL3-H 1024 $P6 FL1-A <NA> 1024 $P7 FL4-H 1024 $P8 Time Time (51.20 sec.) 1024

> str(head(keyword(ff), 15)) List of 15 $ FCSversion: chr "2" $ $BYTEORD : chr "4,3,2,1" $ $DATATYPE : chr "I" $ $NEXTDATA : chr "0" $ $SYS : chr [1:4] "Macintosh" "System" "Software" "9.2.2" $ CREATOR : chr [1:2] "CELLQuest<aa>" "3.3" $ $TOT : chr "10000" $ $MODE : chr "L" $ $PAR : chr "8" $ $P1N : chr "FSC-H" $ $P1R : chr "1024" $ $P1B : chr "16" $ $P1E : chr "0,0" $ $P2N : chr "SSC-H" $ $P2R : chr "1024"

read.flowset() reads in a collection of files > fset <- read.flowset(...) > fset A flowset with 35 experiments. Reading in multiple files rownames: s5a01, s5a02,..., s10a07 (35 total) varlabels and varmetadata: Patient: Patient code Visit: Visit number...:... name: NA (5 total) column names: FSC-H SSC-H FL1-H FL2-H FL3-H FL2-A FL4-H Time

Individual frames can be accessed using [[ and $ > fset[[2]] flowframe object with 3405 cells and 8 observables: <FSC-H> FSC-Height <SSC-H> SSC-Height <FL1-H> CD15 FITC <FL2-H> C slot 'description' has 153 elements > fset$"s5a02" flowframe object with 3405 cells and 8 observables: <FSC-H> FSC-Height <SSC-H> SSC-Height <FL1-H> CD15 FITC <FL2-H> C slot 'description' has 153 elements

Subsets of a flowset can be extracted using [ > fset[1:5] A flowset with 5 experiments. rownames: s5a01, s5a02,..., s5a05 (5 total) varlabels and varmetadata: Patient: Patient code Visit: Visit number...:... name: NA (5 total) column names: FSC-H SSC-H FL1-H FL2-H FL3-H FL2-A FL4-H Time

fsapply(): applies function on all frames in a flowset > head(fsapply(fset, nrow)) [,1] s5a01 3420 s5a02 3405 s5a03 3435 s5a04 8550 s5a05 10410 s5a06 3750

Phenodata manipulation > varmetadata(phenodata(fset)) labeldescription Patient Patient code Visit Visit number Days Days since graft Grade Grade (leukemia) name <NA> > varmetadata(phenodata(fset))["name", "labeldescription"] <- "File name" > phenodata(fset) rownames: s5a01, s5a02,..., s10a07 (35 total) varlabels and varmetadata: Patient: Patient code Visit: Visit number...:... name: File name (5 total)

Visualization ˆ has some basic plots ˆ flowviz has more flexible methods based on lattice

> plot(fset$"s10a06", c("ssc-h", "FSC-H")) FSC H 200 400 600 800 1000 0 200 400 600 800 1000 SSC H

> xyplot(`fsc-h` ~ `SSC-H`, data = fset$"s10a06") 1000 800 FSC H 600 400 200 0 200 400 600 800 1000 SSC H

> xyplot(fset$"s10a06") Time 0 2000 4000 6000 8000 10000 0 100 200 300 FL1 H 0 200 400 600 800 1000 FL2 A 0 2000 4000 6000 8000 10000 FL2 H 0 2000 4000 6000 8000 10000 FL3 H 0 1000 2000 3000 4000 FL4 H 200 400 600 800 1000 FSC H 0 200 400 600 800 1000 SSC H

> splom(fset$"s10a06") FL4 H FL2 A FL3 H FL2 H FL1 H SSC H FSC H Scatter Plot Matrix

Transformations ˆ Original scale often not very good for visualization ˆ The transform() function creates transformed versions > summary(transform(fset$"s10a06", asinh.fsc.h = asinh(`fsc-h`))) FSC-H SSC-H FL1-H FL2-H FL3-H FL2-A FL4-H Min. 60.0 0.0 1.000 1.0 1.000 0.0 1.000 1st Qu. 103.0 77.0 3.656 338.7 2.460 83.0 1.828 Median 145.0 113.0 11.680 708.7 5.684 175.0 10.770 Mean 175.9 145.1 431.500 871.7 13.270 218.2 27.270 3rd Qu. 232.0 149.0 94.320 1307.0 13.010 325.0 49.320 Max. 1023.0 1023.0 10000.000 10000.0 10000.000 1023.0 3921.000 asinh.fsc.h Min. 4.788 1st Qu. 5.328 Median 5.670 Mean 5.729 3rd Qu. 6.140 Max. 7.624

Transformations > summary(transform("fsc-h" = asinh) %on% fset$s10a06) FSC-H SSC-H FL1-H FL2-H FL3-H FL2-A FL4-H Min. 4.788 0.0 1.000 1.0 1.000 0.0 1.000 1st Qu. 5.328 77.0 3.656 338.7 2.460 83.0 1.828 Median 5.670 113.0 11.680 708.7 5.684 175.0 10.770 Mean 5.729 145.1 431.500 871.7 13.270 218.2 27.270 3rd Qu. 6.140 149.0 94.320 1307.0 13.010 325.0 49.320 Max. 7.624 1023.0 10000.000 10000.0 10000.000 1023.0 3921.000

> s10a06 <- fset$"s10a06"[, 1:5] > splom(transform("fsc-h" = asinh, "SSC-H" = asinh, "FL1-H" = asinh, "FL2-H" = asinh, "FL3-H" = asinh) %on% s10a06) FL3 H FL2 H FL1 H SSC H FSC H Scatter Plot Matrix

{ a x < a truncatetransform y = x x a scaletransform f (x) = x a b a lineartransform f (x) = a + bx quadratictransform f (x) = ax 2 + bx + c lntransform f (x) = log (x) r d Standard Transformations logtransform f (x) = log b (x) r d biexponentialtransform f 1 (x) = ae bx ce dx + f logicletransform A special form of the biexponential transform with parameters selected by the data. arcsinhtransform f (x) = asinh (a + bx) + c

Transforms can be applied on an entire flowframe > fset.trans <- transform("fsc-h" = asinh, "SSC-H" = asinh, "FL1-H" = asinh, "FL2-H" = asinh, "FL3-H" = asinh) %on% fset

Quality Assessment ˆ Conditional plots (lattice + flowviz) ˆ Numerical summaries

> ecdfplot(~ `FL2-H` Patient, fset.trans, groups = Visit) 2 4 6 8 10 9 10 1.0 0.8 0.6 0.4 0.2 Empirical CDF 1.0 0.8 5 6 0.0 7 0.6 0.4 0.2 0.0 2 4 6 8 10 FL2 H 2 4 6 8 10

> ecdfplot(~ `FL2-H` Visit, fset.trans, subset = Patient == "10") 2 4 6 8 10 5 6 7 1.0 0.8 0.6 0.4 0.2 Empirical CDF 1.0 0.8 1 2 0.0 3 4 0.6 0.4 0.2 0.0 2 4 6 8 10 2 4 6 8 10 FL2 H

Numerical quality measures > fsapply(fset.trans, each_col, IQR)[29:35, 1:4] FSC-H SSC-H FL1-H FL2-H s10a01 0.6044971 0.8265640 1.5110417 1.098386 s10a02 0.8865547 0.9066027 1.1048425 2.608522 s10a03 0.5024268 1.9429009 0.7189971 5.341477 s10a04 0.7710728 1.5766873 1.1472946 5.871970 s10a05 0.6435293 0.4883394 3.3556510 1.404507 s10a06 0.8119895 0.6601100 3.2320107 1.350488 s10a07 0.7519678 0.8953221 2.9799957 1.557562

> xyplot(`ssc-h` ~ `FL2-H` factor(days), fset.trans, subset = Patient == "10") 6 2 4 6 8 10 0 6 2 4 6 8 10 13 6 4 2 SSC H 0 20 27 34 6 4 2 0 2 4 6 8 10 FL2 H 2 4 6 8 10

Standard Filters (a.k.a. Gates) rectanglegate Cubic shape in one or more dimensions (e.g. interval gates) polygongate Two dimensional polygonal gate polytopegate Convex hull of a set of points, possibly in more than two dimensions ellipsoidgate Ellipsoidal region in two or more dimensions

Data-driven Filters norm2filter Finds a subregion that most resembles a bivariate Gaussian distribution kmeansfilter (Multiple) populations using one-dimensional k-means clustering

> xyplot(`ssc-h` ~ `FL2-H` factor(days), fset.trans, subset = Patient == "10", filter = norm2filter("ssc-h", "FL2-H")) 6 2 4 6 8 10 0 6 2 4 6 8 10 13 6 4 2 SSC H 0 20 27 34 6 4 2 0 2 4 6 8 10 FL2 H 2 4 6 8 10

> rect.gate <- rectanglegate("rangegate", "FL2-H" = c(4, 10)) > rect.results <- filter(fset.trans, rect.gate) > lapply(rect.results[29:35], summary) $s10a01 RangeGate: 17117 of 17289 (99.01%) $s10a02 RangeGate: 3609 of 4530 (79.67%) $s10a03 RangeGate: 2496 of 6765 (36.90%) $s10a04 RangeGate: 5402 of 9540 (56.62%) $s10a05 RangeGate: 24924 of 25515 (97.68%) $s10a06 RangeGate: 23597 of 24656 (95.70%)

> xyplot(`ssc-h` ~ `FL2-H` factor(days), subset = Patient == "10", data = Subset(fset.trans, rect.results), filter = norm2filter("ssc-h", "FL2-H", scale = 2)) 4 5 6 7 8 9 10 4 5 6 7 8 9 10 6 0 6 13 6 4 2 SSC H 0 20 27 34 6 4 2 0 4 5 6 7 8 9 10 FL2 H 4 5 6 7 8 9 10