Design of microarray experiments

Size: px
Start display at page:

Download "Design of microarray experiments"

Transcription

1 Practical microarray analysis experimental design Design of microarray experiments Ulrich Mansmann Practical microarray analysis October 2003 Heidelberg Heidelberg, October 2003

2 Practical microarray analysis experimental design Experiments Scientists deal mostly with experiments of the following form: A number of alternative conditions / treatments one of which is applied to each experimental unit an observation (or several observations) then being made on each unit. The objective is: Separate out differences between the conditions / treatments from the uncontrolled variation that is assumed to be present. Take steps towards understanding the phenomena under investigation. Heidelberg, October

3 Practical microarray analysis experimental design Statistical thinking Uncertain knowledge Knowledge of the extent of uncertainty in it + = Useable knowledge Measurement model m = µ + e m measurement with error, µ - true but unknown value What is the mean of e? What is the variance of e? Is there dependence between e and µ? What is the distribution of e (and µ)? Typically but not always: e ~ N(0,σ²) Gaussian / Normal measurement model Decisions on the experimental design influence the measurement model. Heidelberg, October

4 Practical microarray analysis experimental design Main requirements for experiments Once the conditions / treatments, experimental units, and the nature of the observations have been fixed, the main requirements are: Experimental units receiving different treatments should differ in no systematic way from one another Assumptions that certain sources of variation are absent or negligible should, as far as practical, be avoided; Random errors of estimation should be suitably small, and this should be achieved with as few experimental units as possible; The conclusions of the experiment should have a wide range of validity; The experiment should be simple in design and analysis; A proper statistical analysis of the results should be possible without making artificial assumptions. Taken from Cox DR (958) Planning of experiments, Wiley & Sons, New York (page 3) Heidelberg, October

5 Information dilemma: too many or too few? # of variables of interest # of observational units Classical situation of a clinical research project: Statistical methods, principles of clinical epidemiology and principles of experimental design allow to give a confirmatory answer, if results of the study describe reality or are caused by random fluctuations. Working with micro-arrays 4

6 Information dilemma: too many or too few? # of variables of interest # of observational units The use of micro-array technology turns the classical situation upside down. There is the need for orientation how to perform microarray experiments. A new methodological consciousness is put to work: False detection rate Validation to avoid overfitting Working with micro-arrays 6

7 Biometrical practice Biological, medical framework Experimental design, clinical epidemiology Statistical methods Working with micro-arrays 7

8 Micro-array experiments Bioinformatics Biological, medical framework Technology Experimental design, clinical epidemiology Statistical methods Complex statistical methods Data collections Working with micro-arrays 8

9 Example LPS: The setting Problem: Differential reaction on LPS stimulation in peripheral blood of stroke patients and controls? Blood Gene expres. Patient Blood + LPS Gene expres. Pat Difference? Blood Gene expres. Control Blood + LPS Gene expres. Kon Sample size has to be chosen with respect to financial restrictions Peripheral blood is a special tissue, possible confounder Chosen technology: Affymetrix (22283 genes) PNAS, 00: Working with micro-arrays 5

10 Example LPS: Design - Pooling Assume a linear model for appropriately transformed gene expression: y Pat,Gen = transformed abundance + confounder effect + biol. var. + techn. var. Pat No LPS Gene expres. Pat 5 Pool LPS Gene expres. Pool Correction for confounding - if composition of pools is homogeneous over possible confounder Reduction of biological variability: σ biol No reduction of technological / array specific variability: σ tech Reduction of arrays is determined by Ψ = σ tech / σ biol. Working with micro-arrays 6

11 Example LPS: Design - Gene exclusion Array used codes for ~ 8000 genes Do we have good rules to reduce the set of interesting genes? How can we introduce a hierarchy into the gene list without manipulating the result of our analysis? Possible solutions: Bioinformatics: Integration of pathway information into the analysis Statistics: Use of genes with high inter-array variability - set cut-point Meta-genes (West et al.) - predefine # of meta-genes define cluster strategy Working with micro-arrays 7

12 Example LPS: Design - Gene exclusion Array used codes for ~ 8000 genes Do we have good rules to reduce the set of interesting genes? How can we introduce a hierarchy into the gene list without manipulating the result of our analysis? Possible solutions: Bioinformatics: Statistics: Integration of pathway information into the analysis Only possible for small problems Use of genes with high inter-array variability - set cut-point Meta-genes (West et al.) - predefine # of meta-genes define cluster strategy Working with micro-arrays 8

13 Example LPS: Design - Gene exclusion Array used codes for ~ 8000 genes Do we have good rules to reduce the set of interesting genes? How can we introduce a hierarchy into the gene list without manipulating the result of our analysis? Possible solutions: Bioinformatics: Statistics: Integration of pathway information into the analysis Only possible for small problems Use of genes with high inter-array variability - set cut-point Mostly heuristic procedures / Kropf et al. Meta-genes (West et al.) - predefine # of meta-genes define cluster strategy Working with micro-arrays 9

14 Example LPS: Design - Gene exclusion Array used codes for ~ 8000 genes Do we have good rules to reduce the set of interesting genes? How can we introduce a hierarchy into the gene list without manipulating the result of our analysis? Possible solutions: Bioinformatics: Statistics: Integration of pathway information into the analysis Only possible for small problems Use of genes with high inter-array variability - set cut-point Mostly heuristic procedures / Kropf et al. Meta-genes (West et al.) - predefine # of meta-genes define cluster strategy Not well evaluated Working with micro-arrays 20

15 Meta-genes Patient 2 Patient Working with micro-arrays 2

16 Meta-genes Patient 2 Patient Working with micro-arrays 22

17 Meta-genes Patient 2 Meta-gene 6 Meta-gene 5 Meta-gene Meta-gene 7 Meta-gene 4 Patient Meta-gene 3 Meta-gene 2 Working with micro-arrays 23

18 Meta-genes Patient 2 Meta-gene 6 Meta-gene 5 Meta-gene Meta-gene 7 Meta-gene 4 Patient Meta-gene 3 Meta-gene 2 Working with micro-arrays 24

19 Example LPS: Differential reaction DR LPS stimulated LPS stimulated FC = 3 FC = 0.25 Control Patient Differential reaction (DR): log(0.25 / 3) = log(0.25) - log(3) = DR = Pat - Kon Working with micro-arrays 29

20 Example LPS: Data Pool (5 subjects) Group Sex distribution (Male:Female) mean age Control : Control : Control 2: Patient 4: Patient 5: Patient 3: Working with micro-arrays 30

21 Example LPS: Expression Summaries Quantification of expression MSA5: Tukey bi-weight signal of PM/MM, which is log-transformed RMA: VSN: linear additive model for log(pm), Median polish to aggregate over probes arsinh - transformation for PM values Rock - Blythe model for expression Working with micro-arrays 3

22 Example LPS: Expression Summaries MAS5 RMA Kontrollpool 2 - transf. Expression Kontrollpool 2 - transf. Expression Kontrollpool - transf. Expression Kontrollpool - transf. Expression Working with micro-arrays 32

23 Example LPS: First look on the data All genes 000 meta-genes Differential reaction * * * * * * * * * ** * * * * * * * * * * * * * * * * * ** * * * * Mean diff. reaction Mean diff. response Mean diff. expression Working with micro-arrays 33

24 Example LPS: Metagenes Mean diff. reaction > meta.gene.rma.summary $metagene.64 "2067_x_at" "204270_at" "23606_s_at "29273_at" "220557_s_at" "34478_at" $metagene.80 "207425_s_at" "26234_s_at" "26629_at" $metagene.343 "200935_at" "20556_s_at" "20579_s_at" "207824_s_at "2790_s_at "24792_x_at" "27793_at" "28600_at" $metagene.352 "20625_s_at" "20627_s_at" "207387_s_at" "20692_s_at "239_s_at "22206_at" $metagene.604 "204747_at" "205569_at" "205660_at" "2063_at" "20797_s_at" "222_s_at" "27502_at" $metagene.69 "AFFX-HUMRGE/M0098_3_at" "AFFX-HUMRGE/M0098_5_at "AFFX-HUMRGE/M0098_M_at" "AFFX-r2-Hs8SrRNA-3_s_at" "AFFX-r2-Hs8SrRNA-5_at" "AFFX-r2-Hs8SrRNA-M_x_at" Mean diff. expression Working with micro-arrays 34

25 Example LPS: Metagenes - multiple testing > round(meta.gene.mult.test.rma[[]][:30,],5) rawp Bonferroni Holm Hochberg SidakSS SidakSD BH BY [,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [0,] [,] Working with micro-arrays 35

26 Example LPS: Predictive analysis observed t-statistics Mixture components - without DR - negative DR - positive DR t Statistik Working with micro-arrays 36

27 Example LPS: Predictive analysis observed t-statistics Mixture components - without DR - negative DR - positive DR Inference of mixture components by EMalgorithm or Gibbssampler t Statistik Working with micro-arrays 37

28 Example LPS: Predictive analysis Prob(Diff. reaction t-value) P negative = P positive = Value of t statistics Working with micro-arrays 38

29 Example LPS: Adjusted p-values Predictive analysis is closely related with frequentist test-theory: Procedure by Benjamini Hochberg (Efron, Storey, Tibshirani, 200) i-ter p-wert Index * FDR / # of genes FDR: 0. BH: red area contains in mean at most 0% false positive decisions Index Working with micro-arrays 39

30 Example LPS: Interpretation Genes with differential reaction (RMA) BH-procedure: # Genes: 22 PA with PPV>0.99: # Genes: Genes with differential reaction (MAS5) BH-procedure: # Genes: 42 PA with PPV>0.99: # Genes: 62 Set of genes common to RMA and MAS5 result: 27 Working with micro-arrays 40

31 Example LPS: Interpretation Confounder, Covariates, Population variability - Pools are unbalanced between cases and controls Sample size calculation + Setting allows with high chance to detect absolute effects above 2. cdna arrays may have resulted in a more efficient analysis. Generality +/- Few pools may not give a representative sample of the patient group of interest. Interpretability - Inhomogeneities with respect to sex and age make it difficult to interpret DR as related to the disease. Artificial assumptions - Assumption of a linear model for confounder effects allows to assume an effect measurement fully attributable to the disease. Use of cdna arrays would have automatically eliminated the confounder effects. Working with micro-arrays 4

32 Practical microarray analysis experimental design The most simple measurement model in microarray experiments Situation: m arrays (Affimetrix) from control population n arrays (Affimetrix) from population with special condition /treatment Observation of interest: Mean difference of log-transformed gene expression ( logfc) logfc obs = logfc true + e e ~ N(0, σ² [/n+/m]) In an experiment with 5 arrays per population and the same variance for the expression of a gene of interest, the above formula implies that the variance of the logfc is only 40% (/5+/5 = 2/5 = 0.4) of the variability of a single measurement taming of uncertainty. Heidelberg, October

33 Practical microarray analysis experimental design Separate out differences between the conditions / treatments from the uncontrolled variation that is assumed to be present. Is logfc true 0? How to decide? Special Decision rules: Statistical Tests When the probability model for the mechanism generating the observed data is known, hypotheses about the model can be tested. This involves the question: Could the presented data reasonable have come from the model if the hypothesis is correct? Usually a decision must be made on the basis of the available data, and some degree of uncertainty is tolerated about the correctness of that decision. These four components: data, model, hypothesis, and decision are basic to the statistical problem of hypothesis testing. Heidelberg, October

34 Practical microarray analysis experimental design Quality of decision True state of gene Decision Gene is diff. expr. Gene is not diff. expr. Gene is diff. expr. OK false positive decision happens with probability α Gene is not diff. expr. OK false negative decision happens with probability β Two sources of error: Power of a test: False positive rate α False negative rate β Ability to detect a difference if there is a true difference Power true positive rate or Power = - β Heidelberg, October

35 Practical microarray analysis experimental design Question of interest (Alterative): The Statistical test Is the gene G differentially expressed between two cell populations? Answer the question via a proof by contradiction: Show that there is no evidence to support the logical contrary of the alternative. The logical contrary of the alternative is called null hypothesis. Null hypothesis: The gene G is not differentially expressed between two cell populations of interest. A test statistic T is introduced which measures the fit of the observed data to the null hypothesis. The test statistics T implies a prob. distribution P to quantify its variability when the null hypothesis is true. It will be checked if the test statistic evaluated at the observed data t obs behaves typically (not extreme) with respect to the test distribution. The p-value is the probability under the null hypothesis of an observation which is more extreme as the observation given by the data: P( T t obs ) = p. A criteria is needed to asses extreme behaviour of the test statistic via the p value which is called the level of the test: a. The observed data does not fit to the null hypothesis if p < a or t obs > t* where t* is the -α or -α/2 quantile of the prob. distribution P. t* is also called the critical value. The conditions p < a and t obs > t* are equivalent. If p < a or t obs > t* the null hypothesis will be rejected. If p a or t obs t* the null hypothesis can not be rejected this does not mean that it is true Absence of evidence for a difference is no evidence for an absence of difference. Heidelberg, October

36 Practical microarray analysis experimental design Controlling the power sample size calculations The test should produce a significant result (level α) with a power of -β if logfc true = δ null hypothesis: logfc true = 0 0 alternative: logfc true = δ δ z -α/2 σ n,m z -β σ n,m σ 2 n,m = σ 2 (/n+/m) The above requirement is fulfilled if: δ = (z -α/2 + z -β ) σ n,m or 2 n m (z α / 2 + z β) = n + m δ 2 σ 2 Heidelberg, October

37 Practical microarray analysis experimental design Controlling the power sample size calculations 2 n m (z α / 2 + z β) = n + m δ 2 σ 2 n = N γ and m = N (-γ) with M total size of experiment and γ ]0,[ 2 (z α / 2 + z β ) N = γ ( γ) 2 δ 2 σ The size of the experiment is minimal if g = ½ Gamma Heidelberg, October

38 Practical microarray analysis experimental design Sample size calculation for a microarray experiment I Truth Test result diff. expr. (H ) not diff. expr. (H 0 ) diff. expr. D D 0 D not diff. expr. U U 0 U Number of genes on array G G 0 G α 0 = E[D 0 ]/G 0 β = E[U ]/G FDR=E[D 0 /D] E: expectation / mean number family type I error probability: α F = P[D 0 >0] family type II error probability: β F = P[U >0] Heidelberg, October 2003

39 Practical microarray analysis experimental design Sample size calculation for a microarray experiment II Independent genes Dependent Genes P[D 0 =0] = (-α 0 ) Go = -α F D 0 ~ Binomial(G 0, α 0 ) E[D 0 ] = G 0 α 0 Poissonapprox.: E[D 0 ] ~ -ln(-α F ) P[U =0] = (-β ) G = -β F E[U ] = G (-β ) Bonferroni: α 0 = α F / G 0 No direct link between the probability for D 0 and α F. -β F max{0,- G β } No direct link between the probability for U and β F. Heidelberg, October

40 Practical microarray analysis experimental design Sample size calculation for a microarray experiment III for an array with independent genes What are useful α 0 and β? α F = 0.8 E[D 0 ] = - ln(-0.8) =.6 = λ P(exactly k false pos.) = exp(-λ) λ k / (k!) false pos Prob P(at least six false positives) = unexpressed genes: α 0 =.6/32500 = expressed genes, set E[D ] = 450 -β = 450/500 = 0.9 β = 0. -β F = (-β ) G < 0-23 E[FDR] = % quantile of FDR: (calculated by simulation) Heidelberg, October

41 Practical microarray analysis experimental design Sample size calculation for a microarray experiment IV In order to complete the sample size calculation for a microarray experiment, information on σ 2 is needed. The size of the experiment, N, needed to detect a logfc true of δ on a significance level α and with power -β is: 2 2 (z α / 2 + z β ) σ N = 4 2 δ In a similar set of experiments σ 2 for a set of 20 VSN transformed arrays was between.55 and.85. One may choose the value σ 2 = 2. δ log(.5) log(2) log(3) log(5) log(0) N (σ 2 = 2) N (σ 2 = ) Sample size with α = , β = 0. Heidelberg, October

42 Practical microarray analysis experimental design Sample size formula for a one group test The test should produce a significant result (level α) with a power of -β if T = δ null hypothesis: T = 0 0 alternative: T = δ δ z -α/2 σ n z -β σ n σ 2 n = σ 2 /n The above requirement is fulfilled if: δ = (z -α/2 + z -β ) σ n or 2 2 (z α / 2 + z β ) σ n = 2 δ Heidelberg, October

43 Practical microarray analysis experimental design Measurement model for cdna arrays Gene expression under condition A intensity of red colour, Gene expression under condition B intensity of green colour Measurement: m A/B = I red,a 2 I green,b Log = γ A/B + δ + e γ A/B log-transformed true fold change of gene of condition A with respect to condition B δ - dye effect, e measurement error with E[e] = 0 and Var(e) = σ 2 Measurement m A/B is used to estimate unknown γ A/B Vertices Edges mrna samples hybridization Direction Dye assignment Green Red B A Heidelberg, October

44 Practical microarray analysis experimental design Estimation of log fold change g A/B Reference Design Dye swap design B A B A R R B A/ Estimate of γ A/B DS g = m A/R m B/R g A/ B = (m A/B m B/A )/2 Variability of estimate g ) = 2 σ 2 Var( g DS A/ B) = 0.5 σ 2 Sample Size increases proportional to the variance of the measurement! Var( R A/ B Heidelberg, October

45 Practical microarray analysis experimental design 2x2 factorial experiments I treatment / condition Wild type Mutation before treatment β β+µ after treatment β+τ β+τ+µ+ψ β - baseline effect; τ - effect of treatment; µ - effect of mutation ψ - differential effect on treatment between WT and MUT treatment effect on gene expr. in WT cells: WT = (β + τ) - β = τ treatment effect on gene expr. in MUT cells: MUT = (β + τ + µ + ψ) (β + µ)= τ + ψ differential treatment effect: MUT WT or ψ 0 How many cdna arrays are needed to show ψ 0 with significance α and power -β if ψ > ln(5)? Heidelberg, October

46 Practical microarray analysis experimental design 2x2 factorial experiments II Study the joint effect of two conditions / treatment, A and B, on the gene expression of a cell population of interest. There are four possible condition / treatment combinations: AB: treatment applied to MUT cells A: treatment applied to WT cells B: no treatment applied to MUT cells 0: no treatment applied to WT cells 0 A B AB Design with 2 slides Heidelberg, October

47 Practical microarray analysis experimental design 2x2 factorial experiments III Array m A/0 m 0/A m B/0 m 0/B m AB/0 m 0/AB m AB/A m A/AB m AB/B m B/AB m A/B m B/A Measurement γ A/0 + δ + e = τ + δ + e -γ A/0 + δ + e = -τ + δ + e γ B/0 + δ + e = µ + δ + e -γ B/0 + δ + e = -µ + δ + e γ AB/0 + δ + e = µ + τ + ψ + δ + e -γ AB/0 + δ + e = - (µ + τ + ψ) + δ + e γ AB/A + δ + e = µ + ψ + δ + e -γ AB/A + δ + e = - (µ + ψ) + δ + e γ AB/B + δ + e = µ + ψ + δ + e -γ AB/B + δ + e = - (µ + ψ) + δ + e γ A/B + δ + e = τ - µ + δ + e -γ A/B + δ + e = - (τ - µ) + δ + e Each measurement has variance σ 2 Parameter β is confounded with the dye effect Heidelberg, October

48 Practical microarray analysis experimental design Heidelberg, October Regression analysis ψ µ τ δ = M M M M M M M M M M M M E B A / A B / AB B/ B AB / AB A / A AB / AB 0/ 0 AB / B 0/ 0 B / A 0/ 0 A / AB B 0 A For parameter θ = (δ, τ, µ, ψ) define the design matrix X such that E(M) = Xθ. For each gene, compute least square estimate θ* = (X X) - X M (BLUE) Obtain measures of precision of estimated effects. Use all possibilities of the theory of linear models. Design problem: Each measurement M is made with variability σ 2. How precise can we estimate the components or contrasts of θ? Answer: Look at (X X) -

49 Practical microarray analysis experimental design 0 A 2 x 2 factorial designs IV B AB total.2.by.2.design.mat delta alpha beta psi > precision.2.by.2.rfc(x.mat) A/0 0 0 $inv.mat 0/A B/0 0 0 tau mu psi 0/B 0-0 tau AB/0 mu /AB psi AB/A 0 A/AB $effects AB/B 0 tau mu psi tau-mu B/AB B/A - 0 A/B - 0 Var(A-B) = Var(A) + Var(B) 2 Cov(A,B) Heidelberg, October

50 Practical microarray analysis experimental design Sample size for differential treatment effect (DTE) in a 2 x 2 factorial designs I Array has genes: 9500 without DTE, 500 with DTE α F = 0.9, using Bonferroni adjustment: α = 0.9/ = Mean number of correct positives is set to 450: -β = 0.9 σ 2 = 0.7, taken from similar experiments A total dye swap design (2 arrays) estimates ψ with precision σ 2 /2 = 0.35 N = [ ] / ln(5) 2 = The experiment would need in total 4 x 2 = 48 arrays Is there a chance to get the same result cheaper? Heidelberg, October

51 Practical microarray analysis experimental design 2 x 2 factorial designs V Design I Common ref. Design II Common ref. Design III Connected Design IV Connected AB AB AB AB A B A B A B A B A Design V All-pairs AB B Scaled variances of estimated effects D.I D.II D.III D.IV D.V D.tot tau mu psi # chips Heidelberg, October

52 Practical microarray analysis experimental design Sample size for differential treatment effect (DTE) in a 2 x 2 factorial designs II Is there a chance to get the same result cheaper? Using total dye swap design, the experiment would need in total 4 x 2 = 48 arrays Using Design III, the effect of interest is estimated with doubled variance (4 8) but by using a design which need only 4 arrays (2 4). This reduces the number of arrays needed from 48 to 32. Heidelberg, October

53 Practical microarray analysis experimental design Designs for time course experiments Experimental Design - Conclusions In addition to experimental constraints, design decisions should be guided by knowledge of which effects are of greater interest to the investigator. The unrealistic planning based on independent genes may be put into a more realistic framework by using simulation studies speak to your bio statistician/informatician How to collect and present experience from performed microarray experiments on which to base assumptions for planing (σ 2 )? Further reading: Kerr MK, Churchill GA (200) Experimental design for gene expression microarrays, Biostatistics, 2:83-20 Lee MLT, Whitmore GA (2002), Power and sample size for DNA microarray studies, Stat. in Med., 2: Heidelberg, October

Gene Expression Analysis

Gene Expression Analysis Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands

More information

Statistical issues in the analysis of microarray data

Statistical issues in the analysis of microarray data Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data

More information

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from

More information

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center Vanderbilt-Ingram Cancer Center Yu.Shyr@vanderbilt.edu Aims Cancer Biostatistics

More information

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data experimental design data collection modeling statistical testing biological heterogeneity

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

The Wondrous World of fmri statistics

The Wondrous World of fmri statistics Outline The Wondrous World of fmri statistics FMRI data and Statistics course, Leiden, 11-3-2008 The General Linear Model Overview of fmri data analysis steps fmri timeseries Modeling effects of interest

More information

False Discovery Rates

False Discovery Rates False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

More information

Analysis of Illumina Gene Expression Microarray Data

Analysis of Illumina Gene Expression Microarray Data Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray

More information

Gene expression analysis. Ulf Leser and Karin Zimmermann

Gene expression analysis. Ulf Leser and Karin Zimmermann Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a

More information

Principles of Hypothesis Testing for Public Health

Principles of Hypothesis Testing for Public Health Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions

More information

Data Acquisition. DNA microarrays. The functional genomics pipeline. Experimental design affects outcome data analysis

Data Acquisition. DNA microarrays. The functional genomics pipeline. Experimental design affects outcome data analysis Data Acquisition DNA microarrays The functional genomics pipeline Experimental design affects outcome data analysis Data acquisition microarray processing Data preprocessing scaling/normalization/filtering

More information

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Analysis of gene expression data. Ulf Leser and Philippe Thomas Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Study Design and Statistical Analysis

Study Design and Statistical Analysis Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

Error Type, Power, Assumptions. Parametric Tests. Parametric vs. Nonparametric Tests

Error Type, Power, Assumptions. Parametric Tests. Parametric vs. Nonparametric Tests Error Type, Power, Assumptions Parametric vs. Nonparametric tests Type-I & -II Error Power Revisited Meeting the Normality Assumption - Outliers, Winsorizing, Trimming - Data Transformation 1 Parametric

More information

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, 2012. Abstract. Review session.

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, 2012. Abstract. Review session. June 23, 2012 1 review session Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, 2012 Review session. Abstract Quantitative methods in business Accounting

More information

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

More information

Randomized Block Analysis of Variance

Randomized Block Analysis of Variance Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

ANOVA ANOVA. Two-Way ANOVA. One-Way ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups

ANOVA ANOVA. Two-Way ANOVA. One-Way ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups ANOVA ANOVA Analysis of Variance Chapter 6 A procedure for comparing more than two groups independent variable: smoking status non-smoking one pack a day > two packs a day dependent variable: number of

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Statistical analysis of modern sequencing data quality control, modelling and interpretation

Statistical analysis of modern sequencing data quality control, modelling and interpretation Statistical analysis of modern sequencing data quality control, modelling and interpretation Jörg Rahnenführer Technische Universität Dortmund, Fakultät Statistik Email: rahnenfuehrer@statistik.tu-.de

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

HYPOTHESIS TESTING: POWER OF THE TEST

HYPOTHESIS TESTING: POWER OF THE TEST HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,

More information

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine 2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels

More information

Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients

Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients by Li Liu A practicum report submitted to the Department of Public Health Sciences in conformity with

More information

"Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1

Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals. 1 BASIC STATISTICAL THEORY / 3 CHAPTER ONE BASIC STATISTICAL THEORY "Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1 Medicine

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

individualdifferences

individualdifferences 1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,

More information

Redwood Building, Room T204, Stanford University School of Medicine, Stanford, CA 94305-5405.

Redwood Building, Room T204, Stanford University School of Medicine, Stanford, CA 94305-5405. W hittemoretxt050806.tex A Bayesian False Discovery Rate for Multiple Testing Alice S. Whittemore Department of Health Research and Policy Stanford University School of Medicine Correspondence Address:

More information

Concepts of Experimental Design

Concepts of Experimental Design Design Institute for Six Sigma A SAS White Paper Table of Contents Introduction...1 Basic Concepts... 1 Designing an Experiment... 2 Write Down Research Problem and Questions... 2 Define Population...

More information

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table ANALYSIS OF DISCRT VARIABLS / 5 CHAPTR FIV ANALYSIS OF DISCRT VARIABLS Discrete variables are those which can only assume certain fixed values. xamples include outcome variables with results such as live

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis

Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) Department of Biostatistics &

More information

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to

More information

Introduction to SAGEnhaft

Introduction to SAGEnhaft Introduction to SAGEnhaft Tim Beissbarth October 13, 2015 1 Overview Serial Analysis of Gene Expression (SAGE) is a gene expression profiling technique that estimates the abundance of thousands of gene

More information

Introduction To Real Time Quantitative PCR (qpcr)

Introduction To Real Time Quantitative PCR (qpcr) Introduction To Real Time Quantitative PCR (qpcr) SABiosciences, A QIAGEN Company www.sabiosciences.com The Seminar Topics The advantages of qpcr versus conventional PCR Work flow & applications Factors

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

Measuring gene expression (Microarrays) Ulf Leser

Measuring gene expression (Microarrays) Ulf Leser Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University

Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation

More information

Point Biserial Correlation Tests

Point Biserial Correlation Tests Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Tests for Two Proportions

Tests for Two Proportions Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic

More information

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Statistics in Medicine Research Lecture Series CSMC Fall 2014 Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Testing for Granger causality between stock prices and economic growth

Testing for Granger causality between stock prices and economic growth MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

12: Analysis of Variance. Introduction

12: Analysis of Variance. Introduction 1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

More information

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7. THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

More information

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST UNDERSTANDING THE DEPENDENT-SAMPLES t TEST A dependent-samples t test (a.k.a. matched or paired-samples, matched-pairs, samples, or subjects, simple repeated-measures or within-groups, or correlated groups)

More information

Guide to Biostatistics

Guide to Biostatistics MedPage Tools Guide to Biostatistics Study Designs Here is a compilation of important epidemiologic and common biostatistical terms used in medical research. You can use it as a reference guide when reading

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Obesity in America: A Growing Trend

Obesity in America: A Growing Trend Obesity in America: A Growing Trend David Todd P e n n s y l v a n i a S t a t e U n i v e r s i t y Utilizing Geographic Information Systems (GIS) to explore obesity in America, this study aims to determine

More information

Real-time PCR: Understanding C t

Real-time PCR: Understanding C t APPLICATION NOTE Real-Time PCR Real-time PCR: Understanding C t Real-time PCR, also called quantitative PCR or qpcr, can provide a simple and elegant method for determining the amount of a target sequence

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/2004 Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

More information

Microarray Data Analysis. A step by step analysis using BRB-Array Tools

Microarray Data Analysis. A step by step analysis using BRB-Array Tools Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.

More information

Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

More information

Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005

Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005 Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005 Philip J. Ramsey, Ph.D., Mia L. Stephens, MS, Marie Gaudard, Ph.D. North Haven Group, http://www.northhavengroup.com/

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Table 1" Cancer Analysis No Yes Diff. S EE p-value OLS 62.8 61.3 1.5 0.6 0.013. Design- Based 63.6 62.7 0.9 0.9 0.29

Table 1 Cancer Analysis No Yes Diff. S EE p-value OLS 62.8 61.3 1.5 0.6 0.013. Design- Based 63.6 62.7 0.9 0.9 0.29 Epidemiologic Studies Utilizing Surveys: Accounting for the Sampling Design Edward L. Korn, Barry I. Graubard Edward L. Korn, Biometric Research Branch, National Cancer Institute, EPN-739, Bethesda, MD

More information

Random effects and nested models with SAS

Random effects and nested models with SAS Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/

More information

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Tests for Two Survival Curves Using Cox s Proportional Hazards Model Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

REAL TIME PCR USING SYBR GREEN

REAL TIME PCR USING SYBR GREEN REAL TIME PCR USING SYBR GREEN 1 THE PROBLEM NEED TO QUANTITATE DIFFERENCES IN mrna EXPRESSION SMALL AMOUNTS OF mrna LASER CAPTURE SMALL AMOUNTS OF TISSUE PRIMARY CELLS PRECIOUS REAGENTS 2 THE PROBLEM

More information

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Inclusion and Exclusion Criteria

Inclusion and Exclusion Criteria Inclusion and Exclusion Criteria Inclusion criteria = attributes of subjects that are essential for their selection to participate. Inclusion criteria function remove the influence of specific confounding

More information

Package ERP. December 14, 2015

Package ERP. December 14, 2015 Type Package Package ERP December 14, 2015 Title Significance Analysis of Event-Related Potentials Data Version 1.1 Date 2015-12-11 Author David Causeur (Agrocampus, Rennes, France) and Ching-Fan Sheu

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

People have thought about, and defined, probability in different ways. important to note the consequences of the definition: PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

Classification and Regression by randomforest

Classification and Regression by randomforest Vol. 2/3, December 02 18 Classification and Regression by randomforest Andy Liaw and Matthew Wiener Introduction Recently there has been a lot of interest in ensemble learning methods that generate many

More information