sensr - Part 2 Similarity testing and replicated data (and sensr) p 1 2 p d 1. Analysing similarity test data.



Similar documents
Hypothesis testing - Steps

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Independent t- Test (Comparing Two Means)

Testing a claim about a population mean

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Correlational Research

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

WISE Power Tutorial All Exercises

Chapter 2 Probability Topics SPSS T tests

Two Related Samples t Test

Data Analysis Tools. Tools for Summarizing Data

August 2012 EXAMINATIONS Solution Part I

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Chapter Study Guide. Chapter 11 Confidence Intervals and Hypothesis Testing for Means

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Stats Review Chapters 9-10

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

ABSORBENCY OF PAPER TOWELS

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

HYPOTHESIS TESTING WITH SPSS:

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = Adverse impact as defined by the 4/5ths rule was not found in the above data.

Regression Analysis: A Complete Example

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Module 2 Probability and Statistics

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Some Essential Statistics The Lure of Statistics

Stats for Strategy Fall 2012 First-Discussion Handout: Stats Using Calculators and MINITAB

Principles of Hypothesis Testing for Public Health

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

Introduction to Hypothesis Testing OPRE 6301

Statistical Functions in Excel

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

5/31/2013. Chapter 8 Hypothesis Testing. Hypothesis Testing. Hypothesis Testing. Outline. Objectives. Objectives

Confidence Intervals for the Difference Between Two Means

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Binary Diagnostic Tests Two Independent Samples

Chapter 23 Inferences About Means

1 Hypothesis Testing. H 0 : population parameter = hypothesized value:

Confidence intervals

Difference of Means and ANOVA Problems

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Study Guide for the Final Exam

Online 12 - Sections 9.1 and 9.2-Doug Ensley

Section 13, Part 1 ANOVA. Analysis Of Variance

One-Way Analysis of Variance

Lecture Notes Module 1

Hypothesis Testing --- One Mean

1-3 id id no. of respondents respon 1 responsible for maintenance? 1 = no, 2 = yes, 9 = blank

11. Analysis of Case-control Studies Logistic Regression

Fixed-Effect Versus Random-Effects Models

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:


Multinomial and Ordinal Logistic Regression

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Hypothesis Testing for Beginners

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

p ˆ (sample mean and sample

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

AP STATISTICS (Warm-Up Exercises)

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

6: Introduction to Hypothesis Testing

Mind on Statistics. Chapter 13

How To Check For Differences In The One Way Anova

How To Test For Significance On A Data Set

Permutation & Non-Parametric Tests

Chapter 7. One-way ANOVA

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Main Effects and Interactions

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Simple Linear Regression Inference

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Final Exam Practice Problem Answers

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Using R for Linear Regression

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

Statistics Review PSY379

SAMPLE SIZE CONSIDERATIONS

"Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1

Confidence Intervals for Cpk

ANOVA ANOVA. Two-Way ANOVA. One-Way ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Point Biserial Correlation Tests

Planning sample size for randomized evaluations

Transcription:

Similarity testing and replicated data (and sensr) Per Bruun Brockhoff Professor, Statistics DTU, Copenhagen August 17 2015 sensr - Part 2 1. Analysing similarity test data. 2. Planning similarity tests (power and sample size for similarity testing) 3. Analysing replicated difference test data http://www.staff.dtu.dk/perbb The three levels of interpretation Figure 7.1 (book) Level 0 Level 1 Level 2 Sensory Scale Proportion of Correct answers in test of A vs B Proportion of Discriminators in population A B Intensity difference p 1 2 p p c d 1 2 c p 1 3 p c d 2 3 1

Analysing Similarity test data 1. Aim: proof that products are (sufficiently) similar! 2. Traditionally, Power approach : 1. Claim similarity if NOT different 2. Acceptance of difference test hypothesis of no difference. 3. Better: 1. Equivalence tests and/or Confidence limits. Equivalence tests 1. Interchanges the roles of the hypotheses: H0: Products are NOT similar H1: Products are similar Example with specified level of similarity: ( p 0.25) d 0 Equivalence tests For such one-tailed situations this is equivalent to: Specify the wanted degree of similarity Claim similarity (with 95% confidence) IF the 90% upper (2-tailed) confidence limit is within this specification Doing the equivalence test Does 40 out of 100 in a triangle prove a 0.25 Pd-equivalence? discrim(40,100,pd0=0.25,conf.level=0.90, method="triangle, Can be used on level 0, level 1 or level 2 BUT: discrim requires the Pd0 as input! 2

Similarity and difference tests SAME data may provide non-significant results for BOTH difference and similarity tests: discrim(40, 100, pd0=0.2, conf.level=0.90, method="triangle", discrim(40, 100, conf.level=0.90, method = "triangle") Similarity and difference tests SAME data may provide significant results for BOTH difference and similarity tests: discrim(80, 200, pd0=0.25, conf.level=0.90, method="triangle", discrim(80, 200, conf.level=0.90, method = "triangle") Doing the equivalence test sensr - Part 2 Does 35 out of 100 in a 3AFC prove a 0.5 d-prime equivalence? mypd0=rescale(d.prime=0.5)$pd discrim(40,100,pd0=mypd0,conf.level=0.90, method= 3AFC, 1. Analysing similarity test data. 2. Planning similarity tests (power and sample size for similarity testing) 3. Analysing replicated difference test data 3

Power of ISO triangle standard does (a version of) the following: discrimpwr(pda=0,pd0=0.25, sample.size=100, alpha=0.05,pguess =1/3, Power of But what if pda>0, e.g. pda=0.2: (ISO triangle standard ignores this) sensr does the job: discrimpwr(pda=.2,pd0=0.25, sample.size=100, alpha=0.05,pguess=1/3, So assumes the true pd to be zero! (pda=0) (only best case scenario) Explanation of Power of The critical value of the test can be found: (The upper limit of being able to prove similarity) findcr(sample.size=100, alpha =.05, p0 = 1/3, pd0 = 0.25, test = c("similarity")) #Power: pbinom(41,100,1/3) Power of What if similarity is defined on d- prime scale, e.g. d.prime0=0.5: d.primepwr(d.primea=0,d.prime0=0.5, sample.size=100, alpha=0.05,method="triangle", 4

Sample Size for # First on pd-scale with pd0=0.25, best case: discrimss(pda=0,pd0=0.25, target.power=0.9, alpha=0.05,pguess =1/3, Smallest reasonable equivalence margin (What is at all possible??) # Still on pd-scale with pd0=0.25, "bad" case: discrimss(pda=.2,pd0=0.25, target.power=0.9, alpha=0.05, pguess=1/3, # Then on d.prime-scale with d.prime0=0.5, best case: d.primess(d.primea=0, d.prime0=0.5, target.power=0.9, alpha=0.05, method="triangle", # Then on d.prime-scale with d.prime0=0.5, "bad" case: d.primess(d.primea=0.4, d.prime0=0.5, target.power=0.9, alpha=0.05, method="triangle, Smallest reasonable equivalence margin (What is at all possible??) sensr - Part 2 1. Analysing similarity test data. 2. Planning similarity tests (power and sample size for similarity testing) 3. Analysing replicated difference test data 5

situations n individuals each performed k tests Individuals may be different There may be heterogeneity The nk observations are not independent Actually: the naive (pooled) difference hypothesis test is NOT wrong BUT: NOT enough for extracting complete information traingle tests Example, n=15, k=12: 2,2,3,3,4,4,4,4,4,4,4,5,6,10,11 Naive analysis: discrim(70,180,method="triangle") Confidence limits generally NOT OK Test for detecting product difference may NOT be the strongest possible! traingle tests How do they come out?? Simulation: # First a case where individuals are "clones" - they are NOT really different, # AND there is NO product difference: discrimsim(20, replicates = 12, d.prime = 0, method = "triangle", sd.indiv=0) # Next a case where individuals are "clones" - they are NOT really different, # but there IS a product difference, d.prime=2: discrimsim(20, replicates = 12, d.prime = 2, method = "triangle", sd.indiv=0) # Finally a case where individuals arereally different, (sd.indiv=2) # AND there IS a product difference, d.prime=2: discrimsim(20, replicates = 12, d.prime = 2, method = "triangle", sd.indiv=2) traingle tests TWO aspects are now in play: 1. Average level of difference 2. Variability of individual differences 1. Usual variability 2. EXCESS variability ( over-dispersion ) Different possible approaches to cope with this! (all: random individuals) 6

Figure 7.4 triangle tests Level 0 (discrimsim) Level 1 (Beta-binomial) Level 2 (Corrected Beta-binomial) Table 7.4 analysis: (corrected beta-binomial) > summary(betabin(x2,method="triangle")) Sensory Scale 2 i ~ N(0, Assessor ) Random varying d-primes Proportion of Correct answers in test of A vs B p ci ~ beta(, ) Random varying actual proportions Chance corrected proportions ~ beta(, ) difference tests in light of the three overall analysis levels. p Di Random varying Pd proportions Chance-corrected beta-binomial model for the triangle protocol with 95 percent confidence intervals Estimate Std. Error Lower Upper mu 0.09779917 0.06599972 0.0000000 0.2271562 gamma 0.62516874 0.20607101 0.2212770 1.0000000 pc 0.39853278 0.04399981 0.3333333 0.4847708 pd 0.09779917 0.06599972 0.0000000 0.2271562 d-prime 0.86877203 0.31203861 0.0000000 1.3855903 log-likelihood: -30.98173 LR-test of over-dispersion, G^2: 13.05338 df: 1 p-value: 0.0003027369 LR-test of association, G^2: 15.49198 df: 2 p-value: 0.0004324741 triangle tests triangle tests Table 7.4 analysis: (corrected beta-binomial) > summary(betabin(x2,method="triangle")) Chance-corrected beta-binomial model for the triangle protocol with 95 percent confidence intervals Estimate Std. Error Lower Upper mu 0.09779917 0.06599972 0.0000000 0.2271562 gamma 0.62516874 0.20607101 0.2212770 1.0000000 pc 0.39853278 0.04399981 0.3333333 0.4847708 pd 0.09779917 0.06599972 0.0000000 0.2271562 d-prime 0.86877203 0.31203861 0.0000000 1.3855903 Proper Confidence Intervals! log-likelihood: -30.98173 LR-test of over-dispersion, G^2: 13.05338 df: 1 p-value: 0.0003027369 LR-test of association, G^2: 15.49198 df: 2 p-value: 0.0004324741 Table 7.4 analysis: (corrected beta-binomial) > summary(betabin(x2,method="triangle")) Chance-corrected beta-binomial model for the triangle protocol with 95 percent confidence intervals Estimate Std. Error Lower Upper mu 0.09779917 0.06599972 0.0000000 0.2271562 gamma 0.62516874 0.20607101 0.2212770 1.0000000 pc 0.39853278 0.04399981 0.3333333 0.4847708 pd 0.09779917 0.06599972 0.0000000 0.2271562 d-prime 0.86877203 0.31203861 0.0000000 1.3855903 Excess varability measure AND test! log-likelihood: -30.98173 LR-test of over-dispersion, G^2: 13.05338 df: 1 p-value: 0.0003027369 LR-test of association, G^2: 15.49198 df: 2 p-value: 0.0004324741 7

triangle tests Difference test Table 7.4 analysis: (corrected beta-binomial) > summary(betabin(x2,method="triangle")) Chance-corrected beta-binomial model for the triangle protocol with 95 percent confidence intervals Estimate Std. Error Lower Upper mu 0.09779917 0.06599972 0.0000000 0.2271562 gamma 0.62516874 0.20607101 0.2212770 1.0000000 pc 0.39853278 0.04399981 0.3333333 0.4847708 pd 0.09779917 0.06599972 0.0000000 0.2271562 d-prime 0.86877203 0.31203861 0.0000000 1.3855903 JOINT Product difference test! log-likelihood: -30.98173 LR-test of over-dispersion, G^2: 13.05338 df: 1 p-value: 0.0003027369 LR-test of association, G^2: 15.49198 df: 2 p-value: 0.0004324741 Joint product difference test: H0: No mean effect AND no excess variability HA: The products are different sensr - Part 2 1. Analysing similarity test data. 2. Planning similarity tests (power and sample size for similarity testing) 3. Analysing replicated difference test data 8