Proficiency Testing with FAPAS. Understanding PT Statistics Ken Mathieson Senior Proficiency Analyst

Similar documents
Magruder Statistics & Data Analysis

Methods verification. Transfer of validated methods into laboratories working routine. Dr. Manuela Schulze 1

Interlaboratory studies

American Association for Laboratory Accreditation

1. PURPOSE To provide a written procedure for laboratory proficiency testing requirements and reporting.

How To Test For Significance On A Data Set

Proficiency testing schemes on determination of radioactivity in food and environmental samples organized by the NAEA, Poland

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Applying Statistics Recommended by Regulatory Documents

Validation and Calibration. Definitions and Terminology

Definition of Minimum Performance Requirements for Analytical Methods of GMO Testing European Network of GMO Laboratories (ENGL)

USE OF REFERENCE MATERIALS IN THE LABORATORY

Descriptive Statistics

Lecture Notes Module 1

ASSURING THE QUALITY OF TEST RESULTS

Mean = (sum of the values / the number of the value) if probabilities are equal

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

THE KRUSKAL WALLLIS TEST

GUIDELINES FOR THE VALIDATION OF ANALYTICAL METHODS FOR ACTIVE CONSTITUENT, AGRICULTURAL AND VETERINARY CHEMICAL PRODUCTS.

Descriptive Statistics

PTA proficiency testing for metal testing laboratories

FOOD FOR THOUGHT Topical Insights from our Subject Matter Experts UNDERSTANDING WHAT IS NEEDED TO PRODUCE QUALITY DATA

2. Filling Data Gaps, Data validation & Descriptive Statistics

Descriptive Statistics

How to Verify Performance Specifications

Six Sigma Application in Health Care

Fairfield Public Schools

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

TEST REPORT: SIEVERS M-SERIES PERFORMANCE SPECIFICATIONS

Guide to Method Validation for Quantitative Analysis in Chemical Testing Laboratories

An Introduction to Statistics using Microsoft Excel. Dan Remenyi George Onofrei Joe English

AGILE Burndown Chart deviation - Predictive Analysis to Improve Iteration Planning

Foundation of Quantitative Data Analysis

Results of Proficiency Test Bisphenol A in Plastic May 2014

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

QUALITY MANAGEMENT IN VETERINARY TESTING LABORATORIES

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Normality Testing in Excel

MTH 140 Statistics Videos

1 Nonparametric Statistics

NABL NATIONAL ACCREDITATION

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Terms concerned with internal quality control procedures

Chapter 2 Simple Comparative Experiments Solutions

Comparing Means in Two Populations

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

MEASURES OF LOCATION AND SPREAD

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Recall this chart that showed how most of our course would be organized:

Chapter 7. One-way ANOVA

Risk and return (1) Class 9 Financial Management,

Analytical Test Method Validation Report Template

ISO How to Meet. Requirements for Method Verification. Prepared by:

Simple linear regression

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

CALCULATIONS & STATISTICS

PT/EQA STANDARDS AND GUIDELINES: QUALITY AND RELIABILITY OF TEST ITEMS

Module 4: Data Exploration

Skewed Data and Non-parametric Methods

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Introduction to Risk, Return and the Historical Record

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

ISO and SANAS guidelines for use of reference materials and PT scheme participation Shadrack Phophi

Appendix H: Probability of Detection (POD) as a Statistical Model for the Validation of Qualitative Methods

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

NIST HANDBOOK CHECKLIST CONSTRUCTION MATERIALS TESTING

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

CHAPTER 14 NONPARAMETRIC TESTS

Final Exam Practice Problem Answers

ALACC Frequently Asked Questions (FAQs)

European cooperation for EAL-G23 THE EXPRESSION OF UNCERTAINTY IN QUANTITATIVE TESTING. The Expression of Uncertainty in Quantitative Testing

THE INTERNATIONAL HARMONIZED PROTOCOL FOR THE PROFICIENCY TESTING OF ANALYTICAL CHEMISTRY LABORATORIES

Unit 1: Introduction to Quality Management

DATA FOR PUBLIC HEALTH DECISION MAKING

Monitoring the Quality and Performance of Analytical Process Testing

Descriptive Statistics and Measurement Scales

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Two-sample hypothesis testing, II /16/2004

APPENDIX N. Data Validation Using Data Descriptors

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

Control Charts and Trend Analysis for ISO Speakers: New York State Food Laboratory s Quality Assurance Team

Statistics Review PSY379

Principles of Microbiological Testing: Methodological Concepts, Classes and Considerations

8. THE NORMAL DISTRIBUTION

Six Sigma Acronyms. 2-1 Do Not Reprint without permission of

Exploratory Data Analysis

WEATHERING QUALITY INDEX OF COARSE AGGREGATE PROFICIENCY PROGRAMME

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Transcription:

Proficiency Testing with FAPAS Understanding PT Statistics Ken Mathieson Senior Proficiency Analyst

Statistics?!

Where you want to go? If you don t know where you are going, you won t get there! The International Harmonised Protocol for the Proficiency Testing of Analytical Chemistry Laboratories [1] The point of a PT is to give a performance assessment to its participants Against what standard will such a performance assessment be made?

Fitness-for-Purpose Fitness-for-purpose is at the heart of the statistical model used by FAPAS Definition: a simple expression that takes lots of words, much arm waving and at least one diagram to explain but once grasped, a very succinct way of conveying the concept of fitness-for-purpose

Making Data Usable All analyses are variable, you never get the same answer twice The end use of the data should dictate the limits of acceptable variability In simple terms the more effort you put in to the analysis, the lower the uncertainty of the final answer In contrast, the greater the effort, the greater the cost (time = money!)

Fitness-for-Purpose Illustrated Uncertainty Time Effort Expended

Fitness-for Purpose Quantified Fitness-for-purpose represents reasonable uncertainty a tolerance on a result that is small enough to make the data meaningful and useful To put it another way, it represents a range / spread of possible results. Statistically, a standard deviation, σ p the standard deviation for proficiency assessment

Std. Deviation for Proficiency Assessment FAPAS sets σ p using data external to the observed performance i.e. σ p is prescriptive NOT descriptive Not all PT schemes do this 902 901 900 899 898 850 870 890 910 930 950 902 901 900 899 898 850 870 890 910 930 950

Sources of σ p The inter-laboratory variance (Reproducibility) from a method validation study is an indicator of best practice sd R, RSD R, R Predictive models e.g. modified Horwitz equation [2] Expert judgement of what is fit-for-purpose

Horwitz (original) 200 500 10 13.8 ppth % ppm % 1 ppm Original Horwitz Equation 50 45 40 35 30 25 20 = 0.02 c 0.8495 σ RSD, % 15 10 5 0 500 ppb 200 ppb 150 ppb 120 ppb 100 ppb 50 ppb 20 ppb 10 ppb 1 ppb concentration

Modification Below 120ppb 50 45 40 35 30 25 Horwitz (<120ppb) Horwitz (original) σ = 0.22c RSD, % 20 15 10 5 0 200 ppb 150 ppb 120 ppb 100 ppb 50 ppb 20 ppb 10 ppb 1 ppb concentration

Modification Above 13.8% 12 10 σ = 0.01c 0.5 8 RSD, % 6 Horwitz (original) Horwitz (>13.8%) 4 2 0 1 % 10 % 13.8 % 20 % concentration 50 %

Are You Sitting Comfortably? Then I ll begin

Homogeneity Testing All test materials are heterogeneous What we want is sufficient homogeneity In other words: the differences between individual test portions must not be large enough to materially affect the outcome of the PT Otherwise the results would simply reflect the many levels in the test portions and not the accuracy of the participating lab

Testing for sufficient homogeneity - 1 The statistical test is based around ANOVA typically using the results from 10 samples analysed in duplicate For the statistics to be meaningful the results have to be obtained under specified conditions: random selection, random analytical order, same time, etc. i.e. under repeatability conditions

Testing for sufficient homogeneity - 2 It is NOT just a comparison of the variance within and between pairs Experience has shown that type of simple test is limited over-sensitive when repeatability is very good under-sensitive when repeatability is poor

Testing for sufficient homogeneity - 3 11 11 10.8 10.8 10.6 10.6 10.4 10.4 10.2 10 9.8 rep 1 rep 2 10.2 10 9.8 rep 1 rep 2 9.6 9.6 9.4 9.4 9.2 9.2 9 0 1 2 3 4 5 6 7 8 9 10 11 12 9 0 1 2 3 4 5 6 7 8 9 10 11 12

Testing for sufficient homogeneity - 4 FAPAS uses a more sophisticated protocol Fearn and Thompson [3] Seek to reject material that displays heterogeneity above a set limit limit derived from fitness-for-purpose Fully worked example in Prof. Thompson s published paper

Testing for sufficient homogeneity - 5 Assume homogeneity reasonable, lots of time and effort goes into making test materials Scrutinise the results for any obvious problems e.g. trends, possible outliers Use Cochran s test to formally check the variance of the worst pair

Testing for sufficient homogeneity - 6 18 250 17.5 240 17 230 16.5 220 16 210 15.5 rep 1 rep 2 200 rep 1 rep 2 15 190 14.5 180 14 170 13.5 160 13 0 1 2 3 4 5 6 7 8 9 10 11 12 150 0 1 2 3 4 5 6 7 8 9 10 11 12

Testing for sufficient homogeneity - 7 Carry out ANOVA to obtain: the analytical variance, san 2 the sampling variance, ssam 2 Calculate the allowable sampling variance σall 2 = 0.3σp Calculate critical value c = F1σall 2 + F2 san 2 (F1 and F2 from a given table) If s sam 2 > c then the test indicates is a lack of sufficient homogeneity

Testing for sufficient homogeneity - 8 6.8 6.6 6.4 6.2 6 5.8 rep 1 rep 2 5.6 5.4 5.2 5 0 1 2 3 4 5 6 7 8 9 10

The Assigned Value Note, the assigned value, not the true value the true value is an ideal we ll never know The use of the word assigned indicates we are setting the value The assigned value should be the best estimate of the true value

Deriving the Assigned Value FAPAS usually derives the assigned value from the consensus of submitted results other options are a cert. ref. or a formulation value Using the most appropriate measure of central tendency: robust mean [4, 5] median mode [6] but not necessarily in that order

Simple vs Robust Mean Descriptive Statistics Variable: ass. value Anderson-Darling Normality Test A-Squared: P-Value: 2.450 0.000 3 7 11 15 19 Mean StDev Variance Skewness Kurtosis N 8.11113 3.06012 9.36431 2.08479 7.53429 61 Robust Mean 7.82879 95% Confidence Interval for Mu Minimum 1st Quartile Median 3rd Quartile Maximum 3.4700 6.5900 7.8100 9.0000 21.0000 95% Confidence Interval for Mu 7.3274 8.8949 7.5 8.0 8.5 9.0 95% Confidence Interval for Sigma 2.5972 3.7255 95% Confidence Interval for Median 95% Confidence Interval for Median 7.4405 8.2757

Limitations of a Robust Mean / Median Descriptive Statistics Variable: afm1 Anderson-Darling Normality Test A-Squared: P-Value: 5.339 0.000 0.0 0.4 0.8 1.2 1.6 Mean StDev Variance Skewness Kurtosis N 0.248691 0.314717 9.90E-02 2.78245 8.74819 46 Robust Mean 0.184686 95% Confidence Interval for Mu Minimum 1st Quartile Median 3rd Quartile Maximum 0.01700 0.08100 0.11900 0.29500 1.60000 95% Confidence Interval for Mu 0.15523 0.34215 0.1 0.2 0.3 95% Confidence Interval for Sigma 0.26104 0.39639 95% Confidence Interval for Median 95% Confidence Interval for Median 0.08998 0.17190

Bump-hunting Adaptive kernel density plot - afm1 6 5 mode = 0.087012 4 Density 3 2 1 0 0.0 0.5 Analytical result 1.0

Identifying Poor Methodology Adaptive kernel density plot - chloride 0.005 0.004 Density 0.003 0.002 0.001 0.000 500 1000 Analytical result 1500

Poor or Just Different Performance? Adaptive kernel density plot - peanut prote 0.2 Density 0.1 0.0 0 10 Analytical result 20

z-scores (at last!) This is a score that compares a participant s result to the true value x - X Then standardises it against a measure of acceptable analytical variation (x - X)/sd

More formally z = ( x Xˆ σ p ) where : x = Xˆ σ p participant' s result = the assigned value = std dev for proficiency assessment

Non-Normal Distributions - 1 z-scores rely on the results being normally distributed Microbiological results are known to be nonnormally distributed (Poisson distribution) log-transformation prior to calculating z-scores

Non-Normal Distributions - 2 GeMMA PT results invariably are skewed, with a long tail to the high end Review [7] of two GM schemes, commissioned by UK Food Stds Agency, confirmed: the non-normal distributions log-transformation prior to calculating z-scores, as the most appropriate way to treat the results

Non-normal distributions - 3 More formally z = (log x log 10 10 σ p Xˆ ) where : x = Xˆ σ p participant' s result = the assigned value = std dev for proficiency assessment, expressed in log10

Understanding z-scores z-scores embody the concept of fitness for purpose If the level of the determinand and/or the allowable variation around this level are inappropriate for your work your z-scores have no worth e.g. oil content of soya beans, assaying this to determine its commercial value is not the same as checking the oil content for nutritional purposes

Interpreting z-scores - 1 z-scores look simple but z-scores are statistics and, as with any statistic, interpretation requires experience Such experience gives you the edge with your managers, competitors, customers and accreditation assessors

Interpreting z-scores - 2 Superficially z-scores can be interpreted as: z <= 2 satisfactory z >2 but <= 3 questionable z > 3 unsatisfactory However, there is more to it! You must consider the probabilities a questionable score has about a 1 in 20 chance of being a perfectly good result, from the edge of the distribution!

Interpreting z-scores - 3-4 std dev -3 std dev -2 std dev -1 std dev mean + 1std dev +2 std dev +3 std dev +4 stdev

Your z-score What is fit for YOUR purpose?

References [1] M. Thompson, S. Ellison and R. Wood, The International Harmonised Protocol for the Proficiency Testing of (Chemical) Analytical Laboratories, Pure Appl. Chem., Vol.78, No.1, pp.145 196, 2006 http://www.iupac.org/publications/pac/2006/pdf/7801x0145.pdf [2] M. Thompson, Recent trends in inter-laboratory precision at ppb and sub-ppb concentrations in relation to fitness for purpose criteria in proficiency testing, Analyst, 2000, 125, 385-386 [3] T. Fearn and M. Thompson, A New Test for Sufficient Homogeneity, Analyst, 2001, 126, 1414-1417 [4] Analytical Methods Committee, Robust Statistics How not to reject outliers Part 1. Basic Concepts, Analyst, 1989, 114, 1693-1697 [5] ISO 13528:2005, Statistical methods for use in proficiency testing by interlaboratory comparisons, Annex C [6] P.J. Lowthian, and M. Thompson, Bump-hunting for the proficiency tester searching for multimodality, Analyst, 2002, 127, 1359-1364 [7] Thompson, M., et al, 2006, Scoring in GMO Proficiency Tests based on log-transformed results, J. AOAC Int., 89(1), 232-239.