Data handling and descriptive statistics in Proficiency Testing Microbiology



Similar documents
Magruder Statistics & Data Analysis

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics

Descriptive Statistics

Data Exploration Data Visualization

Lesson 4 Measures of Central Tendency

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Control Charts and Trend Analysis for ISO Speakers: New York State Food Laboratory s Quality Assurance Team

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Descriptive statistics parameters: Measures of centrality

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Module 4: Data Exploration

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Mean = (sum of the values / the number of the value) if probabilities are equal

Exploratory Data Analysis

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Changes to UK NEQAS Leucocyte Immunophenotyping Chimerism Performance Monitoring Systems From April Uncontrolled Copy

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

A Review of Statistical Outlier Methods

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

NABL NATIONAL ACCREDITATION

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Statistics. Measurement. Scales of Measurement 7/18/2012

II. DISTRIBUTIONS distribution normal distribution. standard scores

Mean, Median, Standard Deviation Prof. McGahagan Stat 1040

Lecture 1: Review and Exploratory Data Analysis (EDA)

1. PURPOSE To provide a written procedure for laboratory proficiency testing requirements and reporting.

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Sampling and Descriptive Statistics

Shape of Data Distributions

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Chapter 3. The Normal Distribution

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Midterm Review Problems

Topic 9 ~ Measures of Spread

How To Test For Significance On A Data Set

4. Continuous Random Variables, the Pareto and Normal Distributions

Descriptive Statistics and Measurement Scales

THE BINOMIAL DISTRIBUTION & PROBABILITY

First Midterm Exam (MATH1070 Spring 2012)

Measures of Central Tendency and Variability: Summarizing your Data for Others

Geostatistics Exploratory Analysis

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Name: Date: Use the following to answer questions 2-3:

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Week 1. Exploratory Data Analysis

Using SPSS, Chapter 2: Descriptive Statistics

ICMSF Lecture on Microbiological Sampling Plans

a. mean b. interquartile range c. range d. median

Exploratory Data Analysis. Psychology 3256

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

How Far is too Far? Statistical Outlier Detection

Lecture 2. Summarizing the Sample

Permutation Tests for Comparing Two Populations

Standard Deviation Estimator

Proficiency testing schemes on determination of radioactivity in food and environmental samples organized by the NAEA, Poland

CALCULATIONS & STATISTICS

Analysing Questionnaires using Minitab (for SPSS queries contact -)

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

BEIPH Final Report. QCMD 2010 Hepatitis B Virus DNA (HBVDNA10A) EQA Programme. William G MacKay on behalf of QCMD and its Scientific Council July 2010

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

FOOD FOR THOUGHT Topical Insights from our Subject Matter Experts UNDERSTANDING WHAT IS NEEDED TO PRODUCE QUALITY DATA

Means, standard deviations and. and standard errors

AP * Statistics Review. Descriptive Statistics

Interlaboratory studies

Foundation of Quantitative Data Analysis

List of Examples. Examples 319

Bernd Klaus, some input from Wolfgang Huber, EMBL

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

Chapter 2 Statistical Foundations: Descriptive Statistics

American Association for Laboratory Accreditation

What is Data Analysis. Kerala School of MathematicsCourse in Statistics for Scientis. Introduction to Data Analysis. Steps in a Statistical Study

Section 1.3 Exercises (Solutions)

Frequency Distributions

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Variables. Exploratory Data Analysis

Results of Proficiency Test Bisphenol A in Plastic May 2014

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Tutorial 5: Hypothesis Testing

PTA proficiency testing for metal testing laboratories

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Exploratory data analysis (Chapter 2) Fall 2011

Summarizing and Displaying Categorical Data

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Northumberland Knowledge

What Does the Normal Distribution Sound Like?

Diagrams and Graphs of Statistical Data

Validation and Calibration. Definitions and Terminology

Descriptive Analysis

6.4 Normal Distribution

Chapter 7 Section 7.1: Inference for the Mean of a Population

Transcription:

Data handling and descriptive statistics in Proficiency Testing Microbiology In relation to the standards ISO/IEC 1743 and ISO 1328 by PhD Microbiology division, Science department 1

Descriptive statistics for PT participant s results Location (mode, median, mean) Assigned value for an analysis (x pt ; mean ) Measurement uncertainty for assigned value Scale (standard deviation, range, MAD) Standard deviation for proficiency assessment (s pt, sigma-pt) Description of performance of individual laboratories (z scores, plots etc.) 2

No. of results No. of results 2 2 1 Coliform bacteria 3/36/37 C (MF) 194 Without remark False negative Outliers Median = 97. 1 1 1 2 2 3 3 4 4 No. of colonies per 1 ml 1 12 9 1 Coliform bacteria 3/36/37 C (MF) Median = 14 6 3 * 1 2 3 4 6 7 8 9 1 No. of colonies per 1 ml 3

Density Distribution Plot Normal; Mean=9,4 StDev 3 1,3,2,1, -1 - X 1 1 2 2 4

Methods for calculation of x pt & s pt Assigned value (x pt ) Value from (certified) reference material Consensus value from expert laboratories Consensus value from participant results (x i ) Standard deviation (s pt ) Fitness for purpose value determined in advance Horwitz curve (mainly in chemistry) From participant results (x i )

Statistics for assessment of performance Difference: D= x i x pt (x i is participant result) Per cent difference: D%= 1 (x i x pt )/x pt z score: z i = (x i x pt )/s pt s pt is the standard deviation for proficiency assessment Z scores, Zeta (z) scores and E n scores: when measurement uncertainties are considered 6

Questions 1. How to calculate appropriate assigned values, standard deviations and z scores? 2. Traditional methods or Robust statistical methods? 7

Basic considerations False positive results removed without any calculation False negative results removed without any calculation when the cfu concentration is high or after calculation (e.g. as outlier) when there is a low cfu concentration Plausible mean (x pt ) and legitimate standard deviation (s p ) should be determined with low/limited impact from false and/or extreme results 8

Frequency Histogram & estimated Normal distribution 8 7 6 4 Median & Mode 2,38 Yeast (all),,8 1,6 2,4 3,2 Yeast (trimmed) 4, 4,8 Yeast (all) Mean 2,474 StDev,498 N 16 Yeast (trimmed) Mean 2,429 StDev,316 N 144 3 2 1,,8 1,6 2,4 3,2 4, 4,8 9 2,38

How to get relevant statistical measures for location (x pt ) and scale (s p ) Traditional method (TM) remove outliers before calculation of mean and SD (after prior removal of obviously false results) Robust statistics (RSM) in strict sense calculation without identifying and removing deviating results by the use of an iterative method to reduce the effect of moderately or highly deviating results on the mean and SD (false results first removed) 1

Traditional outlier removal Assumption of approx. normal distribution at least after appropriate transformation [log 1 (cfu) or (cfu)] Extreme results are usually present blunders (false results, sample/dilution mixing up etc.), unclear reasons Outlier tests for normal distributions used usually also when the distribution is not perfectly normal (e.g. Grubbs test) 11

Robust statistical methods RSM RSM works well even when the results are only roughly normal distributed (e.g. long tails ) There are many different estimators of robustly calculated location ( mean ) and scale ( standard deviation ) With RSM there is no need to look for and identify results as outliers Often problem with ordinary outlier tests using TM when there are two or more outliers in one direction 12

The principles for calculation of robust mean and standard deviation by an iterative (= repeated) process called Huber s method Including the use of MAD, Median Absolute Deviation (sometimes the word Difference or Distance) 13

Huber s method first steps (acc. to ISO FDIS 1328:214) Robust estimation of mean = assigned value (x*) 1. Find the median of the results after sorting them insensitive to how far from the median deviating results are the median is the initial x*, a robust estimation of the mean Robust estimation of standard deviation (s*) 2. Calculate the absolute differences between the participant s results x i and the median: x i x* 3. Sort the absolute differences in ascending order and find the median of these differences = MAD insensitive to how far from the median deviating results are 4. Initial s* = MAD 1. (or more exactly: MAD 1.483) 14

Last steps iterative process. Calculate: d = 1. s* (d = delta; a difference) 6. For each x i (i = 1, 2,, p), calculate: x* d, when x i < x* d x i * = x* + d, when x i > x* + d x i, in other cases 7. Calculate new values for x* and s* x* = x i */p (p = number of results) s* = 1.134 SD(x i *) 8. Repeat the steps 7 until convergence 1

Implications for performance More z scores will be beyond limits when deviating results are removed at (RSM) or before (TM) calculation of mean and SD more participant results unsatisfactory Usual performance criteria: z 2. satisfactory 2. < z < 3. questionable z 3. unsatisfactory 16

Limitations for Huber s method The underlying distribution should be roughly normal (= unimodal & symmetrical) The number of deviating results must not be > 2% of all results Outliers are not directly removed as such Outliers can be characterized as those x i where: x i x* 3 s* (or 2. s*) when s* is used as s pt Corresponds to z scores where: z 3 (or z 2.) 17

Examples from EURL Campylobacter trial PT 13, 214 18

Frequency 1 1 3 1 4-1,6 2, 6 2,4 2,8 3,2 3,6 4,, No 1. C.coli 1,6 3,2 7 4,8 8 3 1 1 1-1,2 4,8 4,4, 1,2 No 2. E. coli No 3. C. lari No 4. C. jejuni 1 2,4,6 2 6,4 6, 3,6 3 4,8 4 6,8 1 1 1,2 19 1,8 2,4 3, 3,6 4,2 2, 2,4 2,8 3,2 3,6 4, 4,4 4,8 1 1 3, 3,6 4,2 4,8 No. C.jejuni+E.coli No 6. C.coli No 7. C.jejuni+E.coli No 8. C.jejuni No 9. Blank Histogram & estimated Normal distribution Normal No 1. C.lari 4, 4,,,4, 6, 6, No 1. C.coli Mean,2 StDev,6287 N 3 No 2. E. coli Mean,171 StDev,6339 N 3 No 3. C. lari Mean 3,38 StDev,288 N 3 No 4. C. jejuni Mean 4,738 StDev,928 N 3 No. C.jejuni+E.coli Mean 3,18 StDev,4932 N 3 No 6. C.coli Mean,497 StDev,647 N 3 No 7. C.jejuni+E.coli Mean 3,21 StDev,497 N 3 No 8. C.jejuni Mean 4,967 StDev,4983 N 3 No 9. Blank Mean,1943 StDev,9899 N 3 No 1. C.lari Mean 3,82 StDev,793 N 3

Z scores when all resuts are used (deviating results included) No 1. C.coli N No 3. C. lari No 4. C. jejuni No. C.jejuni+No 6. C.coli No 7. C.jejuni+No 8. C.jejuni N No 1. C.lari,71 #,68,61,64 1,7,9 1,27 #,4-1,2 # -,7 -,7,3 -,7 -,3 -,74 # -,48, # -1,77-1,7-2,81,89 3,23,27 # 2,8,44 # 1,4 1,23,98 -,1,14-1,78 #,2,24 #,31,44,64,4,1,7 #,2 -,37 # 1,1 -,7 -,1,82,48 1,29 #,7,6 #,9,44,62,2 -,4,87 #,4 -,6 # -,83 -,23 -,17 -,3 -,1,27 # -3,89,6 #,89,8,4 -,8,18,9 #,6 3,74 #,31,44,44,89,3,87 #,2 -,4 #,31,27,44 -,3,1 -,13 #,28,1 #,7,36,82,38,3,29 #,19,16 # -,49 -,77 -,41 -,1-1,23,7 #,28,84 #,27,76,64,4,38,67 #,34-1,34 # -3,32-3,2-2,44-1,2-2,42-2,12 # -1,67,7 # 1,14,71,48 -,9,18-2,66 #,8,62 # -,6 -,3,42,6,1,71 # -,61,9 # 1,23,61,84 2,61,86 1,29 # 1,7,62 # -,49-1,7,84-1,48,84 1,3 #, -,19 # -1,1-1,36 -,1,2 -,1,43 # -,8 -,9 # -,4 -,3-1,33-1,22-1,19-1,18 # -,47-2,79 # -1,2-1,7-2,2-1,94-1,71-1,4 # -,73 -,1 #,74,68,32,2,2 -,9 #,9 -,78 # -,83 -,42 -,8-1,13 -,61 -,1 # -,33, #,27,88,62,66,9,67 #,4 -,62 #,29,34 -,17 -,1 -,63,1 #,21 -,76 # -1,39-1,3 -,98,88-1,23 -, # -,72 -,1 #,21,98 1,9,,2 -,9 #,28 -,72 #,12,61,23,18,1,27 #,28,68 #,6 1,,86,71,2,43 #,19 -,4 # -,64 -,6 -,98 -,3,1 -,13 # -1,11,79 #,8 1,13,66,89,64,83 #,2,87 #,7 -,13,48,47,78,1 #,6 2 -,6 # -,83,44-1,19-2,3-1,91-1,4 # -,1

Z scores by robust method (Huber's) No 1. C.coli N No 3. C. lari No 4. C. jejuni No. C.jejuni+No 6. C.coli No 7. C.jejuni+No 8. C.jejuni N No 1. C.lari 1,1,66,7,63 1,12 1,13 1,3,6-1,63 -,1 -,71 -,13 -,76 -,42 -,93 -,92,9-1,98-1,79-3,6,93 4,1,19 4,7,64 1,4 1,2 1,6 -,3,1-2,1 -,8,3,2,39,63,,1 -,4 -,8 -,48 1, -,71 -,3,8,9 1,33,83,11,94,39,6,2 -,9,86,6 -,7 -,97 -,34 -,38 -,7 -,68,19-6,62,11,88,3,33 -,1,2,4,89,19,2,39,38,93,36,86 -,8 -,3,2,2,38 -,7,1 -,26,3,73,4,3,8,39,36,21,2,24 -,6 -,93 -,68 -, -1,62 -,4,3 1,19,21,73,63,,46,63,4-1,82-3,6-3,36-3,2-1,62-3,1-2,48-2,9,99 1,14,68,43-1,2,2-3,8,8,88 -,72 -,42,3,,1,68-1,13,1 1,2,7,88 2,7 1,8 1,33 1,67,88 -,6-1,99,88-1,8 1,6 1,4 -,12 -,24-1,31-1,7 -,18,1 -,16,36-1,4 -,79 -,6 -,47-1,81-1,3-1,7-1,43 -,9-3,82-1,17-1,99-2,89-2,7-2,24-1,83-1,34 -,18,72,64,23,4 -,1 -,22,3-1, -,97 -,4 -,88-1,21 -,81 -,28 -,67,79,21,86,6,69 1,13,63,6 -,83,23,28 -,38 -, -,84,,24-1,3-1,7-1,21-1,39,91-1,62 -,72-1,32 -,68,1,97 1,18,7,64-1,11,3 -,97,,7,12,18,1,19,3,97,62 1,4,9,74 -,1,36,2 -,3 -,76 -,16-1,38 -,38,1 -,26-1,98 1,12,84 1,14,6,93,8,81,7 1,23,68 -,23,43,48,98,4,98 21 -,7 -,97,39-1,64-2,4-2, -1,83 -,29,31 1,37,86 1,,94,91,68 1,12

No 1. C.coli No 2. E. coli No 3. C. lari No 4. C. jejuni No. C.jejuni+E.c oli No 6. C.coli No 7. C.jejuni+E.c oli No 8. C.jejuni No 9. Blank No 1. C.lari 6,, 3,4,1 3, 6,1 3,7,6, 3,4 4,8, 3, 4,4 3,2,1 3,1 4,6, 2,7,8 3,7 2,1 3,81 1,8 6, 4,86,1,8,3,83, 3,9,47 3,67,49 3,32 4,8 1, 3,1,7, 3,2, 3,,8 3,3,, 3,1,32, 3,7 4,4 3,11,96 3,49,61, 3,3,9, 3,4, 3,49,61 3,23,4, 3,4,2, 2,6 4,6 3,1,2 3,,1,,,9, 3,1,8 3,38,4 3,34,26, 3,6 7,9, 3,2, 3,4 6, 3,4,4, 3,1,3, 3,2 4,9 3,4,2 3,3 4,9, 3,3,87, 3,34 4,9 3,9,71 3,4,11, 3,23,6, 2,78 4,28 2,98,21 2,64,, 3,3 6,8, 3,18,19 3,,8 3,44,3, 3,3 4,71, 1,28 2,9 1,98 4,64 2, 3,91, 1,76,99, 3,64,16 3,42 4,96 3,34 3,64, 3,4,94, 2,72 4,6 3,39,3 3,3,32, 2,6,61, 3,69,1 3,6 6,97 3,68,61, 3,93,94, 2,78 3,7 3,6 4,66 3,67,48, 3,8,43, 2,43 3,93 3,18,1 3,2,18, 2,4,18, 2,8 4,3 2,3 4,81 2,66 4,38, 2,71 3,8, 2, 3,7 2,1 4,4 2,4 4,2, 2,,46, 3,43,14 3,34,79 3,26 4,92, 3,1,6, 2,6 4,49 2,9 4,86 2,9 4,89, 2,82,9, 3,18,26 3,49,87 3,7,3, 3,4,16, 3,19 4,94 3,1,21 2,94,4, 3,2,7, 2,3 4,13 2,7,99 2,64 4,7, 2,1,23, 3,1,32 3,72,81 3,1 4,2, 3,3,1, 3,1,1 3,3,6 3,3,1, 3,3,98, 3,38,63 3,61,9 3,26,18, 3,23,3, 2,7 4,7 2,7,3 3,3 4,9, 2,2 6,, 3,49,41 3,1 6, 3,7,38, 3,49 6,1, 3,41 4,66 3,42,76 3,64,22, 3,6,2, 2,6, 2,6 22 4,2 2,3 4,2, 3,,68, 3,7,26 3,6 6, 3,61,32, 3,67

Other robust estimators Location (centre tendency) Median (= initial x*) Scale (dispersion of results, e.g SD) MAD (Median Absolute Difference) Scaled MAD = MADe = MAD 1.483 (= initial s*) IQR (Interquartile range = % in the middle) 7th percentile of x i 2th percentile of x i (i = 1, 2,, p) Normalized (scaled) IQRn = IQR.7413 23

References to Huber s method ISO 72-:23 ISO 1328:2 Will be replaced by ISO (FDIS) 1328:214 amc technical brief No. 6 April 21 (Analytical Methods Committee, Royal Society of Chemistry 21) 24