Data handling and descriptive statistics in Proficiency Testing Microbiology In relation to the standards ISO/IEC 1743 and ISO 1328 by PhD Microbiology division, Science department 1
Descriptive statistics for PT participant s results Location (mode, median, mean) Assigned value for an analysis (x pt ; mean ) Measurement uncertainty for assigned value Scale (standard deviation, range, MAD) Standard deviation for proficiency assessment (s pt, sigma-pt) Description of performance of individual laboratories (z scores, plots etc.) 2
No. of results No. of results 2 2 1 Coliform bacteria 3/36/37 C (MF) 194 Without remark False negative Outliers Median = 97. 1 1 1 2 2 3 3 4 4 No. of colonies per 1 ml 1 12 9 1 Coliform bacteria 3/36/37 C (MF) Median = 14 6 3 * 1 2 3 4 6 7 8 9 1 No. of colonies per 1 ml 3
Density Distribution Plot Normal; Mean=9,4 StDev 3 1,3,2,1, -1 - X 1 1 2 2 4
Methods for calculation of x pt & s pt Assigned value (x pt ) Value from (certified) reference material Consensus value from expert laboratories Consensus value from participant results (x i ) Standard deviation (s pt ) Fitness for purpose value determined in advance Horwitz curve (mainly in chemistry) From participant results (x i )
Statistics for assessment of performance Difference: D= x i x pt (x i is participant result) Per cent difference: D%= 1 (x i x pt )/x pt z score: z i = (x i x pt )/s pt s pt is the standard deviation for proficiency assessment Z scores, Zeta (z) scores and E n scores: when measurement uncertainties are considered 6
Questions 1. How to calculate appropriate assigned values, standard deviations and z scores? 2. Traditional methods or Robust statistical methods? 7
Basic considerations False positive results removed without any calculation False negative results removed without any calculation when the cfu concentration is high or after calculation (e.g. as outlier) when there is a low cfu concentration Plausible mean (x pt ) and legitimate standard deviation (s p ) should be determined with low/limited impact from false and/or extreme results 8
Frequency Histogram & estimated Normal distribution 8 7 6 4 Median & Mode 2,38 Yeast (all),,8 1,6 2,4 3,2 Yeast (trimmed) 4, 4,8 Yeast (all) Mean 2,474 StDev,498 N 16 Yeast (trimmed) Mean 2,429 StDev,316 N 144 3 2 1,,8 1,6 2,4 3,2 4, 4,8 9 2,38
How to get relevant statistical measures for location (x pt ) and scale (s p ) Traditional method (TM) remove outliers before calculation of mean and SD (after prior removal of obviously false results) Robust statistics (RSM) in strict sense calculation without identifying and removing deviating results by the use of an iterative method to reduce the effect of moderately or highly deviating results on the mean and SD (false results first removed) 1
Traditional outlier removal Assumption of approx. normal distribution at least after appropriate transformation [log 1 (cfu) or (cfu)] Extreme results are usually present blunders (false results, sample/dilution mixing up etc.), unclear reasons Outlier tests for normal distributions used usually also when the distribution is not perfectly normal (e.g. Grubbs test) 11
Robust statistical methods RSM RSM works well even when the results are only roughly normal distributed (e.g. long tails ) There are many different estimators of robustly calculated location ( mean ) and scale ( standard deviation ) With RSM there is no need to look for and identify results as outliers Often problem with ordinary outlier tests using TM when there are two or more outliers in one direction 12
The principles for calculation of robust mean and standard deviation by an iterative (= repeated) process called Huber s method Including the use of MAD, Median Absolute Deviation (sometimes the word Difference or Distance) 13
Huber s method first steps (acc. to ISO FDIS 1328:214) Robust estimation of mean = assigned value (x*) 1. Find the median of the results after sorting them insensitive to how far from the median deviating results are the median is the initial x*, a robust estimation of the mean Robust estimation of standard deviation (s*) 2. Calculate the absolute differences between the participant s results x i and the median: x i x* 3. Sort the absolute differences in ascending order and find the median of these differences = MAD insensitive to how far from the median deviating results are 4. Initial s* = MAD 1. (or more exactly: MAD 1.483) 14
Last steps iterative process. Calculate: d = 1. s* (d = delta; a difference) 6. For each x i (i = 1, 2,, p), calculate: x* d, when x i < x* d x i * = x* + d, when x i > x* + d x i, in other cases 7. Calculate new values for x* and s* x* = x i */p (p = number of results) s* = 1.134 SD(x i *) 8. Repeat the steps 7 until convergence 1
Implications for performance More z scores will be beyond limits when deviating results are removed at (RSM) or before (TM) calculation of mean and SD more participant results unsatisfactory Usual performance criteria: z 2. satisfactory 2. < z < 3. questionable z 3. unsatisfactory 16
Limitations for Huber s method The underlying distribution should be roughly normal (= unimodal & symmetrical) The number of deviating results must not be > 2% of all results Outliers are not directly removed as such Outliers can be characterized as those x i where: x i x* 3 s* (or 2. s*) when s* is used as s pt Corresponds to z scores where: z 3 (or z 2.) 17
Examples from EURL Campylobacter trial PT 13, 214 18
Frequency 1 1 3 1 4-1,6 2, 6 2,4 2,8 3,2 3,6 4,, No 1. C.coli 1,6 3,2 7 4,8 8 3 1 1 1-1,2 4,8 4,4, 1,2 No 2. E. coli No 3. C. lari No 4. C. jejuni 1 2,4,6 2 6,4 6, 3,6 3 4,8 4 6,8 1 1 1,2 19 1,8 2,4 3, 3,6 4,2 2, 2,4 2,8 3,2 3,6 4, 4,4 4,8 1 1 3, 3,6 4,2 4,8 No. C.jejuni+E.coli No 6. C.coli No 7. C.jejuni+E.coli No 8. C.jejuni No 9. Blank Histogram & estimated Normal distribution Normal No 1. C.lari 4, 4,,,4, 6, 6, No 1. C.coli Mean,2 StDev,6287 N 3 No 2. E. coli Mean,171 StDev,6339 N 3 No 3. C. lari Mean 3,38 StDev,288 N 3 No 4. C. jejuni Mean 4,738 StDev,928 N 3 No. C.jejuni+E.coli Mean 3,18 StDev,4932 N 3 No 6. C.coli Mean,497 StDev,647 N 3 No 7. C.jejuni+E.coli Mean 3,21 StDev,497 N 3 No 8. C.jejuni Mean 4,967 StDev,4983 N 3 No 9. Blank Mean,1943 StDev,9899 N 3 No 1. C.lari Mean 3,82 StDev,793 N 3
Z scores when all resuts are used (deviating results included) No 1. C.coli N No 3. C. lari No 4. C. jejuni No. C.jejuni+No 6. C.coli No 7. C.jejuni+No 8. C.jejuni N No 1. C.lari,71 #,68,61,64 1,7,9 1,27 #,4-1,2 # -,7 -,7,3 -,7 -,3 -,74 # -,48, # -1,77-1,7-2,81,89 3,23,27 # 2,8,44 # 1,4 1,23,98 -,1,14-1,78 #,2,24 #,31,44,64,4,1,7 #,2 -,37 # 1,1 -,7 -,1,82,48 1,29 #,7,6 #,9,44,62,2 -,4,87 #,4 -,6 # -,83 -,23 -,17 -,3 -,1,27 # -3,89,6 #,89,8,4 -,8,18,9 #,6 3,74 #,31,44,44,89,3,87 #,2 -,4 #,31,27,44 -,3,1 -,13 #,28,1 #,7,36,82,38,3,29 #,19,16 # -,49 -,77 -,41 -,1-1,23,7 #,28,84 #,27,76,64,4,38,67 #,34-1,34 # -3,32-3,2-2,44-1,2-2,42-2,12 # -1,67,7 # 1,14,71,48 -,9,18-2,66 #,8,62 # -,6 -,3,42,6,1,71 # -,61,9 # 1,23,61,84 2,61,86 1,29 # 1,7,62 # -,49-1,7,84-1,48,84 1,3 #, -,19 # -1,1-1,36 -,1,2 -,1,43 # -,8 -,9 # -,4 -,3-1,33-1,22-1,19-1,18 # -,47-2,79 # -1,2-1,7-2,2-1,94-1,71-1,4 # -,73 -,1 #,74,68,32,2,2 -,9 #,9 -,78 # -,83 -,42 -,8-1,13 -,61 -,1 # -,33, #,27,88,62,66,9,67 #,4 -,62 #,29,34 -,17 -,1 -,63,1 #,21 -,76 # -1,39-1,3 -,98,88-1,23 -, # -,72 -,1 #,21,98 1,9,,2 -,9 #,28 -,72 #,12,61,23,18,1,27 #,28,68 #,6 1,,86,71,2,43 #,19 -,4 # -,64 -,6 -,98 -,3,1 -,13 # -1,11,79 #,8 1,13,66,89,64,83 #,2,87 #,7 -,13,48,47,78,1 #,6 2 -,6 # -,83,44-1,19-2,3-1,91-1,4 # -,1
Z scores by robust method (Huber's) No 1. C.coli N No 3. C. lari No 4. C. jejuni No. C.jejuni+No 6. C.coli No 7. C.jejuni+No 8. C.jejuni N No 1. C.lari 1,1,66,7,63 1,12 1,13 1,3,6-1,63 -,1 -,71 -,13 -,76 -,42 -,93 -,92,9-1,98-1,79-3,6,93 4,1,19 4,7,64 1,4 1,2 1,6 -,3,1-2,1 -,8,3,2,39,63,,1 -,4 -,8 -,48 1, -,71 -,3,8,9 1,33,83,11,94,39,6,2 -,9,86,6 -,7 -,97 -,34 -,38 -,7 -,68,19-6,62,11,88,3,33 -,1,2,4,89,19,2,39,38,93,36,86 -,8 -,3,2,2,38 -,7,1 -,26,3,73,4,3,8,39,36,21,2,24 -,6 -,93 -,68 -, -1,62 -,4,3 1,19,21,73,63,,46,63,4-1,82-3,6-3,36-3,2-1,62-3,1-2,48-2,9,99 1,14,68,43-1,2,2-3,8,8,88 -,72 -,42,3,,1,68-1,13,1 1,2,7,88 2,7 1,8 1,33 1,67,88 -,6-1,99,88-1,8 1,6 1,4 -,12 -,24-1,31-1,7 -,18,1 -,16,36-1,4 -,79 -,6 -,47-1,81-1,3-1,7-1,43 -,9-3,82-1,17-1,99-2,89-2,7-2,24-1,83-1,34 -,18,72,64,23,4 -,1 -,22,3-1, -,97 -,4 -,88-1,21 -,81 -,28 -,67,79,21,86,6,69 1,13,63,6 -,83,23,28 -,38 -, -,84,,24-1,3-1,7-1,21-1,39,91-1,62 -,72-1,32 -,68,1,97 1,18,7,64-1,11,3 -,97,,7,12,18,1,19,3,97,62 1,4,9,74 -,1,36,2 -,3 -,76 -,16-1,38 -,38,1 -,26-1,98 1,12,84 1,14,6,93,8,81,7 1,23,68 -,23,43,48,98,4,98 21 -,7 -,97,39-1,64-2,4-2, -1,83 -,29,31 1,37,86 1,,94,91,68 1,12
No 1. C.coli No 2. E. coli No 3. C. lari No 4. C. jejuni No. C.jejuni+E.c oli No 6. C.coli No 7. C.jejuni+E.c oli No 8. C.jejuni No 9. Blank No 1. C.lari 6,, 3,4,1 3, 6,1 3,7,6, 3,4 4,8, 3, 4,4 3,2,1 3,1 4,6, 2,7,8 3,7 2,1 3,81 1,8 6, 4,86,1,8,3,83, 3,9,47 3,67,49 3,32 4,8 1, 3,1,7, 3,2, 3,,8 3,3,, 3,1,32, 3,7 4,4 3,11,96 3,49,61, 3,3,9, 3,4, 3,49,61 3,23,4, 3,4,2, 2,6 4,6 3,1,2 3,,1,,,9, 3,1,8 3,38,4 3,34,26, 3,6 7,9, 3,2, 3,4 6, 3,4,4, 3,1,3, 3,2 4,9 3,4,2 3,3 4,9, 3,3,87, 3,34 4,9 3,9,71 3,4,11, 3,23,6, 2,78 4,28 2,98,21 2,64,, 3,3 6,8, 3,18,19 3,,8 3,44,3, 3,3 4,71, 1,28 2,9 1,98 4,64 2, 3,91, 1,76,99, 3,64,16 3,42 4,96 3,34 3,64, 3,4,94, 2,72 4,6 3,39,3 3,3,32, 2,6,61, 3,69,1 3,6 6,97 3,68,61, 3,93,94, 2,78 3,7 3,6 4,66 3,67,48, 3,8,43, 2,43 3,93 3,18,1 3,2,18, 2,4,18, 2,8 4,3 2,3 4,81 2,66 4,38, 2,71 3,8, 2, 3,7 2,1 4,4 2,4 4,2, 2,,46, 3,43,14 3,34,79 3,26 4,92, 3,1,6, 2,6 4,49 2,9 4,86 2,9 4,89, 2,82,9, 3,18,26 3,49,87 3,7,3, 3,4,16, 3,19 4,94 3,1,21 2,94,4, 3,2,7, 2,3 4,13 2,7,99 2,64 4,7, 2,1,23, 3,1,32 3,72,81 3,1 4,2, 3,3,1, 3,1,1 3,3,6 3,3,1, 3,3,98, 3,38,63 3,61,9 3,26,18, 3,23,3, 2,7 4,7 2,7,3 3,3 4,9, 2,2 6,, 3,49,41 3,1 6, 3,7,38, 3,49 6,1, 3,41 4,66 3,42,76 3,64,22, 3,6,2, 2,6, 2,6 22 4,2 2,3 4,2, 3,,68, 3,7,26 3,6 6, 3,61,32, 3,67
Other robust estimators Location (centre tendency) Median (= initial x*) Scale (dispersion of results, e.g SD) MAD (Median Absolute Difference) Scaled MAD = MADe = MAD 1.483 (= initial s*) IQR (Interquartile range = % in the middle) 7th percentile of x i 2th percentile of x i (i = 1, 2,, p) Normalized (scaled) IQRn = IQR.7413 23
References to Huber s method ISO 72-:23 ISO 1328:2 Will be replaced by ISO (FDIS) 1328:214 amc technical brief No. 6 April 21 (Analytical Methods Committee, Royal Society of Chemistry 21) 24