Magruder Statistics & Data Analysis

Magruder Statistics & Data Analysis Caution: There will be Equations!

Based Closely On: Program Model The International Harmonized Protocol for the Proficiency Testing of Analytical Laboratories, 2006 (IHP), MICHAEL THOMPSON, STEPHEN L. R. ELLISON AND ROGER WOOD AMC supported (Analytical Methods Committee of the RSC) Uses ISO statistical models - ISO 13528, 2005 and ISO 5725-2, 1994 Robust statistics used as described in the IHP and ISO 13528 Duplicate analysis supports method precision calculations. Proficiency testing often required for Laboratory Accreditation. Independent documentation on how it all works. IHP is free! Makes full use of Web based data transfer.

Magruder Proficiency Testing Reports Overview True Proficiency Testing Analyte reports and report cards using your Method of choice. Support for Guarantees. Support for IA s. Individual Method Proficiency Testing Method reports and report cards. Method Precision Data Duplicates allow calculation of Repeatability and Reproducibility. Method Precision is calculated for each Sample run.

Magruder Check Sample Program Robust Statistics The International Harmonized Protocol For The Proficiency Testing Of Analytical Chemistry Laboratories, 2006 ISO 13528 Statistical Methods for Use in Proficiency Testing by Interlaboratory Comparisons, 2005 Algorithm A

Why Robust Statistics? Most real world data distributions do not follow the Normal Gaussian Model, they are more like contaminated Normals. Distributions have Fat Tails and Outliers that skew the Mean and inflate the Standard Deviation (Normal estimators are very sensitive!). Even Outliers contain information. We need to weight it properly. We need a Robust estimate of Location for the data center. We need a Robust estimate of the data Dispersion. We need to identify and weight the Reliable data. John Tukey, Peter Huber and Frank Hampel credited with founding the discipline. All since Tukey s landmark paper in 1960 Tukey, J. W. (1960). A survey of sampling from contaminated distributions.

But First: We must remove The Pathological Data!

avg Robust Statistics Raw - 2 sd + 2 sd Unwarranted influence Needs fair representation Per: Frank Sikora, 2015

avg avg Robust Statistics Raw - 2 sd + 2 sd - 2 sd + 2 sd Robust Per: Frank Sikora, 2015

avg Robust Statistics Fat Tails Z! Z Z Z Z - 2 sd + 2 sd Robust Per: Frank Sikora, 2015

Contaminated Normal Observed Distribution Reliable Data Contamination Fat Tails -8-6 -4-2 0 2 4 6 8 SD

Calculating Robust Statistics We use Peter Huber s H15 method and Winsorize the Data. Sequentially brings the outer data in towards the Median. Down weights Outliers and Fat tails. Draws the Data towards a reliable standard Normal. Iterate this process until the mean converges. The new Mean X a = Robust estimate of Location for the data center (Assigned Value). The new Standard Deviation = σ rob as a fit-for-purpose Robust estimate of the data Dispersion. Uncertainty in X a U a rob 2n

Tools I use Data (Red) on Kernel Density Envelope. Normal Curve (Grey) 4 3.5 3 2.5 Winsorized Data Data Robust Normal Normal Kernel Density 2 1.5 1 0.5 0 4.6 5.35 6.1 Soluble Potash

Kernel Density Plot Let s call it a more precise Histogram f ( X, h) 1 nh n i 1 X X h i Φ is the Standard Normal density function.h is the Bandwidth.

Tools I use Data (Red) on Kernel Density Envelope. Normal Curve (Grey) 4 3.5 3 2.5 Winsorized Data Data Robust Normal Normal Kernel Density Winsorizing Squeezes Some Data Points In 2 1.5 1 0.5 0 4.6 5.35 6.1 Soluble Potash

Tools I use Data (Red) on Kernel Density Envelope. 4 3.5 3 Winsorized Data Data Robust Normal Normal Kernel Density Normal Curve (Grey) 2.5 Winsorizing Squeezes Some Data Points In Robust Normal Is Calculated 2 1.5 1 0.5 0 4.6 5.35 6.1 Soluble Potash

Data Quantiles Tools I use Compare: Raw Data (Green) Robust Data (Red) Normal Quantile (Blue) QQ Plot for Soluble Potash 7 6.7 Raw Data Robust Data 6.4 Normal Q Raw Data is ranked. Robust Data is ranked. Normal Quantiles Calculated. 6.1 5.8 5.5 5.2 All plotted against the Rank Based Z Value 4.9 4.6 The Sweet Spot! Where the curves overlap is Reliable Data 4.3 4-5 -4-3 -2-1 0 1 2 3 4 5 Normal Theoretical Quantiles or Rank Based Z Value

Normal QQ Plot Random Normal Data Zero centered SD = 1 The Blue Line: Replace Y axis with Normalized Data values. X a + Z * σ rob 0 + Z * 1

Data (Red) on Kernel Density Envelope. 2.5 2 Winsorized Data Data Robust Normal Normal Kernel Density Normal Curve (Grey) 1.5 1 0.5 0 0.27 1.185 2.1 Acid Soluble Iron

Data (Red) on Kernel Density Envelope. 2.5 2 Winsorized Data Data Robust Normal Normal Kernel Density Normal Curve (Grey) 1.5 Winsorizing Squeezes Some Data Points In 1 0.5 0 0.27 1.185 2.1 Acid Soluble Iron

Data (Red) on Kernel Density Envelope. 2.5 2 Winsorized Data Data Robust Normal Normal Kernel Density Normal Curve (Grey) 1.5 Winsorizing Squeezes Some Data Points In 1 Robust Normal Is Calculated 0.5 0 0.27 1.185 2.1 Acid Soluble Iron

QQ Plot for Acid Soluble Iron 1.6 1.46 Raw Data Robust Data Normal Q 1.32 1.18 1.04 0.9 0.76 0.62 0.48 0.34 0.2-5 -4-3 -2-1 0 1 2 3 4 5

In summary: from the Huber H15 Process we now have: An Assigned Value X a (robust measure of location). A fit for purpose σ rob standard deviation (robust measure of dispersion). An estimate of uncertainty in the assigned value U a. All based on the Reliable Data.

Sulfur Analysis in 150611 QQ Plots Reveal A Problem 10% Guarantee & 5% Guarantee

Elemental Sulfur (5%) QQ Plot Raw Data Robust Data Normal Q 10.6 10.01 9.42 Where s The Sweet Spot?? 8.83 8.24 7.65 7.06 6.47 5.88 5.29 4.7-5 -3-1 1 3 5

Sulfate Sulfur (5%) QQ Plot Raw Data Robust Data Normal Q 8.6 8.23 7.86 7.49 7.12 6.75 6.38 6.01 5.64 5.27 4.9-5 -3-1 1 3 5

Total Sulfur (10%) QQ Plot 11.6 Raw Data Robust Data Normal Q 10.89 10.18 9.47 8.76 8.05 7.34 6.63 5.92 5.21 4.5-5 -3-1 1 3 5

Total Sulfur (10%) QQ Plot Adjusted Raw Data Robust Data Normal Q 12 11.5 11 10.5 10 9.5 9 8.5 8 7.5 7-5 -3-1 1 3 5

Imagine if the discrepancy was not so obvious. Not Statistically Discernable! It is vitally important for Clients to submit Data for the CORRECT Analyte with the CORRECT Method Code!

Reporting Data Below the LOD A Word About Detection Limits

Units (Standard Deviation of the Blank) Detection Limits Definitions are not standardized! 11 10 9 8 7 6 5 4 3 2 1 0-1 -2-3 Blank set to 0 Establishes the Noise of the instrument or method. Let s call S BLANK = Noise

Units (Standard Deviation of the Blank) Detection Limits Definitions are not standardized! 11 10 9 8 7 6 5 4 3 2 1 0-1 -2-3 LOD Limit Of Detection 3 x Noise Blank set to 0 Above the Noise but still 50% chance of a false negative. Establishes the Noise of the instrument or method. Let s call S BLANK = Noise

Units (Standard Deviation of the Blank) Detection Limits Definitions are not standardized! 11 10 9 8 7 6 5 4 3 2 1 0-1 -2-3 Reporting Limit, 6 x Noise LOD Limit Of Detection 3 x Noise Blank set to 0 Protects against false negatives. Above the Noise but still 50% chance of a false negative. Establishes the Noise of the instrument or method. Let s call S BLANK = Noise

Units (Standard Deviation of the Blank) Detection Limits Definitions are not standardized! 11 10 9 8 7 6 5 4 3 2 1 0-1 -2-3 LOQ Limit Of Quantitation 10 x Noise Reporting Limit, 6 x Noise LOD Limit Of Detection 3 x Noise Blank set to 0 Safe limit for reporting reliable quantities. Protects against false negatives. Above the Noise but still 50% chance of a false negative. Establishes the Noise of the instrument or method. Let s call S BLANK = Noise

Units (Standard Deviation of the Blank) Detection Limits Definitions are not standardized! 11 10 9 8 7 6 5 4 3 2 1 0-1 -2-3 LOQ Limit Of Quantitation 10 x Noise Reporting Limit, 6 x Noise LOD Limit Of Detection 3 x Noise ----Got it! Blank set to 0 CYA, only useful in litigation. Repeated measurement of values in here can produce a usable estimate. This is similar to signal averaging. If you are not comfortable with your result do not report 0 - report nothing!

Questions?