Magruder Statistics & Data Analysis

Similar documents
Validation and Calibration. Definitions and Terminology

Method Validation/Verification. CAP/CLIA regulated methods at Texas Department of State Health Services Laboratory

FOOD FOR THOUGHT Topical Insights from our Subject Matter Experts UNDERSTANDING WHAT IS NEEDED TO PRODUCE QUALITY DATA

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

AMS 5 CHANCE VARIABILITY

Proficiency testing schemes on determination of radioactivity in food and environmental samples organized by the NAEA, Poland

American Association for Laboratory Accreditation

Validation of measurement procedures

APPENDIX N. Data Validation Using Data Descriptors

Applying Statistics Recommended by Regulatory Documents

Descriptive Statistics

Methods verification. Transfer of validated methods into laboratories working routine. Dr. Manuela Schulze 1

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Interlaboratory studies

ASSURING THE QUALITY OF TEST RESULTS

Sample Analysis Design Step 2 Calibration/Standard Preparation Choice of calibration method dependent upon several factors:

2014 AAFCO Check Sample Mycotoxins Program. Samples Engineered to Contain Relevant Concentrations of Significant Mycotoxins.

Evaluating Laboratory Data. Tom Frick Environmental Assessment Section Bureau of Laboratories

What Does the Normal Distribution Sound Like?

Stat 20: Intro to Probability and Statistics

Definition of Minimum Performance Requirements for Analytical Methods of GMO Testing European Network of GMO Laboratories (ENGL)

Name: Date: Use the following to answer questions 2-3:

Step-by-Step Analytical Methods Validation and Protocol in the Quality System Compliance Industry

3. Data Analysis, Statistics, and Probability

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

TEST REPORT: SIEVERS M-SERIES PERFORMANCE SPECIFICATIONS

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Evaluating System Suitability CE, GC, LC and A/D ChemStation Revisions: A.03.0x- A.08.0x

Exploratory data analysis (Chapter 2) Fall 2011

Ecology Quality Assurance Glossary

Tutorial 5: Hypothesis Testing

How To Check For Differences In The One Way Anova

Gamma Distribution Fitting

Exploratory Data Analysis

MTH 140 Statistics Videos

USING CLSI GUIDELINES TO PERFORM METHOD EVALUATION STUDIES IN YOUR LABORATORY

Analytical Chemistry Lab Reports

Data Quality Assessment: A Reviewer s Guide EPA QA/G-9R

Section 1.3 Exercises (Solutions)

Implementing Point and Figure RS Signals

Mass Spectrometry Signal Calibration for Protein Quantitation

DECISION LIMITS FOR THE CONFIRMATORY QUANTIFICATION OF THRESHOLD SUBSTANCES

Dongfeng Li. Autumn 2010

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

A practical and novel standard addition strategy to screen. pharmacodynamic components in traditional Chinese medicine using

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Exploratory Data Analysis

Normality Testing in Excel

Protein Prospector and Ways of Calculating Expectation Values

USE OF REFERENCE MATERIALS IN THE LABORATORY

Foundation of Quantitative Data Analysis

A Data Mining Based Approach to Electronic Part Obsolescence Forecasting

The CUSUM algorithm a small review. Pierre Granjon

Experiment 1: Measurement and Density

Qualification Study CHO 360-HCP ELISA (Type A to D)

METHOD 9075 TEST METHOD FOR TOTAL CHLORINE IN NEW AND USED PETROLEUM PRODUCTS BY X-RAY FLUORESCENCE SPECTROMETRY (XRF)

Guide to Method Validation for Quantitative Analysis in Chemical Testing Laboratories

Assay Development and Method Validation Essentials

AP * Statistics Review. Descriptive Statistics

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

determining relationships among the explanatory variables, and

ANALYTICAL DETECTION LIMIT GUIDANCE. & Laboratory Guide for Determining Method Detection Limits

Jitter Measurements in Serial Data Signals

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Control Charts and Trend Analysis for ISO Speakers: New York State Food Laboratory s Quality Assurance Team

Server Load Prediction

Using Excel for inferential statistics

Statistics. Measurement. Scales of Measurement 7/18/2012

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Descriptive Statistics and Measurement Scales

Visualization Quick Guide

SAUDI SCHOOL ASSESSMENT SYSTEM FOR PREDICTING ADMISSIONS TO SCIENCE COLLEGES

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests

Variables. Exploratory Data Analysis

1. PURPOSE To provide a written procedure for laboratory proficiency testing requirements and reporting.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

An Introduction to Point Pattern Analysis using CrimeStat

Compensation Basics - Bagwell. Compensation Basics. C. Bruce Bagwell MD, Ph.D. Verity Software House, Inc.

A Review of Statistical Outlier Methods

Common Tools for Displaying and Communicating Data for Process Improvement

Develop a Quantitative Analytical Method for low (» 1 ppm) levels of Sulfate

ALACC Frequently Asked Questions (FAQs)

Regression III: Advanced Methods

Quality Assessment of Exon and Gene Arrays

Glossary of Terms for Quality Assurance and Good Laboratory Practices

Uncertainty evaluations in EMC measurements

A Novel Technique for Long-Term Anomaly Detection in the Cloud

MHI3000 Big Data Analytics for Health Care Final Project Report

ANALYTICAL METHODS INTERNATIONAL QUALITY SYSTEMS

QbD Approach to Assay Development and Method Validation

Week 1. Exploratory Data Analysis

Transcription:

Magruder Statistics & Data Analysis Caution: There will be Equations!

Based Closely On: Program Model The International Harmonized Protocol for the Proficiency Testing of Analytical Laboratories, 2006 (IHP), MICHAEL THOMPSON, STEPHEN L. R. ELLISON AND ROGER WOOD AMC supported (Analytical Methods Committee of the RSC) Uses ISO statistical models - ISO 13528, 2005 and ISO 5725-2, 1994 Robust statistics used as described in the IHP and ISO 13528 Duplicate analysis supports method precision calculations. Proficiency testing often required for Laboratory Accreditation. Independent documentation on how it all works. IHP is free! Makes full use of Web based data transfer.

Magruder Proficiency Testing Reports Overview True Proficiency Testing Analyte reports and report cards using your Method of choice. Support for Guarantees. Support for IA s. Individual Method Proficiency Testing Method reports and report cards. Method Precision Data Duplicates allow calculation of Repeatability and Reproducibility. Method Precision is calculated for each Sample run.

Magruder Check Sample Program Robust Statistics The International Harmonized Protocol For The Proficiency Testing Of Analytical Chemistry Laboratories, 2006 ISO 13528 Statistical Methods for Use in Proficiency Testing by Interlaboratory Comparisons, 2005 Algorithm A

Why Robust Statistics? Most real world data distributions do not follow the Normal Gaussian Model, they are more like contaminated Normals. Distributions have Fat Tails and Outliers that skew the Mean and inflate the Standard Deviation (Normal estimators are very sensitive!). Even Outliers contain information. We need to weight it properly. We need a Robust estimate of Location for the data center. We need a Robust estimate of the data Dispersion. We need to identify and weight the Reliable data. John Tukey, Peter Huber and Frank Hampel credited with founding the discipline. All since Tukey s landmark paper in 1960 Tukey, J. W. (1960). A survey of sampling from contaminated distributions.

But First: We must remove The Pathological Data!

avg Robust Statistics Raw - 2 sd + 2 sd Unwarranted influence Needs fair representation Per: Frank Sikora, 2015

avg avg Robust Statistics Raw - 2 sd + 2 sd - 2 sd + 2 sd Robust Per: Frank Sikora, 2015

avg Robust Statistics Fat Tails Z! Z Z Z Z - 2 sd + 2 sd Robust Per: Frank Sikora, 2015

Contaminated Normal Observed Distribution Reliable Data Contamination Fat Tails -8-6 -4-2 0 2 4 6 8 SD

Calculating Robust Statistics We use Peter Huber s H15 method and Winsorize the Data. Sequentially brings the outer data in towards the Median. Down weights Outliers and Fat tails. Draws the Data towards a reliable standard Normal. Iterate this process until the mean converges. The new Mean X a = Robust estimate of Location for the data center (Assigned Value). The new Standard Deviation = σ rob as a fit-for-purpose Robust estimate of the data Dispersion. Uncertainty in X a U a rob 2n

Tools I use Data (Red) on Kernel Density Envelope. Normal Curve (Grey) 4 3.5 3 2.5 Winsorized Data Data Robust Normal Normal Kernel Density 2 1.5 1 0.5 0 4.6 5.35 6.1 Soluble Potash

Kernel Density Plot Let s call it a more precise Histogram f ( X, h) 1 nh n i 1 X X h i Φ is the Standard Normal density function.h is the Bandwidth.

Tools I use Data (Red) on Kernel Density Envelope. Normal Curve (Grey) 4 3.5 3 2.5 Winsorized Data Data Robust Normal Normal Kernel Density Winsorizing Squeezes Some Data Points In 2 1.5 1 0.5 0 4.6 5.35 6.1 Soluble Potash

Tools I use Data (Red) on Kernel Density Envelope. 4 3.5 3 Winsorized Data Data Robust Normal Normal Kernel Density Normal Curve (Grey) 2.5 Winsorizing Squeezes Some Data Points In Robust Normal Is Calculated 2 1.5 1 0.5 0 4.6 5.35 6.1 Soluble Potash

Data Quantiles Tools I use Compare: Raw Data (Green) Robust Data (Red) Normal Quantile (Blue) QQ Plot for Soluble Potash 7 6.7 Raw Data Robust Data 6.4 Normal Q Raw Data is ranked. Robust Data is ranked. Normal Quantiles Calculated. 6.1 5.8 5.5 5.2 All plotted against the Rank Based Z Value 4.9 4.6 The Sweet Spot! Where the curves overlap is Reliable Data 4.3 4-5 -4-3 -2-1 0 1 2 3 4 5 Normal Theoretical Quantiles or Rank Based Z Value

Normal QQ Plot Random Normal Data Zero centered SD = 1 The Blue Line: Replace Y axis with Normalized Data values. X a + Z * σ rob 0 + Z * 1

Data (Red) on Kernel Density Envelope. 2.5 2 Winsorized Data Data Robust Normal Normal Kernel Density Normal Curve (Grey) 1.5 1 0.5 0 0.27 1.185 2.1 Acid Soluble Iron

Data (Red) on Kernel Density Envelope. 2.5 2 Winsorized Data Data Robust Normal Normal Kernel Density Normal Curve (Grey) 1.5 Winsorizing Squeezes Some Data Points In 1 0.5 0 0.27 1.185 2.1 Acid Soluble Iron

Data (Red) on Kernel Density Envelope. 2.5 2 Winsorized Data Data Robust Normal Normal Kernel Density Normal Curve (Grey) 1.5 Winsorizing Squeezes Some Data Points In 1 Robust Normal Is Calculated 0.5 0 0.27 1.185 2.1 Acid Soluble Iron

QQ Plot for Acid Soluble Iron 1.6 1.46 Raw Data Robust Data Normal Q 1.32 1.18 1.04 0.9 0.76 0.62 0.48 0.34 0.2-5 -4-3 -2-1 0 1 2 3 4 5

In summary: from the Huber H15 Process we now have: An Assigned Value X a (robust measure of location). A fit for purpose σ rob standard deviation (robust measure of dispersion). An estimate of uncertainty in the assigned value U a. All based on the Reliable Data.

Sulfur Analysis in 150611 QQ Plots Reveal A Problem 10% Guarantee & 5% Guarantee

Elemental Sulfur (5%) QQ Plot Raw Data Robust Data Normal Q 10.6 10.01 9.42 Where s The Sweet Spot?? 8.83 8.24 7.65 7.06 6.47 5.88 5.29 4.7-5 -3-1 1 3 5

Sulfate Sulfur (5%) QQ Plot Raw Data Robust Data Normal Q 8.6 8.23 7.86 7.49 7.12 6.75 6.38 6.01 5.64 5.27 4.9-5 -3-1 1 3 5

Total Sulfur (10%) QQ Plot 11.6 Raw Data Robust Data Normal Q 10.89 10.18 9.47 8.76 8.05 7.34 6.63 5.92 5.21 4.5-5 -3-1 1 3 5

Total Sulfur (10%) QQ Plot Adjusted Raw Data Robust Data Normal Q 12 11.5 11 10.5 10 9.5 9 8.5 8 7.5 7-5 -3-1 1 3 5

Imagine if the discrepancy was not so obvious. Not Statistically Discernable! It is vitally important for Clients to submit Data for the CORRECT Analyte with the CORRECT Method Code!

Reporting Data Below the LOD A Word About Detection Limits

Units (Standard Deviation of the Blank) Detection Limits Definitions are not standardized! 11 10 9 8 7 6 5 4 3 2 1 0-1 -2-3 Blank set to 0 Establishes the Noise of the instrument or method. Let s call S BLANK = Noise

Units (Standard Deviation of the Blank) Detection Limits Definitions are not standardized! 11 10 9 8 7 6 5 4 3 2 1 0-1 -2-3 LOD Limit Of Detection 3 x Noise Blank set to 0 Above the Noise but still 50% chance of a false negative. Establishes the Noise of the instrument or method. Let s call S BLANK = Noise

Units (Standard Deviation of the Blank) Detection Limits Definitions are not standardized! 11 10 9 8 7 6 5 4 3 2 1 0-1 -2-3 Reporting Limit, 6 x Noise LOD Limit Of Detection 3 x Noise Blank set to 0 Protects against false negatives. Above the Noise but still 50% chance of a false negative. Establishes the Noise of the instrument or method. Let s call S BLANK = Noise

Units (Standard Deviation of the Blank) Detection Limits Definitions are not standardized! 11 10 9 8 7 6 5 4 3 2 1 0-1 -2-3 LOQ Limit Of Quantitation 10 x Noise Reporting Limit, 6 x Noise LOD Limit Of Detection 3 x Noise Blank set to 0 Safe limit for reporting reliable quantities. Protects against false negatives. Above the Noise but still 50% chance of a false negative. Establishes the Noise of the instrument or method. Let s call S BLANK = Noise

Units (Standard Deviation of the Blank) Detection Limits Definitions are not standardized! 11 10 9 8 7 6 5 4 3 2 1 0-1 -2-3 LOQ Limit Of Quantitation 10 x Noise Reporting Limit, 6 x Noise LOD Limit Of Detection 3 x Noise ----Got it! Blank set to 0 CYA, only useful in litigation. Repeated measurement of values in here can produce a usable estimate. This is similar to signal averaging. If you are not comfortable with your result do not report 0 - report nothing!

Questions?