John Zorich, MS CQE

Used In Non-clinical Inspection & Test In the Pharma & Med-device Industries John Zorich, MS CQE www.johnzorich.com JOHNZORICH@YAHOO.COM Outline of topics: Regulatory Requirements Basic Vocabulary Confidence Intervals Statistical Process Control (SPC) Confidence / Reliability Calculations Process Capability Indices Mean Time Between Failures Tests of Statistical Significance QC Sampling Plans Examples of Valid Statistical Rationale statements 1

Regulatory Requirements 21CFR211 (2016) Stability Testing...shall include...sample size... based on statistical criteria... FDA s Guidance for...process Validation (2011) The number of samples should be adequate to provide sufficient statistical confidence of quality both within a batch and between batches. The confidence level selected can be based on risk analysis as it relates to the particular attribute under examination. ICH Harmonised Tripartite Guideline Quality Risk Management, Q9 (2005) The following statistical tools are recommended to support and facilitate quality risk management : [statistical process] control charts process capability analysis Regulatory Requirements FDA's "GMP" (21CFR820.250) Sampling plans...shall be...based on a valid statistical rationale... FDA's "Medical Device Quality Systems Manual" "...all sampling plans have a built-in risk of accepting a bad lot. This sampling risk is typically determined... by...the 'operating characteristic [OC] curve [which]...can be used to determine the risk a sampling plan presents... A manufacturer shall be prepared to demonstrate the statistical rationale for any sampling plan used. ISO 13485:2016 7.3.6, Design & Development Verification 7.3.7, Design & Development Validation Records of the results and conclusions...including, as appropriate, rationale for sample size...shall be maintained. 2

is a mathematical summary value calculated from data taken from a Sample. All of the following are statistics: Avg thickness of every 100th cable produced last week. Range of thicknesses in that sample Median thickness in that sample. is a mathematical summary value calculated from data taken from the entire Population; that is, every data point in the entire population (e.g., average thickness of all cables produced last week). "Statistics" as a science is the mathematical analysis of "statistics", not of parameters. Statistics is the science of using "statistics" to guesstimate "parameters". is part of a The purpose of obtaining information about the sample statistic (e.g., Average, Std Deviation, or % in-spec) is that we trust that it will give us information about the population parameter (which is what we really care about). 3

(they are the basis of most statistical methods) 95% Confidence Interval ( & Limits) Sample Mean A 95% confidence interval is an interval around the observed mean of a sample, in which interval you can expect (95% of the time) to find the true mean (= parameter) of the population from which the sample was taken. Confidence Limits are the values on either end of the interval. 4

Where is the true mean ( = the Parameter Mean)? Answer: Somewhere in Confidence Interval (at the chosen confidence level, typically 95%). Sample Mean Confidence interval based on a LARGE sample-size Confidence interval based on a MEDIUM sample-size Confidence interval based on a SMALL sample-size The choice of sample size is arbitrary, based upon how narrow you want the confidence interval ( i.e. how accurately you want to know the parameter). Because confidence limits are automatically adjusted based upon sample size, ANY sample size is valid!! Confidence Limits for Sample Average Standard Error of the (sample) Mean ( estimated from 1 sample ) Sample Standard Deviation Sample Size. 95% confidence interval for the Population Mean can be estimated using this equation: Sample Average + / " t " x (Std Error of Mean) 5

This table provides the t used to calculate Confidence Intervals; for 2-sided, use A = 1 Confidence For 1-sided confidence interval limits, use A = 2 x ( 1 Confidence) " v " or "d.f. (= degrees of freedom ) is commonly equal to the denominator in the calculation of the relevant standard deviation (e.g., the classic sample standard deviation has n 1 = Sample_Size 1 in its denominator. Confidence Limits for Sample Proportions, % Over a dozen different methods exist for calculating confidence limits for sample % defective --- each of those methods gives a different length & different conf. limits!!! The classic method is called the Exact binomial --- it can be calculated via Excel's "Beta" function --- for example: UPPER 2-tailed Confidence Limit =beta.inv( 1 (1 C ) / 2, k + 1, N k ) LOWER 2-tailed Confidence Limit =beta.inv((1 C ) / 2, k, N k + 1 ) C = Confidence N = Sample size (e.g., 100 ) k = observed number of items of interest, e.g. defectives) 95% conf. limits for observed 10 defects in sample size = 100 beta.inv( 1 (1 0.95) / 2, 10 + 1, 100 10 ) = 0.176 = 17.6% beta.inv( (1 0.95) / 2, 10, 100 10 + 1 ) = 0.049 = 4.9% 6

One-sided confidence limits are calculated like so: UPPER 1-tailed Confidence Limit =beta.inv( C, k + 1, N k ) LOWER 1-tailed Confidence Limit =beta.inv( 1 C, k, N k + 1 ) 95% 1-sided limits for observed 10 defects when N = 100 upper = beta.inv( 0.95, 10 + 1, 100 10 ) = 0.16372 lower = beta.inv( 1 0.95, 10, 100 10 + 1 ) = 0.05526 7

Harold F. Dodge: He helped invent SPC when he worked in the QA department at Bell Laboratories from 1917 to 1958. Of his early experiences Dodge wrote: Initially, the basic procedures for variables called for samples of four, with one [SPC] chart for the average, and another for the standard deviation... We proposed shop use of samples of five instead of four; it is easier to divide by five than by four." That text is from: http://www.asq.org/join/about/history/dodge.html SPC Xbar charts with normally distributed data (ANY sample size is valid because the Control Limits are simply 2-sided 99% confidence limits on current process mean) 125 n = 2 n = 4 n = 10 120 115 110 105 100 95 90 85 80 75 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 8

where Reliability means % in-specification (these are a type of process capability calculation) Sample Sizes Because the output of reliability/confidence calculation is a lower 1-sided confidence limit... Any sample size is valid, because the smaller the sample size, the farther away from the observed %-in-spec is the %-in-spec that you can claim. 90% 100% Confidence limit based upon small sample Confidence limit base upon large sample % reliability observed in sample 9

ATTRIBUTE DATA: (i.e., Pass/Fail testing, not measurements results) Reliability = beta.inv ( 1 C, N F, F + 1 ) where... beta.inv = Microsoft Excel function C = Confidence desired (expressed as a decimal fraction) N = sample size F = # of failures seen in the sample That formula outputs the lower 1-tailed "exact" binomial confidence limit on % in-specification observed in the sample. If no failures in a sample of 299, then 95% confidence in... =beta.inv( 1 0.95, 299 0, 0 + 1 ) = 0.99 = 99% reliability If 2 failures in a sample size of 30, then 95% confidence in... =beta.inv( 1 0.95, 30 2, 2 + 1 ) = 0.80 = 80% reliability Why does a sample of 299, with zero failures, equal 95% confidence of at least 99% reliability? A reliability calculation on a binomial proportion is, in effect, a lower 1-sided confidence limit on the observed proportion. It's the lower-most edge of the interval in which we predict we will find the true ("parameter") proportion. We are 95% sure that the Parameter is somewhere in this interval 98% 99% Lower 1-tailed 95% Confidence Limit on Sample Statistic, when N = 299 and no failures are found in sample. 100% For reliability, we get to claim the worst value in that interval (in this case, 99%) Sample Statistic 10

VARIABLES DATA Normally Distributed (or transformed to Normality) K-factor table a.k.a., Normal Tolerance Factor table, or... Statistical Tolerance Factor table Procedure is: Calculate the Observed K Compare the Observed K to the K in the K-table. Observed K = number of Std Deviations that the Process Mean is from nearest side of a 1 or 2-sided specification, i.e., (SmplAvg NearestSpecLimit) SmplStdDev Juran's QH confidence reliability K If the "observed K" is at least 3.520, & the population is "Normally Distributed, & sample size is 15, then we are 95% confident that the Lot from which the sample came has at least 99% in-spec parts. 11

Why does a sample of 15, with whose average is 3.52 std dev s away from a 1-sided QC spec, equal 95% confidence of at least 99% reliability? A reliability calculation on a sample from a Normal population is, in effect, a lower 1-sided confidence limit on the observed % in-spec. We are 95% sure that the Parameter % in-spec is somewhere in this interval 98% 99% 99.98% For reliability, we get to claim the worst value in that interval (in this case, 99%) When sample Avg is 3.52 stdevs from the one-sided spec, can claim 99% reliability at 95% confidence. Observed Sample Statistic: =NORMSDIST(3.52) = 99.98 % in-spec but no confidence can be claimed for this statement! (requires normally distributed data, or data that has been transformed to normality) 12

Capability Indices = ( NSL ProcessAvg ) (3 x StdDev) where: NSL = QC specification closest to Process Average ProcessAvg = average of the sample data being analyzed StdDev = estimate of the standard deviation of the population from which the sample was taken (based on data from all samples being analyzed for Cpk) When Cpk alone is reported to a regulatory agency, the conclusions based upon the Cpk are suspect, if a small sample size was used, because... a Cpk is just a statistic --- where s the parameter? = L C L (if report LCL, then...any sample size is valid) for a lower 1-sided 95% limit, replace this with... normsinv(0.95) where normsinv is an Excel function Cpk without confidence Cpk with confidence (for 1 and 2-sided QC specs) n in this formula refers to total # of raw data points combined from all samples in data set being evaluated If Cpk = 1.33, then lower 1-sided 95% confidence limit is... 1.17 for n = 100 and 1.00 for n = 25 13

(e.g., for capital equipment) MTBF = Mean Time Between Failures Device Failure indicated by X on line Hours in service # 1 X X 500 # 2 X 500 # 3 500 # 4 X X 500 After failure, device is quickly repaired & put back into service. MTBF = (500+500+500+500) / 5 failures = 400 hours But that s a statistic --- where s the parameter? When reporting a statistic, impossible to justify sample size. 14

LOWER 1-SIDED CONFIDENCE LIMITS for MTBF from studies carried out for a pre-determined time period: Any sample size is valid, because using Confidence Limits: 2 x T Lower Conf. Limit = ------------------------------------------- Chiinv( 1 Conf, ( 2 x F ) + 2 ) Where T = Total in-service test time (all devices combined; = LengthOfStudy x NumberOfDevicesInStudy For this calculation, these are identical: T = 500 hours x 4 devices T = 50 hours x 40 devices T = 5 hours x 400 devices Chiinv = the Excel function Conf = desired confidence (as a decimal fraction) F = Total number of failures (all devices combined) e.g., ( 2 x 2000 ) / Chiinv ( 1 0.95, ( 2 x 5 ) + 2 ) = 190 hours 15

Confidence Interval explanation of t-tests (mathematically identical to classic t-tests) Sample Value Null Hypothesis Value 95% conf. interval (large sample) The fact that the Null Hypothesis is outside the confidence interval means the t-test had a significant result. It is much more difficult to obtain a significant result with a small sample (= large conf. interval) than with a large sample (= small conf. interval), and therefore any sample size is valid when claiming significance. Confidence Interval explanation of t-tests Sample Value Null Hypothesis Value 95% conf. interval (small sample) The fact that the Null Hypothesis is NOT outside the confidence interval means the t-test had a non-significant result. It is much easier to obtain a non-significant result with a small sample (= large conf. interval) than with a large sample (= small conf. interval), and therefore... not all sample sizes are valid when claiming non-significance unless justified based upon the classic method of Power. 16

Testing for (practical) Non-inferiority ( use this instead of non-significance and Power ) WORSE performance C B(yours) A (other) BETTER performance Assume that A is the other company s product and that B is your company s product, and that C is the first value lower than A that is considered worse than A in a practical sense). If B is statistically significantly larger than C, then you can claim that your product B is non-inferior (substantially equivalent?) to product A, in a practical sense, irrespective of whether B is statistically different from A. Therefore...Any sample size is valid, for claiming noninferiority, IF it shows a significant difference between B and C 17

The %AQL of an AQL sampling plan is the product quality ( = lots having that % defective) that the sampling plan will accept (=approve) almost all the time. The %LQL of an LQL (a.k.a., LTPD, RQL, or UQL) sampling plan is the product quality ( = lots having that % defective) that the sampling plan will reject almost all the time. Predicting Pass Rates 100% 80% 60% % Defective OC curve for a 4% AQL sampling plan (ANSI Z1.4) 40% 20% 0% 0% 10% 20% Lot % Defective N = 1000 n = 80 c = 7 Sample Size is mandated by Sampling Plan booklet, based upon Lot Size. 18

100% 80% 60% % Defective OC curve for a 4% AQL, C=0 sampling plan (Squeglia's 4 & 5th ed.) 40% 20% 0% 0% 10% 20% Lot % Defective N = 1000 n = 15 c = 0 Sample Size is mandated by Sampling Plan booklet, based upon Lot Size. Previous slides, combined: 100% 80% 60% Previous 2 slides combined Z1.4 (4% AQL) ASQC-Z1.4, 4% AQL ASQC-C=0, 4% AQL C=0 ( 4% AQL ) 40% 20% 0% 0% 5% 10% 15% 20% Lot % Defective 14% LQL (choose sampling plans that have LQL = LTPD that support Risk Management statements) 19

Are LQL levels of AQL plans consistent? 100% 80% 60% 40% 20% 0% 0% 10% 20% Lot % Defective Lot Size = 1000 Lot Size = 500 Lot Size = 100 ASQC-Z1.4, general, II, single, normal, 4% AQL After making choice on previous slide, be sure to determine actual LQL for planned Lot Size; adjust AQL as needed to achieve desired LQL. 20

CONFIDENCE INTERVALS We are 99% confident that the Population average is not smaller than 2.35 and not larger than 3.16 (where those 2 values are the lower and upper limits of the 2-sided 99% confidence interval calculated based upon the Sample data). We used a sample size of 20, which is valid because this result was based upon a calculation of confidence interval limits (any sample size is valid when using confidence limits). 2.35 Sample Avg. 3.16 CONFIDENCE / RELIABILITY STATEMENTS We are 95% confident that the Population from which the Sample was taken has an in-specification-% (i.e., a %Reliability ) that is not less than 99.9%. We used a sample size of 8, which is valid because we used K-tables for our calculation, which are based on confidence limits, and therefore any sample size is valid. 5.7 Standard Deviations between QC Spec and Sample Average Sample Avg 42 21

QC SAMPLING PLANS We are using a formal published Sampling Plan booklet purchased from the American Society for Quality (ASQ). Sample size is a function of lot size, as explicitly required by the instructions in the Sampling Plan booklet. The particular sampling plan we chose from that booklet is one that will control the Incoming QC rejection rate, so that the rejection rate is least 90% for lots that have defective rates of 1.5% or greater (i.e., LQL = 1.5%), which is the rejection rate required by our Risk Management Plan documents. We have chosen to limit the range of allowed lot sizes that we purchase to 500-1000, in order to ensure that the rejection rate is stable no matter what lot size is received. 22