PTA proficiency testing for metal testing laboratories BRIGGS Philip (Proficiency Testing Australia PO Box 7507, Silverwater, Rhodes NSW 2128 Australia ) Abstract: Proficiency testing is used internationally to provide objective evidence of testing laboratories testing competence. This paper details the types of proficiency testing techniques used by Proficiency Testing Australia (PTA) for metal testing programs and explores the basic principles behind sample selection and program design to achieve an effective outcome. It also covers the analysis of results and the remedial and corrective actions necessary where problems are highlighted by a proficiency test. Proficiency testing provides objective evidence of testing laboratories competence and complements the on-site assessment process. The effectiveness of such proficiency tests requires careful sample selection and preparation, program design, evaluation of results and where necessary, appropriate follow-up action to be taken. Key words: Metal testing, proficiency testing, Metal testing, interlaboratory comparisons 1. INTRODUCTION The competence of accredited laboratories is assessed by two complimentary techniques. One technique is the on-site evaluation by an assessment team, which examines the technical competence of the laboratories and their compliance with the requirements of ISO/IEC 17025 [1]. The other technique is by proficiency testing which involves the determination of laboratory testing performance by means of interlaboratory comparisons. The two techniques have their own advantages and when combined give a high degree of confidence in the overall competence of an accredited laboratory. PTA operates its proficiency testing programs in accordance with ISO Guide 43 [2]. PTA proficiency testing is also accredited to the requirements of ILAC Guide 13 [3]. The national PTA programs are conducted to the procedures outlined in the Guide to Proficiency Testing Australia [4]. The international Asia Pacific Laboratory Accreditation Cooperation (APLAC) programs are conducted to the procedures outlined in APLAC PT002 [5]. The effectiveness of the proficiency testing activities is also regularly evaluated by its international counterparts as part of the mutual recognition process. 1
2. ADVANTAGES OF PROFICIENCY TESTING for the accreditation body it gives a transparent demonstration of its laboratories technical competence; for the assessors it assists in making decisions on laboratory's proposed capabilities; for the laboratory it can highlight otherwise undiscovered problems and can be used as a development and improvement tool. 3. AIMS OF PROFICIENCY TESTING For testing laboratories the main aim of proficiency testing is to evaluate the participating laboratories ability to competently perform the tests examined in the program. Secondary aims of proficiency testing are to ensure that laboratories are:- correctly interpreting documentary standards; correctly calculating results; correctly reporting results. 4. TYPES OF PROFICIENCY TESTING The main type of proficiency test used by PTA for chemical and mechanical testing of metal:- the interlaboratory comparison which involves subdivided samples taken from a bulk sample which are distributed to participating laboratories which test them concurrently. The calculation of the consensus values is taken from the returned results from the participating laboratories. BULK SAMPLE Lab 1 Lab 2 Lab 3 Lab 4. Lab N CONSENSUS VALUES
Figure I Testing Interlaboratory Comparison The main type of proficiency test used by PTA for non-destructive testing of metal:- the reference comparison which involves sequential distribution of reference test plates to participating laboratories. PTA NDT Reference Test Item LAB 1 PTA LAB 2 PTA LAB N Figure II Testing Reference Comparison Laboratories in the Asia-Pacific region also participate in international comparisons coordinated by regional bodies such as the Asia Pacific Laboratory Accreditation Cooperation (APLAC) [3] and the European cooperation for Accreditation (EA). Laboratories throughout the Asia-Pacific region are also invited to participate in International Measurement Evaluation Programs (IMEP) which are sponsored by the European Commission in partnership with the Institute for Reference Materials and Measurements (IRMM). These intercomparisons provide objective evidence to APLAC mutual recognition partners that collectively the member laboratories can achieve satisfactory results, thus giving confidence in the accreditation process. 5. EXAMPLES OF PTA PROGRAMS 5.1 PTA Metal Alloys Proficiency Testing Program This program is designed for laboratories accredited for chemical compositional analysis of metals. The Round 16 program testing Bronze was completed in August 2007 (PTA Report No. 549 refer www.pta.asn.au). This chemical testing program involved the distribution to participating laboratories with one bronze alloy disc sample 50mm diameter x 10mm thickness. The following were instructed to be tested in duplicate Copper, Aluminium, Nickel, Iron, Tin, Lead, Zinc, Manganese, Silicon, Arsenic, Phosphorus, Cadmium and Chromium.
The bronze alloy samples were supplied by Hayes Metal Refineries Ltd, Auckland, New Zealand. The samples are described as nickel-aluminium bronze alloy. The samples were cut into discs of 10 mm thickness from one stock bar 50 mm in diameter. Fifteen samples were selected at random and analysed four times each for Aluminium, Iron, Nickel, Tin, Lead, Zinc, Manganese, Silicon, Arsenic, Phosphorus, Cadmium, Chromium and Magnesium. For each set of analyses the samples were taken in random order. A spark optical emission spectrometer was used for the analyses. The homogeneity of the original sample was determined from the results of the data analysis. The results comply with the requirements of ASTM E 826-90 Standard Practice for Testing Homogeneity of Materials for Development of Reference Materials. A total of 140 results were analysed in this program. Of these results, 15 (10.7%) were outlier results. It is planned that further rounds covering different alloy types will follow on a sixmonthly basis with the aim of regularly assessing laboratories performance for the specified tests. 5.2 PTA Tensile Testing of Metals Proficiency Testing Program This program was designed for laboratories accredited for the mechanical testing of the tensile strength of metal. The program was completed in November 2007 (PTA Report No. 558 refer www.pta.asn.au) Laboratories were provided with four steel strip samples and were instructed to perform tests for thickness, 0.2% proof stress (non-proportional elongation), (R p0.2 ), upper yield (ReH), lower yield (ReL), tensile strength (R m ) and percentage elongation after fracture (A%). The testing, recording and reporting was instructed to be performed in accordance with AS1391 Metallic materials Tensile testing at ambient temperature (2005). Before the test pieces were distributed to participants, ten specimens from each sample were selected at random and tested by BlueScope Steel Limited, Australia. This was done to assess the variability of the four samples to be used in the program. Results for thickness, 0.2% proof, tensile strength and percentage elongation after fracture were obtained for samples 1 and 3. Results for thickness, upper yield, tensile strength and percentage elongation after fracture were obtained for samples 2 and 4. A total of 213 results were analysed in this program. Of these results, 26 (12.2%) were outlier results.
It is planned that further rounds covering different mechanical tests on metals including hardness testing will follow on yearly basis with the aim of regularly assessing laboratories performance for the specified tests. 5.3 PTA Non-Destructive Testing Proficiency Testing Program This program is designed for laboratories accredited for non-destructive testing of metals. The program on Magnetic Particle Inspection was completed in May 2007 (PTA Report No. 542 refer www.pta.asn.au). A total of 6 test pieces were available for distribution to laboratories participating in this program. Each laboratory was supplied one test piece. The test piece material used was carbon steel. The test piece configurations included plate butts, T butts, Y butts and pipes which contained weld and plate discontinuities. Prior to the commencement of the proficiency testing program, each test piece was sent to BlueScope Steel for preliminary testing. This preliminary testing indicated that the test pieces were suitable for use in the program. The defects were determined to be appropriate, the surface condition was satisfactory and the test pieces were readily demagnetised. The participating laboratories were requested to test by the AC yoke magnetic flow technique. Inspection of the test piece was instructed to be conducted in accordance with AS 1171: 1998. A drawing, recording the non-compliant discontinuities, giving their type, dimension and location (reference to the specimen datum) was submitted with their endorsed report to PTA for evaluation. The reported test results were assessed using the following criteria: Tolerance for sizing discontinuities was ± 2 mm. Half marks were awarded for sizing to ± 5 mm. Tolerance for positioning of discontinuities was ± 2 mm. Half marks were awarded for positioning to ± 5 mm. Certified discontinuities not found resulted in an automatic fail. Any non-existent discontinuities reported resulted in a loss of marks. Any defects not correctly identified (in accordance with AS 4749:2001 or AS 2812:1985) resulted in a loss of marks. Documentation was to conform to AS 1171:1998 and accreditation requirements. These criteria were then used to assess the laboratory reports using a pass/fail grading. Of the 81 participants, 64 were graded as a pass and 17 were graded as a fail. The information was presented in the form of an interim report.
It is planned It is planned that further programs covering different non-destructive testing techniques including Ultrasonics and Radiography will follow on a yearly basis with the aim of regularly assessing laboratories performance for the specified tests. 6. DESIGN STAGE Once a program has been selected a small working group is formed to design the program. This usually comprises one or more technical advisers, the PTA staff program coordinator and sample supplier. The following areas are usually considered: nomination of tests to be conducted, range of values to be included, test methods to be used and number/design of samples required; preparation of paperwork (instructions and results sheet) particularly with reference to reporting formats, number of significant figures/decimal places to which results should be reported and correct units for reporting; technical commentary in the final report and in some cases evaluation of replies submitted by laboratories which were requested to investigate extreme results. 6.1 Sample Supply and Preparation For interlaboratory programs sample preparation procedures are designed to ensure that the samples used are as homogenous and stable as possible, while still being similar to samples routinely tested by laboratories. A number of each type of sample are selected at random and tested, to ensure that they are sufficiently homogeneous for use in the proficiency program. Whenever possible this is done prior to samples being distributed to participants. The results of this homogeneity testing are analysed statistically and may be included in the final report. Proficiency tests may also use test items with established reference values. These test items and reference certificates may be supplied by a reference material provider. With such test items no further characterisation is required by the proficiency testing provider. 7. EVALUATION OF RESULTS 7.1 Statistical Design In testing programs the evaluation of results is based on comparison to assigned values which are usually obtained from all participants results (i.e. consensus values). Given that any differences between the samples have been minimised, variability in the results for a program usually has two main sources - variation between laboratories (which may include variation between methods) and variation within a laboratory.
The other main statistical consideration during the planning of a program is that the analysis used is based on the assumption that the results will be approximately normally distributed. The normal distribution is a bell-shaped curve, which is continuous and symmetric, and is defined such that about 68% of the values lie within one standard deviation of the mean, 95% are within two standard deviations and 99% are within three. 68% 95% 99% Figure 2 Normal Distribution 7.2 Summary Statistics Once the data preparation is complete, summary statistics are calculated to describe the data. PTA uses seven summary statistics:- Number of results total number of results received for a particular test/sample; Median - middle value or centre of the data set; Normalised Interquartile Range (IQR) - difference between the upper and lower quartiles, calculated by multiplying the IQR by a factor (0.7413) which relates it back to the "normal distribution"; Robust CV coefficient of variation (normalised IQR/median) expressed as a %; Minimum lowest value of the data set; Maximum highest value of the data set; Range difference (maximum minimum). 7.3 Z-scores To statistically evaluate the participants results, PTA uses z-scores based on robust summary statistics (the median and normalised IQR). A simple robust z-score (denoted by Z) for a laboratory s single sample A result would be: Z = A median(a) normiqr(a)
Where pairs of results have been obtained (i.e. in most cases), two z-scores are calculated - a between-laboratories z-score and a within-laboratory z-score. These are based on the sum and difference of the pair of results, respectively. The z-score calculations in this case would be: Let S i be the standardised sum of laboratory's results. Also, D i is the standardised difference between the two results from laboratory i. Now S i = (A i +B i ) / 2 and D i = (A i -B i ) / 2 if median(a i ) > median(b i ) (B i -A i ) / 2 if median(a i ) < median(b i ). So, the between-laboratories z-score for laboratory i (denoted ZB i ) is the robust z-score of its S i and their within-laboratory z-score (ZW i ) is the robust z-score of its D i - i.e. ZB i = S i - median(s i ) and ZW i = D i - median(d i ). IQR(S i )x 0.7413 IQR(D i )x0.7413 An outlier is defined as any result/pair of results with an absolute z-score greater than three, i.e. Z > 3 or Z < -3. Such results have been identified as significantly different to others in the data set. 7.4 Graphical Displays 7.4.1 Ordered Z-score charts On these charts each laboratory s z-score is shown, in order of magnitude, and is marked with its code number. From these charts each laboratory can readily compare its performance relative to the other laboratories. For a sample pair, the first chart in each section is of the between-laboratories z-scores for the sample pair and the second is of the within-laboratory z-scores (if applicable). For a single sample a robust z-score chart is generated. These charts contain solid lines at +3 and 3, so the outliers are clearly identifiable as the laboratories whose bar extends beyond these cutoff lines. The y-axis has been limited to range from 5 to +5, so in some cases very large or small (negative) z-scores appear as extending beyond the limit of the chart. Tensile strength (R m ) - Sample 1 3 2 2 Robust Z-Score 1 0-1 13 18 15 5 9 1 14 8 10 4 3 17-2 -3 Laboratory Code
Figure III Bar Chart 7.4.2 Youden Diagrams Youden two-sample diagrams are presented to highlight laboratory systematic differences. They are based on a plot of each laboratory s pair of results, sample two versus sample one, represented by a black spot. These diagrams also feature an approximate 95% confidence ellipse for the bivariate analysis of the results, and dashed lines which mark the median value for each of the samples. All points which lie outside the ellipse are labelled with the laboratory s code number. Note however that these points may not correspond with those identified as outliers. This is because the outlier criterion ( z-score > 3) has a confidence level of approximately 99%, whereas the ellipse is an approximate 95% confidence region. So, the points outside the ellipse on the Youden diagram will roughly correspond to those with z-scores greater than 2 or less than 2. The laboratories which are outside the ellipse but have not been identified as extreme (those which have 2 < z-score < 3) are encouraged to investigate their results. As a guide to the interpretation of the Youden diagrams: (i) (ii) laboratories with significant systematic error components (i.e. betweenlaboratories variation) will be outside the ellipse in either the upper right hand quadrant (as formed by the median lines) or the lower left hand quadrant, i.e. inordinately high or low results for both samples; laboratories with random error components (i.e. within-laboratory variation) significantly greater than other participants will be outside the ellipse and (usually) in either the upper left or lower right quadrants, i.e. an inordinately high result for one sample and low for the other. It is important to note however that Youden diagrams are an illustration of the data only, and are not used to assess the results (this is done by the z-scores).
Lead (%) in Bronze Alloy Sample 2 0.0 0.01 0.02 0.03 0.04 6 2 5 1 7 8 11 10 9 4 3 0.0 0.01 0.02 0.03 0.04 Sample 1 Figure IV Youden Diagram 7.5 Reference comparison evaluation For the non-destructive type programs the laboratory s performance is a comparison of the laboratory s results to the reference data. A scoring scheme was developed with a pass/fail grading. The criteria was: Tolerance for sizing discontinuities was ± 2 mm Tolernance for positioning of discontinuities was ± 2 mm Certified discontinuities not found resulted in an automatic fail Any non-existent discontinuities reported resulted in a loss of marks Any defects not correctly identified resulted in a loss of marks Documentation was to conform to AS 1171:1998 8. FOLLOW UP ACTION Proficiency testing provides objective evidence of a laboratory s capability. Under accreditation body s due diligence obligations they must ensure that where a laboratory has not performed satisfactorily, that appropriate action is taken to: correct the problem; review other works that could have been affected by the same problem and where necessary recall reports;
take action to prevent recurrence of the problem. Accreditation bodies may work with the laboratory to identify and overcome the problem: where there are results with Z > 3, the laboratory may be asked to investigate (if accredited for the specific test) and report details of their findings and corrective action; seek advice where necessary from the appointed program technical adviser for a PTA response; discuss the matter further at the laboratories next scheduled accreditation reassessment. 9. CONCLUSION Proficiency testing provides objective evidence for testing laboratories competence and complements for accredited laboratories the on-site assessment process. The effectiveness of such proficiency tests requires careful sample selection and preparation, program design, evaluation of results and where necessary, appropriate follow-up action to be taken. 10. REFERENCES [1] ISO/IEC 17025: 2005 General requirements for the competence of testing and calibration laboratories [2] ISO/IEC Guide 43: 1997 Part 1: Development and operation of proficiency testing schemes Part 2: Selection and use of external proficiency testing schemes by laboratory accreditation bodies [3] ILAC Guide 13:2007 Guidelines for the Requirements for the Competence of Providers of Proficiency Testing Schemes [4] PTA: 2008 Guide to Proficiency Testing Australia [5] APLAC PT002: 2008 Testing Interlaboratory Comparisons