Unit cell refinement from powder diffraction data: the use of regression diagnostics



Similar documents
BARTON COLLEGE PRACTICE PLACEMENT TEST. a) 4 b) 4 c) 12 d) a) 7a 11 b) a 17 c) a 11 d) 7a 17. a) 14 b) 1 c) 66 d) 81

1.3. The Mean Temperature Difference

CATALYZED HYDROLYSIS OF AMIDE AND PEPTIDE BONDS IN PROTEINS 1

STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION

INFLUENCE OF GRINDING TREATMENTS ON THE SUR FACE HARDNESS OF INTAGLIO PRINTING PLATES OF 0.33.PERCENT CARBON STEEL

Chapter 3 Savings, Present Value and Ricardian Equivalence

2 r2 θ = r2 t. (3.59) The equal area law is the statement that the term in parentheses,

AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM

Semipartial (Part) and Partial Correlation

An Introduction to Omega

Episode 401: Newton s law of universal gravitation

Vector Calculus: Are you ready? Vectors in 2D and 3D Space: Review

Deflection of Electrons by Electric and Magnetic Fields

Figure 2. So it is very likely that the Babylonians attributed 60 units to each side of the hexagon. Its resulting perimeter would then be 360!

Using PayPal Website Payments Pro UK with ProductCart

The ad hoc reporting feature provides a user the ability to generate reports on many of the data items contained in the categories.

4a 4ab b (count number of places from first non-zero digit to

Disk Redundancy (RAID)

Ilona V. Tregub, ScD., Professor

The Role of Gravity in Orbital Motion

est using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years.

Questions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing

efusion Table of Contents

ON THE (Q, R) POLICY IN PRODUCTION-INVENTORY SYSTEMS

Financing Terms in the EOQ Model

Tipsheet: Sending Out Mass s in ApplyYourself

VISCOSITY OF BIO-DIESEL FUELS

IPv6 Lookups using Distributed and Load Balanced Bloom Filters for 100Gbps Core Router Line Cards

Coordinate Systems L. M. Kalnins, March 2009

Custom Portlets. an unbiased review of the greatest Practice CS feature ever. Andrew V. Gamet

Manual ultrasonic inspection of thin metal welds

Experiment MF Magnetic Force

:: ADMIN HELP AT A GLANCE Contents

An important topic in marketing involves how manufacturers

FXA Candidates should be able to : Describe how a mass creates a gravitational field in the space around it.

LISTSERV ADMINISTRATION Department of Client Services Information Technology Systems Division

Often people have questions about new or enhanced services. This is a list of commonly asked questions and answers regarding our new WebMail format.

Converting knowledge Into Practice

STIOffice Integration Installation, FAQ and Troubleshooting

Spamguard SPAM Filter

LOUISIANA TECH UNIVERSITY Division of Student Financial Aid Post Office Box 7925 Ruston, LA 71272

Using PayPal Website Payments Pro with ProductCart

March 2016 Group A Payment Issues: Missing Information-Loss Calculation letters ( MILC ) - deficiency resolutions: Outstanding appeals:

WHITEPAPER SERIES

Experiment 6: Centripetal Force

9:6.4 Sample Questions/Requests for Managing Underwriter Candidates

Key Steps for Organizations in Responding to Privacy Breaches

Data Protection Act Data security breach management

UTO Training Bb Discussion Boards. Technical Assistance: Website: Help Desk Phone: (24/7 support) Instruction

BRILL s Editorial Manager (EM) Manual for Authors Table of Contents

UNIT CIRCLE TRIGONOMETRY

How To Set Up A General Ledger In Korea

This page provides help in using WIT.com to carry out the responsibilities listed in the Desk Aid Titled Staffing Specialists

esupport Quick Start Guide

TRAINING GUIDE. Crystal Reports for Work

This document provides instructions on how to complete the Cheque Requisition Form.

How to Avoid Moisture Damage to Walls from Condensation

Chapter 3: Cluster Analysis

Tracking/Fusion and Deghosting with Doppler Frequency from Two Passive Acoustic Sensors


Patient Participation Report

Group Term Life Insurance: Table I Straddle Testing and Imputed Income for Dependent Life Insurance

Competitive Intelligence Report - Market Snapshot Explanations of Numbers Suggestions and Tips

Statistical Analysis (1-way ANOVA)

WHITE PAPER. Vendor Managed Inventory (VMI) is Not Just for A Items

Chris J. Skinner The probability of identification: applying ideas from forensic statistics to disclosure risk assessment

Interactive Catchment Plan Project Brief. Background to the organisation. How will we achieve this? What is a rivers trust?

Supervisor Quick Guide

Research on Risk Assessment of the Transformer Based on Life Cycle Cost

MDSB. MemberDirect Small Business. User Guide

Spirotechnics! September 7, Amanda Zeringue, Michael Spannuth and Amanda Zeringue Dierential Geometry Project

Phi Kappa Sigma International Fraternity Insurance Billing Methodology

Retirement Planning Options Annuities

Access to the Ashworth College Online Library service is free and provided upon enrollment. To access ProQuest:

CSE 231 Fall 2015 Computer Project #4

MATHEMATICAL SIMULATION OF MASS SPECTRUM

Student Academic Learning Services Page 1 of 7. Statistics: The Null and Alternate Hypotheses. A Student Academic Learning Services Guide

GED MATH STUDY GUIDE. Last revision July 15, 2011

Instituto Superior Técnico Av. Rovisco Pais, Lisboa virginia.infante@ist.utl.pt

Data Analytics for Campaigns Assignment 1: Jan 6 th, 2015 Due: Jan 13 th, 2015

TaskCentre v4.5 Send Message (SMTP) Tool White Paper

Exam #1 Review Answers

Comparing Availability of Various Rack Power Redundancy Configurations

This report provides Members with an update on of the financial performance of the Corporation s managed IS service contract with Agilisys Ltd.

Lesson Study Project in Mathematics, Fall University of Wisconsin Marathon County. Report

How do I evaluate the quality of my wireless connection?

Army DCIPS Employee Self-Report of Accomplishments Overview Revised July 2012

NextGen: PM Contract Library. User Manual

Voltage ( = Electric Potential )

Connecting to

QAD Operations BI Metrics Demonstration Guide. May 2015 BI 3.11

Transcription:

Unit cell efinement fm pwde diffactin data: the use f egessin diagnstics T. J. B. HLLAND AND S. A. T. REDFERN Deptatment f Eath Sciences, Univesity f Cambidge, Dwning Steet, Cambidge, CB2 3EQ, UK Abstact We discuss the use f egessin diagnstics cmbined with nnlinea least-squaes t efine cell paametes fm pwde diffactin data, pesenting a methd which minimizes esiduals in the expeimentallydetemined quantity (usually 2hkt enegy, Ehkt). Regessin diagnstics, paticulaly deletin diagnstics, ae invaluable in detectin f utlies and influential data which culd be deleteius t the egessed esults. The usual pactice f simple inspectin f calculated esiduals alne ften fails t detect the seiusly deleteius utlies in a dataset, because bae esiduals pvide n infmatin n the leveage (sensitivity) f the datum cncened. The egessin diagnstics which pedict the change expected in each cell cnstant upn deletin f each bsevatin (hkl eflectin) ae paticulaly valuable in assessing the sensitivity f the calculated esults t individual eflectins. A new cmpute pgam, implementing nnlinea egessin methds and pviding the diagnstic utput, is descibed. ~YWRDS: pwde diffactin, egessin diagnstics, lattice paametes, cmpute pgam. ntductin THE deteminatin f the lattice ( cell) paametes f cystalline mateials fm pwde diffactin data is a vey cmmn task in minealgical and petlgical eseach. Beaing in mind the pevalent natue f this task, it is smewhat supising t discve that it is vey ften caied ut using a methd that culd easily be impved upn. The appach that is cmmnly emplyed fllws that fist adpted by Chen (135) t efine cell paametes fm diffactin data by iteative leastsquaes efinement f tial cell paametes, using the minimizatin f the sums f squaes f esiduals in Q = dh~). This is lagely a matte f cnvenience, because the mst cmpact and elegant expessin f the dependence f the spacing f the (hkl) lattice planes, dhkt, in tems f the unknwn cell paametes is given by Qhkl = d~ = h2 a.2 + kzb.2 + 12c.2 + 2klb*c*csct* + 21hc*a*cs[l* + 2hka*b*csy* (1) The values f the ecipcal cnstants (a*, b*, c*, ct*, 13", and y*) ae usually fund by fitting the expessin abve t values f Qh~ (fund fm measuements f 2hkt) by a nn-linea least-squaes Minealgical Magazine, VL 61, Febuay 17, pp. (~) Cpyight the Minealgical Sciety pcedue. The eal space unit cell paametes ae then detemined fm these ecipcal cnstants with thei uncetainties calculated by e ppagatin. t is supising that iteative nn-linea efinement is the mst cmmn methd used f cell paamete deteminatin fm pwde diffactin data, given that the equatin abve is actually linea in six paametes which may be eadily detemined by the much simple methd f linea least-squaes. This fact was nted and discussed by Kelsey (164) wh utlined the methd f e ppagatin f the expessin f Qhkl ecast as hkt = h2xl + k2x2 + 12x3 + klx4 + lhxs + hkx6 (2) The advantages f this appach ae that it is diect and fast, using standad least-squaes pcedues, and that n initial guesses ae equied f the cell paametes. The disadvantages ae that the last thee unknwns x4, x5 and x6 ae made up fm vaius cmbinatins f the cell paametes and ae nt independent f the fist thee paametes. Lage celatins amng the vaius paametes might cause unding e, educing the accuacy with which ct, 13 and y can be detemined. Futheme, equatin (2) abve is nly linea in paametes xl... x6 when witten in tems f Qhkt. f we wish t minimize 65-77

66 T.J.B. HLLAND AND S. A. T. REDFERN esiduals in anthe dependent vaiable, such as (the 8 mst usually measued) 2hk dhkl, then the expessin becmes nn-linea in the cell paametes and simple linea least-squaes cannt be used. 6 Rathe than minimizing esiduals in Q, in which _~ case diect linea methds such as thse f Kelsey (164) might be used, it is usually me apppiate t use the expeimentally measued quantity (such as ~ 4 2hk l Ehkl) as the dependent vaiable f ~. minimizatin. Belw, we discuss the advantages f -~ this appach. Additinally, we daw attentin t the advantages f using egessin diagnstics as a tl 2 in detecting nt nly utlies in measuements f diffactin data but als thse diffactin peaks which ae mst influential in detemining the fitted cell paametes. Chice f dependent vaiable Cfinchle diffactin data Rts (14) d E Q _~ 25 5 75 degees (2) n many egessin pblems thee exists a chice f FG. 1. A Plt f d-spacing, enegy and Q against 2 f which vaiable t use as the dependent vaiable. This a typical set f measued X-ay eflectins f chlite ften tuns ut t be an imptant chice since it (taken fm Rts, 14). The values f d and Q have usually affects the magnitudes f the detemined been multiplied by 5 and 1, espectively, t scale them paametes. Mst familia is the questin in simple t thse f E in kev. The nnlineaity between 2 and d staight line elatinships invlving tw vaiables becmes paticulaly imptant f mateials with lage (e.g. y = a + bx) f whethe t egess y n x x n y. d-spacings, such as the chlite epesented hee. All e is usually placed n the dependent vaiable (say y) and it is assumed that it is y which we wish t estimate fm knwn values f x using the paametes f the egessin equatin. n the ecgnized peviusly (Hat et al., 1; Taya, pesent situatin the chice wuld appea clea -- 13). Figue 1, a typical dataset invlving the values f h,k,l ae knwn (if the indexing has eflectins in the ange 6-8 ~ suggests that been dne cectly) and s Q must be the dependent vaiable t use. Uncetainties in each Qhkz value ae nt geneally knwn, hweve, and geneally each is assigned its wn weight. This is because it is nt usually Q which has been measued but sme the expeimentally detemined value such as the angle (2hkl) enegy (Ehkl) f a Bagg eflectin, depending n the natue f the diffactin expeiment. Clealy it wuld be me satisfacty t minimize the esiduals in the expeimental bsevables duing the egessin. Because Qhkt, Ehkl and dhkt d nt vay linealy with 2 (see Fig. 1), the egessin esults will depend n which ne we chse t be the dependent vaiable. This is, hweve, a cnsequence f using unweighted leastsquaes. With nn-linea least-squaes methds, any egessing with d-spacing as the dependent vaiable will place inceasingly excessive weight n lw angle eflectins, thus seiusly biasing the esults n the basis f aguably the lwest eslutin eflectins. This effect becmes paticulaly significant in mateials with lage d-spacings, such as the chlite fm which the data f Fig. 1 wee btained. Likewise, use f Q as the dependent vaiable wuld place t lw a weight n lw angle eflectins but wuld begin t place t lage a weight n the vey high angle eflectins when cmpaed with the expeimentally detemined vaiables E and 2. A stategy that has been emplyed t vecme this functinal bias is t weight the data in Q t cmpensate, an appach that indeed pvides an adequate (if piecemeal) slutin t this aspect f f the pssibilities (2h~1, Qhkl, Ehkt and dhkl) can be having chsen the incect dependent vaiable. easily used as the vaiable whse esiduals ae t be minimized and the mst easnable chice must be the ne which was measued in the paticula diffactin expeiment, unless paticula cae is taken ve weighting the data pints t cmpensate. These advantages f efmulating the they f efinement as a nn-linea least-squaes pcedue athe than a linea least-squaes pcedue have been Weights may, hweve, als be needed t accunt f the vaiatin in quality f each peak psitin measuement. t is knwn, f example, that the standad deviatin f the measued psitin (in, say, 2) is invesely pptinal t the squae t f the peak intensity (Wilsn, 167). f we wish t weight the bsevatins t take accunt f this sme the judgement f individual datum quality, futhe

REGRESSN DAGNSTCS 67 adjustments must be made t thse weights which have aleady been applied t cect f the functinal bias f Q. The pefeed appach is t cay ut the initial nnlinea least-squaes n the basis f egessin f the measued quantity (2 E) athe than Q, and then weights can be applied as necessay t take accunt f expeimental judgements f each datum. ndeed, this has been adpted by pevius wkes wh mdified existing methds (Hat et al., 1). As an illustatin f the ptential weakness f pefming unweighted egessin n Q, we cmpae the esults f egessing the data f Mnte Smma anthite fm Redfeu and Salje (187), details given belw, using 2hkt, Qhkt, dhkl and Eh~l as the dependent vaiable. T simulate an enegy-dispesive synchtn expeiment, we have assumed a beam 2 f 1 ~ t calculate an enegy spectum fm the iginal data. The diffeences in cell paametes, althugh small, can be as lage as the individual estimated uncetainties. Figue 2 shws the diffeences in the vlume and lengths f the cell edges using these fu dependent paametes and shws clealy that Q and d yield the mst exteme values. Althugh nt shwn in Fig. 2, the cell angles ct, [3 and y all have simila stng dependence n egessin vaiable. n caying ut these egessins we emplyed a tw step appach. Fist we used linea least-squaes f Qhkt t btain stating guesses f the cell paametes, and then we used nnlinea leastsquaes f the measued vaiable f chice t btain the efined paametes. This appach nt nly allws the cect egessin vaiable t be selected, it als means that initial guesses at the stating cell 8.11 1 12.87 a (A) 8.15 A /k A b (A) 12.878 [] [] [] 8.1 A 12.877 [] 8.185 i i i 12.876 i t d 2 E Q d 2 E Q 14.178 14.177 i @ 1342.625 1342.6 i V 14.176 14.175 C (h) 1342.575 v(~, 3) V v 14.174 1342.55 14.173 14.172 1342.525 V 14.171 1342.5 d 2 E Q d 2 E Q FG. 2. The effect n the cell dimensins f changing the dependent vaiable (Q, 2, E d) in the efinement f the anthite data (see Tables 1 and 2). Nte that Q and d typically pvide exteme values f the cell cnstants.

68 T. J. B. HLLAND AND S. A. T. REDFERN paametes ae nt equied (nly indexed peaks and a specificatin f the cystal system). Regessin diagnstics As an aid in fitting cell paametes t diffactin data, it is extemely useful t calculate seveal s-called egessin diagnstics alng with all the the paametes duing the egessin in de t identify pssible utlies in the data. Regessin diagnstics ae discussed in sme detail in the wk f Belsley et al. (18) and Pwell (185) with espect t linea egessin analysis whee thei value is demnstated in helping identify which data pints in a set ae utiies and which data ae ptentially dangeus because they have vey high influence n the calculated esults (leveage). Althugh these diagnstics nly apply stictly.t linea pblems, by lineaizing the functin at the slutin we may use all the machiney f the linea situatin. The assumptin is that f small es, the functin we ae fitting is easnably linea -- an assumptin we have t make anyway, in detemining the magnitudes f the uncetainties n fitted paametes. We will nw intduce five imptant diagnstic paametes and explain thei use. Typically, the nly diagnstic used duing efinement f cell paametes is the diffeence between the bseved and calculated values (the esiduals) f the data. We shall see that this 2bs--2caJc value can be misleading, and the use f egessin diagnstics pvides a fa supei methd f identifying p data pints esulting fm measuement indexing es. Regessin diagnstics pvide a useful methd f cnfiming the cect indexing f peaks. t shuld be nted at the utset, hweve, that these ae single-bsevatin diagnstics -- based n the influence a single data pint may have, and as such the methd cannt detect deleteius effects aising fm seveal bsevatins acting tgethe, since these may mask ne anthe. (1) Hat. ne f the mst imptant diagnstics in helping detect influential data is the Hat matix H, s called because it puts the Hat n y, being a pjectin matix elating calculated and bseved values f the vect fy values, ~ = Hy. The diagnstics f value ae the diagnal elements h i efeing t each bsevatin i and these can take n values fm hi =, indicating that bsevatin i has n influence n the fit, t h i = 1, indicating exteme influence such that bsevatin i is fixing ne f the paametes. The Hat values ae elated t the distance f any pint fm the cente f the data spead, s that pints lying at the extemities f data space ae vey influential in detemining the values f ne me paametes, wheeas data lying in the middle f the spead exet little influence n the calculated paametes. The Hat values sum t the n numbe f paametes in the egessin, ~i=1 hi = p and the aveage value f h i is theefe given by whee p is the numbe f paametes and n is the numbe f bsevatins. bsevatins with high leveage ae influential, and ae flagged by Hat values in excess f a cut-ff f ~ (Belsley et al., 18). High leveage simply flags the vey influential data and des nt in itself imply that such data ae hamful. the diagnstics must be used in cnjunctin with the Hat values in helping t assess the data. n linea least-squaes pblems, a slutin b which minimizes the esiduals in y f the equatins y = Xb is fund by slving the nmal equatins, which may be expessed in tems f matix algeba as (xtx)b = XTy, whee X T is the tanspse f X. The Hat matix is then defined as x(xtx)-x T. A gd nn-linea least-squaes methd f ptimizing cell paametes is that f Maquadt (as detailed, f example, by Bevingtn, 16) in which the final stage is a Gauss-Newtn step t finding slutins b t the equatins (JTj)b = jte whee J is the Jacbian f patial deivatives f the fitting functin with espect t the cell paametes a, b is the vect f incements t the cell paamete estimates, and e is the vect f esiduals (Yi - YiC~c). Lineaizing the fitting functin at the slutin allws an estimate f the Hat values fm hi = Hdi = ji(jtj)-lj/t, whee Ji is the ith.... ~ Yi w j,.e. [a!' ~ "'" ~.J" Y" 1 (2) Sigma(i). ~l'he stana~ad e f the esiduals, is a useful measue f the spead f the calculated y values, and a dp in this diagnstic signals a bette veall fit t the data. The definitin f ~y is given by T.,. = e-e whee e is the vect f esiduals and if this n --p..... ' value falls slgmflcantly upn deletin f an bsevatin i, it pints t that bsevatin being ptentially deleteius t the fit. The deletin diagnstic calculated f each bsevatin, is the value f Cy which wuld esult if the bsevatin i wee t be deleted fm the dataset. Scanning dwn the list f calculated ~y(i) f values significantly smalle than the veall ~y highlights bsevatins which might be hamful t the fit. (3) Rstudent. The use f simple esiduals ei = calc Yi- Yi ae f elatively little diagnstic value whee sme bsevatins ae vey much me influential than thes, as vey influential data ae geneally assciated with small esiduals. An adapted fm f esidual, Rstudent, in which the esidual has been nmalized by divisin by ~/1 -hi, allws f the effects f leveage. t is defined as (Belsley et al., 18) Rstudenti ei (3) cy(i)~/1 - hi Rstudent may be used as a diagnstic paamete since it is expected t be less than 2. at the 5%

REGRESSN DAGNSTCS 6 cnfidence level, and s values f Rstudent f an i which ae in excess f 2. suggest that the data pint ( in this case bseved diffactin vect f the ith hkl eflectin) shuld be teated with suspicin. (4) DfFits. DfFits is anthe imptant deletin diagnstic which gives the change in the pedicted value Yi upn deletin f the ith bsevatin as a multiple f the standad deviatin f ~i. When this diagnstic is lage, the pedicted value f y, cespnding t bsevatin i, wuld change substantially if the egessin wee t be eun withut that bsevatin. t is theefe a measue f the influence f each bsevatin i n its wn calculated psitin. hiei 1 DfFitsi -- i - ( " (4) bsevatins with DfFits geate than a cutff value f 2v/p~ shuld be cnsideed as ptential utlies in a dataset. (5) DfBetas. The final diagnstic which may be used t assess and impve cell paamete efinement is called DfBetas, defined as DfBetas/j - f}j - [}j(i ~) _ (ata)-~j~e, (5) '[~j '[~j (1 -- hi) This pvides a measue f hw much the calculated value f each efined paamete 13j wuld change if the egessin wee eun withut using bsevatin i. t may cnveniently be expessed as a pecentage f the standad deviatin f that paamete. Rathe than use ne f the heuistic cutff values suggested by Belsley et al, (18) we suggest that bsevatins which wuld change a paamete by me than 33% f its standad deviatin shuld be flagged as ptentially deleteius t the efined esults. This diagnstic is paticulaly valuable in assessing which f the bseved eflectins has mst influence n the calculated cell paametes, and can be used in cnjunctin with the the diagnstics in weeding ut pssibly deleteius data as well as assessing the individual suces f e in a cell paamete efinement. Examples Refinement f data cllected as a functin f 2 The usefulness f these egessin diagnstics, which we have incpated int a nn-linea least-squaes efinement pcedue t detemine unit cell paametes, is best illustated by taking a clse lk at sme specific examples. Taking fist f all the typical case f cell paamete efinement fm diffactin data cllected as a functin f scatteing angle, 2, we shall explain the use f the assciated egessin diagnstics using the dataset f an anthite measued at high-tempeatue. The expeimental aangements f the data cllectin and the scientific significance f the data ae descibed by Redfen and Salje (187) and Redfen et al. (188). We have efined this example dataset minimizing the squae f the esiduals in 2 as well as the esiduals in Q, and ae thus able t cmpae diectly the diffeences between the tw methds f a eal set f data. Cmputed esults fm these tw efinements f the anthite data ae shwn in Tables 1 and 2. (a) Minimizing esiduals in 2. Many cmmnlyused cell paamete efinement pgams pvide lists f bseved and calculated 2 d-spacings f each f the bseved eflectins. The usual measue f the quality f each data pint and test f whethe a eflectin is cectly indexed is taken as the diffeence between these bseved and calculated values. F the Mnte Smma anthite dataset hee, theefe, thse eflectins which shw diffeences between the bseved and calculated values which ae geate than twice the aveage abslute deviatin ae flagged by a bullet in the Tables. This wuld be the usual limit f diagnstic infmatin available t the expeimentalist. Tables 1 and 2, hweve, als include the egessin diagnstics efeed t abve, which pvide an additinal invaluable aid t the citical analysis and evaluatin f each measued data pint as well as pviding a check f the accuacy f peak indexing. F example, f the anthite data efined n the esiduals in 2 (Table 1) we see that bth the 224 and 228 eflectins have values f Hat geate than the ~ cutff explained abve. Thus these eflectins ae paticulaly influential. Lking at thei values f DfFits and Rstudent, hweve, we bseve that while they ae influential, they ae nt as detimental t the veall fit as sme f the the eflectins. The 64 eflectin, n the the hand, is nt quite as influential (its value f Hat is just less than the cutff) but it des appea deleteius t the fit, since bth DfFits and Rstudent lie well abve thei limits. ndeed, we see that if this eflectin wee emved fm the dataset it wuld lwe the veall value f sigmafit fm its cuent value f.17 t.1 (as given by the paamete sigma(i), in the futh clumn f the egessin diagnstics), a pecentage change in sigmafit f -6.% (as shwn in the final clumn). This is a case whee the expeimente might wish t lk again at the measuement f 2 f the 64 and 224 eflectins and assess whethe thee may be sme e f eithe measuement, indexing, calibatin, a pblem f velap with a stng eflectin f anthe phase if the data ae btained fm a mixed-phase sample. f a data pint is an utlie, as the egessin diagnstics imply, then the nly ptin may be t emve ne

7 T. J. B. HLLAND AND S. A. T. REDFERN ~D a~,..., c~ ~ ~ ~ ~ ~D A ~ qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq ~ ==,..., v U~ -,. ~'~.~. zn [-

REGRESSN DAGNSTCS 71 l l l ~t ta.,.-~ l l E ~D ~ [..,,~

72 T. J. B. HLLAND AND S. A. T. REDFERN E n:d p,.. "~>,= gggggg~ gg~ggggg~gg~ggg~gggggg~g~ ~ ~ ~ ~ ~ ~ ~ ~ E A cq -8 D ~ ~ ~ ~ ~ ~ ~ ~ ~ ;> ~D { " ". g~ ~ ~.~ c5c~c5 ~ g~g c4~ ~ ~. ~ b-,

REGRESSN DAGNSTCS 73 ;> <1 ~ c~.<1,--' <1 1~11,,-->. e-i

74 T.J.B. HLLAND AND S. A. T. REDFERN bth fm the efinement. The DfBetas paametes f these eflectins pvide a pi indicatin f the effect f such a cuse f actin, since they shw hw the emval f a datapint affects the efined cell paametes. Thus, in the case f emval f the 64 eflectin, ne wuld see a change f +.2cya (.3 A) n the a cell edge, -.6ct, (.1 A) n b, hadly any change n c, but mst significantly an incease f +.87c~ (.16 ) n the c angle. The veall effect is t incease the value f the cell vlume by sme.81,&3, which is nly 38% f its standad deviatin (and thus f aguable significance). The limited effect f emval f this datapint n the efined paametes might have been expected since it is a futhe cnfimatin that while bth DfFits and Rstudent lie abve thei limits f this eflectin, and its emval can impve the veall quality f the fit, the eflectin des nt have as stng an influence n the deived cell paametes as, f example, the 224 eflectin (since its Hat is lwe than that f 224). ndeed, emval f 224 wuld lead t a eductin in the cell vlume by 71% f its standad deviatin (.151 ~3), and a lage incease f the!3 angle by 123% f its standad deviatin. t is inteesting t cmpae the use f the egessin diagnstics explained abve with the pcedue f simple selectin f ptential utlies n the basis f the diffeences between bseved and calculated 2 psitins (as might usually be pefmed in the cuse f a cell paamete efinement). Fu eflectins shw deviatins between calculated and bseved values geate than 2cy. f these, the geatest is f the 64 eflectin and the 152 eflectin. Fm the values f Hat, hweve, we see that 152 des nt have a stng influence n the efined paametes, neithe des its emval lwe the veall standad deviatin by as much as, say, the emval f 224 (which has a smalle deviatin between bseved and calculated 2). Futheme, we have seen that while the emval f the 64 eflectin gives the geatest eductin in sigmafit, this peak des nt influence the efined cell paametes as much as the (almst as ply fitting) 224 eflectin. While emval f 152 might be suggested fm the diffeences between bseved and calculated 2 psitins, theefe, cnsideatin f the deletin diagnstics, pvided as a cmputatinal by-pduct f the egessin, indicates that the 152 eflectin is nt a piity, and attentin shuld fist be paid t the 224 and 64 eflectins. (b) Minimizing esiduals in Q. The same dataset f anthite has been efined by minimizing the esiduals in Q, athe than in 2. This simulates the peatin f a standad linea least-squaes efinement f the data, as might be pefmed by any ne f the many public dmain pgams available f the pupse. The utput is shwn in Table 2. As expected, the efined cell paametes diffe fm thse btained by efinement minimizing esiduals in 2 using exactly the same data. n paticula the 13 cell angle is me than ne standad deviatin smalle. Futheme, we see that the esiduals n individual bsevatins ae nw quite misleading if emplyed as a mechanism f detecting utlies. The geatest vaiatin between bseved and calculated 2 is shwn by the 152 eflectin. f this was all that was knwn, then the fist cuse f actin in an attempt t impve the efinement might be t eliminate this eflectin, at least measue it again t attempt t impve the fit. We shwed abve, hweve, that this eflectin is nt as detimental t the nn-linea least-squaes efinement as eithe 64 224, and that 224 was the peak which is mst influential n the efined paametes when the data ae handled cectly (efining n 2, the measued bsevatin). Distubingly, the 224 eflectin shws what wuld pbably be egaded as a pefectly acceptable value f 2bs--2calc when the data ae fitted by efining esiduals in Q, and thee is n indicatin that this is the mst deleteius utlie. The egessin diagnstics btained by efining esiduals in Q als highlight a numbe f the peaks (such as 222, 7424, and 26), which we knw ae nt significant utlies fm u efinement f the data n the basis f minimising esiduals in 2. Cmpaisn f the cmputed esults f this dataset using the tw methds f efinement highlights bth the imptance f efining the data n the basis f the bseved quantities (athe than a deived functin, such as Q), as well as the utility f deletin diagnstics in identifying utlies (cmpaed with simple yet less bust citeia such as the values f 2bs--2calc f individual eflectins). Refinement f high-pessue enegy-dispesive data. The pvisin f egessin diagnstics becmes paticulaly useful when dealing with data that ae inheently lw quality. ne paticula field whee this applies is in the analysis f high-pessue pwde diffactin data cllected in an enegy-dispesive expeiment. n a typical expeiment a beam f white (usually synchtn) adiatin impinges n an extemely small amunt f sample, ften held static in a diamnd anvil cell. P sample statistics, cllectin f a limited pat f the diffactin cne, intefeence with diffactin fm intenal pessue standads, and the elatively lw eslutin f slidstate enegy-dispesive detects all cnspie against the expeimente and mean that data must be handled and intepeted with cae t btain the best esults. We illustate the use f egessin diagnstics in efining enegy-dispesive data using the example f epidte in Table 3. Fist f all we nte that the Bagg

REGRESSN DAGNSTCS 75 ~ d d d d d d d ~ d A ~D ;:,,..,L' e-. "t:3 v d d d ~ d d d d d d d d d Eg~ c~ > P 7 f7 ~ ~.~ l l l l Z~ ~ > E,, c~ ~'~ b.

76 T.J. B. HLLAND AND S. A. T. REDFERN maxima wee measued as a functin f enegy, and the cell paametes wee detemined fm efinement f the same measued quantity. The list f bsevedcalculated peak.psitins shws that the measued psitin f the 223 eflectin displays the geatest deviatin fm the best fit calculated value. A typical stategy might be t assume that the 223 eflectin psitin was spuius and t ecalculate the cell paametes n the basis f a dataset withut this data pint. The egessin diagnstics, hweve, indicate that athe than 223, it is the 414 and i6 eflectins that equie clse examinatin. Values f DfFits and Rstudent f the 414 peak ae substantially geate than the cutff. We als see that the i6 eflectin has a lage Hat, but this need nt wy us as the values f DfFits and Rstudent ae within thei limits, and the value f sigma(i) shws that the egessin wuld becme wse, nt bette, if this eflectin was mitted. n the the hand, sigma(i) f the 414 eflectin is substantially (aund 17%) lwe than sigmafit f the whle efinement, and the missin f this peak will impve the whle fit. nspectin f the DfBetas given in pat (b) eveals that missin f the 414 eflectin fm the efinement will incease a, c, and the 13 angle by me than ne standad deviatin. t is inteesting t nte that emval f the 223 eflectin, as might initially have been indicated simply by cnsideing Ebs-Ecalc, wuld nt alte any f the cell paametes by me than half thei individual standad deviatins. Futheme, althugh the esidual f this pint is elatively lage, the statistical tests shw that it is nt significant. By inspectin f the diagnstics f evey bsevatin athe than just thse abve the citical cutffs, we find that the measued bsevatin f the 223 eflectin gives values f -.78 and -1.72 f DfFits and Rstudent espectively (well belw the cutff), and futheme has a small value f Hat (.145) s is in any case insignificant. n the the hand, Table 3 shws that the 7~14 eflectin is a tue utlie in the dataset and identifies this peak as the eflectin which shuld be checked if the efinement is t be impved. nce again, we see that the values f E,,bs--Ecalc ae nt always gd indicats f the statistical quality f individual eflectins. mplicatins f the use f egessin diagnstics in cell efinement We have shwn the efficacy f cmputing essential diagnstic infmatin equied f caeful cell paamete efinement, and that such diagnstics pesent a cnsideable impvement n thse pcedues f weeding ut eflectins based puely n the individual deviatins between bseved and calculated data. Thee is n easn why the identificatin f utlies in datasets and subsequent impvement f efinements shuld nt becme a utine pecus t the publicatin and use f pwde diffactin data. ndeed, Smith (18) has aleady pinted ut that any labaty planning t pepae data f publicatin f inclusin in databases such as the Pwde Diffactin File the NST Cystal Data File shuld sceen thei data f es and ply-fitting values. The statistical tests descibed hee pvide a simple mechanism f caying ut such sceening, using egessin diagnstics f the fist time. t als vey imptant that the efinement is based n minimizatin f the diffeences between the tue measued quantity and its calculated value (athe than a lineaized deived functin). R.C. Jenkins (15, pes. cmm.) ecently pinted ut that f the 5 pwde diffactin datasets culled fm the published liteatue each yea, the CDD find that nly 1 s ae acceptable f inclusin in the Pwde Diffactin File. With the ealy use f egessin diagnstics pvided hee this 'hit-ate' culd be significantly impved. The pgam UnitCell, which implements the egessins (with egessin diagnstics) discussed abve, is available fee t uses fm nn-pfitmaking institutins. The executable cde (f Macintsh Windws) may be btained by annymus ftp fm ck.esc.cam.ac.uk, whee it esides in diecty pub/minp/unitcell/. Dwnlad the file README f futhe instuctins. The pgams and futhe details may be btained fm the apppiate pat f the Wld Wide Web seve at Depatment f Eath Sciences, Cambidge Univesity (http://www.esc.cam.ac.uk). Acknwledgements We wish t expess u thanks t Anne Gaeme- Babe f he patience in testing the pgam UnitCell and f he cmments and advice n impvements t it, and t Rge Pwell wh iginally intduced TJBH t the subject f egessin diagnstics. Refeences Belsley, D.A., Kuh, E. and Welsh, R.E. (18) Regessin Diagnstics: dentifying nfluential Data and Suces c~f Cllineaity. Jhn Wiley, New Yk. Bevingtn, P.R. (16) Data Reductin and E Analysis.fis the Physical Sciences. McGaw-Hill, New Yk, 336 pp. Chen, MU. (135) Pecisin lattice cnstants fm X- ay pwde phtgaphs. Review (~[ Scientific nstuments, 6, 68-74.

REGRESSN DAGNSTCS 77 Hat, M., Cenik, R.J., Paish, W. and Taya, H. (1) Lattice paamete deteminatin f pwdes using synchtn adiatin. J. Appl. Cystallg., 23, 286-1. Kelsey, C.H. (164) The calculatin f es in a least squaes estimate f unit-cell dimensins. Mineal. Mag., 33, 8-12. Pwell, R. (185) Regessin diagnstics and bust egessin in gethemmete/gebamete calibatin: the ganet-clinpyxene gethemmete evisited. J. Met. Gel., 3, 231-43. Redfen, S.A.T. and Salje, E. (187) Themdynamics f plagiclase : Tempeatue evlutin f the spntaneus stain at the i-p1 phase tansitin in anthite. Phys. Chem. Mineals, 14, 18-5. Redfen, S.A.T., Gaeme-Babe, A. and Salje, E. (188) Themdynamics f plagiclase : Spntaneus stain at the 1-P1 phase tansitin in Ca-ich plagiclase. Phys. Chem. Mineals, 16, 157-63. Rts, M. (14) Mla vlumes n the clinchleamesite binay: sme new data. Eu. J. Mineal., 6, 27-83. Smith, D.K. (18) Cmpute analysis f diffactin data. n Mden Pwde Diffactin (D.L. Bish and J.E. Pst, eds) Minealgical Sciety f Ameica, Reviews in Minealgy, 2, 183-216. Taya, H. (13) The deteminatin f unit-cell paametes fm Bagg eflectin data using a standad efeence mateials but withut a calibatin cuve. J. Appl. Cystallg., 26, 583-. Wilsn, A.J.C. (167) Statistical vaiance f line-pfile paametes. Measues f intensity, lcatin and dispesin. Acta Cystallg., 23, 888-8. [Manuscipt eceived 25 Mach 6: evised 1 July 16]