Maximising the value of pxrf data

Maximising the value of pxrf data Michael Gazley Senior Research Scientist 13 November 2015 With contributions from: Katie Collins, Ben Hines, Louise Fisher, June Hill, Angus McFarlane, Jess Robertson & René Sterk MINERALS RESOURCES

Overview How good is pxrf data? How do you make sure your data are good? Multivariate data Issues with compositional data Principal component analysis (PCA) The Teapot Case studies 1 & 2 Cluster analysis Case studies 3 & 4 Concluding remarks 2

Fisher et al. (2014) Gazley et al. (in prep.) How good is pxrf data? Rb Sr K Zn 3

How do you make sure your data are good? 4

Goodale et al. (2014) Instrumentation 5

Gazley & Fisher (2014) Nature of the material to be analysed 6

Parsons et al. (2014) Nature of the material to be analysed 7

Gazley & Fisher (2014) Nature of the material to be analysed 8

Parsons et al. (2014) Presentation of the sample to the unit 9

Fisher et al. (2014) Calibration and reference materials 10

Gazley & Fisher (2014) Validation and presentation of data 11

Top tips for ensuring good data 1. Ensure the sample is dry. 2. Present the sample as well as you possibly can (i.e. sample cup with mylar film). Reducing the particle size usually gives the best results. 3. Ensure the standards are appropriate matrix matched and that there are enough of them. 4. Send a sub-set of samples (5%?) for laboratory analysis. 12

Reporting pxrf data (JORC or otherwise) 13

The multivariate problem Datasets in geology tend to be high-dimensional Whatever it is we do, we do it either through space or through time, or both Humans are very good at seeing patterns. But, sometimes the sheer size of a dataset is overwhelming. 14

Disclaimer I am not a statistician. I am not a mathematician. I am a geologist who has found a need for multivariate methods to help us navigate n- dimensional space. Multivariate ordinations are not new, they have been around for a long time, geologists just seem to be slow adopters of them. 15

Missing Data You cannot have missing data. You need to substitute or impute missing values. For <10% missing 66% of LOD For 10-30% missing impute missing data For >30% discard element 16

Closure and log-ratio transforms Geochemical data are typically reported as compositions They must total 100% or 1,000,000 ppm These data are closed For a composition of n-components, only n-1 components are required (Buccianti & Grunsky, 2014). Can t do statistics on closed data because you find spurious correlations The log-ratio transform of Aitchison (1982, 1986) converts data into real number space Log-ratio transformations allow us to make meaningful statements on compositional data. There are a number of log-ratio transforms that have different purposes. 17

PC2 Principal component analyses (PCA) PCA is an ordination All it does is reorient and rescale your data. Point-point relationships are preserved; PCA just makes it easier to see structure. PCA does a couple of really useful things. It quantifies how much of the variance in the dataset is summarised by each PC axis. It gives you a plot of loadings that you can use to understand which of your original variables are driving the variance in the dataset - it is human readable. PC2 18

Imagine your dataset as a teapot... What s the best way to look at a teapot so that you can best understand what shape it is? 19

Orientating the teapot 20

Other ordinations PCA is to ordinations as vanilla is to ice cream flavours It works with most things but there are plenty of other ordinations to choose from and some of those might suit you better, or be useful in combination with PCA A priori groupings? Canonical Variates Analysis (CVA) or Linear Discriminant Analysis (LDA) Both categorical and continuous data? Canonical Correspondence Analysis (CCA) and Detrended Correspondence Analysis (DCA) Variables not normally distributed? Independent Components Analysis (ICA) 21

Implementation A number of different PCAs (and other ordinations, in some cases) can be run very easily in different programs various stats software, MATLAB, iogas, PAST and R R can do PCA in a multitude of ways Base package [stats] has prcomp and princomp Also found in additional packages [FactoMineR, ade4, amap, pcapp] probably more! Also robust PCA sparse PCA, robust sparse PCA 22

Case study 1 Agnew gold mine Au associated with Ca calcic amphibole and not biotite Barnes et al. (2014); Fisher et al. (2014) 23

Gazley et al. (2014) Case study 2 - Dolerites 24

Cluster analysis What if PCA has done a good job but you ve still got too much overlap to be able to draw your own lines between groups of data? This is where cluster analysis comes in. Cluster analysis finds groups by looking at distances between points It doesn t know what your data are and it doesn t care. It is interested in point-point relationships. So yes, different clustering methods will find different groups! 29

Clustering the teapot 30

Clustering the teapot There are going to be points that could belong to more than one group How you deal with those is dependent on the methods you choose and your own judgement Cluster analysis cannot and will not solve this problem for you! 31

Gazley et al. (2015) The data analysis work flow 32

Hines et al. (2015; in prep) Case study 3 East Coast Basin, NZ Whangai/Waipawa/Wanstead Formations East Coast of North Island Homogenous, brown, boring except Waipawa Fm potential hydrocarbon source. Provenance of sediment of interest for palaeoenvironmental reasons 33

Hines et al. (2015; in prep) Case study 3 East Coast Basin, NZ pxrf dataset from six measured sections along the East Coast. 34

Hines et al. (2015; in prep) 35

Sterk et al. (in review) Case study 4 Mozambique soil samples 36

Sterk et al. (in review) Case study 4 Mozambique soil samples Data collected by analysing a Niton XL3t GOLDD pxrf unit on a nominal 40 m x 80 m grid. The pxrf unit was used in the field by digging a ~20 cm pit. Ta and Sn are not good by pxrf due to overlaps Cu/Zn and K/Ca respectively. Following anomalism being detected in this survey a 100 x 300 m grid was run with samples sent for lab analysis Both sample sets were estimated to a 100 x 100 m cells in 3DS Surpac. 37

Sterk et al. (in review) Case study 4 Mozambique soil samples PC2 PC1 40

Sterk et al. (in review) Case study 4 Mozambique soil samples PC2 PC1 42

Hill et al. (2014) Conditional probability 44

Sterk et al. (in review) Case study 4 Mozambique soil samples If Sn in the pxrf dataset is >150 ppm, in the lab dataset it is >90 ppm truly anomalous. Used Fe, Ti, Zr and Mn concentrations and a dataset of Sn concentrations that were >150 ppm (8% of the samples) to predict the probability of Sn concentration in all samples. Left out Rb, Ca and Sr in case they were mobile during weathering 45

Sterk et al. (in review) Case study 4 Mozambique soil samples Conditional Probability based on Fe, Ti, Zr and Mn Exploration targets Ignore anomaly here 46

Concluding remarks pxrf data are fit for many purposes. You can collect datasets that may contain elements you otherwise would not have paid for. But, you must stay on top of recording all of the metadata that tells you (and others) how good (or not) it really is. Multivariate methods can reveal underlying structure and provide ways to visualise big data. You can formulate hypotheses using PCA and cluster analysis which are then testable using standard statistics. pxrf technology allows for the collection of large datasets; ensure that you extract all of the value that you possibly can. 48

Questions? 49

Thank you Michael Gazley Senior Research Scientist t +61 8 6436 8501 e michael.gazley@csiro.au w www.csiro.au/ MINERAL RESOURCES