Maximising the value of pxrf data

Similar documents
Portable X-ray fluorescence Spectroscopy. Michael A. Wilson Research Soil Scientist USDA-NRCS National Soil Survey Center Lincoln, NE

How To Cluster

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Design & Analysis of Ecological Data. Landscape of Statistical Methods...

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA

Tutorial on Exploratory Data Analysis

Data Exploration Data Visualization

Data, Measurements, Features

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Introduction to Principal Components and FactorAnalysis

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Multivariate Analysis of Ecological Data

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Exploratory data analysis (Chapter 2) Fall 2011

Instrumentation. (Figure 2)

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Visualization of textual data: unfolding the Kohonen maps.

Didacticiel - Études de cas

Chalcophile and Key Element Distribution in the Eastern Goldfields: seismic traverse EGF01. Aleks Kalinowski Geoscience Australia, pmdcrc Y2 project

The Scientific Data Mining Process

Distribution of Chemical Elements In Urban Sediments in Slovenia (Extended Abstract)

Spatial sampling effect of laboratory practices in a porphyry copper deposit

Component Ordering in Independent Component Analysis Based on Data Power

X Ray Flourescence (XRF)

Principal components analysis

Structural Analysis of Network Traffic Flows Eric Kolaczyk

Environmental Remote Sensing GEOG 2021

Multivariate Analysis. Overview

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

EXPLORATORY FACTOR ANALYSIS IN MPLUS, R AND SPSS. sigbert@wiwi.hu-berlin.de

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Simple Predictive Analytics Curtis Seare

A Demonstration of Hierarchical Clustering


Machine Learning for Data Science (CS4786) Lecture 1

The ith principal component (PC) is the line that follows the eigenvector associated with the ith largest eigenvalue.

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Principal Component Analysis

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC Politecnico di Milano)

Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important

Image Database System based on Readers Kansei Character

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

EXTENSIVE GOLD IN SOILS TARGET IDENTIFIED AT MOMBUCA GOLD PROJECT, SE BRAZIL

Cluster this! June 2011

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Time series clustering and the analysis of film style

OptiDAT. - database reference document -

COC131 Data Mining - Clustering

Anomaly detection. Problem motivation. Machine Learning

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

Clustering & Visualization

Overview of Factor Analysis

Additional sources Compilation of sources:

Clustering through Decision Tree Construction in Geology

How to Get More Value from Your Survey Data

Regression Modeling Strategies

SUS. Company Profile. Ulrich Nell, Feldstr.23, D Oberhausen, Tel. 0049(0)208/ Fax 0049(0)208/658536

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Akjoujt South Project: Drilling Update and Ground Magnetic Anomalies Identified

Visualization Quick Guide

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Novel Analysis of Air Pollution Sources and Trends using openair Tools

How is Big Data Different? A Paradigm Shift

2015 Workshops for Professors

Data Analysis: Analyzing Data - Inferential Statistics

An Overview and Evaluation of Decision Tree Methodology

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

AIM: KEFI 6 March KEFI Minerals Plc. ( KEFI or the Company ) SAUDI ARABIAN EXPLORATION AND DEVELOPMENT UPDATE

Leapfrog : new software for faster and better 3D geological modelling

Data Mining and Visualization

Exploration. Exploration methods

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

What is Data mining?

The STC for Event Analysis: Scalability Issues

CIM DEFINITION STANDARDS - For Mineral Resources and Mineral Reserves

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Metadata for Big River Watershed Geologic and Geomorphic Data

Lecture 2. Summarizing the Sample

Framing Business Problems as Data Mining Problems

How To Understand Multivariate Models

PanelCheck A Tool for Monitoring of Assessor and Panel Performance

Analysis of Asbestos in Soil. Hazel Davidson Technical Marketing Manager

Using Data Mining Techniques for Analyzing Pottery Databases

MINES AND ENERGY MINISTRY OF COLOMBIA. Geological Survey of Colombia National Mineral Agency of Colombia

Universal Data Acquisition (UDA)

Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration

Detecting Network Anomalies. Anant Shah

B I N G O B I N G O. Hf Cd Na Nb Lr. I Fl Fr Mo Si. Ho Bi Ce Eu Ac. Md Co P Pa Tc. Uut Rh K N. Sb At Md H. Bh Cm H Bi Es. Mo Uus Lu P F.

High-Dimensional Data Visualization by PCA and LDA

Sustainable energy products Simulation based design for recycling

ROME RESOURCES LTD Fraser Highway Surrey, B.C. V4N 0G2

Exploratory data analysis for microarray data

DISCRIMINANT FUNCTION ANALYSIS (DA)

Introduction to SQL for Data Scientists

DATA INTERPRETATION AND STATISTICS

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

Transcription:

Maximising the value of pxrf data Michael Gazley Senior Research Scientist 13 November 2015 With contributions from: Katie Collins, Ben Hines, Louise Fisher, June Hill, Angus McFarlane, Jess Robertson & René Sterk MINERALS RESOURCES

Overview How good is pxrf data? How do you make sure your data are good? Multivariate data Issues with compositional data Principal component analysis (PCA) The Teapot Case studies 1 & 2 Cluster analysis Case studies 3 & 4 Concluding remarks 2

Fisher et al. (2014) Gazley et al. (in prep.) How good is pxrf data? Rb Sr K Zn 3

How do you make sure your data are good? 4

Goodale et al. (2014) Instrumentation 5

Gazley & Fisher (2014) Nature of the material to be analysed 6

Parsons et al. (2014) Nature of the material to be analysed 7

Gazley & Fisher (2014) Nature of the material to be analysed 8

Parsons et al. (2014) Presentation of the sample to the unit 9

Fisher et al. (2014) Calibration and reference materials 10

Gazley & Fisher (2014) Validation and presentation of data 11

Top tips for ensuring good data 1. Ensure the sample is dry. 2. Present the sample as well as you possibly can (i.e. sample cup with mylar film). Reducing the particle size usually gives the best results. 3. Ensure the standards are appropriate matrix matched and that there are enough of them. 4. Send a sub-set of samples (5%?) for laboratory analysis. 12

Reporting pxrf data (JORC or otherwise) 13

The multivariate problem Datasets in geology tend to be high-dimensional Whatever it is we do, we do it either through space or through time, or both Humans are very good at seeing patterns. But, sometimes the sheer size of a dataset is overwhelming. 14

Disclaimer I am not a statistician. I am not a mathematician. I am a geologist who has found a need for multivariate methods to help us navigate n- dimensional space. Multivariate ordinations are not new, they have been around for a long time, geologists just seem to be slow adopters of them. 15

Missing Data You cannot have missing data. You need to substitute or impute missing values. For <10% missing 66% of LOD For 10-30% missing impute missing data For >30% discard element 16

Closure and log-ratio transforms Geochemical data are typically reported as compositions They must total 100% or 1,000,000 ppm These data are closed For a composition of n-components, only n-1 components are required (Buccianti & Grunsky, 2014). Can t do statistics on closed data because you find spurious correlations The log-ratio transform of Aitchison (1982, 1986) converts data into real number space Log-ratio transformations allow us to make meaningful statements on compositional data. There are a number of log-ratio transforms that have different purposes. 17

PC2 Principal component analyses (PCA) PCA is an ordination All it does is reorient and rescale your data. Point-point relationships are preserved; PCA just makes it easier to see structure. PCA does a couple of really useful things. It quantifies how much of the variance in the dataset is summarised by each PC axis. It gives you a plot of loadings that you can use to understand which of your original variables are driving the variance in the dataset - it is human readable. PC2 18

Imagine your dataset as a teapot... What s the best way to look at a teapot so that you can best understand what shape it is? 19

Orientating the teapot 20

Other ordinations PCA is to ordinations as vanilla is to ice cream flavours It works with most things but there are plenty of other ordinations to choose from and some of those might suit you better, or be useful in combination with PCA A priori groupings? Canonical Variates Analysis (CVA) or Linear Discriminant Analysis (LDA) Both categorical and continuous data? Canonical Correspondence Analysis (CCA) and Detrended Correspondence Analysis (DCA) Variables not normally distributed? Independent Components Analysis (ICA) 21

Implementation A number of different PCAs (and other ordinations, in some cases) can be run very easily in different programs various stats software, MATLAB, iogas, PAST and R R can do PCA in a multitude of ways Base package [stats] has prcomp and princomp Also found in additional packages [FactoMineR, ade4, amap, pcapp] probably more! Also robust PCA sparse PCA, robust sparse PCA 22

Case study 1 Agnew gold mine Au associated with Ca calcic amphibole and not biotite Barnes et al. (2014); Fisher et al. (2014) 23

Gazley et al. (2014) Case study 2 - Dolerites 24

Gazley et al. (2014) Case study 2 - Dolerites 25

Gazley et al. (2014) Case study 2 - Dolerites 26

Gazley et al. (2014) Case study 2 - Dolerites 27

Gazley et al. (2014) Case study 2 - Dolerites 28

Cluster analysis What if PCA has done a good job but you ve still got too much overlap to be able to draw your own lines between groups of data? This is where cluster analysis comes in. Cluster analysis finds groups by looking at distances between points It doesn t know what your data are and it doesn t care. It is interested in point-point relationships. So yes, different clustering methods will find different groups! 29

Clustering the teapot 30

Clustering the teapot There are going to be points that could belong to more than one group How you deal with those is dependent on the methods you choose and your own judgement Cluster analysis cannot and will not solve this problem for you! 31

Gazley et al. (2015) The data analysis work flow 32

Hines et al. (2015; in prep) Case study 3 East Coast Basin, NZ Whangai/Waipawa/Wanstead Formations East Coast of North Island Homogenous, brown, boring except Waipawa Fm potential hydrocarbon source. Provenance of sediment of interest for palaeoenvironmental reasons 33

Hines et al. (2015; in prep) Case study 3 East Coast Basin, NZ pxrf dataset from six measured sections along the East Coast. 34

Hines et al. (2015; in prep) 35

Sterk et al. (in review) Case study 4 Mozambique soil samples 36

Sterk et al. (in review) Case study 4 Mozambique soil samples Data collected by analysing a Niton XL3t GOLDD pxrf unit on a nominal 40 m x 80 m grid. The pxrf unit was used in the field by digging a ~20 cm pit. Ta and Sn are not good by pxrf due to overlaps Cu/Zn and K/Ca respectively. Following anomalism being detected in this survey a 100 x 300 m grid was run with samples sent for lab analysis Both sample sets were estimated to a 100 x 100 m cells in 3DS Surpac. 37

Sterk et al. (in review) Case study 4 Mozambique soil samples 38

Sterk et al. (in review) Case study 4 Mozambique soil samples 39

Sterk et al. (in review) Case study 4 Mozambique soil samples PC2 PC1 40

Sterk et al. (in review) Case study 4 Mozambique soil samples 41

Sterk et al. (in review) Case study 4 Mozambique soil samples PC2 PC1 42

Sterk et al. (in review) Case study 4 Mozambique soil samples 43

Hill et al. (2014) Conditional probability 44

Sterk et al. (in review) Case study 4 Mozambique soil samples If Sn in the pxrf dataset is >150 ppm, in the lab dataset it is >90 ppm truly anomalous. Used Fe, Ti, Zr and Mn concentrations and a dataset of Sn concentrations that were >150 ppm (8% of the samples) to predict the probability of Sn concentration in all samples. Left out Rb, Ca and Sr in case they were mobile during weathering 45

Sterk et al. (in review) Case study 4 Mozambique soil samples Conditional Probability based on Fe, Ti, Zr and Mn Exploration targets Ignore anomaly here 46

Sterk et al. (in review) Case study 4 Mozambique soil samples 47

Concluding remarks pxrf data are fit for many purposes. You can collect datasets that may contain elements you otherwise would not have paid for. But, you must stay on top of recording all of the metadata that tells you (and others) how good (or not) it really is. Multivariate methods can reveal underlying structure and provide ways to visualise big data. You can formulate hypotheses using PCA and cluster analysis which are then testable using standard statistics. pxrf technology allows for the collection of large datasets; ensure that you extract all of the value that you possibly can. 48

Questions? 49

Thank you Michael Gazley Senior Research Scientist t +61 8 6436 8501 e michael.gazley@csiro.au w www.csiro.au/ MINERAL RESOURCES