Maximising the value of pxrf data

Size: px
Start display at page:

Download "Maximising the value of pxrf data"

Transcription

1 Maximising the value of pxrf data Michael Gazley Senior Research Scientist 13 November 2015 With contributions from: Katie Collins, Ben Hines, Louise Fisher, June Hill, Angus McFarlane, Jess Robertson & René Sterk MINERALS RESOURCES

2 Overview How good is pxrf data? How do you make sure your data are good? Multivariate data Issues with compositional data Principal component analysis (PCA) The Teapot Case studies 1 & 2 Cluster analysis Case studies 3 & 4 Concluding remarks 2

3 Fisher et al. (2014) Gazley et al. (in prep.) How good is pxrf data? Rb Sr K Zn 3

4 How do you make sure your data are good? 4

5 Goodale et al. (2014) Instrumentation 5

6 Gazley & Fisher (2014) Nature of the material to be analysed 6

7 Parsons et al. (2014) Nature of the material to be analysed 7

8 Gazley & Fisher (2014) Nature of the material to be analysed 8

9 Parsons et al. (2014) Presentation of the sample to the unit 9

10 Fisher et al. (2014) Calibration and reference materials 10

11 Gazley & Fisher (2014) Validation and presentation of data 11

12 Top tips for ensuring good data 1. Ensure the sample is dry. 2. Present the sample as well as you possibly can (i.e. sample cup with mylar film). Reducing the particle size usually gives the best results. 3. Ensure the standards are appropriate matrix matched and that there are enough of them. 4. Send a sub-set of samples (5%?) for laboratory analysis. 12

13 Reporting pxrf data (JORC or otherwise) 13

14 The multivariate problem Datasets in geology tend to be high-dimensional Whatever it is we do, we do it either through space or through time, or both Humans are very good at seeing patterns. But, sometimes the sheer size of a dataset is overwhelming. 14

15 Disclaimer I am not a statistician. I am not a mathematician. I am a geologist who has found a need for multivariate methods to help us navigate n- dimensional space. Multivariate ordinations are not new, they have been around for a long time, geologists just seem to be slow adopters of them. 15

16 Missing Data You cannot have missing data. You need to substitute or impute missing values. For <10% missing 66% of LOD For 10-30% missing impute missing data For >30% discard element 16

17 Closure and log-ratio transforms Geochemical data are typically reported as compositions They must total 100% or 1,000,000 ppm These data are closed For a composition of n-components, only n-1 components are required (Buccianti & Grunsky, 2014). Can t do statistics on closed data because you find spurious correlations The log-ratio transform of Aitchison (1982, 1986) converts data into real number space Log-ratio transformations allow us to make meaningful statements on compositional data. There are a number of log-ratio transforms that have different purposes. 17

18 PC2 Principal component analyses (PCA) PCA is an ordination All it does is reorient and rescale your data. Point-point relationships are preserved; PCA just makes it easier to see structure. PCA does a couple of really useful things. It quantifies how much of the variance in the dataset is summarised by each PC axis. It gives you a plot of loadings that you can use to understand which of your original variables are driving the variance in the dataset - it is human readable. PC2 18

19 Imagine your dataset as a teapot... What s the best way to look at a teapot so that you can best understand what shape it is? 19

20 Orientating the teapot 20

21 Other ordinations PCA is to ordinations as vanilla is to ice cream flavours It works with most things but there are plenty of other ordinations to choose from and some of those might suit you better, or be useful in combination with PCA A priori groupings? Canonical Variates Analysis (CVA) or Linear Discriminant Analysis (LDA) Both categorical and continuous data? Canonical Correspondence Analysis (CCA) and Detrended Correspondence Analysis (DCA) Variables not normally distributed? Independent Components Analysis (ICA) 21

22 Implementation A number of different PCAs (and other ordinations, in some cases) can be run very easily in different programs various stats software, MATLAB, iogas, PAST and R R can do PCA in a multitude of ways Base package [stats] has prcomp and princomp Also found in additional packages [FactoMineR, ade4, amap, pcapp] probably more! Also robust PCA sparse PCA, robust sparse PCA 22

23 Case study 1 Agnew gold mine Au associated with Ca calcic amphibole and not biotite Barnes et al. (2014); Fisher et al. (2014) 23

24 Gazley et al. (2014) Case study 2 - Dolerites 24

25 Gazley et al. (2014) Case study 2 - Dolerites 25

26 Gazley et al. (2014) Case study 2 - Dolerites 26

27 Gazley et al. (2014) Case study 2 - Dolerites 27

28 Gazley et al. (2014) Case study 2 - Dolerites 28

29 Cluster analysis What if PCA has done a good job but you ve still got too much overlap to be able to draw your own lines between groups of data? This is where cluster analysis comes in. Cluster analysis finds groups by looking at distances between points It doesn t know what your data are and it doesn t care. It is interested in point-point relationships. So yes, different clustering methods will find different groups! 29

30 Clustering the teapot 30

31 Clustering the teapot There are going to be points that could belong to more than one group How you deal with those is dependent on the methods you choose and your own judgement Cluster analysis cannot and will not solve this problem for you! 31

32 Gazley et al. (2015) The data analysis work flow 32

33 Hines et al. (2015; in prep) Case study 3 East Coast Basin, NZ Whangai/Waipawa/Wanstead Formations East Coast of North Island Homogenous, brown, boring except Waipawa Fm potential hydrocarbon source. Provenance of sediment of interest for palaeoenvironmental reasons 33

34 Hines et al. (2015; in prep) Case study 3 East Coast Basin, NZ pxrf dataset from six measured sections along the East Coast. 34

35 Hines et al. (2015; in prep) 35

36 Sterk et al. (in review) Case study 4 Mozambique soil samples 36

37 Sterk et al. (in review) Case study 4 Mozambique soil samples Data collected by analysing a Niton XL3t GOLDD pxrf unit on a nominal 40 m x 80 m grid. The pxrf unit was used in the field by digging a ~20 cm pit. Ta and Sn are not good by pxrf due to overlaps Cu/Zn and K/Ca respectively. Following anomalism being detected in this survey a 100 x 300 m grid was run with samples sent for lab analysis Both sample sets were estimated to a 100 x 100 m cells in 3DS Surpac. 37

38 Sterk et al. (in review) Case study 4 Mozambique soil samples 38

39 Sterk et al. (in review) Case study 4 Mozambique soil samples 39

40 Sterk et al. (in review) Case study 4 Mozambique soil samples PC2 PC1 40

41 Sterk et al. (in review) Case study 4 Mozambique soil samples 41

42 Sterk et al. (in review) Case study 4 Mozambique soil samples PC2 PC1 42

43 Sterk et al. (in review) Case study 4 Mozambique soil samples 43

44 Hill et al. (2014) Conditional probability 44

45 Sterk et al. (in review) Case study 4 Mozambique soil samples If Sn in the pxrf dataset is >150 ppm, in the lab dataset it is >90 ppm truly anomalous. Used Fe, Ti, Zr and Mn concentrations and a dataset of Sn concentrations that were >150 ppm (8% of the samples) to predict the probability of Sn concentration in all samples. Left out Rb, Ca and Sr in case they were mobile during weathering 45

46 Sterk et al. (in review) Case study 4 Mozambique soil samples Conditional Probability based on Fe, Ti, Zr and Mn Exploration targets Ignore anomaly here 46

47 Sterk et al. (in review) Case study 4 Mozambique soil samples 47

48 Concluding remarks pxrf data are fit for many purposes. You can collect datasets that may contain elements you otherwise would not have paid for. But, you must stay on top of recording all of the metadata that tells you (and others) how good (or not) it really is. Multivariate methods can reveal underlying structure and provide ways to visualise big data. You can formulate hypotheses using PCA and cluster analysis which are then testable using standard statistics. pxrf technology allows for the collection of large datasets; ensure that you extract all of the value that you possibly can. 48

49 Questions? 49

50 Thank you Michael Gazley Senior Research Scientist t e michael.gazley@csiro.au w MINERAL RESOURCES

Portable X-ray fluorescence Spectroscopy. Michael A. Wilson Research Soil Scientist USDA-NRCS National Soil Survey Center Lincoln, NE

Portable X-ray fluorescence Spectroscopy. Michael A. Wilson Research Soil Scientist USDA-NRCS National Soil Survey Center Lincoln, NE Portable X-ray fluorescence Spectroscopy Michael A. Wilson Research Soil Scientist USDA-NRCS National Soil Survey Center Lincoln, NE OBJECTIVES Background of the method Features of the instrument Applications

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3 COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Design & Analysis of Ecological Data. Landscape of Statistical Methods...

Design & Analysis of Ecological Data. Landscape of Statistical Methods... Design & Analysis of Ecological Data Landscape of Statistical Methods: Part 3 Topics: 1. Multivariate statistics 2. Finding groups - cluster analysis 3. Testing/describing group differences 4. Unconstratined

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary

More information

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501 PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 May 2008 Introduction Suppose we had measured two variables, length and width, and

More information

Tutorial on Exploratory Data Analysis

Tutorial on Exploratory Data Analysis Tutorial on Exploratory Data Analysis Julie Josse, François Husson, Sébastien Lê julie.josse at agrocampus-ouest.fr francois.husson at agrocampus-ouest.fr Applied Mathematics Department, Agrocampus Ouest

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that

More information

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d. EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Instrumentation. (Figure 2)

Instrumentation. (Figure 2) X-Ray Fluorescence Lab Report Nydia Esparza Victoria Rangel Physics of XRF XRF is a non destructive analytical technique that is used for elemental and chemical analysis. X-Ray Fluorescence Spectroscopy

More information

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Visualization of textual data: unfolding the Kohonen maps.

Visualization of textual data: unfolding the Kohonen maps. Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: ludovic.lebart@enst.fr) Ludovic Lebart Abstract. The Kohonen self organizing

More information

Didacticiel - Études de cas

Didacticiel - Études de cas 1 Topic Linear Discriminant Analysis Data Mining Tools Comparison (Tanagra, R, SAS and SPSS). Linear discriminant analysis is a popular method in domains of statistics, machine learning and pattern recognition.

More information

Chalcophile and Key Element Distribution in the Eastern Goldfields: seismic traverse EGF01. Aleks Kalinowski Geoscience Australia, pmdcrc Y2 project

Chalcophile and Key Element Distribution in the Eastern Goldfields: seismic traverse EGF01. Aleks Kalinowski Geoscience Australia, pmdcrc Y2 project pmd CR C Chalcophile and Key Element Distribution in the Eastern Goldfields: seismic traverse EGF01 predictive mineral discovery Aleks Kalinowski Geoscience Australia, pmdcrc Y2 project Aleks.Kalinowski@ga.gov.au

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Distribution of Chemical Elements In Urban Sediments in Slovenia (Extended Abstract)

Distribution of Chemical Elements In Urban Sediments in Slovenia (Extended Abstract) Robert SAJN and Simon PIRC Distribution of Chemical Elements In Urban Sediments in Slovenia (Extended Abstract) The goal of the study work was to assess the distribution of chemical elements in anthropogenic

More information

Spatial sampling effect of laboratory practices in a porphyry copper deposit

Spatial sampling effect of laboratory practices in a porphyry copper deposit Spatial sampling effect of laboratory practices in a porphyry copper deposit Serge Antoine Séguret Centre of Geosciences and Geoengineering/ Geostatistics, MINES ParisTech, Fontainebleau, France ABSTRACT

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

X Ray Flourescence (XRF)

X Ray Flourescence (XRF) X Ray Flourescence (XRF) Aspiring Geologist XRF Technique XRF is a rapid, relatively non destructive process that produces chemical analysis of rocks, minerals, sediments, fluids, and soils It s purpose

More information

Principal components analysis

Principal components analysis CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k

More information

Structural Analysis of Network Traffic Flows Eric Kolaczyk

Structural Analysis of Network Traffic Flows Eric Kolaczyk Structural Analysis of Network Traffic Flows Eric Kolaczyk Anukool Lakhina, Dina Papagiannaki, Mark Crovella, Christophe Diot, and Nina Taft Traditional Network Traffic Analysis Focus on Short stationary

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Multivariate Analysis. Overview

Multivariate Analysis. Overview Multivariate Analysis Overview Introduction Multivariate thinking Body of thought processes that illuminate the interrelatedness between and within sets of variables. The essence of multivariate thinking

More information

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.

More information

EXPLORATORY FACTOR ANALYSIS IN MPLUS, R AND SPSS. sigbert@wiwi.hu-berlin.de

EXPLORATORY FACTOR ANALYSIS IN MPLUS, R AND SPSS. sigbert@wiwi.hu-berlin.de EXPLORATORY FACTOR ANALYSIS IN MPLUS, R AND SPSS Sigbert Klinke 1,2 Andrija Mihoci 1,3 and Wolfgang Härdle 1,3 1 School of Business and Economics, Humboldt-Universität zu Berlin, Germany 2 Department of

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

A Demonstration of Hierarchical Clustering

A Demonstration of Hierarchical Clustering Recitation Supplement: Hierarchical Clustering and Principal Component Analysis in SAS November 18, 2002 The Methods In addition to K-means clustering, SAS provides several other types of unsupervised

More information

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

More information

Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning for Data Science (CS4786) Lecture 1 Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:

More information

The ith principal component (PC) is the line that follows the eigenvector associated with the ith largest eigenvalue.

The ith principal component (PC) is the line that follows the eigenvector associated with the ith largest eigenvalue. More Principal Components Summary Principal Components (PCs) are associated with the eigenvectors of either the covariance or correlation matrix of the data. The ith principal component (PC) is the line

More information

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

More information

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important

Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important Floyd Ray Martin, FSA, MAAA Thomas A. McInteer, FSA, MAAA Jonathan P. Polon, FSA Dental Insurance Fraud Detection

More information

Image Database System based on Readers Kansei Character

Image Database System based on Readers Kansei Character Image Database System based on Readers Kansei Character Yamanaka, Toshimasa / Institute of Art and Design, University of Tsukuba Uchiyama, Toshiaki / Institute of Art and Design, University of Tsukuba

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

EXTENSIVE GOLD IN SOILS TARGET IDENTIFIED AT MOMBUCA GOLD PROJECT, SE BRAZIL

EXTENSIVE GOLD IN SOILS TARGET IDENTIFIED AT MOMBUCA GOLD PROJECT, SE BRAZIL 9 July 2015 EXTENSIVE GOLD IN SOILS TARGET IDENTIFIED AT MOMBUCA GOLD PROJECT, SE BRAZIL Open ended target zone up to 1.5km long identified Key Points Prospectivity of Centaurus recently secured Mombuca

More information

Cluster this! June 2011

Cluster this! June 2011 Cluster this! June 2011 Agenda On the agenda today: SAS Enterprise Miner (some of the pros and cons of using) How multivariate statistics can be applied to a business problem using clustering Some cool

More information

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting

More information

Time series clustering and the analysis of film style

Time series clustering and the analysis of film style Time series clustering and the analysis of film style Nick Redfern Introduction Time series clustering provides a simple solution to the problem of searching a database containing time series data such

More information

OptiDAT. - database reference document -

OptiDAT. - database reference document - ` OptiDAT - database reference document - OB_TC_R018 rev. 005 document number: 10224 June 29 th, 2006 Public version OPTIMAT BLADES TC Rogier Nijssen OPTIMAT BLADES Page 2 of 16 Change record Issue/revision

More information

COC131 Data Mining - Clustering

COC131 Data Mining - Clustering COC131 Data Mining - Clustering Martin D. Sykora m.d.sykora@lboro.ac.uk Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window

More information

Anomaly detection. Problem motivation. Machine Learning

Anomaly detection. Problem motivation. Machine Learning Anomaly detection Problem motivation Machine Learning Anomaly detection example Aircraft engine features: = heat generated = vibration intensity Dataset: New engine: (vibration) (heat) Density estimation

More information

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows: Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Overview of Factor Analysis

Overview of Factor Analysis Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Clustering through Decision Tree Construction in Geology

Clustering through Decision Tree Construction in Geology Nonlinear Analysis: Modelling and Control, 2001, v. 6, No. 2, 29-41 Clustering through Decision Tree Construction in Geology Received: 22.10.2001 Accepted: 31.10.2001 A. Juozapavičius, V. Rapševičius Faculty

More information

How to Get More Value from Your Survey Data

How to Get More Value from Your Survey Data Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

SUS. Company Profile. Ulrich Nell, Feldstr.23, D - 46149 Oberhausen, Tel. 0049(0)208/658535 Fax 0049(0)208/658536

SUS. Company Profile. Ulrich Nell, Feldstr.23, D - 46149 Oberhausen, Tel. 0049(0)208/658535 Fax 0049(0)208/658536 SUS Ulrich Nell, Feldstr.23, D - 46149 Oberhausen, Tel. 0049(0208/658535 Fax 0049(0208/658536 Company Profile SUS was founded in 1986 in Oberhausen in the Ruhr area (close to Düsseldorf, in order to meet

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Akjoujt South Project: Drilling Update and Ground Magnetic Anomalies Identified

Akjoujt South Project: Drilling Update and Ground Magnetic Anomalies Identified ANNOUNCEMENT TO THE AUSTRALIAN SECURITIES EXCHANGE Akjoujt South Project: Drilling Update and Ground Magnetic Anomalies Identified The Board of OreCorp Limited (OreCorp or the Company) is pleased to provide

More information

Visualization Quick Guide

Visualization Quick Guide Visualization Quick Guide A best practice guide to help you find the right visualization for your data WHAT IS DOMO? Domo is a new form of business intelligence (BI) unlike anything before an executive

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Novel Analysis of Air Pollution Sources and Trends using openair Tools

Novel Analysis of Air Pollution Sources and Trends using openair Tools Novel Analysis of Air Pollution Sources and Trends using openair Tools David Carslaw 8 th October 2015 2 Briefly What is openair and why was it developed? What can openair do? Some examples of recent developments

More information

How is Big Data Different? A Paradigm Shift

How is Big Data Different? A Paradigm Shift How is Big Data Different? A Paradigm Shift Jennifer Clarke, Ph.D. Associate Professor Department of Statistics Department of Food Science and Technology University of Nebraska Lincoln ASA Snake River

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

Data Analysis: Analyzing Data - Inferential Statistics

Data Analysis: Analyzing Data - Inferential Statistics WHAT IT IS Return to Table of ontents WHEN TO USE IT Inferential statistics deal with drawing conclusions and, in some cases, making predictions about the properties of a population based on information

More information

An Overview and Evaluation of Decision Tree Methodology

An Overview and Evaluation of Decision Tree Methodology An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com

More information

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).

More information

AIM: KEFI 6 March 2014. KEFI Minerals Plc. ( KEFI or the Company ) SAUDI ARABIAN EXPLORATION AND DEVELOPMENT UPDATE

AIM: KEFI 6 March 2014. KEFI Minerals Plc. ( KEFI or the Company ) SAUDI ARABIAN EXPLORATION AND DEVELOPMENT UPDATE KEFI Minerals Plc Doğu Akdeniz Mineralleri San. Tic. Ltd. Şti. Cemal Gürsel Cad. Yalı Apt. No:304 K:4 D:9 Karşıyaka İZMİR Tel: +90 232 381 9431 Fax: +90 232 381 9071 Email: info@kefi-minerals.com AIM:

More information

Leapfrog : new software for faster and better 3D geological modelling

Leapfrog : new software for faster and better 3D geological modelling Leapfrog : new software for faster and better 3D geological modelling Paul Hodkiewicz, Principal Consultant (Geology), SRK Consulting, 10 Richardson Street, West Perth WA 6005, Australia, phodkiewicz@srk.com.au

More information

Data Mining and Visualization

Data Mining and Visualization Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research

More information

Exploration. Exploration methods

Exploration. Exploration methods Exploration Activities related to establishing a mineral deposit through geological, geophysical and geochemical methods. It is preceded by Prospecting and followed by Planning & Development. Geological

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

What is Data mining?

What is Data mining? STAT : DATA MIIG Javier Cabrera Fall Business Question Answer Business Question What is Data mining? Find Data Data Processing Extract Information Data Analysis Internal Databases Data Warehouses Internet

More information

The STC for Event Analysis: Scalability Issues

The STC for Event Analysis: Scalability Issues The STC for Event Analysis: Scalability Issues Georg Fuchs Gennady Andrienko http://geoanalytics.net Events Something [significant] happened somewhere, sometime Analysis goal and domain dependent, e.g.

More information

CIM DEFINITION STANDARDS - For Mineral Resources and Mineral Reserves

CIM DEFINITION STANDARDS - For Mineral Resources and Mineral Reserves CIM DEFINITION STANDARDS - For Mineral Resources and Mineral Reserves Prepared by the CIM Standing Committee on Reserve Definitions Adopted by CIM Council on November 27, 2010 FOREWORD CIM Council, on

More information

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,

More information

Metadata for Big River Watershed Geologic and Geomorphic Data

Metadata for Big River Watershed Geologic and Geomorphic Data Metadata for Big River Watershed Geologic and Geomorphic Data Metadata are descriptions and information regarding compiled data. This appendix contains the metadata that describes the compiled data used

More information

Lecture 2. Summarizing the Sample

Lecture 2. Summarizing the Sample Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

More information

Framing Business Problems as Data Mining Problems

Framing Business Problems as Data Mining Problems Framing Business Problems as Data Mining Problems Asoka Diggs Data Scientist, Intel IT January 21, 2016 Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS

More information

How To Understand Multivariate Models

How To Understand Multivariate Models Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

More information

PanelCheck A Tool for Monitoring of Assessor and Panel Performance

PanelCheck A Tool for Monitoring of Assessor and Panel Performance OUTLINE PanelCheck A Tool for Monitoring of Assessor and Panel Performance Tucker- and Manhattan plots One-way ANOVA for panelist performance Per Bruun Brockhoff DTU Compute Denmark perbb@dtu.dk Oliver

More information

Analysis of Asbestos in Soil. Hazel Davidson Technical Marketing Manager

Analysis of Asbestos in Soil. Hazel Davidson Technical Marketing Manager Analysis of Asbestos in Soil Hazel Davidson Technical Marketing Manager Diversity of asbestos materials Methods of analysis Problems and issues The way forward Types of asbestos: Chrysotile (white), Amosite

More information

Using Data Mining Techniques for Analyzing Pottery Databases

Using Data Mining Techniques for Analyzing Pottery Databases BAR-ILAN UNIVERSITY Using Data Mining Techniques for Analyzing Pottery Databases Zachi Zweig Submitted in partial fulfillment of the requirements for the Master s degree in the Martin (Szusz) Department

More information

MINES AND ENERGY MINISTRY OF COLOMBIA. Geological Survey of Colombia National Mineral Agency of Colombia

MINES AND ENERGY MINISTRY OF COLOMBIA. Geological Survey of Colombia National Mineral Agency of Colombia MINES AND ENERGY MINISTRY OF COLOMBIA Geological Survey of Colombia National Mineral Agency of Colombia STRATEGIC MINING AREAS AN OPPORTUNITY TO INVEST IN COLOMBIA Toronto, March 2013 AGENDA 1. GEOLOGICAL

More information

Universal Data Acquisition (UDA)

Universal Data Acquisition (UDA) Universal Data Acquisition (UDA) I C P - O P T I C A L E M I S S I O N P R O D U C T N O T E Introduction Historically, Inductively Coupled Plasma (ICP) spectroscopy has been used for multiple analyte

More information

Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration

Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases Marco Aiello On behalf of MAGIC-5 collaboration Index Motivations of morphological analysis Segmentation of

More information

Detecting Network Anomalies. Anant Shah

Detecting Network Anomalies. Anant Shah Detecting Network Anomalies using Traffic Modeling Anant Shah Anomaly Detection Anomalies are deviations from established behavior In most cases anomalies are indications of problems The science of extracting

More information

B I N G O B I N G O. Hf Cd Na Nb Lr. I Fl Fr Mo Si. Ho Bi Ce Eu Ac. Md Co P Pa Tc. Uut Rh K N. Sb At Md H. Bh Cm H Bi Es. Mo Uus Lu P F.

B I N G O B I N G O. Hf Cd Na Nb Lr. I Fl Fr Mo Si. Ho Bi Ce Eu Ac. Md Co P Pa Tc. Uut Rh K N. Sb At Md H. Bh Cm H Bi Es. Mo Uus Lu P F. Hf Cd Na Nb Lr Ho Bi Ce u Ac I Fl Fr Mo i Md Co P Pa Tc Uut Rh K N Dy Cl N Am b At Md H Y Bh Cm H Bi s Mo Uus Lu P F Cu Ar Ag Mg K Thomas Jefferson National Accelerator Facility - Office of cience ducation

More information

High-Dimensional Data Visualization by PCA and LDA

High-Dimensional Data Visualization by PCA and LDA High-Dimensional Data Visualization by PCA and LDA Chaur-Chin Chen Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan Abbie Hsu Institute of Information Systems & Applications,

More information

Sustainable energy products Simulation based design for recycling

Sustainable energy products Simulation based design for recycling Sustainable energy products Simulation based design for recycling Markus A. Reuter (Prof. Dr. Dr. hc) Director: Technology Management, Outotec Oyj Aalto University (Finland), Central South University (China),

More information

ROME RESOURCES LTD. 205 16055 Fraser Highway Surrey, B.C. V4N 0G2

ROME RESOURCES LTD. 205 16055 Fraser Highway Surrey, B.C. V4N 0G2 Page 1 of 6 ROME RESOURCES LTD. 205 16055 Fraser Highway Surrey, B.C. V4N 0G2 FAX# (604) 507-2187 TSX-VEN Symbol RMR; Frankfurt: 33R WEB: www.romeresources.com CH Valoren No 699 171 Email: info@romeresources.com

More information

Exploratory data analysis for microarray data

Exploratory data analysis for microarray data Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization

More information

DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

More information

Introduction to SQL for Data Scientists

Introduction to SQL for Data Scientists Introduction to SQL for Data Scientists Ben O. Smith College of Business Administration University of Nebraska at Omaha Learning Objectives By the end of this document you will learn: 1. How to perform

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial

More information