STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239



Similar documents
Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Data Mining: Algorithms and Applications Matrix Math Review

Multiple regression - Matrices

1.5 Oneway Analysis of Variance

Least Squares Estimation

Introduction to Matrix Algebra

DATA ANALYSIS II. Matrix Algorithms

1 Introduction to Matrices

One-Way Analysis of Variance

Manifold Learning Examples PCA, LLE and ISOMAP

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

A Primer on Index Notation

Nonlinear Iterative Partial Least Squares Method

Forecasting in STATA: Tools and Tricks

CS 147: Computer Systems Performance Analysis

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Notes on Factoring. MA 206 Kurt Bryan

How To Understand And Solve A Linear Programming Problem

MEASURES OF VARIATION

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

The Assignment Problem and the Hungarian Method

NCSS Statistical Software

Multiple Linear Regression in Data Mining

Time series clustering and the analysis of film style

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

8. Linear least-squares

Poisson Models for Count Data

Exercise 1.12 (Pg )

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Statistical Functions in Excel

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

2. Simple Linear Regression

COMPARISON MEASURES OF CENTRAL TENDENCY & VARIABILITY EXERCISE 8/5/2013. MEASURE OF CENTRAL TENDENCY: MODE (Mo) MEASURE OF CENTRAL TENDENCY: MODE (Mo)

Lecture 2 Matrix Operations

5. Linear Regression

6.02 Fall 2012 Lecture #5

Gaussian Conjugate Prior Cheat Sheet

CBE 6333, R. Levicky 1. Tensor Notation.

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Summation Algebra. x i

Systems of Linear Equations

3.2 Roulette and Markov Chains

Notes on Applied Linear Regression

Multivariate Analysis of Variance (MANOVA): I. Theory

STATISTICA Formula Guide: Logistic Regression. Table of Contents

How To Understand And Solve Algebraic Equations

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

Confidence Intervals for the Difference Between Two Means

SAS Software to Fit the Generalized Linear Model

Orthogonal Diagonalization of Symmetric Matrices

Factor Analysis. Chapter 420. Introduction

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

The Characteristic Polynomial

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

State of Stress at Point

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

SGL: Stata graph library for network analysis

1 VECTOR SPACES AND SUBSPACES

Orthogonal Projections

Nonlinear Programming Methods.S2 Quadratic Programming

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Understanding and Applying Kalman Filtering

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

The Basics of FEA Procedure

Simple Regression Theory II 2010 Samuel L. Baker

5. Multiple regression

Notes on Determinant

Row Echelon Form and Reduced Row Echelon Form

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

9.2 Summation Notation

Introduction to General and Generalized Linear Models

Chapter 7. One-way ANOVA

CHAPTER 13. Experimental Design and Analysis of Variance

Reduced echelon form: Add the following conditions to conditions 1, 2, and 3 above:

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

5. Orthogonal matrices

Chapter 6: Multivariate Cointegration Analysis

Data Analysis Tools. Tools for Summarizing Data

3: Summary Statistics

Least-Squares Intersection of Lines

Factoring Algorithms

Scheduling Resources and Costs

Introduction to Matrix Algebra

Typical Linear Equation Set and Corresponding Matrices

Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Factor Analysis. Sample StatFolio: factor analysis.sgp

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

CURVE FITTING LEAST SQUARES APPROXIMATION

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Probability Distribution for Discrete Random Variables

Standardization and Its Effects on K-Means Clustering Algorithm

Applications to Data Smoothing and Image Processing I

NCSS Statistical Software

Transcription:

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. by John C. Davis Clarificationof zonationprocedure described onpp. 38-39 Because the notation used in this section (Eqs. 4.8 through 4.84) is inconsistent with that used elsewhere in the book, it may be confusing. This confusion can be resolved by some redefinition of the equations. Let x ij represent the standardized value of log trace j measured at depth i, where there are n depths and m log traces. Each log trace is standardized by subtracting the mean of the trace from every value and dividing by the standard deviation of the trace. This transforms the log traces so that each trace has a mean of zero and variance of one. The vector x i consists of the standardized values for all log traces at depth i. For the purposes of demonstrating the clustering process, we will regard x i as a column vector and use inner product notation (x x) to represent sums of squares, even though x i is actually a row in an n m matrix. Using this convention and keeping in mind that the log traces are standardized, the total sum of squares is given by SS T = n i x i x i = m (n 1) (4.8) Because the log traces are standardized, the total sum of squares is simply the number of log traces times the number of degrees of freedom for the variance of each trace, which is the number of depth intervals minus one. Consequently, we do not have to calculate the sum of the cross-products to find SS T. The zonation process consists of iteratively collecting adjacent depths into zones composed of intervals that have similar log characteristics. The measure of similarity (or rather, of dissimilarity) is the squared distance, E. (A greater squared distance between two adjacent zones indicates a greater dissimilarity between them. If two zones are identical, the squared distance between them will be zero.) We will refer to zones by the subscript k, and the number of points within a zone as n k. Because on the first iteration each zone includes only a single depth (that is, n k = 1 for all k), the squared distance between every observation at depth i and its neighbor at depth i + 1 is given by E k = E i = 1 (x i x i+1 ) (x i x i+1 ) (4.81) After the first iteration, some of the zones will include more than a single depth; these zones are henceforth treated as a single object defined by their vector mean, X k = 1 n k nk i x i (4.8) again, treating these as column vectors. We also must calculate a measure of the variation within a zone in the form of a within-zone sum of squares, W k. W k = n k ( ) ( ) xi X i k xi X k (4.83) Note that the summation extends over the set of n k depths included in zone k. If we sum this measure over all g zones, we will obtain the total sum of squares within zones, SS W. SS W = g W k k = g nk ( ) ( ) xi X k k xi X k (4.84) 38 i

Analysis of Sequences of Data Zonation Procedures 1 Variable 1 Variable Variable 3 Variable 4 3 4 6 7 1 1 3 3 4 1 1 3 4 6 8 3 3 34 36 38 1 1 1 1 Figure 1. Hypothetical well log with four traces. As iteration proceeds and depths become grouped into zones, the dissimilarity between adjacent zones is determined as the squared distance between the zone means weighted by the number of depths within each zone. That is, the squared distance between zone k and zone k + 1is E k = ( Xk X k+1 ) ( Xk X k+1 ) (1/n k + 1/n k+1 ) This is the amount by which the merger of zones k and k + 1 will increase the total sum of squares within zones. As noted previously, on the initial pass each zone consists of a single depth, so n k = 1 for all k and the squared distances E k are given by Equation (4.81). A general summary of the steps in the zonation algorithm, after standardizing the log traces, is (1) Calculate the distances between the vectors of every interval (or zone) and the underlying interval (or zone). () Combine the intervals whose squared distance is smallest to form a zone. (3) Replace the vectors of intervals in a zone with their mean vector. (4) Return to (1). Repeat until all intervals are combined into a single zone or some stopping criterion is satisfied. Operation of the zonation algorithm can be demonstrated using the hypothetical well log shown in Figure 1, which consists of four well log variables measured at successive depths; the data are listed in Table 1. The means and standard 39

Table 1. Hypothetical example of a well log consisting of four traces (variables) measured at equally spaced depths. var. 1 var. var. 3 var. 4 depth 1 1 3 34 1 1 18 31 1 1 4 3 3 16 1 6 4 18 1 6 3 17 11 6 3 1 31 7 4 8 9 33 8 3 8 33 9 6 34 33 1 6 3 34 11 6 3 7 33 1 4 3 3 34 13 4 4 13 33 14 4 3 1 31 1 3 16 1 16 1 6 36 17 3 1 7 36 18 3 14 3 34 19 3 1 34 deviations of the four traces are var. 1 var. var. 3 var. 4 mean 3.3 1.1 14.1 31.8 std. dev. 1.31 8.4161 7.73 3.433 which can be used to standardize the readings listed in Table 1, giving the transformed values in Table. The first step is to calculate the total variance in the sequence, using Equation (4.8) as modified above. For depth 1, the multiplication is x 1 x 1 = 1.346 1.813 1.17.668 1.346 1.813 1.17 =.73.668 The twenty values for the log are given below; the sum of these values is SS T = 76., which is also the product 4 19 = 76. 1.63.1981 7.347 4.336 6.34 1.411.666.787 3.9471 3.33.44.4943 3.9318 6.893.44.131 4.1944.73.366.338 4

Analysis of Sequences of Data Zonation Procedures Table. Hypothetical well log with variables standardized by subtracting column means and dividing by column standard deviations. var. 1 var. var. 3 var. 4 depth 1.347 1.813 1.17.668 1.8816.748.1.34 1.347 1.813 1.37.8889 3.8816.66.773 1.61 4.8816.3683.414 1.61.86.487.493 1.933 6.86.169.773.399 7.44.8199.6734.346 8 1.77 1.7.84.346 9 1.736 1.38 1.16.346 1 1.736 1.616 1.16.668 11 1.736 1.91.937.346 1.44 1.616 1.466.668 13.44.3446.14.346 14.44.8.773.399 1.86.66.773 1.933 16.8816 1.813 1.713 1.171 17.86 1.813 1.733 1.171 18.86.8436 1.17.668 19.86.748 1.431.668 During this first iteration, the distance between every log measurement and the immediately underlying measurement is calculated. For example, we first find the vector difference between the measurements at depth 1 and depth : ( x 1 x ) =.8816.748.14.399 1.347 1.813 1.17.668 =.631.36.66.8467 The distance between the log readings at these two depths is found by postmultiplying the vector of differences by its transpose, as indicated in modified Equation (4.81). That is, E 1 = 1.631.36.66.8467 = 1.76 =.831.631.36.66.8467 After repeating this calculation for every depth interval in the log we obtain the 19 distances listed below. 41

depth interval distance 1 and.831 and 3 1.78 3 and 4 4.87 4 and.631 and 6.689 6 and 7 1.6189 7 and 8.71 8 and 9. 9 and 1.447 1 and 11.469 11 and 1.138 1 and 13 1.98 13 and 14 1.767 14 and 1.171 1 and 16 1.998 16 and 17 6.83 17 and 18. 18 and 19.37 19 and.18 The smallest squared distance (greatest similarity) is between depths 19 and, the last two values in the sequence. Since these two are most similar, we will combine them to form a new zone defined by the vector mean of the two original intervals. The new, combined zone is ( x 19 + x ) =.86.8436 1.17.668 =.86.784 1.19.668 +.86.748 1.431.668 Finally, we calculate a measure of variation within the new zone using modified Equation (4.83), subtracting the zone mean, X k, from x 19 and from x, postmultiplying each difference vector by its transpose, and summing the products. The difference vectors are: x 19 X k =.86.8436 1.17.668.86.784 1.19.668 =..94.66. 4 x X k =.86.748 1.431.668.86.784 1.19.668 =..94.66.

Analysis of Sequences of Data Zonation Procedures (Note that the deviations from the zone mean are symmetrical only when a zone consists of just two intervals.) For combined depths 19 and, W k is W k = +..94.66...94.66..79 +.79 =..94.66...94.66 =..18 SS W is found simply by summing all the W k for all zones that contain more than one original depth. During the initial iteration, there is only one such zone, so SS W = W k =.18. In each successive iteration, one depth interval or zone is combined with an adjacent depth interval or zone, so the number of zones decreases by one with each iteration. Table 3 shows the successsive combination of zones during the 18 iterations needed to combine all of the depth intervals into two zones. As an Table 3. Order of combination of depth intervals into zones in successive iterations of the zonation algorithm. Iteration 1 3 4 6 7 8 9 1 11 1 13 14 1 16 17 18 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 3 1 1 4 4 4 4 4 4 4 3 1 1 6 6 6 4 4 3 1 1 7 7 7 6 6 6 6 6 4 3 3 3 1 1 8 8 8 7 7 7 7 7 6 6 4 4 4 3 3 3 1 9 9 9 8 8 8 8 7 6 6 4 4 4 3 3 3 1 1 1 1 9 9 9 9 8 7 7 6 4 3 3 3 1 11 11 1 9 9 9 9 8 7 7 6 4 3 3 3 1 1 1 11 1 9 9 9 8 7 7 6 4 3 3 3 1 13 13 1 11 1 1 1 9 8 8 7 6 4 3 3 3 1 14 14 13 1 11 11 11 1 9 9 8 7 6 4 4 3 1 1 1 14 13 1 11 11 1 9 9 8 7 6 4 4 3 1 16 16 1 14 13 1 1 11 1 1 9 8 7 6 4 3 1 17 17 16 1 14 13 13 1 11 11 1 9 8 7 6 4 3 18 18 17 16 1 14 13 1 11 11 1 9 8 7 6 4 3 19 19 18 17 16 1 14 13 1 11 1 9 8 7 6 4 3 19 18 17 16 1 14 13 1 11 1 9 8 7 6 4 3 43

example, Figure shows the log traces after 1 iterations, where individual values within zones have been replaced by the mean values of the zones. Variable 1 1 3 4 6 Variable Variable 3 Variable 4 71 1 3 3 4 1 1 3 4 6 8 3 3 34 36 38 1 1 1 1 Figure. Well log from Figure 1 after replacement of values within each zone with the mean values for the zone. Log is shown after 1 iterations when log has been grouped into five zones. 44