THE USE OF THE VARIOGRAM CLOUD IN GEOSTATISTICAL MODELLING
|
|
|
- Christopher Benedict Golden
- 10 years ago
- Views:
Transcription
1 ENVIRONMETRICS Environmetrics, 10, 413±437 (1999) THE USE OF THE VARIOGRAM CLOUD IN GEOSTATISTICAL MODELLING ALEXANDER PLONER* Institute of Mathematics and Applied Statistics, University of Agricultural Sciences, Gregor Mendel Straûe 33, A-1180 Wien, Austria SUMMARY This paper gives an overview of some of the possible applications of the variogram cloud in geostatistical modelling, mainly in exploratory data analysis (EDA), but also in preliminary parameter quanti cation, and model validation. Copyright # 1999 John Wiley & Sons, Ltd. KEY WORDS spatial data analysis; geostatistics; variogram cloud; exploratory data analysis 1. INTRODUCTION The variogram cloud in its traditional sense, i.e. as a scatter plot of the variogram estimates versus the distances, is well established in geostatistical practice (e.g. Isaaks and Srivastava 1989, p. 181). In our work with geochemical datasets we have found a number of generalizations of this concept, which are useful, but less well known, and which we will present in some detail in the following sections De nitions The basic tool for describing the autocorrelation structure of a spatial random process Z(x) ranging over some domain D R n is the variogram given by 2g x; ~x ˆVar Z x Z ~x Š; 1 where x, xä are locations in D; as we will generally have only one observation z i per sampling location x i, i ˆ 1;...; n, we need the intrinsical hypothesis as an additional assumption, which basically amounts to 2g x; ~x ˆE Z x Z ~x 2 Š ˆ 2g ~x x ; 2 implying constant mean of the process Z(x) and invariance to translation of its covariance structure. * Correspondence to: A. Ploner, Institute of Mathematics and Applied Statistics, University of Agricultural Sciences, Gregor Mendel Straûe 33, A-1180 Wien, Austria. CCC 1180±4009/99/040413±25$17.50 Received 11 May 1998 Copyright # 1999 John Wiley & Sons, Ltd. Accepted 10 December 1998
2 414 A. PLONER The variogram is typically estimated by a method-of-moments estimator. 2^g h ˆ 1 X Z x Z ~x 2 ; jn h j N h 3 where summation is over the index set N h ˆf x; ~x jx; ~x 2 D; ~x x ˆ hg: 4 In practice, when dealing with non-gridded data, we are hardly ever able to use (4), because there will be only few if any pairs of observations where the sampling locations di er exactly by some vector h. Therefore we have to de ne classes of distance vectors in order to get tolerably stable estimations, using the index set N h ˆf x; ~x jx; ~x 2 D; ~x x ' hg; 5 where the meaning of ' depends on the de nition of the tolerance regions; typically, the estimates 2^g l 1 h j ;...; 2^g l k h j calculated for some distances l 1 ;...; l k in direction h j are called the empirical variogram function in this direction; if the empirical variogram functions are the same for all directions h 1 ;...; h l, then an isotropic variogram model is tted, where the value of 2g(h) depends only on the distance khk; if the empirical variograms vary strongly in di erent directions, we will assume that we can nd a linear transformation A of D so that the appropriate anisotropic model 2g a can be expressed as some isotropic model 2g i applied to the transformed distance vector: 2g a h ˆ2g i kahk : If the model is not continuous in the origin, the height of the jump is called the nugget constant; if the model takes on a constant value for distances beyond a certain limit, this value is called the sill and the limit the range of the model; the existence of a sill signi es that observations are not correlated for distances larger than the range, and equivalently that the variance of the process Z(x) exists. If we assume that Z(x i ) follows a normal distribution, we can make use of the fact that the squared di erences divided by the variogram are w 2 1 -distributed: Z x i Z x j 2 w 2 1 2g x j x i : Data The examples presented in the following sections will be based on an anonymized subset of 145 observations of the large geochemical dataset that was obtained on the peninsula Kola in northern Europe, and which is described in Reimann et al. (1996): the soil content of more than 30 elements in di erent depths was measured in more than 600 locations in an e ort to assess the impact of major industrial activity in this region and to map the degradation of the particularly vulnerable arctic ecosystem. Background information and a very detailed list of references
3 APPLICATIONS OF THE VARIOGRAM CLOUD 415 are available on an excellent webpage the Norwegian Geological Survey has created at The geostatistical analysis in Reimann et al. (1996) is focused on the e cient and parsimonious modelling of the huge amount of data, with the foremost intention of obtaining correct graphical representations that can be easily interpreted. The ideas and tools presented in this paper are aimed at facilitating an intuitive approach to further in-depth analysis based on more sophisticated models. 2. DEFINITION OF THE VARIOGRAM CLOUD As mentioned above, the term `variogram cloud' usually refers to a special kind of plot (Cressie 1991, p. 41). In this article, we will call the set of all pair-wise distance vectors combined with the squared di erences of the observations z x 1 ;...; z x n a variogram cloud: f x j x i ; z x j z x i 2 ji 6ˆ j; i; j ˆ 1;...; ng: In many cases this basic de nition is too limited, therefore we will consider in the following article also variations of this set where either the distances or the squared di erences or both are suitably transformed by applying some useful function t l and/or t m : f t l x j x i ; t m z x j z x i 2 ji 6ˆ j; i; j ˆ 1;...; ng: 7 This set can then be plotted in numerous ways, depending on the focus of the analysis. Our motivation for working with this rather large dataset is mainly that we feel that we sacri ce a lot of detail visible in the set of pairs by using empirical variogram functions right from the start. 3. APPLICATIONS OF THE VARIOGRAM CLOUD 3.1. Exploratory data analysis (EDA) The standard plot. What is usually referred to as variogram cloud is just a plot of the squared di erences versus the distances, i.e. a scatterplot of the set of pairs f kx j x i k; z x j z x i 2 ji 6ˆ j; i; j ˆ 1;...; ng: This graphical representation of (7) is most useful when linked interactively to a map of the sampling locations as described in Haslett et al. (1991), so that pairs of observations that are highlighted in the cloud are labeled and connected in the map, and vice versa (1 : 1-link), although we also used 1 :n-links, where clicking on a point of the map highlights all pairs of observations including this point. These techniques worked well for the detection of the following phenomena: (i) Global outliers are measurements that are distinctly separate from the main part of the data, and which can be also spotted by classical univariate EDA, e.g. a boxplot; in a standard plot of the variogram cloud they will stand out because for every distance, the squared di erences of pairs that were formed with such an outlier will be signi cantly larger than the rest of the cloud.
4 416 A. PLONER (ii) Local outliers are hidden in the main bulk of the observed data, but di er markedly from the neighbouring values. Local outliers are more sneaky than global ones, as they will result in high squared di erences for small distances close to the origin only (thereby contributing to a high estimate of the nugget constant), but will behave normal for medium to large distances. (iii) Pockets of non-stationarity are small areas within D where one of the two assumptions of stationarity (2) appears to be incorrect: for non-stationarity in the mean, the same reasoning as for local outliers holds; non-stationarity of the dependency structure may possibly be detected, if a cluster of points exhibits a larger variability than the surrounding points, but areas where the variation is markedly smaller than usual will be lost in the crowded lower part of this plot. Another common phenomenon in geostatistical data, which may indicate the absence of a normal distribution, is a relative paucity of values, especially if the observed variable is discrete ( ppm, percentages of grade, etc.); this will result in characteristical horizontal stripes in the cloud, where there are not enough distinct values to cover the whole range of possible di erences. Examples: Figure 1 shows the variogram cloud for lead (Pb), where all pairs including observation 139 are marked with a lled circle; the impression that this is a global outlier could be con rmed by a conventional stem-and-leaf display. Figure 2 shows a variogram cloud for nickel (Ni), where pairs including observation 118 are highlighted, hinting at another global outlier; univariate EDA is inconclusive, as observation 118 is the maximum of the dataset and lies almost exactly at the usual rule-of-thumb limit for outliers (i.e. quartiles+1.5 (interquartile range)), therefore we decided to have a closer look at the neighbourhood via a linked map, as shown in Figure 3; it appears that the only remarkable di erence is between observations 118 and 136, marked with an empty circle in both subplots, whereas there is a striking discrepancy between 136 and the surrounding observations, as can be seen in Figure 4. We therefore concluded that 118 is really an acceptable value at the high end of the range, whereas the value of Ni at location 136 is questionable in this context, although this value is well within the quartiles and not remarkable if compared with observations farther away, as illustrated by Figure 5. Figure 6 shows an example of a `striped' variogram cloud for thorium (Th), which is due to only 15 distinct values for this variable The square-root-di erences cloud. This is almost the same as above, only with the squareroot of the absolute distances instead of the squared distances, i.e. we consider a plot of the set f kx j x i k; jz x j z x i j 1=2 ji 6ˆ j; i; j ˆ 1;...ng: The motivation for taking the root is to stabilize the spread of the squared di erences, and to make them less skewed; the choice of the fourth root is prompted by the assumption of normality of the underlying process, because a power transformation with exponent 0.25 will make a w 2 -distributed variable reasonably symmetric and platykurtic (Cressie 1991, p. 40). The square-root-di erences cloud is useful much in the same way as the standard cloud plot. It has the advantage of not only pulling in large values, but also of thinning out the clutter of points for small values. A comparison between the standard plot and the square-root-di erences may be informative for cross-checking potential outliers, but beyond that the square-root-di erences
5 APPLICATIONS OF THE VARIOGRAM CLOUD 417 Figure 1. Variogram cloud for variable Pb, observation 139 marked
6 418 A. PLONER Figure 2. Variogram cloud for variable Ni, observation 118 marked
7 APPLICATIONS OF THE VARIOGRAM CLOUD 419 Figure 3. Linked variogram cloud and map for variable Ni, neighbourhood of observation 118 marked
8 420 A. PLONER Figure 4. Linked variogram cloud and map for variable Ni, neighbourhood of observation 136 marked
9 APPLICATIONS OF THE VARIOGRAM CLOUD 421 Figure 5. Linked variogram cloud and map for variable Ni, two transsects through observation 136 marked
10 422 A. PLONER Figure 6. Variogram cloud for variable Th
11 APPLICATIONS OF THE VARIOGRAM CLOUD 423 cloud may help in getting a visual impression of the appropriateness of the assumption of normality. Examples: In Figure 7 we see the square-root-cloud for the variable Pb, where all pairs with observation 139 are marked; obviously the transformation of the di erences has pulled this outlying value closer in, so that it appears more palatable than in Figure 1; the di erences are distributed more evenly, so that we can now see that there seems to be a problem with small di erences: there are disproportionally many equal values all over the place, and the corresponding zero di erences are distinctly apart from the main bulk of the cloud; it appears that there are too many equal values to uphold the assumption of normality Exploratory parameter assessment (EPA) We use the term `exploratory parameter assessment' in analogy to `exploratory data analysis' as a euphemism for initial visual parameter estimation Assessing anisotropy. In the presence of anisotropy, outlier detection or model tting require the identi cation of the anisotropy parameters beforehand. In case of two-dimensional locations, these are the directions and lengths of the principal axes of the ellipsoid characterizing the linear transformation A. (Actually, the ratio between the lengths of the axes is su cient.) We have found di erent representations of the untransformed variogram cloud to be of varying interest: (i) Maybe the most obvious idea is to consider directional variogram clouds, in direct analogy to modelling anisotropy with empirical variogram functions, i.e. to consider the distances between points along a transsect through D, and to compare these sub-clouds for di erent directions; this turns out to be not very satisfying for two reasons: on the one hand, we have to de ne some kind of tolerance region around any given angle in order to get a reasonable number of observations per plot, which is basically what we wanted to avoid, and on the other hand, it is very di cult to get a visual range estimate from a variogram cloud, which would be necessary in order to nd the long and the short axis among the given directional plots. (ii) The next obvious idea is probably a conventional 2D-symbol-plot, with the coordinates of the separation vectors on the axes, and di erent levels of squared di erences coded with di erent symbols: if the underlying process Z(x) is isotropic, then the values should increase at least approximately in concentric circles centered on the origin; if the rate of increase is markedly di erent in some directions, we might deal with realizations from an anisotropic process. In practice, such a plot is not easy to read; even for a modest number of observations, and using a set of symbols designed to minimize overlap (e.g. Cleveland 1994, p. 146), the plot is very crowded; and even if there are clear axes of anisotropy, it is hard to read o their directions from this kind of plot. (iii) The shortcomings described above can be overcome by using polar coordinates for the symbol-plot: We have done so by marking the actual distances between points on the horizontal axis of the plot, and the angle on the vertical axis; a circle in the plane is then a vertical line, therefore the squared di erences should be approximately vertically constant, if Z(x) is isotropic; a horizontal line with markedly lower values corresponds to a direction
12 424 A. PLONER Figure 7. Square-root-di erences cloud for variable Pb, observation 139 marked
13 APPLICATIONS OF THE VARIOGRAM CLOUD 425 in which the squared di erences grow more slowly, which may indicate that this is the direction of the shorter axis of the anisotropy ellipsoid; it is now easy to read o the actual angle of the axis, and to verify that the direction orthogonal to a potential minor axis shows fast-growing squared di erences and is therefore the corresponding major axis. Besides, we can reduce the clutter by only considering angles in the range [0, p[, eliminating the redundant symmetry induced by considering both x i x j and x j x i that takes up half the plotting area in a simple symbol-plot. Another advantage of this kind of plot is the fact that it shows clearly beyond which distance there are only pairs of observations for a subset of all possible directions. This can lead to spurious correlations in variogram estimation due to odd border- and corner-e ects, which is why usually only pairs of observations up to half the maximum distance are included during estimation; by using a polar symbol-plot of the kind described, this limit can be easily veri ed and possibly also increased. (iv) Once the orientation of the anisotropy ellipsoid has been assessed, we can use simple projections onto the planes de ned by the axes of the ellipsoid and perpendicular to the plane, in order to verify the correctness of our initial estimate: a projection on the plane de ned by the short axis should show a pronounced central valley in the cloud of projected di erences, whereas a projection on the plane of the long axis should show a uniform wall of values forming this valley. After estimating the orientation of the ellipsoid, these plots can be used to assess the anisotropy factor by experimenting with di erent values until the transformed data appear to be su ciently isotropic. Examples: Figure 8 shows a standard symbol-plot for variable Na; even though it gives the impression that di erences are increasing slower along a transsect approximately angled at 0.75p, the situation is not very clear when compared with Figure 9, which is the same plot in polar coordinates: we can see that the angle for the short axis of the anisotropy ellipsoid is actually closer to 0.8p, and that the increase of the di erences is markedly stronger in the orthogonal direction of 0.3p; besides we can see that it will be a good idea to consider only distances up to approximately 175,000 m for tting a variogram model. Figures 10 and 11 show con rmatory projection plots for 0.3p/0.8p as angles of the anisotropy axes Assessing structure. Obviously, scaling the squared di erences by some power l of the distance, i.e. plotting ( kx j x i k; z x j z x! ) i 2 kx j x i k l ji 6ˆ j; i; j ˆ 1;...; n ; 8 will be useful when tting a power model of the form g h ˆckhk l, but it can be useful in exploring and summarizing the dependency structure even in situations where a global plot of (8) does not make sense, because the scale will be determined by only a few observations lying closely together, so that the rest of the plot is compressed into too little space to show any detail. This can be resolved by de ning some kind of cut-o limit for the scaled distances, but it is more rewarding to divide the whole range of distances into subsets, that can then be plotted and
14 426 A. PLONER Figure 8. Symbol-plot of the variogram cloud for variable Na
15 APPLICATIONS OF THE VARIOGRAM CLOUD 427 Figure 9. Symbol-plot of the variogram cloud for variable Na, polar coordinates
16 428 A. PLONER Figure 10. Projection of the variogram cloud for variable Na on a vertical plane passing through the origin at 0.3p
17 APPLICATIONS OF THE VARIOGRAM CLOUD 429 Figure 11. Projection of the variogram cloud for variable Na on a vertical plane passing through the origin at 0.8p
18 430 A. PLONER scaled individually. For a simple dependency structure, we may typically nd the following subdivisions: (i) a small set of pairs markedly closer to each other than the rest of the data, i.e. those that will compress the scale on a global plot into uselessness; these pairs will be primarily responsible for the presence and size of a nugget constant in our model, so while nding a proper power relation between squared di erences and distances via l may not be our main concern here, the identi cation of this set may be helpful in assessing the nugget e ect. (ii) the main part of the data, where the actual structure, i.e. the decrease in correlation with increase of distance, is displayed. Experimentation with di erent values may yield a l that describes this relationship adequately. (iii) the set of pairs with the largest distances (in practice usually the pairs with separation larger than half the maximum distances) may show the same pattern as the main part of the data for similar l, which generally indicates that a power model can be tted. More often, the scaled di erences will decrease with distances increasing beyond a certain limit, even for a l that produced a stable pattern with the rest of the data. A possible explanation for this is the presence of a sill, so that the squared di erences do not increase beyond the corresponding range; another possibility is a border e ect, e.g. when for rectangular domain D the observations that are separated by larger distances tend to cluster in the corners of the rectangle, thereby giving the impression of higher correlations, as mentioned in In the latter case, we are more interested in the rough range estimate we get by this subdivision than in nding a reasonable l. Examples: Figure 12 shows the standard plot of the variogram cloud for chrome (Cr); Figure 13 shows that scaling the cloud by the distances to the power of 0.4 stabilizes the spread uniformly over all distances, suggesting the t of such a model. Figure 14 shows the standard plot of the variogram cloud for lanthanium (La); overall scaling does not work here, the reason is the single pair of observations in the upper left corner of Figure 15, where we can see the scaled di erences in the neighbourhood of the origin. Figure 16 shows that the scaled distances exhibit a constant spread for the power of 0.6 over a fairly wide range; note that the di erence in the vertical scale of Figure 15 and 16! Figure 17 nally shows that a sill seems to be reached at a distance of about 140,000 m, as the scaled di erences decrease strongly for the exponent 0.6 which produced stable spread in Figure Model validation Once a model g(h) has been t to the data, the quality of the t should be judged, comparing it for di erent areas in the domain D of interest. Barry (1996) uses the assumption of normality to highlight pairs of observations for which (6) is below or above speci ed critical quantiles of the w 2 1-distribution, in order to spot places where the t of g(h) is uncomfortable. Similarly, we can plot ( kx j x i k; z x j z x! ) i 2 ji 6ˆ j; i; j ˆ 1;...; n 2g x j x i
19 APPLICATIONS OF THE VARIOGRAM CLOUD 431 Figure 12. Variogram cloud for variable Cr, unscaled
20 432 A. PLONER Figure 13. Variogram cloud for variable Cr, scaled by the distances to the power of 0.4
21 APPLICATIONS OF THE VARIOGRAM CLOUD 433 Figure 14. Variogram cloud for variable La, unscaled
22 434 A. PLONER Figure 15. Variogram cloud for variable La, scaled by the distances to the power of 0.6, small distances
23 APPLICATIONS OF THE VARIOGRAM CLOUD 435 Figure 16. Variogram cloud for variable La, scaled by the distances to the power of 0.6, medium distances
24 436 A. PLONER Figure 17. Variogram cloud for variable La, scaled by the distances to the power of 0.6, large distances
25 APPLICATIONS OF THE VARIOGRAM CLOUD 437 linked to a map of observations, as described in 3.1.1; extreme pairs of observations will stand out more clearly and individually. 4. SUMMARY We have found the generalized concept of the variogram cloud as the set of all pairwise di erences in location and observation and its graphical representations to be a promising tool in the initial stages of geostatistical modelling. As a technical note we would like to add that we were pleasantly surprised by the speed of computation and display of the plots: even on our elderly workstation, the response time was quite good for datasets up to approximately 400 observations. For larger datasets, a faster machine, a high-end monitor, and the use of colour in our routines seem to be advisable. The S functions used to create the gures in this article will be made available via StatLib. REFERENCES Barry, R. P. (1996). `A diagnostic to assess the t of a variogram model to spatial data'. Journal of Statistical Software 1. Cleveland, W. S. (1994). The Elements of Graphing Data. Summit, New York: Hobart Press. Cressie, N. A. C. (1991). Statistics for Spatial Data. New York: Wiley & Sons. Haslett, J., Bradley, R., Craig, P. S., Wills, G. and Unwin, A. R. (1991). `Dynamic graphics for exploring spatial data, with application to locating global and local anomalies'. The American Statistician 45, 234±242. Isaaks, E. H. and Srivastava, R. M. (1989). An Introduction to Applied Geostatistics. New York: Oxford University Press. Reimann, C., AÈ yraè s, M., Chekushin, V., Bogatyrev, I., Boyd, R., de Caritat, P., Dutter, R., Finne, T. E., Halleraker, J. H., Jñger, é, Kashulina, G., Niskavaara, H., Pavlov, V., RaÈ isaè nen, M. L., Strand, T., Volden, T. (1996). `A geochemical atlas of the central parts of the Barents region'. In The 6th Seminar on Hydrogeology and Environmental Geochemistry 1996, no , 46±47.
An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R)
DSC 2003 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc-2003/ An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R) Ernst
Geography 4203 / 5203. GIS Modeling. Class (Block) 9: Variogram & Kriging
Geography 4203 / 5203 GIS Modeling Class (Block) 9: Variogram & Kriging Some Updates Today class + one proposal presentation Feb 22 Proposal Presentations Feb 25 Readings discussion (Interpolation) Last
Algebra 1 Course Information
Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through
INTRODUCTION TO GEOSTATISTICS And VARIOGRAM ANALYSIS
INTRODUCTION TO GEOSTATISTICS And VARIOGRAM ANALYSIS C&PE 940, 17 October 2005 Geoff Bohling Assistant Scientist Kansas Geological Survey [email protected] 864-2093 Overheads and other resources available
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
EXPLORING SPATIAL PATTERNS IN YOUR DATA
EXPLORING SPATIAL PATTERNS IN YOUR DATA OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze
Diagrams and Graphs of Statistical Data
Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
AMARILLO BY MORNING: DATA VISUALIZATION IN GEOSTATISTICS
AMARILLO BY MORNING: DATA VISUALIZATION IN GEOSTATISTICS William V. Harper 1 and Isobel Clark 2 1 Otterbein College, United States of America 2 Alloa Business Centre, United Kingdom [email protected]
Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
Geostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
Algebra I Vocabulary Cards
Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Pre-Algebra 2008. Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems
Academic Content Standards Grade Eight Ohio Pre-Algebra 2008 STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express large numbers and small
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
How To Write A Data Analysis
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
Variables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide
Curriculum Map/Pacing Guide page 1 of 14 Quarter I start (CP & HN) 170 96 Unit 1: Number Sense and Operations 24 11 Totals Always Include 2 blocks for Review & Test Operating with Real Numbers: How are
2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
Factoring Patterns in the Gaussian Plane
Factoring Patterns in the Gaussian Plane Steve Phelps Introduction This paper describes discoveries made at the Park City Mathematics Institute, 00, as well as some proofs. Before the summer I understood
3: Summary Statistics
3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes
Data Preparation and Statistical Displays
Reservoir Modeling with GSLIB Data Preparation and Statistical Displays Data Cleaning / Quality Control Statistics as Parameters for Random Function Models Univariate Statistics Histograms and Probability
Summarizing and Displaying Categorical Data
Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency
Common Core Unit Summary Grades 6 to 8
Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations
Introduction to Modeling Spatial Processes Using Geostatistical Analyst
Introduction to Modeling Spatial Processes Using Geostatistical Analyst Konstantin Krivoruchko, Ph.D. Software Development Lead, Geostatistics [email protected] Geostatistics is a set of models and
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, 5-8 8-4, 8-7 1-6, 4-9
Glencoe correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 STANDARDS 6-8 Number and Operations (NO) Standard I. Understand numbers, ways of representing numbers, relationships among numbers,
Pennsylvania System of School Assessment
Pennsylvania System of School Assessment The Assessment Anchors, as defined by the Eligible Content, are organized into cohesive blueprints, each structured with a common labeling system that can be read
DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS
DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 [email protected] 1. Descriptive Statistics Statistics
Chapter 6: Constructing and Interpreting Graphic Displays of Behavioral Data
Chapter 6: Constructing and Interpreting Graphic Displays of Behavioral Data Chapter Focus Questions What are the benefits of graphic display and visual analysis of behavioral data? What are the fundamental
CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA
We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical
Manhattan Center for Science and Math High School Mathematics Department Curriculum
Content/Discipline Algebra 1 Semester 2: Marking Period 1 - Unit 8 Polynomials and Factoring Topic and Essential Question How do perform operations on polynomial functions How to factor different types
NEW MEXICO Grade 6 MATHEMATICS STANDARDS
PROCESS STANDARDS To help New Mexico students achieve the Content Standards enumerated below, teachers are encouraged to base instruction on the following Process Standards: Problem Solving Build new mathematical
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
Topic 9 ~ Measures of Spread
AP Statistics Topic 9 ~ Measures of Spread Activity 9 : Baseball Lineups The table to the right contains data on the ages of the two teams involved in game of the 200 National League Division Series. Is
For example, estimate the population of the United States as 3 times 10⁸ and the
CCSS: Mathematics The Number System CCSS: Grade 8 8.NS.A. Know that there are numbers that are not rational, and approximate them by rational numbers. 8.NS.A.1. Understand informally that every number
Lecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel [email protected] Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
Introduction to Exploratory Data Analysis
Introduction to Exploratory Data Analysis A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable
Session 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
Week 1. Exploratory Data Analysis
Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam
Module 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
Big Ideas in Mathematics
Big Ideas in Mathematics which are important to all mathematics learning. (Adapted from the NCTM Curriculum Focal Points, 2006) The Mathematics Big Ideas are organized using the PA Mathematics Standards
Multivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
Exploratory Data Analysis
Exploratory Data Analysis Paul Cohen ISTA 370 Spring, 2012 Paul Cohen ISTA 370 () Exploratory Data Analysis Spring, 2012 1 / 46 Outline Data, revisited The purpose of exploratory data analysis Learning
Unit 7 Quadratic Relations of the Form y = ax 2 + bx + c
Unit 7 Quadratic Relations of the Form y = ax 2 + bx + c Lesson Outline BIG PICTURE Students will: manipulate algebraic expressions, as needed to understand quadratic relations; identify characteristics
Performance Level Descriptors Grade 6 Mathematics
Performance Level Descriptors Grade 6 Mathematics Multiplying and Dividing with Fractions 6.NS.1-2 Grade 6 Math : Sub-Claim A The student solves problems involving the Major Content for grade/course with
11.1. Objectives. Component Form of a Vector. Component Form of a Vector. Component Form of a Vector. Vectors and the Geometry of Space
11 Vectors and the Geometry of Space 11.1 Vectors in the Plane Copyright Cengage Learning. All rights reserved. Copyright Cengage Learning. All rights reserved. 2 Objectives! Write the component form of
determining relationships among the explanatory variables, and
Chapter 4 Exploratory Data Analysis A first look at the data. As mentioned in Chapter 1, exploratory data analysis or EDA is a critical first step in analyzing the data from an experiment. Here are the
Data Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
Dimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
Spatial Data Analysis
14 Spatial Data Analysis OVERVIEW This chapter is the first in a set of three dealing with geographic analysis and modeling methods. The chapter begins with a review of the relevant terms, and an outlines
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501
PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 May 2008 Introduction Suppose we had measured two variables, length and width, and
Grade 6 Mathematics Performance Level Descriptors
Limited Grade 6 Mathematics Performance Level Descriptors A student performing at the Limited Level demonstrates a minimal command of Ohio s Learning Standards for Grade 6 Mathematics. A student at this
Testosterone levels as modi ers of psychometric g
Personality and Individual Di erences 28 (2000) 601±607 www.elsevier.com/locate/paid Testosterone levels as modi ers of psychometric g Helmuth Nyborg a, *, Arthur R. Jensen b a Institute of Psychology,
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
Section 2.4: Equations of Lines and Planes
Section.4: Equations of Lines and Planes An equation of three variable F (x, y, z) 0 is called an equation of a surface S if For instance, (x 1, y 1, z 1 ) S if and only if F (x 1, y 1, z 1 ) 0. x + y
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
ATTACHMENT 8: Quality Assurance Hydrogeologic Characterization of the Eastern Turlock Subbasin
ATTACHMENT 8: Quality Assurance Hydrogeologic Characterization of the Eastern Turlock Subbasin Quality assurance and quality control (QA/QC) policies and procedures will ensure that the technical services
Section 1.1. Introduction to R n
The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to
Measurement with Ratios
Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical
Lecture Notes: Basic Concepts in Option Pricing - The Black and Scholes Model
Brunel University Msc., EC5504, Financial Engineering Prof Menelaos Karanasos Lecture Notes: Basic Concepts in Option Pricing - The Black and Scholes Model Recall that the price of an option is equal to
Tennessee Mathematics Standards 2009-2010 Implementation. Grade Six Mathematics. Standard 1 Mathematical Processes
Tennessee Mathematics Standards 2009-2010 Implementation Grade Six Mathematics Standard 1 Mathematical Processes GLE 0606.1.1 Use mathematical language, symbols, and definitions while developing mathematical
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?
Getting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary
Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:
Time series Forecasting using Holt-Winters Exponential Smoothing
Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract
Common Tools for Displaying and Communicating Data for Process Improvement
Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots Describing Quantitative Data with Numbers 1.3 I can n Calculate and interpret measures of center (mean, median) in context. n Calculate
Part II Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Part II
Part II covers diagnostic evaluations of historical facility data for checking key assumptions implicit in the recommended statistical tests and for making appropriate adjustments to the data (e.g., consideration
Data Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
Chapter 3: The Multiple Linear Regression Model
Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics
1 Another method of estimation: least squares
1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i
The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median
CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box
Optical Illusions Essay Angela Wall EMAT 6690
Optical Illusions Essay Angela Wall EMAT 6690! Optical illusions are images that are visually perceived differently than how they actually appear in reality. These images can be very entertaining, but
Lecture 2. Summarizing the Sample
Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting
Geostatistical Analyst Tutorial
Copyright 1995-2012 Esri All rights reserved. Table of Contents Introduction to the ArcGIS Geostatistical Analyst Tutorial................... 0 Exercise 1: Creating a surface using default parameters...................
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
APPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE
APPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE The assessment phase of the Data Life Cycle includes verification and validation of the survey data and assessment of quality of the data. Data verification
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document
Introduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3
COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping
Mathematics Curriculum Guide Precalculus 2015-16. Page 1 of 12
Mathematics Curriculum Guide Precalculus 2015-16 Page 1 of 12 Paramount Unified School District High School Math Curriculum Guides 2015 16 In 2015 16, PUSD will continue to implement the Standards by providing
Such As Statements, Kindergarten Grade 8
Such As Statements, Kindergarten Grade 8 This document contains the such as statements that were included in the review committees final recommendations for revisions to the mathematics Texas Essential
Normalization and Mixed Degrees of Integration in Cointegrated Time Series Systems
Normalization and Mixed Degrees of Integration in Cointegrated Time Series Systems Robert J. Rossana Department of Economics, 04 F/AB, Wayne State University, Detroit MI 480 E-Mail: [email protected]
Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.
Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary
Statistics Chapter 2
Statistics Chapter 2 Frequency Tables A frequency table organizes quantitative data. partitions data into classes (intervals). shows how many data values are in each class. Test Score Number of Students
Polarization of Light
Polarization of Light References Halliday/Resnick/Walker Fundamentals of Physics, Chapter 33, 7 th ed. Wiley 005 PASCO EX997A and EX999 guide sheets (written by Ann Hanks) weight Exercises and weights
