Mantel Permutation Tests
|
|
- Susanna Hensley
- 7 years ago
- Views:
Transcription
1 PERMUTATION TESTS Mantel Permutation Tests Basic Idea: In some experiments a test of treatment effects may be of interest where the null hypothesis is that the different populations are actually from the same population. Or in other tests, the null hypothesis is one of complete randomness. Example 1: ANOVA where H 0 is that the treatment means are all equal. The assumptions that must be true are that each treatment must have the same variance and the same shape. If in fact, the null hypothesis is true, then the observations are not distinguishable by treatment but are instead from the same distribution (one shape, mean and variance) and just happen to be randomly associated with a treatment. Original dataset collected Sample ID Pop 1 Pop Mean Permuted Data Sample ID Pop 1 Pop Mean ALS5932/FOR6934 Fall Mary C. Christman
2 Permutation tests are based on this idea. If H 0 is true then any set of values are just random assignments among treatments. Method Under The Assumptions That The Distributions Are Identical Under H 0 And Sampling Is Random And With Replacement And Treatment Assignment Is Random: 1) Calculate the test statistic for the hypotheses for the original observed arrangement of data. This could be a sample correlation, an F-stat or a MS or some other statistic. Call it κ 0. 2) Now, randomly rearrange the data among the treatments (shuffle or permute the data according to the experimental design; see below for the case of matrices) and calculate the test statistic for the new * arrangement. Call it. κ p * 3) Store the permutation estimate κ p. 4) Repeat steps 2-3 many times. Call the total number of times you repeat the permutations P. That is p = 1, 2,, P. * 5) Compare κ 0 to the distribution of the permutation estimates κ p. The p- value for the test is * #( κ p > κ p ) p value =. P Example: The most famous use of permutation tests for ecological problems is Mantel s test of similarity of two symmetric matrices. Mantel s test was extended to allow more than 2 matrices by Smouse et al We ll look at the simple case (2 matrices). Mantel s test is a test of the correlation between the elements in one matrix with the elements in the other matrix where the elements within the matrices have been organized in a very specific way (symmetric with zeroes on the diagonal). Original use was to compare two distance matrices and that is still the most common use today. STA 6934 Spring Mary C. Christman
3 Matrix Y a b c b d e c e f Matrix X α β χ β δ ε χ ε φ Question: Are the element-wise pairs, (a, α), (b, β), (c, χ), (d, δ), (e, ε), (f, φ), correlated? Can we use Pearson s correlation coefficient to test that? Recall that Pearson s correlation assumes that 1) the variables are quantitative, and 2) if there is a relationship between 2 variables, that relationship is linear. Now, most of the matrices are not exactly as just shown above. More specifically, the matrices are usually distance measures where distance is some metric between the replicates involved in the study. For example, matrix Y could be the number of genes not in common between sampled animals in a study and matrix X could be the Euclidean distance between the locations at which the animals were found. The distance between a replicate and itself is 0 and the distances are symmetric in the sense that the distance between F and H is the same as the distance between H and F. So commonly we have matrices with the structure Y X animal b c 1 0 β 2 b 0 e 2 β 0 3 c e 0 3 χ ε χ ε 0 where b = # genes not in common between animals 1 and 2 and β = geographic distance between animals 1 and 2, etc. We only (b, β), (c, χ), (e, ε) need to test for correlation. STA 6934 Spring Mary C. Christman
4 Because of the use of the same individuals repeatedly in generating the distances given in the matrices, the values within each matrix are also correlated among themselves. As a consequence, the usual method for testing Pearson s correlation coefficient would involve an estimated standard error that is biased low for the true standard deviation of the estimator of correlation. This means we shouldn t use the usual large-sample test based on Normality. Use a Permutation test! Example: Copepods in Ceiling Drips in Organ Cave, West Virginia STA 6934 Spring Mary C. Christman
5 STA 6934 Spring Mary C. Christman
6 # Title="Organ Cave Ceiling Drips" Partial Code for Testing Correlation Matrixsize= 13 #Y matrix matrix of dissimilarities (1-Jaccard Index) Jaccard <- matrix(c( 0.00, 0.83, 0.80, 1.00, 0.87, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 1.00, 0.83, 0.00, 0.43, 0.43, 0.44, 1.00, 0.67, 0.86, 0.62, 1.00, 0.67, 0.55, 0.08, 0.80, 0.43, 0.00, 0.33, 0.37, 1.00, 0.60, 0.83, 0.57, 1.00, 0.43, 0.50, 0.40, 1.00, 0.43, 0.33, 0.00, 0.56, 1.00, 0.60, 0.83, 0.33, 1.00, 0.43, 0.66, 0.40, 0.87, 0.44, 0.37, 0.56, 0.00, 0.87, 0.75, 0.89, 0.70, 0.87, 0.44, 0.20, 0.62, 1.00, 1.00, 1.00, 1.00, 0.87, 0.00, 1.00, 1.00, 1.00, 1.00, 0.83, 0.90, 1.00, 1.00, 0.67, 0.60, 0.60, 0.75, 1.00, 0.00, 0.67, 0.60, 1.00, 0.67, 0.80, 0.75, 1.00, 0.86, 0.83, 0.83, 0.89, 1.00, 0.67, 0.00, 0.83, 1.00, 0.86, 0.91, 1.00, 1.00, 0.62, 0.57, 0.33, 0.70, 1.00, 0.60, 0.83, 0.00, 1.00, 0.62, 0.64, 0.67, 1.00, 1.00, 1.00, 1.00, 0.87, 1.00, 1.00, 1.00, 1.00, 0.00, 1.00, 0.00, 1.00, 1.00, 0.67, 0.43, 0.43, 0.44, 0.83, 0.67, 0.86, 0.62, 1.00, 0.00, 0.55, 0.50, 0.90, 0.55, 0.50, 0.66, 0.20, 0.90, 0.80, 0.91, 0.64, 0.00, 0.55, 0.00, 0.70, 1.00, 0.08, 0.40, 0.40, 0.62, 1.00, 0.75, 1.00, 0.67, 1.00, 0.50, 0.70, 0.00), #X1 matrix logdist=matrix(c( 0.00, 0.556, 0.607, 0.653, 0.708, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 0.556, 0.00, 0.161, 0.279, 0.398, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 0.607, 0.161, 0.00, 0.161, 0.312, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 0.653, 0.279, 0.161, 0.000, 0.204, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 0.708, 0.398, 0.312, 0.204, 0.000, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 3.097, 3.097, 3.097, 3.097, 3.097, 0.000, 1.959, 1.959, 1.959, 1.820, 1.820, 1.820, 1.820, 3.097, 3.097, 3.097, 3.097, 3.097, 1.959, 0.000, 0.886, 0.896, 1.820, 1.820, 1.820, 1.820, 3.097, 3.097, 3.097, 3.097, 3.097, 1.959, 0.886, 0.000, 0.072, 1.820, 1.820, 1.820, 1.820, 3.097, 3.097, 3.097, 3.097, 3.097, 1.959, 0.896, 0.072, 0.000, 1.820, 1.820, 1.820, 1.820, 3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 0.000, 1.390, 1.405, 1.412, 3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 1.390, 0.000, 0.270, 0.356, 3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 1.405, 0.270, 0.000, 0.149, 3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 1.412, 0.356, 0.149, 0.000), To test if matrices Y and X1 are correlated, I need to permute one of the matrices repeatedly and then test the original correlation estimate against the distribution of correlations for the permuted matrices. So, permute Y by randomly rearranging the columns and then arranging the rows to match the random rearrangement of the columns: A <- matrix(c(11,12,13,21,22,23,31,32,33), byrow=t, nrow=3) >A [,1] [,2] [,3] [1,] [2,] [3,] STA 6934 Spring Mary C. Christman
7 temp <- sample(3) > temp [1] Aperm<- A[,temp] > Aperm [,1] [,2] [,3] [1,] [2,] [3,] Aperm<- Aperm[temp,] > Aperm [,1] [,2] [,3] [1,] [2,] [3,] Aperm<-A[temp,temp] > Aperm preserves the symmetry of [,1] [,2] [,3] the matrix [1,] [2,] [3,] Then do the permutations and get the resulting set of correlations. Compare the original correlation against the permuted pairs. H 0 : the two variables are not correlated H A : the two variables are positively correlated The p-value of the one-sided test is the proportion of permutation correlations estimates > original correlation estimate # simple Mantel test of Jaccard and log(distance) ignoring system effects # observed correlation between X and Y STA 6934 Spring Mary C. Christman
8 Jvector <- as.vector(jaccard) X1vector <- as.vector(logdist) obs.corr <- cor(jvector,x1vector) numpermutes < # 13! = 6,227,020,800 possible arrangements permuted.corr <- rep(0,numpermutes) permuted.corr[1] <- obs.corr for (i in 2:numPermutes) { temp <- sample(matrixsize) permuted.jaccard <- Jaccard[temp,temp] Jvector <- as.vector(permuted.jaccard) permuted.corr[i] <- cor(jvector,x1vector) } pvalue <- sum(permuted.corr>=obs.corr)/numpermutes Frequency distribution of Pearson s r from the permutations original data correlation r = permutation p-value = permuted.corr Pearson s assumes the relationship if it exists is linear. Is that the case here? STA 6934 Spring Mary C. Christman
9 I reran the test using Spearman s correlation coefficient instead: Change cor(jvector,x1vector) to cor(rank(jvector),rank(x1vector)) and rerun the above code. > obs.corr = > pvalue = permuted.corr The best method (not shown) is to incorporate a second variable that distinguishes the two regions from each other following the method outlined in Smouse et al Sometimes called partial Mantel tests or multiple regression Mantel tests. Mantel Correlogram In order to study the structure of the Y matrix (usually the one of interest) with respect to distances in the other matrix, it is of interest to look at the correlation among values of Y for specific sets of distances in X. This is a case of looking at AUTOcorrelation among subsets of values within a matrix rather than correlation between two different variables. The correlogram is a graphic displaying the autocorrelation for those different subsets. For example, suppose I am interested in the autocorrelation among the dissimilarities of the copepods as a function of log(distance). The way to do that is to create a set of non-overlapping distance classes (called lag STA 6934 Spring Mary C. Christman
10 distances) and do the autocorrelation of observations that fall within each distance class. First, I need to create the set of lag distances: (>0 1), (1 2), and (> 2). # Lag distance matrix lagdistmatrix=matrix(c( 0, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 0, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 0, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 0, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 1, 0, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 1, 1, 0, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 2, 2, 2, 0, 2, 2, 2, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 0, 1, 1, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 0, 1, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 0), Then, for each lag distance, I need to create another matrix of 0s and 1s, where the zeroes indicate that the distance is within the lag class or 1s otherwise. Now perform Mantel s test on these two matrices. Repeat until all lag classes have been done. # For example: Lag 1 matrix lagdistmatrix1 = matrix(c( 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0), # Lag distance 2 matrix lagdistmatrix2 = matrix(c( 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, STA 6934 Spring Mary C. Christman
11 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0), # Lag distance matrix lagdistmatrix3 = matrix(c( 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0), Run Mantel s test on each lag distance matrix and Y. We obtain the following results: Lag Observed Correlation 2-sided p-value Very positive and very negative values indicate that the further away the locations from one another, the more dissimilar the species composition (as measured by 1-J). STA 6934 Spring Mary C. Christman
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationClass 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationLinear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More informationCOMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk
COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution
More informationElementary Statistics Sample Exam #3
Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationStatistical tests for SPSS
Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly
More informationQUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More informationStatistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More information1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationDirections for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
More informationParametric and Nonparametric: Demystifying the Terms
Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationNonparametric statistics and model selection
Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.
More informationCome scegliere un test statistico
Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationData Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................
More informationProjects Involving Statistics (& SPSS)
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationDATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University
DATA ANALYSIS QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University Quantitative Research What is Statistics? Statistics (as a subject) is the science
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationThe correlation coefficient
The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative
More informationMultiple regression - Matrices
Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationCourse Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationOutline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
More informationYou have data! What s next?
You have data! What s next? Data Analysis, Your Research Questions, and Proposal Writing Zoo 511 Spring 2014 Part 1:! Research Questions Part 1:! Research Questions Write down > 2 things you thought were
More informationFactors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationFactor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models
Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis
More informationFactor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationDifference tests (2): nonparametric
NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge
More informationAugust 2012 EXAMINATIONS Solution Part I
August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,
More informationMEASURES OF LOCATION AND SPREAD
Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
More informationChapter 12 Nonparametric Tests. Chapter Table of Contents
Chapter 12 Nonparametric Tests Chapter Table of Contents OVERVIEW...171 Testing for Normality...... 171 Comparing Distributions....171 ONE-SAMPLE TESTS...172 TWO-SAMPLE TESTS...172 ComparingTwoIndependentSamples...172
More informationPearson's Correlation Tests
Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationInference for two Population Means
Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example
More informationRegression Analysis (Spring, 2000)
Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More informationBiostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
More informationHomework 11. Part 1. Name: Score: / null
Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is
More informationFriedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable
Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Application: This statistic has two applications that can appear very different,
More informationEXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:
EXCEL Analysis TookPak [Statistical Analysis] 1 First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it: a. From the Tools menu, choose Add-Ins b. Make sure Analysis
More informationMinitab Tutorials for Design and Analysis of Experiments. Table of Contents
Table of Contents Introduction to Minitab...2 Example 1 One-Way ANOVA...3 Determining Sample Size in One-way ANOVA...8 Example 2 Two-factor Factorial Design...9 Example 3: Randomized Complete Block Design...14
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationMath 202-0 Quizzes Winter 2009
Quiz : Basic Probability Ten Scrabble tiles are placed in a bag Four of the tiles have the letter printed on them, and there are two tiles each with the letters B, C and D on them (a) Suppose one tile
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationCHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA
CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA Chapter 13 introduced the concept of correlation statistics and explained the use of Pearson's Correlation Coefficient when working
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationAnalysis of Variance ANOVA
Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationMultivariate normal distribution and testing for means (see MKB Ch 3)
Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................
More informationMATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!
MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics
More informationThis chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
More informationPoint Biserial Correlation Tests
Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable
More informationThe Kendall Rank Correlation Coefficient
The Kendall Rank Correlation Coefficient Hervé Abdi Overview The Kendall (955) rank correlation coefficient evaluates the degree of similarity between two sets of ranks given to a same set of objects.
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationQuestion: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?
ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the
More informationIntroduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
More informationUNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
More informationIndependent samples t-test. Dr. Tom Pierce Radford University
Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of
More informationPearson s Correlation
Pearson s Correlation Correlation the degree to which two variables are associated (co-vary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the
More informationChapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
More informationMORPHEUS. http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.
MORPHEUS http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix. Reference: MORPHEUS, a Webtool for Transcripton Factor Binding Analysis Using
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationIntroduction to Analysis of Variance (ANOVA) Limitations of the t-test
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
More informationStatistics 2014 Scoring Guidelines
AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
More informationCrosstabulation & Chi Square
Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among
More informationSAS/STAT. 9.2 User s Guide. Introduction to. Nonparametric Analysis. (Book Excerpt) SAS Documentation
SAS/STAT Introduction to 9.2 User s Guide Nonparametric Analysis (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation
More informationABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationSPSS Explore procedure
SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationAnalyzing Research Data Using Excel
Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial
More informationUNIVERSITY OF NAIROBI
UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More information