# Canonical Correlation

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present a brief introduction to the subject here, you will probably need a text that covers the subject in depth such as Tabachnick (1989). Suppose you have given a group of students two tests of ten questions each and wish to determine the overall correlation between these two tests. Canonical correlation finds a weighted average of the questions from the first test and correlates this with a weighted average of the questions from the second test. The weights are constructed to maximize the correlation between these two averages. This correlation is called the first canonical correlation coefficient. You can create another set of weighted averages unrelated to the first and calculate their correlation. This correlation is the second canonical correlation coefficient. This process continues until the number of canonical correlations equals the number of variables in the smallest group. Discriminant analysis, MANOVA, and multiple regression are all special cases of canonical correlation. It provides the most general multivariate framework. Because of this generality, it is probably the least used of the multivariate procedures. Researchers would rather use the specific procedure designed for their data. However, there are instances when canonical correlation techniques are useful. Variates and Variables Canonical correlation terminology makes an important distinction between the words variables and variates. The term variables is reserved for referring to the original variables being analyzed. The term variates is used to refer to variables that are constructed as weighted averages of the original variables. Thus a set of Y variates is constructed from the original Y variables. Likewise, a set of X variates is constructed from the original X variables. Basic Issues Some of the issues that must be dealt with during a canonical correlation analysis are: 1. Determining the number of canonical variate pairs to use. The number of pairs possible is equal to the smaller of the number of variables in each set. 2. The canonical variates themselves often need to be interpreted. As in factor analysis, you are dealing with mathematically constructed variates that are usually difficult to interpret. However, in this case, you must relate two constructed variates to each other. 3. The importance of each variate must be evaluated from two points of view. You have to determine the strength of the relationship between the variate and the variables from which it was created. You also need to study the strength of the relationship between the corresponding X and Y variates. 4. Do you have a large enough sample size? In social science work you will often need a minimum of ten cases per variable. In fields with more reliable data, you can get by with a little less

2 Checklist Tabachnick (1989) provides the following checklist for conducting a canonical correlation analysis. We suggest that you consider these issues and guidelines carefully. Missing Data You should begin by screening your data for outliers. Pay particular attention to patterns of missing values. The program ignores rows with missing values. If it appears that most of the missing values occur in one or two variables, you might want to leave these out of the analysis in order to obtain more data on the remaining variables. Multivariate Normality and Outliers Canonical correlation analysis does not make strong normality assumptions. However, as with all least squares procedures, outliers can cause severe problems. You should screen your data carefully for outliers using the various univariate normality tests and plots. Linearity Canonical correlation analysis assumes linear relations among the variables. You should study scatter plots of each pair of variables, watching carefully for curvilinear patterns and for outliers. The occurrence of curvilinear relationship will reduce the effectiveness of the analysis. Multicollinearity and Singularity Multicollinearity occurs when one variable is almost a weighted average of the others. Singularity occurs when this relationship is exact. Since inverse matrices are needed during the analysis, you must check for this. Try running a principal components analysis on each set of variables, separately. If you have eigenvalues at or near zero, you have multicollinearity problems. You must omit the offending variables. Technical Details As the name suggests, canonical correlation analysis is based on the correlations between two sets of variables which we call Y and X. The correlation matrix of all the variables is divided into four parts: 1. R xx. The correlations among the X variables. 2. R yy. The correlations among the Y variables. 3. R xy. The correlations between the X and Y variables. 4. R yx. The correlations between the Y and X variables. Canonical correlation analysis may be defined using the singular value decomposition of a matrix C where: -1-1 C = R yy R yx Rxx R xy Define the singular value decomposition of C as: C = U ΛB 400-2

3 The diagonal matrix Λ of the singular values of C is made up of the eigenvalues of C. The i th eigenvalue λ of the matrix C is equal to the square of the i th 2 canonical correlation which is called r ci. Hence, the i th canonical correlation is the square root of the i th eigenvalue of C. Two sets of canonical coefficients (like regression coefficients) are used for each canonical correlation: one for the X variables and another for the Y variables. These coefficients are defined as follows: -1/ 2 B y = R yy Λ B -1 B x = Rxx Rxy B y The canonical scores for X and Y (denoted X and Y ) are calculated by multiplying the standardized data (subtract the mean and divide by the standard deviation) by these coefficient matrices. Thus we have: i and X = Z x Bx where Z x and Y = Z y B y Z y represent the standardized versions of X and Y. To aid in the interpretation of the canonical variates, loading matrices are computed. These are the correlations between the original variables and the constructed variates. They are computed as follows: The average squared loadings are given by A x = Rxx B x A y = R yy B y pv = 100 xc k x i=1 a k 2 ixc x The redundancy indices are given by: pv = 100 yc k y i=1 a k 2 rd = (pv)( r c ) 2 iyc y Data Structure The data are entered in the standard columnar format in which each column represents a single variable. Missing Values Rows with missing values in any of the variables used in the analysis are ignored

4 Procedure Options This section describes the options available in this procedure. Variables Tab This panel specifies the variables used in the analysis. Data Variables Y Variables Specify the first set of one or more variables to be correlated with the second set of variables. Although we call these the Y variables, they are not dependent variables. Canonical correlation does not assume a dependent versus independent relationship between the two sets of variables. Rather, it analyzes their association. The results would be the same if the X and Y variables were reversed. X Variables Specify the second set of one or more variables to be correlated with the first set of variables. Partial Variables An optional set of variables whose influence on the X and Y variables is removed using partial correlation techniques. The linear influence of these variables is removed from the X and Y variables using a statistical adjustment mechanism called partial correlation. This operation involves running a multiple regression using each of the X and Y variables as the dependent variable and the partial variables as the independent variables. The residuals from each of these multiple regressions are used to calculate a partial correlation matrix. Partial correlation analysis has some serious limitations. First, partial correlation techniques only remove linear (straight-line) patterns. Curvilinear patterns are ignored. Second, like all algorithms based on least squares, the results may be severely distorted by the data outliers. Labels Y Variate Label This is a label that will be associated with the Y variates (constructed as weighted averages from the Y variables). X Variate Label This is a label that will be associated with the X variates (constructed as weighted averages from the X variables). Options Zero Exponent This is the exponent of the value used as zero by the least squares algorithm. To remove the effects of rounding error, values lower than this value are reset to zero. If unexpected results are obtained, try using a smaller value, such as 1E-16. Note that 1E-5 is an abbreviation for the number You enter the exponent only. For example, if you wanted to use 1E-16, you enter 16 here

5 Reports Tab The following options control which reports and plots are displayed. Select Reports Descriptive Statistics... Scores Reports Specify whether to display the indicated reports. Report Options Number of Correlations This option specifies the number of canonical correlations that are reported on. One of the major attractions to canonical correlation analysis is the reduction in variable count, so this value is usually set to two or three. You would approach the selection of this number in much the same way as selecting the number of factors in factor analysis. Precision Specify the precision of numbers in the report. Single precision will display seven-place accuracy, while double precision will display thirteen-place accuracy. Variable Names This option lets you select whether to display variable names, variable labels, or both. Scores Plot Tab These options control the attributes of the scores plots. Select Plots Scores Plot Specify whether to display the scores plot. Click the plot format button to change the plot settings. Plot Options Plot Size This option controls the size of the plots that are displayed. You can select small, medium, or large. Medium and large are displayed one per line, while small are displayed two per line. Storage Tab The constructed variates may be stored on the current database for further analysis. This group of options lets you designate which variates (if any) should be stored and which variables should receive these variates. The data are automatically stored while the program is executing. Note that existing data is replaced. Be careful that you do not specify variables that contain important data

6 Data Storage Variables Y Variates Store the values of the Y Variates in these variables. X Variates Store the values of the X Variates in these variables. Example 1 Analysis This section presents an example of how to run a canonical correlation analysis using data contained on the Tests dataset. As an example, we will correlate variables Test1, Test2, and Test3 with variables Test4, Test5, and IQ. You may follow along here by making the appropriate entries or load the completed template Example 1 by clicking on Open Example Template from the File menu of the window. 1 Open the Tests dataset. From the File menu of the NCSS Data window, select Open Example Data. Click on the file Tests.NCSS. Click Open. 2 Open the window. Using the Analysis menu or the Procedure Navigator, find and select the procedure. On the menus, select File, then New Template. This will fill the procedure with the default template. 3 Specify the variables. On the window, select the Variables tab. Double-click in the Y Variables text box. This will bring up the variable selection window. Select Test4, Test5, IQ from the list of variables and then click Ok. Test4-IQ will appear in the Y Variables box. Double-click in the X Variables text box. This will bring up the variable selection window. Select Test1, Test2, Test3 from the list of variables and then click Ok. Test1-Test3 will appear in the X Variables box. 4 Specify the reports. Select the Reports tab. Enter 3 in the Number of Correlations box. Check all reports and plots. Normally you would only view a few of these reports, but we are selecting them all so that we can document them. 5 Run the procedure. From the Run menu, select Run Procedure. Alternatively, just click the green Run button

7 Descriptive Statistics Section Descriptive Statistics Section Standard Non-Missing Type Variable Mean Deviation Rows Y Test Y Test Y IQ X Test X Test X Test This report displays the descriptive statistics for each variable. You should check that the mean is reasonable and that the number of nonmissing rows is accurate. Correlation Section Correlation Section Test4 Test5 IQ Test1 Test2 Test3 Test Test IQ Test Test Test This report presents the simple correlations among all variables specified. s Section s Section Variate Canonical Num Den Prob Wilks' Number Correlation R-Squared F-Value DF DF Level Lambda F-value tests whether this canonical correlation and those following are zero. This report presents the canonical correlations plus supporting material to aid in their interpretation. Variate Number This is the sequence number of the canonical correlation. Remember that the first correlation will be the largest, the second will be the next to largest, and so on. The value of the canonical correlation coefficient. This coefficient has the same properties as any other correlation: it ranges between minus one and one, a value near zero indicates low correlation, and an absolute value near one indicates near perfect correlation. R-Squared The square of the canonical correlation coefficient. This gives the R-squared value of fitting the Y canonical variate to the corresponding X canonical variate

8 F-Value The value of the F approximation for testing the significance of the Wilks lambda corresponding to this row and those below it. In this example, the first F-Value tests the significance of the first, second, and third canonical correlations while the second F-value tests the significance of only the second and third. Num DF The numerator degrees of freedom of the above F-ratio. Den DF The denominator degrees of freedom of the above F-ratio. Prob Level This is the probability value for the above F statistic. A value near zero indicates a significant canonical correlation. A cutoff value of 0.05 or 0.01 is often used to determine significance. Wilks Lambda The Wilks lambda value for the canonical correlation on this report row. Wilks lambda is the multivariate generalization of R-Squared. The Wilks lambda statistic is interpreted just the opposite of R-Squared: a value near zero indicates high correlation while a value near one indicates low correlation. Variance Explained Section Variation Explained Section Canonical Variation Explained Individual Cumulative Canonical Variate in these by these Percent Percent Correlation Number Variables Variates Explained Explained Squared 1 Y Y Y Y Y Y Y X Y X Y X X Y X Y X Y X X X X X X This report displays the percent of the variation in each set of variables explained by other sets of variables. Canonical Variate Number This is the sequence number of the canonical variable being reported on. Remember that the maximum number of variates is the minimum of the number of variables in each set. Variation in these Variables Each row of the report presents the results of how well a set of variables is explained by a particular canonical variate. This column designates which set of variables is being reported on. Explained by these Variates Each row of the report presents the results of how well a set of variables is explained by a particular canonical variate. This column designates which set of canonical variates is being reported on

9 Individual Percent Explained This column indicates the percentage of the variation in the designated set of variables that is explained by this canonical variate. Cumulative Percent Explained This column indicates the cumulative percentage of the variation in the designated set of variables that is explained by this canonical variate and those listed above it. Squared The square of the canonical correlation coefficient. This is repeated from an earlier report. Standardized Canonical Coefficients Section Standardized Y Canonical Coefficients Section Y1 Y2 Y3 Test Test IQ Standardized X Canonical Coefficients Section X1 X2 X3 Test Test Test These coefficients are used to estimate the standardized scores for the X and Y variates. They aid the interpretation of the variates by showing the weight given each variable in the construction of the variate. They are analogous to standardized beta coefficients in multiple regression. Variable - Variate Correlations Section Variable - Variate Correlations Section Y1 Y2 Y3 X1 X2 Test Test IQ Test Test Test X3 Test Test IQ Test Test Test This report shows the correlations between the variables and the variates. By determining which variables are highly correlated with a particular variate, it is hoped that you can determine its interpretation. For example, you can see that variate Y1 is highly correlated with Test4. Hence, we assume that Y1 has the same interpretation as Test

10 Scores Section Scores Section Row Y1 Y2 Y3 X1 X2 X This report provides the canonical scores of each set of variates for each row of non-missing data. These are the values that are plotted in the score plots shown next. Scores Plots (seven more plots are displayed) These reports show the relationship between each pair of canonical variates. The correlation coefficient of the data in the first plot (Y1 versus X1) is the first canonical correlation coefficient

### Multivariate Analysis of Variance (MANOVA)

Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Factor Analysis. Chapter 420. Introduction

Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

### Regression Clustering

Chapter 449 Introduction This algorithm provides for clustering in the multiple regression setting in which you have a dependent variable Y and one or more independent variables, the X s. The algorithm

### Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

### All Possible Regressions

Chapter 312 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

### NCSS Statistical Software. K-Means Clustering

Chapter 446 Introduction The k-means algorithm was developed by J.A. Hartigan and M.A. Wong of Yale University as a partitioning technique. It is most useful for forming a small number of clusters from

### Lin s Concordance Correlation Coefficient

NSS Statistical Software NSS.com hapter 30 Lin s oncordance orrelation oefficient Introduction This procedure calculates Lin s concordance correlation coefficient ( ) from a set of bivariate data. The

### Correlation Matrix. Chapter 401. Introduction. Discussion

Chapter 401 Introduction This program calculates Pearsonian and Spearman-rank correlation matrices. It allows missing values to be deleted in a pair-wise or row-wise fashion. When someone speaks of a correlation

### Ridge Regression. Chapter 335. Introduction. Multicollinearity. Effects of Multicollinearity. Sources of Multicollinearity

Chapter 335 Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances

### Point-Biserial and Biserial Correlations

Chapter 302 Point-Biserial and Biserial Correlations Introduction This procedure calculates estimates, confidence intervals, and hypothesis tests for both the point-biserial and the biserial correlations.

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### Multivariate analysis of variance

21 Multivariate analysis of variance In previous chapters, we explored the use of analysis of variance to compare groups on a single dependent variable. In many research situations, however, we are interested

### Scatter Plots with Error Bars

Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

### UNDERSTANDING MULTIPLE REGRESSION

UNDERSTANDING Multiple regression analysis (MRA) is any of several related statistical methods for evaluating the effects of more than one independent (or predictor) variable on a dependent (or outcome)

### NCSS Statistical Software

Chapter 142 Introduction The pie chart is constructed by dividing a circle into two or more sections or slices. The chart is used to show the proportion that each part is of the whole. Hence, it should

### Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

### NCSS Statistical Software. X-bar and s Charts

Chapter 243 Introduction This procedure generates X-bar and s (standard deviation) control charts for variables. The format of the control charts is fully customizable. The data for the subgroups can be

### Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Aaron French, Marcelo Macedo, John Poulsen, Tyler Waterson and Angela Yu Keywords: MANCOVA, special cases, assumptions, further reading, computations Introduction

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

### 3D Scatter Plots. Chapter 170. Introduction

Chapter 170 Introduction The 3D scatter plot displays trivariate points plotted in an X-Y-Z grid. It is particularly useful for investigating the relationships among these variables. The influence of a

### PASS Sample Size Software. Linear Regression

Chapter 855 Introduction Linear regression is a commonly used procedure in statistical analysis. One of the main objectives in linear regression analysis is to test hypotheses about the slope (sometimes

### Example: Multivariate Analysis of Variance

1 of 36 Example: Multivariate Analysis of Variance Multivariate analyses of variance (MANOVA) differs from univariate analyses of variance (ANOVA) in the number of dependent variables utilized. The major

### Lecture - 32 Regression Modelling Using SPSS

Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

### Factorial Analysis of Variance

Chapter 560 Factorial Analysis of Variance Introduction A common task in research is to compare the average response across levels of one or more factor variables. Examples of factor variables are income

### Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

### Discriminant Function Analysis in SPSS To do DFA in SPSS, start from Classify in the Analyze menu (because we re trying to classify participants into

Discriminant Function Analysis in SPSS To do DFA in SPSS, start from Classify in the Analyze menu (because we re trying to classify participants into different groups). In this case we re looking at a

### This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

### Chapter 14: Analyzing Relationships Between Variables

Chapter Outlines for: Frey, L., Botan, C., & Kreps, G. (1999). Investigating communication: An introduction to research methods. (2nd ed.) Boston: Allyn & Bacon. Chapter 14: Analyzing Relationships Between

### 7 Time series analysis

7 Time series analysis In Chapters 16, 17, 33 36 in Zuur, Ieno and Smith (2007), various time series techniques are discussed. Applying these methods in Brodgar is straightforward, and most choices are

### Practice 3 SPSS. Partially based on Notes from the University of Reading:

Practice 3 SPSS Partially based on Notes from the University of Reading: http://www.reading.ac.uk Simple Linear Regression A simple linear regression model is fitted when you want to investigate whether

### SPSS: Descriptive and Inferential Statistics. For Windows

For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 Chi-Square Test... 10 2.2 T tests... 11 2.3 Correlation...

### Repeated Measures Analysis of Variance

Chapter 214 Repeated Measures Analysis of Variance Introduction This procedure performs an analysis of variance on repeated measures (within-subject) designs using the general linear models approach. The

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### 6 Variables: PD MF MA K IAH SBS

options pageno=min nodate formdlim='-'; title 'Canonical Correlation, Journal of Interpersonal Violence, 10: 354-366.'; data SunitaPatel; infile 'C:\Users\Vati\Documents\StatData\Sunita.dat'; input Group

### Extended control charts

Extended control charts The control chart types listed below are recommended as alternative and additional tools to the Shewhart control charts. When compared with classical charts, they have some advantages

### Data Matching Optimal and Greedy

Chapter 13 Data Matching Optimal and Greedy Introduction This procedure is used to create treatment-control matches based on propensity scores and/or observed covariate variables. Both optimal and greedy

### One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

### Simple Linear Regression, Scatterplots, and Bivariate Correlation

1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

### ID X Y

Dale Berger SPSS Step-by-Step Regression Introduction: MRC01 This step-by-step example shows how to enter data into SPSS and conduct a simple regression analysis to develop an equation to predict from.

### Linear Dependence Tests

Linear Dependence Tests The book omits a few key tests for checking the linear dependence of vectors. These short notes discuss these tests, as well as the reasoning behind them. Our first test checks

### Point Biserial Correlation Tests

Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

### Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales. Olli-Pekka Kauppila Rilana Riikkinen

Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales Olli-Pekka Kauppila Rilana Riikkinen Learning Objectives 1. Develop the ability to assess a quality of measurement instruments

### PASS Sample Size Software

Chapter 250 Introduction The Chi-square test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Pearson's Correlation Tests

Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation

### SPSS Explore procedure

SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Hierarchical Clustering / Dendrograms

Chapter 445 Hierarchical Clustering / Dendrograms Introduction The agglomerative hierarchical clustering algorithms available in this program module build a cluster hierarchy that is commonly displayed

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values

Chapter 500 Introduction This procedure produces tables of frequency counts and percentages for categorical and continuous variables. This procedure serves as a summary reporting tool and is often used

### NCSS Statistical Software. Deming Regression

Chapter 303 Introduction Deming regression is a technique for fitting a straight line to two-dimensional data where both variables, X and Y, are measured with error. This is different from simple linear

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Comparables Sales Price

Chapter 486 Comparables Sales Price Introduction Appraisers often estimate the market value (current sales price) of a subject property from a group of comparable properties that have recently sold. Since

### 1. The standardised parameters are given below. Remember to use the population rather than sample standard deviation.

Kapitel 5 5.1. 1. The standardised parameters are given below. Remember to use the population rather than sample standard deviation. The graph of cross-validated error versus component number is presented

### Further Maths Matrix Summary

Further Maths Matrix Summary A matrix is a rectangular array of numbers arranged in rows and columns. The numbers in a matrix are called the elements of the matrix. The order of a matrix is the number

### 2.3. Finding polynomial functions. An Introduction:

2.3. Finding polynomial functions. An Introduction: As is usually the case when learning a new concept in mathematics, the new concept is the reverse of the previous one. Remember how you first learned

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### INSTALLING EXCEL S REGRESSION UTILITY

Page 1 INSTALLING EXCEL S REGRESSION UTILITY Your computer and Excel do not come pre-installed with the Excel regression utility. You must perform the following steps with each computer to enable using

### Randomized Block Analysis of Variance

Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of

### Advanced High School Statistics. Preliminary Edition

Chapter 2 Summarizing Data After collecting data, the next stage in the investigative process is to summarize the data. Graphical displays allow us to visualize and better understand the important features

### Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association

### Probability Calculator

Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that

### January 26, 2009 The Faculty Center for Teaching and Learning

THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

### Two-Sample T-Test from Means and SD s

Chapter 07 Two-Sample T-Test from Means and SD s Introduction This procedure computes the two-sample t-test and several other two-sample tests directly from the mean, standard deviation, and sample size.

### Chapter 5. Regression

Topics covered in this chapter: Chapter 5. Regression Adding a Regression Line to a Scatterplot Regression Lines and Influential Observations Finding the Least Squares Regression Model Adding a Regression

### Multiple Regression Analysis in Minitab 1

Multiple Regression Analysis in Minitab 1 Suppose we are interested in how the exercise and body mass index affect the blood pressure. A random sample of 10 males 50 years of age is selected and their

### Distribution (Weibull) Fitting

Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### Confidence Intervals for Cp

Chapter 296 Confidence Intervals for Cp Introduction This routine calculates the sample size needed to obtain a specified width of a Cp confidence interval at a stated confidence level. Cp is a process

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

### Eviews Tutorial. File New Workfile. Start observation End observation Annual

APS 425 Professor G. William Schwert Advanced Managerial Data Analysis CS3-110L, 585-275-2470 Fax: 585-461-5475 email: schwert@schwert.ssb.rochester.edu Eviews Tutorial 1. Creating a Workfile: First you

### Introduction to Regression. Dr. Tom Pierce Radford University

Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.

### 1. Microsoft Excel Basics:

1. Microsoft Excel Basics: This section introduces data manipulation using Microsoft Excel, including importing, copying and pasting data and entering equations. A basic understanding of computer operating

### Multiple regression - Matrices

Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,

### Minitab Guide. This packet contains: A Friendly Guide to Minitab. Minitab Step-By-Step

Minitab Guide This packet contains: A Friendly Guide to Minitab An introduction to Minitab; including basic Minitab functions, how to create sets of data, and how to create and edit graphs of different

### DEPARTMENT OF HEALTH AND HUMAN SCIENCES HS900 RESEARCH METHODS

DEPARTMENT OF HEALTH AND HUMAN SCIENCES HS900 RESEARCH METHODS Using SPSS Session 2 Topics addressed today: 1. Recoding data missing values, collapsing categories 2. Making a simple scale 3. Standardisation

### Collinearity of independent variables. Collinearity is a condition in which some of the independent variables are highly correlated.

Collinearity of independent variables Collinearity is a condition in which some of the independent variables are highly correlated. Why is this a problem? Collinearity tends to inflate the variance of

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### Factor Analysis. Sample StatFolio: factor analysis.sgp

STATGRAPHICS Rev. 1/10/005 Factor Analysis Summary The Factor Analysis procedure is designed to extract m common factors from a set of p quantitative variables X. In many situations, a small number of

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

### Canonical Correlation Analysis

Canonical Correlation Analysis LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the similarities and differences between multiple regression, factor analysis,

### Didacticiel - Études de cas

1 Topic Linear Discriminant Analysis Data Mining Tools Comparison (Tanagra, R, SAS and SPSS). Linear discriminant analysis is a popular method in domains of statistics, machine learning and pattern recognition.

### Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

### Section 1.5 Exponents, Square Roots, and the Order of Operations

Section 1.5 Exponents, Square Roots, and the Order of Operations Objectives In this section, you will learn to: To successfully complete this section, you need to understand: Identify perfect squares.

### Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

### Sect Properties of Real Numbers and Simplifying Expressions

Sect 1.6 - Properties of Real Numbers and Simplifying Expressions Concept #1 Commutative Properties of Real Numbers Ex. 1a.34 + 2.5 Ex. 1b 2.5 + (.34) Ex. 1c 6.3(4.2) Ex. 1d 4.2( 6.3) a).34 + 2.5 = 6.84

### Multicollinearity in Regression Models

00 Jeeshim and KUCC65. (003-05-09) Multicollinearity.doc Introduction Multicollinearity in Regression Models Multicollinearity is a high degree of correlation (linear dependency) among several independent

### Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination

Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used

### Basic Math Refresher A tutorial and assessment of basic math skills for students in PUBP704.

Basic Math Refresher A tutorial and assessment of basic math skills for students in PUBP704. The purpose of this Basic Math Refresher is to review basic math concepts so that students enrolled in PUBP704:

### SAS / INSIGHT. ShortCourse Handout

SAS / INSIGHT ShortCourse Handout February 2005 Copyright 2005 Heide Mansouri, Technology Support, Texas Tech University. ALL RIGHTS RESERVED. Members of Texas Tech University or Texas Tech Health Sciences

### Newton s First Law of Migration: The Gravity Model

ch04.qxd 6/1/06 3:24 PM Page 101 Activity 1: Predicting Migration with the Gravity Model 101 Name: Newton s First Law of Migration: The Gravity Model Instructor: ACTIVITY 1: PREDICTING MIGRATION WITH THE