# INTRODUCTION DATA SCREENING

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 EXPLORATORY FACTOR ANALYSIS ORIGINALLY PRESENTED BY: DAWN HUBER FOR THE COE FACULTY RESEARCH CENTER MODIFIED AND UPDATED FOR EPS 624/725 BY: ROBERT A. HORN & WILLIAM MARTIN (SP. 08) The purpose of this lesson on is to understand and apply statistical techniques to a single set of variables when the researcher is interested in discovering which variables in the set form coherent subsets that are relatively independent of one another. Variables that are correlated with one another but largely independent of other subsets of variables are combined into factors. Factors are thought to reflect underlying processes that have created the correlations among variables. INTRODUCTION That dataset (FACTOR.sav) that we will be using is part of a larger data set from Tabachnick and Fidell (2007). The study involved 369 middle-class, English-speaking women between the ages of 21 and 60 who completed the Bem Sex Role Inventory (BSRI). Respondents attribute traits to themselves by assigning numbers between 1 (never or almost never true of me) and 7 (always or almost always true of me) to each of the items. Forty-four items from the BSRI were selected for this research example. DATA SCREENING SAMPLE SIZE A general rule of thumb is to have at least 300 cases for factor analysis. Solutions that have several high loading marker variables (>.80) do not require such large sample sizes (about 150 cases should be sufficient) as solutions with lower loadings (Tabachnick & Fidell, 2007, p. 613). *Our data set has an adequate sample size of 369 cases. Bryant and Yarnold (1995) state that, one s sample should be at least five times the number of variables. The subjects-to-variables ratio should be 5 or greater. Furthermore, every analysis should be based on a minimum of 100 observations regardless of the subjects-to-variables ratio (p. 100). MISSING DATA To check for missing data: Click Analyze Descriptive Statistics Click Frequencies Click over all 44 Items to Variable(s): (except Subno) De-select [ ] Display frequency tables This will produce a warning message, simply click OK Click OK

2 The first table of the output identifies missing values for each item. Scrolling across the output, you will notice that there are no missing values for this set of data. If there were missing data, use one option (estimate, delete, or missing data pairwise correlation matrix is analyzed). If nonrandom pattern or small sample size, consider estimation but it can lead to overfitting the data resulting in too high correlations. Please refer to Tabachnick and Fidell (2007) to obtain more information about deleting and dealing with missing data. DETECTING MULTIVARIATE OUTLIERS For the sake of this training, we will start with an assessment of multivariate outliers. However, we would usually begin by conducting screening for univariate outliers and assumptions. Many statistical methods are sensitive to outliers so it is important to identify outliers and make decisions about what to do with them. Recall, that a multivariate outlier is an extreme score on one or more variables. REASON FOR OUTLIERS (TABACHNICK & FIDELL, 2007) 1. Incorrect data entry 2. Failure to specify missing values in the computer syntax so missing values are read as real data. 3. Outlier is not member of population that you intended to sample. 4. Outlier is representative of population you intended to sample but population has more extreme scores than a normal distribution. To check for multivariate outliers: Click Analyze Regression Click Linear Dependent: Independent(s): Click Save Under Distances [ ] Mahalanobis Click Continue Click OK subno All remaining 44 Items Page 2

3 An output page will be produced Minimize the output page and go to the Data View page. Once there, you will need to scroll over to the last column to see the Mahalanobis results for all 44 variables. To detect if a variable is a multivariate outlier, one must know the critical value for which the Mahalanobis distance must be greater than. Using the criterion of α =.001 with 44 df (number of variables), the critical Χ 2 = According to Tabachnick and Fidell (2007), we are not using N 1 for df because Mahalanobis distance is evaluated as Χ 2 with degrees of freedom equal to the number of variables (p. 99). Thus, all Mahalanobis variables must be examined to see if they value exceeds the critical value of Χ 2 = Due to the large number of variables to examine, an easy way to analyze all the Mahalanobis distance values for the 44 items is to Click Data Click Sort Cases Scroll down the variable list to the last variable and highlight the Mahalanobis Distance variable (MAH_1) and click it over to the Sort by: box Then under Sort Order Click OK e Descending We can also sort by moving the cursor over the variable of interest (e.g., MAH_1), right clicking on the mouse and click on Sort Descending The values under the Mahalanobis (MAH_1) column will then be arranged in descending order from highest to lowest values. On the Data View page, examine the top values and determine how many cases meet the criteria for a multivariate outlier (i.e., > 78.75). For this set of data there should be 25 cases that are considered multivariate outliers, leaving 344 non-outlying cases still an acceptable number of cases. We are opting to delete the 25 outlying cases. To delete the cases, highlight the gray numbers 1 through 25 (on the left of the screen) then click the Delete key. Save As the modified data set, FACTORMINUSMVOUTLIERS OPTIONS FOR DEALING WITH OUTLIERS (TABACHNICK & FIDELL, 2007) 1. Delete variable that may be responsible for many outliers, especially if it is highly correlated with other variables in the analysis. 2. If you decide that cases with extreme scores are not part of the population you sampled, then delete them. Page 3

4 3. If cases with extreme scores are considered part of the population you sampled then a way to reduce the influence of a univariate outlier is to transform the variable to change the shape of the distribution to be more normal. Tukey said you are merely reexpressing what the data have to say in other terms (Howell, 2007). 4. Another strategy for dealing with a univariate outlier is to assign the outlying case(s) a raw score on the offending variable that is one unit larger (or smaller) than the next most extreme score in the distribution (Tabachnick & Fidell, 2007, p. 77). 5. Univariate transformations and score alterations often help reduce the impact of multivariate outliers but they can still be a problem. These cases are usually deleted (Tabachnick & Fidell, 2007). All transformations, changes to scores, and deletions are reported in the results section with the rationale and with citations. MULTICOLLINEARITY AND SINGULARITY Multicollinearity occurs when the IVs are highly correlated. Singularity occurs when you have redundant variables. To test for multicollinearity and singularity, use the following SPSS commands: Click Analyze Regression Click Linear Click Reset Dependent: Independent(s): Click Statistics subno All 44 Items Be sure not to include MAH_1 [ ] Collinearity diagnostics Click Continue Click OK This will produce an output page If the determinant of R and eigenvalues associated with some factors approach 0, multicollinearity or singularity may be in existence. To investigate further, look at the SMCs for each variable where it serves as DV with all other variables as IVs (Tabachnick & Fidell, 2007, p. 614). Page 4

5 Looking at the output page on the following page, under Collinearity Statistics look at the Tolerance values for each item on the test. We want the Tolerance values to be high, closer to 1.0. Next, we want to explore SMCs (squared multiple correlations) of a variable where it serves as DV with the rest as IVs in multiple correlation (Tabchnick & Fidell, 2007). Many programs, including SPSS, convert the SMC values for each variable to tolerance (1 SMC) and deal with tolerance instead of SMC. Thus, we have to calculate the SMCs ourselves. Turn to the next page of this handout and next to the tolerance values calculate the SMCs for the first tem items (1 Tolerance). We want the SMCs to be low, closer to.00. If any of the SMCs are one (1), then singularity if present. If any of the SMCs are very large (i.e., near one), then multicollinearity is present (Tabachnick & Fidell, 2007). The tolerance and SMC values were fine for this group of data. However, if the tolerance values are too low, we would want to scroll down to the next table and examine the Condition Index for each item. According to Tabachnick and Fidell (2007), we do not want the Condition Index values to be greater than 30. Examine the Condition Index for all 44 items. As you can see, the last 25 items have Condition Indexes that are grater than 30. Because of these high Condition Indexes, you would next need to examine the Variance Proportion for those high Condition Index items which are located next to the Condition Index. According to Tabachnick and Fidell (2007), we do not want two Variance Proportions to be greater than.50 for each item. To explain further, look at the Variance Proportion of Dimension 45. Scroll across the page and see if there are two items with Variance Proportions that are greater than.50 for Dimension 45. Next, you have to make some decisions about multicollinearity. Because we did not find evidence of any Variance Proportions that are grater than.50, we may decide that we do not have evidence of multicollinearity. However, one can also combine evidence (explore the SMC, Tolerance Values, Condition Index, and Variance Proportions) and decide if there is combined evidence of multicollinearity. Generally, if the Condition Index and Variance Proportion values are high, then there is evidence of multicollinearity. For this set of data we have no evidence that multicollinearity or singularity exist. Save the output as MULTICOLLINEARITY Page 5

6 Model 1 (Constant) helpful self reliant defend beliefs yielding cheerful independent athletic shy assertive strong personality forceful affectionate flatter loyal analyt feminine sympathy moody sensitiv undstand compassionate leadership ability eager to soothe hurt feelings willing to take risks makes decisions easily self sufficient conscientious dominant masculin willing to take a stand happy soft spoken warm truthful tender gullible act as a leader childlik individualistic use foul language love children competitive ambitious gentle a. Dependent Variable: Subject identification Unstandardized Coefficients Coefficients a Standardized Coefficients Collinearity Statistics t Sig. Tolerance VIF B Std. Error Beta Page 6

7 NORMALITY If Principal Factor Analysis is used descriptively, then assumptions about distributions are not essential. However, normality of variables enhances the solution (Tabachnick & Fidell, 2007). When the numbers of factors are determined using statisicial inference, multivariate normality is assumed. Normality among single variables is assessed by skewness and kurtosis (Tabachnick & Fidell, 2007, p. 613) and as such, the distributions of the 44 variables need to be examined for skewness and kurtosis. To obtain the skewness and kurtosis of the 44 variables one would first Click Analyze Descriptive Statistics Click Frequencies Click Reset Click over all 44 Items to Variable(s): box Be sure not to include Subno and MAH_1 Click Statistics Click Charts Under Dispersion [ ] all Under Central Tendency [ ] all Under Distribution [ ] all Click Continue e Histograms [ ] With normal curve Click Continue De-select [ ] Display frequency tables Click OK An output will be produced scroll to the top of the output to Frequencies. You will see the skewness values and their standard error values for all 44 items. Page 7

8 Skewness: A distribution that is not symmetric but has more cases (more of a tail ) toward one end of the distribution than the other is said to be skewed (Norusis, 1994). Value of 0 = normal Positive Value = positive skew (tail going out to right) Negative Value = negative skew (tail going out to left) Divide the skewness statistic by its standard error. We want to know if this standard score value significantly departs from normality. Concern arises when the skewness statistic divided by its standard error is greater than z = (p <.001, two-tailed test) (Tabachnick & Fidell, 2007). To illustrate, calculate the standardized skewness of one item labeled helpful and provide the information asked for below. Keep in mind, that you would do this for each of the 44 items. helpful Skewness Value = Std. Error Skewness Standard Score Direction of the Skewness Significant Departure? (yes, no) Scroll to the top of the output to Frequencies. You will see the kurtosis values and their standard error values for all 44 items. Kurtosis: The relative concentration of scores in the center, the upper and lower ends (tails) and the shoulders (between the center and the tails) of a distribution (Norusis, 1994). Value of 0 = mesokurtic (normal, symmetric) Positive Value = leptokurtic (shape is more narrow, peaked) Negative Value = platykurtic (shape is more broad, widely dispersed, flat) Divide the kurtosis statistic by its standard error. We want to know if this standard score value significantly departs from normality. Concern arises when the kurtosis statistic divided by its standard error is greater than z = (p <.001, two-tailed test) (Tabachnick & Fidell, 2007). To illustrate, calculate the standardized kurtosis of one item labeled helpful and provide the information asked for below. Keep in mind, that you would do this for each of the 44 items. helpful Kurtosis Value Std. Error = Kurtosis Standard Score Direction of the Kurtosis Significant Departure? (yes, no) Page 8

9 LINEARITY Overall, many of the variables are negatively skewed and a few are positively skewed, However, because the BSRI is already published and in use, no deletion of variables or transformations of them is performed (Tabachnick & Fidell, 2007, p. 652). Save the output as NORMALITY Multivariate normality implies linearity, so linearity among pairs of variables is assessed through inspection of scatterplots (Tabachnick & Fidell, 2007, p. 613). With 44 variables, however, examination of all pairwise scatterplots (about 1,000 plots) is impractical. Therefore, to spot check for linearity, we will examine Loyal (with strong negative skewness) and Masculin (with strong positive skewness). To create a scatterplot, select Click Graphs Legacy Dialogs Click Scatter/Dot Click Simple Scatter (this should be the default) Click Define Y-Axis: X-Axis: Click OK An output (graph) will then be produced Save the output as LINEARITY Masculin Loyal The scatterplot should show a balanced spread of scores. According to Tabachnick and Fidell (2007), when assessing bivariate scatterplots if they are oval-shaped, they are normally distributed and linearly related. Although the plot is far from pleasing, and shows departure from linearity as well as the possibility of outliers, there is no evidence of true curvilinearity. And again, transformations are viewed with disfavor considering the variable set and the goals of analysis (Tabachnick & Fidell, 2007, p. 652 Page 9

10 CONDUCTING A PRINCIPAL FACTOR ANALYSIS Click Analyze Data Reduction Click Factor Highlight all 44 Items and click them over to the Variable(s): box. Be sure not to include Subno and MAH_1 Click Descriptives Under Statistics [ ] Univariate descriptives [ ] Initial solution (default) Page 10

11 Under Correlation Matrix [ ] Coefficients [ ] Determinant [ ] KMO and Bartlett s test of sphericity Click Continue Click Extraction Click OK An output will then be produced Change Method to Principal axis factoring Under Display [ ] Unrotated factor solution (default) [ ] Scree plot Click Continue INTERPRETATION OF THE EXPLORATORY FACTOR ANALYSIS To review the study, a sample of 369 middle-class, English-speaking women between the ages of 21 and 60 completed the Bem Sex Role Inventory (BSRI) and 44 items (variables) were used in the analysis. The research question is: Will the factor structure of the BSRI be similar to previous research indicating the presence of between three and five factors underlying the items of the BSRI for this sample of women? The purpose of factor analysis is to study a set of variables and discover subsets of variables that are relatively independent from one another. The subsets of variables that correlate with each other are combined as factors (linear combinations of observed variables) and are thought to reflect underlying processes (latent variables) that have created the correlations among the observed variables. Principal components analysis (PCA) uses the total variance (common variance + unique variance + error variance) to derive components (Hair, et al., 2006). PCA is an empirical summary of the data set. PCA aggregates the correlated variables, the variables produce the components. Common variance is variance in a variable that is shared with all other variables in the analysis. A variable s communality is the estimate of such shared variance. Unique variance is variance only associated with a specific variable which is not explained by correlations to other variables. Error variance cannot be explained by correlations to other variables either but it is due to unreliability in data-gathering, measurement error, or random selection. Page 11

12 Factor Analysis (FA) focuses only on the common variance (covariance, communality) that each observed variable shares with other observed variables. FA excludes unique and error variance which confuses the understanding of underlying processes (latent variables). FA is the choice if a theoretical solution of factors is thought to cause or produce scores on variables. The steps of interpretation are (1) selecting and measuring variables, (2) preparing the correlation matrix, (3) determining the factorability of R, (4) assessing the adequacy of extraction and determining the number of factors, (5) extraction and rotating the factors to increase interpretability, and (6) interpreting the results. Once an initial final solution is selected validation continues using cross-validation, confirmatory factor analysis, and criterion validation methods (Tabachnick & Fidell, 2007). FACTORABILITY OF R: There are several sources of information to determine if the R matrix is likely to produce linear combinations of variables as factors. Look at the Correlation Matrix (R) produced on the output page. A matrix that is factorable should include several sizable correlations. The expected size depends, to some extent, on N (larger sample sizes tend to produce smaller correlations), but if no correlation exceeds.30, use of FA is questionable because there is probably nothing to factor analyze (Tabachnick & Fidell, 2007, p. 614). We want the correlations between items to be greater than.30. Interpret the correlation matrix: High bivariate correlations, however, are not ironclad proof that the correlation matrix contains factors. It is possible that the correlations are between only two variables and do not reflect underlying processes that are simultaneously affecting several variables. For this reason, it is helpful to examine matrices of partial correlations where pairwise correlations are adjusted for effects of all other variables (Tabachnick & Fidell, 2007, p. 614). To examine partial correlations, look on the output page at the KMO. The Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy is the sum of all the squared correlation coefficients in the numerator and the denominator is the sum of all the squared correlation coefficients plus the sum of all of the squared partial correlation coefficients (Norusis, 2003). A partial correlation is a value that measures the strength of the relationship between a dependent variable and a single independent variable when the effects of other independent variables are held constant (Hair, et al., 2006). Page 12

13 The following criteria are used to assess and describe the sampling adequacy (Kaiser, 1974):.90 = Marvelous.80 = Meritorious.70 = Middling.60 = Mediocre.50 = Miserable Below.50 = Unacceptable If small KMOs, it is a good idea not to do factor analysis. Please interpret the KMO below: KMO Value: Sampling Adequacy Criteria Rating: Next, look at Bartlett s Test of Sphericity on the output page. Bartlett s (1954) Test of Sphericity is a notoriously sensitive test of the hypothesis that the correlations in a correlation matrix are zero. According to Tabachnick and Fidell (2007), the test is likely to be significant with samples of substantial size even if correlations are very low. Therefore, use of the test is recommended only if there are fewer than, say, five cases per variable (p. 614). Overall, we want Bartlett s Test of Sphericity to be significant so that we can reject the hypothesis. Interpret Bartlett s Test of Sphericity by providing the information asked for below. Approx. Chi-Square Significance Bartlett s Test of Sphericity What was your decision about the null hypothesis? ADEQUACY OF EXTRACTION AND NUMBER OF FACTORS: An initial factor analysis is run using principal axis factoring with an unrotated factor solution with the purpose to determine the adequacy of extraction and to identify the likely number of factors in the solution. PCA is often used for the same initial purpose. Page 13

14 Look at the communalities from the output. A communality of the variable is the proportion of variance explained by the common factors. The initial communalities are the SMC of each variable as DV with the others in the sample as IVs. Extraction communalities are SMCs between each variable as DV and the factors as IVs. Communalities range from 0 to 1 where 0 means that the factors don t explain any of the variance and 1 means that all of the variance is explained by the factors. Variables with small extraction communalities cannot be predicted by the factors and you should consider eliminating them if too small (<.20). How many extraction communalities are below.20? A first check of the number of factors is obtained from the sizes of the eigenvalues reported as part of an initial run with principal axis factoring extraction. An eigenvalue (latent root) represents the amount of variance accounted for by a factor. Because the variance that each standardized variable contributes to a principal factor extraction is 1, a factor with an eigenvalues less than 1 is not as important, from a variance perspective, as an observed variable. Look at the output and look under the heading: Total Variance Explained. Then look under the heading: Initial Eigenvalues. Examine the Initial Eigenvalues and under Total examine how many factors are above the value of one (1). How many factors are above an initial eigenvalue of 1.0? There should be 11 factors above one. However, having 11 factors is not parsimonious. Thus, you may use eigenvalues over two (2) as the criterion in specifying which factors are the most worthy of further exploration. Tabachnick & Fidell (2007) say Eigenvalues for the first four factors are all larger than two, and, after the sixth factor, changes in successive eigenvalues are small. This is taken as evidence that there are probably between 4 and 6 factors (p. 657). A second criterion is the scree test of eigenvalues plotted against factors. Factors, in descending order, are arranged along the abscissa with eigenvalues as the ordinate. Usually the scree plot is negatively decreasing the eigenvalue is highest for the first factor and moderate but decreasing for the next few factors before reaching small values for the last several factors. Examine the Scree Plot on your output page According to Norusis (2003), the plot most often will show a distinct break between the steep slope of the large factors and the gradual trailing off of the rest of the factors, the scree that forms at the foot of a mountain. One should use only the factors before the scree begins. According to Hair et al. (2006), starting with the first factor, the plot slopes steeply downward initially and then slowly becomes an approximately horizontal line. The point at which the curve first begins to straigten out is considered to indicate the maximum number of factors to extract (p.120). You look for the point where the line drawn through the points change slope. Page 14

15 Unfortunately, the scree test is not exact; it involves judgment of where the discontinuity in eigenvalues occurs and researchers are not perfectly reliable judges (Tabachnick & Fidell, 2007). In the example, a single straight line can comfortably fit the first four eigenvalues. After that, another line, with a noticeably different slope, best fits the remaining eight points. Therefore, there appears to be about four (4) factors in the data. Once you have determined the number of factors by these criteria, it is important to look at the rotated loading matrix to determine the number of variables that load on each factor. CREATING 4 FACTORS: Click Analyze Data Reduction Click Factor Click Reset Highlight all 44 Items and click them over to the Variable(s): box. Be sure not to include Subno and MAH_1 Click Extraction Change Method to Principal axis factoring Under Extract e Number of factors: Type in the number 4 (four) Click Continue Click Rotation Click OK An output should be produced Under Method e Varimax Click Continue EXTRACTION AND ROTATING THE FACTORS TO INCREASE INTERPRETABILITY: We are now looking for the most parsimonious final solution of factors representing the R matrix and the theory of the problem related to the presence of between three and five factors underlying the items of the BSRI. We specified 4 factors for the run. Again, we will use principal axis factoring which maximizes variance extracted by orthogonal Page 15

18 helpful self reliant defend beliefs yielding cheerful independent athletic shy assertive strong personality forceful affectionate flatter loyal analyt feminine sympathy moody sensitiv undstand compassionate leadership ability eager to soothe hurt feelings willing to take risks makes decisions easily self sufficient conscientious dominant masculin willing to take a stand happy soft spoken warm truthful tender gullible act as a leader childlik individualistic use foul language love children competitive ambitious gentle Rotated Factor Matrix a Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 9 iterations. Factor Page 18

20 INTERNAL CONSISTENCY OF FACTORS Click Analyze Scale Click Reliability Analysis Click over the 44 Items under the Items: box Be sure not to include Subno and MAH_1 For the Model: box be sure that Alpha is selected Click OK Cronbach s coefficient alpha is a measure of internal consistency of the items of a total test or scales of a test based upon the scores of the particular sample. The scores range from 0-1. Scores on the higher range of the scale (>.70) suggest that the items of the total test or scales are measuring the same thing. Interpret Cronbach s Alpha by providing the information asked for below: Cronbach s Alpha For all 44 items N of items Interpretation: FOR EACH FACTOR (SCALE) Next, examine the internal consistency of the items which have high factor loadings on each of the four factors (i.e., >.45). These are the item loadings you circled for each of the four factors in the Rotated Factor Matrix. Click Analyze Scale Click Reliability Analysis Click Reset Click over the items for that factor under the Items: box For the Model: box be sure that Alpha is selected Click OK Page 20

21 Cronbach s Alpha For Factor 1 N of items Interpretation: Do the same procedure for the next three factors and interpret Cronbach s Alpha by providing the information asked for below: Cronbach s Alpha For Factor 2 N of items Interpretation: Cronbach s Alpha For Factor 3 N of items Interpretation: Cronbach s Alpha For Factor 4 N of items Interpretation: Page 21

22 References Bartlett, M. S. (1954). A note on the multiplying factors for various chi square approximations. Journal of Royal Statistical Society, 16(Series B), Bryant, F. B., & Yarnold, P. R. (1995). Principal-components analysis and exploratory and confirmatory factor analysis. In L. G. Grimm & P. R. Yarnold (Eds.), Reading and understanding multivariate statistics (pp ). Washington, DC: American Psychological Association. Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Hair, J. R., Jr., Black, W. C., Babin, B. J., Anderson, R. E., & Tatham, R. L. (2006). Multivariate analysis. Upper saddle River, NJ: Pearson Prentice Hall. Howell, D. C. (2007). Statistical methods for psychology (6th ed.). Belmont, CA: Thomson Wadsworth. Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrica, 39, Norusis, M. J. (2003). SPSS 12.0 Statistical Procedures Companion. Upper Saddle, NJ: Prentice Hall. Norusis, M. J. (1994). SPSS advanced statistics 6.1. Chicago, IL: SPSS Inc. Rummel, R. J. (1970). Applied multivariate statistics for the social sciences. Mahwah, NJ: Lawrence Erlbaum Associates. Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston: Allyn and Bacon. Page 22

### UNDERSTANDING MULTIPLE REGRESSION

UNDERSTANDING Multiple regression analysis (MRA) is any of several related statistical methods for evaluating the effects of more than one independent (or predictor) variable on a dependent (or outcome)

### Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

### Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales. Olli-Pekka Kauppila Rilana Riikkinen

Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales Olli-Pekka Kauppila Rilana Riikkinen Learning Objectives 1. Develop the ability to assess a quality of measurement instruments

### T-test & factor analysis

Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue

### Lecture 7: Factor Analysis. Laura McAvinue School of Psychology Trinity College Dublin

Lecture 7: Factor Analysis Laura McAvinue School of Psychology Trinity College Dublin The Relationship between Variables Previous lectures Correlation Measure of strength of association between two variables

### Common factor analysis

Common factor analysis This is what people generally mean when they say "factor analysis" This family of techniques uses an estimate of common variance among the original variables to generate the factor

### Data analysis process

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

### 4. There are no dependent variables specified... Instead, the model is: VAR 1. Or, in terms of basic measurement theory, we could model it as:

1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in the relationships among the variables--factors are linear constructions of the set of variables; the critical source

### ID X Y

Dale Berger SPSS Step-by-Step Regression Introduction: MRC01 This step-by-step example shows how to enter data into SPSS and conduct a simple regression analysis to develop an equation to predict from.

### Factor Analysis. Chapter 420. Introduction

Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

### SPSS: Descriptive and Inferential Statistics. For Windows

For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 Chi-Square Test... 10 2.2 T tests... 11 2.3 Correlation...

### Lecture - 32 Regression Modelling Using SPSS

Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

### Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk

Doi:10.5901/mjss.2014.v5n20p303 Abstract Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk Wilbert Sibanda Philip D. Pretorius

### Correlation and Regression

Dublin Institute of Technology ARROW@DIT Books/Book Chapters School of Management 2012-10 Correlation and Regression Donal O'Brien Dublin Institute of Technology, donal.obrien@dit.ie Pamela Sharkey Scott

### Linear Regression Models

Linear Regression Models 1 SPSS for Windows Intermediate & Advanced Applied Statistics Zayed University Office of Research SPSS for Windows Workshop Series Presented by Dr. Maher Khelifa Associate Professor

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### FACTOR ANALYSIS NASC

FACTOR ANALYSIS NASC Factor Analysis A data reduction technique designed to represent a wide range of attributes on a smaller number of dimensions. Aim is to identify groups of variables which are relatively

### Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association

### UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly

### PRINCIPAL COMPONENTS AND THE MAXIMUM LIKELIHOOD METHODS AS TOOLS TO ANALYZE LARGE DATA WITH A PSYCHOLOGICAL TESTING EXAMPLE

PRINCIPAL COMPONENTS AND THE MAXIMUM LIKELIHOOD METHODS AS TOOLS TO ANALYZE LARGE DATA WITH A PSYCHOLOGICAL TESTING EXAMPLE Markela Muca Llukan Puka Klodiana Bani Department of Mathematics, Faculty of

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### A Brief Introduction to SPSS Factor Analysis

A Brief Introduction to SPSS Factor Analysis SPSS has a procedure that conducts exploratory factor analysis. Before launching into a step by step example of how to use this procedure, it is recommended

### Chapter 7 Factor Analysis SPSS

Chapter 7 Factor Analysis SPSS Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. Factor analysis is often

### Correlation and Regression Analysis: SPSS

Correlation and Regression Analysis: SPSS Bivariate Analysis: Cyberloafing Predicted from Personality and Age These days many employees, during work hours, spend time on the Internet doing personal things,

### EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

EPS 625 INTERMEDIATE STATISTICS The Friedman test is an extension of the Wilcoxon test. The Wilcoxon test can be applied to repeated-measures data if participants are assessed on two occasions or conditions

### Canonical Correlation

Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present

### 2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) F 2 X 4 U 4

1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) 3. Univariate and multivariate

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Overview of Factor Analysis

Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

### Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings

### Statistics and research

Statistics and research Usaneya Perngparn Chitlada Areesantichai Drug Dependence Research Center (WHOCC for Research and Training in Drug Dependence) College of Public Health Sciences Chulolongkorn University,

### Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

### INTERPRETING THE REPEATED-MEASURES ANOVA

INTERPRETING THE REPEATED-MEASURES ANOVA USING THE SPSS GENERAL LINEAR MODEL PROGRAM RM ANOVA In this scenario (based on a RM ANOVA example from Leech, Barrett, and Morgan, 2005) each of 12 participants

### Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003 FA is not worth the time necessary to understand it and carry it out. -Hills, 1977 Factor analysis should not

### Running head: ASSUMPTIONS IN MULTIPLE REGRESSION 1. Assumptions in Multiple Regression: A Tutorial. Dianne L. Ballance ID#

Running head: ASSUMPTIONS IN MULTIPLE REGRESSION 1 Assumptions in Multiple Regression: A Tutorial Dianne L. Ballance ID#00939966 University of Calgary APSY 607 ASSUMPTIONS IN MULTIPLE REGRESSION 2 Assumptions

### 12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods

### 11/20/2014. Correlational research is used to describe the relationship between two or more naturally occurring variables.

Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

### FACTOR ANALYSIS EXPLORATORY APPROACHES. Kristofer Årestedt

FACTOR ANALYSIS EXPLORATORY APPROACHES Kristofer Årestedt 2013-04-28 UNIDIMENSIONALITY Unidimensionality imply that a set of items forming an instrument measure one thing in common Unidimensionality is

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### Admin. Assignment 2: Final Exam. Small Group Presentations. - Due now.

Admin Assignment 2: - Due now. Final Exam - June 2:30pm, Room HEATH, UnionComplex. - An exam guide and practice questions will be provided next week. Small Group Presentations Kaleidoscope eyes: Anomalous

### This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

### To do a factor analysis, we need to select an extraction method and a rotation method. Hit the Extraction button to specify your extraction method.

Factor Analysis in SPSS To conduct a Factor Analysis, start from the Analyze menu. This procedure is intended to reduce the complexity in a set of data, so we choose Data Reduction from the menu. And the

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### 5.2 Customers Types for Grocery Shopping Scenario

------------------------------------------------------------------------------------------------------- CHAPTER 5: RESULTS AND ANALYSIS -------------------------------------------------------------------------------------------------------

### Introduction to Principal Component Analysis: Stock Market Values

Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from

### We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do

### Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### Understanding and Using Factor Scores: Considerations for the Applied Researcher

A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

### The Effectiveness of Ethics Program among Malaysian Companies

2011 2 nd International Conference on Economics, Business and Management IPEDR vol.22 (2011) (2011) IACSIT Press, Singapore The Effectiveness of Ethics Program among Malaysian Companies Rabiatul Alawiyah

### Chapter 14: Analyzing Relationships Between Variables

Chapter Outlines for: Frey, L., Botan, C., & Kreps, G. (1999). Investigating communication: An introduction to research methods. (2nd ed.) Boston: Allyn & Bacon. Chapter 14: Analyzing Relationships Between

### Factor Analysis Example: SAS program (in blue) and output (in black) interleaved with comments (in red)

Factor Analysis Example: SAS program (in blue) and output (in black) interleaved with comments (in red) The following DATA procedure is to read input data. This will create a SAS dataset named CORRMATR

### psyc3010 lecture 8 standard and hierarchical multiple regression last week: correlation and regression Next week: moderated regression

psyc3010 lecture 8 standard and hierarchical multiple regression last week: correlation and regression Next week: moderated regression 1 last week this week last week we revised correlation & regression

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 2307 Old Cafeteria Complex 2 When want to predict one variable from a combination of several variables. When want

### Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

### Multivariate Analysis of Variance (MANOVA)

Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

### The scatterplot indicates a positive linear relationship between waist size and body fat percentage:

STAT E-150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the

### FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

FACTOR ANALYSIS Introduction Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables Both methods differ from regression in that they don t have

### UNDERSTANDING THE ONE-WAY ANOVA

UNDERSTANDING The One-way Analysis of Variance (ANOVA) is a procedure for testing the hypothesis that K population means are equal, where K >. The One-way ANOVA compares the means of the samples or groups

### Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

### One-Way Repeated Measures Analysis of Variance (Within-Subjects ANOVA)

One-Way Repeated Measures Analysis of Variance (Within-Subjects ANOVA) 1 SPSS for Windows Intermediate & Advanced Applied Statistics Zayed University Office of Research SPSS for Windows Workshop Series

### Four Assumptions Of Multiple Regression That Researchers Should Always Test

A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

### Research Methodology: Tools

MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 02: Item Analysis / Scale Analysis / Factor Analysis February 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi

### Multiple Linear Regression

Multiple Linear Regression Simple Linear Regression Regression equation for a line (population): y = β 0 + β 1 x + β 0 : point where the line intercepts y-axis β 1 : slope of the line : error in estimating

### Content DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS

Content DESCRIPTIVE STATISTICS Dr Najib Majdi bin Yaacob MD, MPH, DrPH (Epidemiology) USM Unit of Biostatistics & Research Methodology School of Medical Sciences Universiti Sains Malaysia. Introduction

### 7. Tests of association and Linear Regression

7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

### SPSS Explore procedure

SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

### Choosing the Right Type of Rotation in PCA and EFA James Dean Brown (University of Hawai i at Manoa)

Shiken: JALT Testing & Evaluation SIG Newsletter. 13 (3) November 2009 (p. 20-25) Statistics Corner Questions and answers about language testing statistics: Choosing the Right Type of Rotation in PCA and

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

### The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

### Instructions for SPSS 21

1 Instructions for SPSS 21 1 Introduction... 2 1.1 Opening the SPSS program... 2 1.2 General... 2 2 Data inputting and processing... 2 2.1 Manual input and data processing... 2 2.2 Saving data... 3 2.3

### Exploratory Factor Analysis

Exploratory Factor Analysis ( 探 索 的 因 子 分 析 ) Yasuyo Sawaki Waseda University JLTA2011 Workshop Momoyama Gakuin University October 28, 2011 1 Today s schedule Part 1: EFA basics Introduction to factor

### The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

### Extended control charts

Extended control charts The control chart types listed below are recommended as alternative and additional tools to the Shewhart control charts. When compared with classical charts, they have some advantages

### Practical Considerations for Using Exploratory Factor Analysis in Educational Research

A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

### Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

### Data Analysis: Describing Data - Descriptive Statistics

WHAT IT IS Return to Table of ontents Descriptive statistics include the numbers, tables, charts, and graphs used to describe, organize, summarize, and present raw data. Descriptive statistics are most

### Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

### Research Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement

Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.

### CRJ Doctoral Comprehensive Exam Statistics Friday August 23, :00pm 5:30pm

CRJ Doctoral Comprehensive Exam Statistics Friday August 23, 23 2:pm 5:3pm Instructions: (Answer all questions below) Question I: Data Collection and Bivariate Hypothesis Testing. Answer the following

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### Using Principal Components Analysis in Program Evaluation: Some Practical Considerations

http://evaluation.wmich.edu/jmde/ Articles Using Principal Components Analysis in Program Evaluation: Some Practical Considerations J. Thomas Kellow Assistant Professor of Research and Statistics Mercer

### A correlation exists between two variables when one of them is related to the other in some way.

Lecture #10 Chapter 10 Correlation and Regression The main focus of this chapter is to form inferences based on sample data that come in pairs. Given such paired sample data, we want to determine whether

### Factor Analysis Using SPSS

Psychology 305 p. 1 Factor Analysis Using SPSS Overview For this computer assignment, you will conduct a series of principal factor analyses to examine the factor structure of a new instrument developed

### Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Factor Analysis Using SPSS

Factor Analysis Using SPSS The theory of factor analysis was described in your lecture, or read Field (2005) Chapter 15. Example Factor analysis is frequently used to develop questionnaires: after all

### Simple Linear Regression, Scatterplots, and Bivariate Correlation

1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

### Discriminant Function Analysis in SPSS To do DFA in SPSS, start from Classify in the Analyze menu (because we re trying to classify participants into

Discriminant Function Analysis in SPSS To do DFA in SPSS, start from Classify in the Analyze menu (because we re trying to classify participants into different groups). In this case we re looking at a