Data analysis process



Similar documents
SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

Instructions for SPSS 21

SPSS Tests for Versions 9 to 13

January 26, 2009 The Faculty Center for Teaching and Learning

Additional sources Compilation of sources:

Directions for using SPSS

4. Descriptive Statistics: Measures of Variability and Central Tendency

SPSS Explore procedure

Projects Involving Statistics (& SPSS)

SPSS Introduction. Yi Li

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Data exploration with Microsoft Excel: analysing more than one variable

Mathematics within the Psychology Curriculum

Analyzing Research Data Using Excel

The Dummy s Guide to Data Analysis Using SPSS

Data Analysis in SPSS. February 21, If you wish to cite the contents of this document, the APA reference for them would be

T-test & factor analysis

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

An introduction to using Microsoft Excel for quantitative data analysis

Introduction Course in SPSS - Evening 1

A Brief Introduction to SPSS Factor Analysis

Chapter 7 Section 7.1: Inference for the Mean of a Population

The Statistics Tutor s Quick Guide to

SPSS (Statistical Package for the Social Sciences)

Using SPSS, Chapter 2: Descriptive Statistics

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

(and sex and drugs and rock 'n' roll) ANDY FIELD

An introduction to IBM SPSS Statistics

Table of Contents. Preface


Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

8. Comparing Means Using One Way ANOVA

4. There are no dependent variables specified... Instead, the model is: VAR 1. Or, in terms of basic measurement theory, we could model it as:

Simple Predictive Analytics Curtis Seare

7. Comparing Means Using t-tests.

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

SPSS Notes (SPSS version 15.0)

Descriptive Statistics

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

SPSS TUTORIAL & EXERCISE BOOK

2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables) F 2 X 4 U 4

Research Methods & Experimental Design

SPSS Guide How-to, Tips, Tricks & Statistical Techniques

Analysing Questionnaires using Minitab (for SPSS queries contact -)

UNIVERSITY OF NAIROBI

When to use Excel. When NOT to use Excel 9/24/2014

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Psyc 250 Statistics & Experimental Design. Correlation Exercise

An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman

SPSS Manual for Introductory Applied Statistics: A Variable Approach

SPSS: AN OVERVIEW. Seema Jaggi and and P.K.Batra I.A.S.R.I., Library Avenue, New Delhi

Using Excel in Research. Hui Bian Office for Faculty Excellence

GETTING YOUR DATA INTO SPSS

Chapter 7 Factor Analysis SPSS

HLM software has been one of the leading statistical packages for hierarchical

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

SPSS Step-by-Step Tutorial: Part 2

The Chi-Square Test. STAT E-50 Introduction to Statistics

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

IBM SPSS Missing Values 22

MASTER COURSE SYLLABUS-PROTOTYPE PSYCHOLOGY 2317 STATISTICAL METHODS FOR THE BEHAVIORAL SCIENCES

Introduction to Statistics with SPSS (15.0) Version 2.3 (public)

SPSS-Applications (Data Analysis)

Lecture 1: Review and Exploratory Data Analysis (EDA)

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Data Cleaning and Missing Data Analysis

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Introduction to StatsDirect, 11/05/2012 1

II. DISTRIBUTIONS distribution normal distribution. standard scores

To do a factor analysis, we need to select an extraction method and a rotation method. Hit the Extraction button to specify your extraction method.

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Main Effects and Interactions

Simple Linear Regression, Scatterplots, and Bivariate Correlation

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

SPSS Modules Features Statistics Premium

Student Guide to SPSS Barnard College Department of Biological Sciences

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

Foundation of Quantitative Data Analysis

An SPSS companion book. Basic Practice of Statistics

How to Make APA Format Tables Using Microsoft Word

Statistics Review PSY379

Why Is EngineRoom the Right Choice? 1. Cuts the Cost of Calculation

Elements of statistics (MATH0487-1)

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2014/11/6) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

SPSS Basic Skills Test

Chapter 5 Analysis of variance SPSS Analysis of variance

IBM SPSS Direct Marketing 23

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Data Analysis Tools. Tools for Summarizing Data

R with Rcmdr: BASIC INSTRUCTIONS

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Transcription:

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis Explore relationship between variables Compare groups

Collecting data Survey Using existing data

Three steps to PREPARING A DATA FILE

Step 1: Setting your options: Edit>Options General tab Variable lists Output notification Raise viewer window Scroll to new output Output No scientific notation Session journal Append Browse set folder Data tab Calculate values immediately Display format for new numeric variables (make 0) Output labels Values and labels Charts tab Formatting charts Pivot Tables tab academic

Step 2: setting up the structure of the data file Variable tab SPSS codebook Label versus name Types of variables Values of a variable Missing values Type of measurement

Step 3: enter the data Data view Entering data Opening an existing data set Using Excel Variable names in the first row Converting to SPSS format

Other useful things to know: Merging files Adding cases Adding variables Sorting the data file Splitting the data file Selecting cases Using sets (useful for large data files with scales) Data file information

Two step to SCREENING AND CLEANING THE DATA

Two steps: Step 1: checking for errors Categorical variables Analyze>descriptive statistics>frequencies Statistics: min/max Numerical variables analyze>descriptive statistics>descriptives Options: mean, std dev, min, max Step 2: finding and correcting the errors Method 1: sort cases Method 2: edit>find

Other useful thing for screening Case summaries 1. Analyze>reports>case summaries 2. Choose variables, limit cases to first 100 3. Statistics: number of cases 4. Options: uncheck subheadings for totals

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis Explore relationship between variables Compare groups

Descriptive statistics and graphs EXPLORATION OF DATA

Descriptive statistics Categorical: Frequencies Numerical: Descriptives: mean standard deviation minimum maximum skewness (symmetry) kurtosis (peakness)

Missing Data Two potential problems with missing data: 1. Large amount of missing data number of valid cases decreases drops the statistical power 2. Nonrandom missing data either related to respondent characteristics and/or to respondent attitudes may create a bias

Missing Data Analysis Examine missing data By variable By respondent By analysis If no problem found, go directly to your analysis If a problem is found: Delete the cases with missing data Try to estimate the value of the missing data

Amount of missing data by variable Use Analyze > Descriptive Statistics > Frequencies Look at the frequency tables to see how much missing If the amount is more than 5%, there is too much. Need analyze further.

Missing data by respondent 1. Use transform>count 2. Create NMISS in the target variable 3. Pick a set of variables that have more missing data 4. Click on define values 5. Click on system- or user-missing 6. Click add 7. Click continue and then ok 8. Use the frequency table to show you the results of NMISS

Missing data patterns Use Analyze>descriptive statistics>crosstabs Look to see if there is a correlation between NMISS (row) and another variable (column) Use column percents to view the % of missing for the value of the variable

What to do about the missing data? Proceed anyway In SPSS Options: Exclude case listwise include only cases that have all of the variables Exclude cases pairwise excludes only if the variables in the analysis is missing Replace with mean use at your own risk

Assessing Normality Skewness and kurtosis Using Explore: Dependent variable Label cases by id variable Display both Statistics descriptives and outliers Plots descriptive: histogram, normality plots with test options exclude cases pairwise

Outliers 1. histogram 2. boxplot

Other graphs Bar charts Scatterplots Line graphs

Manipulating data Recoding Calculating When to create a new variable versus creating a new one.

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis Explore relationship between variables Compare groups

Analysis Explore relationships among variables Crosstabulation/Chi Square Correlation Regression/Multiple regression Logistic regression Factor analysis Compare groups Non-parametric statistics T-tests One-way analysis of variance ANOVA Two-way between groups ANOVA Multivariate analysis of variance MANOVA

Crosstabulation Aim: for categorical data to see the relationship between two or more variables Procedure: Analyze>Descriptive statistics>crosstab Statistics: correlation, Chi Square, association Cells: Percentages row or column Cluster bar charts

Correlation Aim: find out whether a relationship exists and determine its magnitude and direction Correlation coefficients: Pearson product moment correlation coefficient Spearman rank order correlation coefficient Assumptions: relationship is linear Homoscedasticity: variability of DV should remain constant at all values of IV

Partial correlation Aim: to explore the relationship between two variables while statistically controlling for the effect of another variable that may be influencing the relationship Assumptions: same as correlation c a b

Regression Aim: use after there is a significant correlation to find the appropriate linear model to predict DV (scale or ordinal) from one or more IV (scale or ordinal) Assumptions: sample size needs to be large enough multicollinearity and singularity outliers normality IV2 linearity homoscedasticity IV1 Types: standard hierarchical stepwise DV IV3

Logistic regression Aim: create a model to predict DV (categorical 2 or more categories) given one or more IV (categorical or numerical/scale) Assumptions: sample size large enough multicollinearity outliers Procedure note: use Binary Logistic for DV of 2 categories (coding 0/1) use Multinomial Logistic for DV for more then 2 categories

Factor analysis Aim: to find what items (variables) clump together. Usually used to create subscales. Data reduction. Factor analysis: exploratory confirmatory SPSS -> Principal component analysis

Three steps of factor analysis 1. Assessment of the suitability of the data for factor analysis 2. Factor extraction 3. Factor rotation and interpretation

1. Assessment of the suitability 1. Sample size: 10 to 1 ratio 2. Strength of the relationship among variables (items) Test of Sphericity 3. Linearity 4. Outliers among cases

Step 2. Factor extraction 1. Commonly used technique principal components analysis 2. Kaiser s criterion: only factors with eigenvalues of 1.0 or more are retained may give too many factors 3. Scree test: plot of the eigenvalues, retain all the factors above the elbow 4. Parallel analysis: compares the size of the eigenvalues with those obtained from randomly generated data set of the same size

Step 3: factor rotation and interpretation 1. Orthogonal rotation 1. uncorrelated 2. Easier to interpret 3. Varimax 2. Oblique rotation 1. Correlated 2. Harder to interpret 3. Direct Oblimin

Checking the reliability of a scale Analyze>Scale>Reliability Items Model: Alpha Scale label: name new subscale to be created Statistics: descriptives: item, scale, scale if item deleted Inter-item: correlations Summaries: correlations

Analysis Explore relationships among variables Crosstabulation/Chi Square Correlation Regression/Multiple regression Logistic regression Factor analysis Compare groups Non-parametric statistics T-tests One-way analysis of variance ANOVA Two-way between groups ANOVA Multivariate analysis of variance MANOVA

Nonparametric tests Non-parametric techniques Chi-square test for goodness of fit Chi-square test for independence Kappa measure of agreement Mann-Whitney U Test Wilcoxon Signed Rank Test Kruskal-Wallis Test Friedman Test Parametric techniques None None None Independent samples t-test Paired samples t-test One-way between groups ANOVA One-way repeated measures ANOVA

T-test for independent groups Aim:Testing the differences between the means of two independent samples or groups Requirements: Only one independent (grouping) variable IV (ex. Gender) Only two levels for that IV (ex. Male or Female) Only one dependent variable (DV - numerical) Assumptions: Sampling distribution of the difference between the means is normally distributed Homogeneity of variances Tested by Levene s Test for Equality of Variances Procedure: ANALYZE>COMPARE MEANS>INDEPENDENT SAMPLES T-TEST Test variable DV Grouping variable IV DEFINE GROUPS (need to remember your coding of the IV) Can also divide a range by using a cut point

Paired Samples T-test Aim:used in repeated measures or correlated groups designs, each subject is tested twice on the same variable, also matched pairs Requirements: Looking at two sets of data (ex. pre-test vs. post-test) Two sets of data must be obtained from the same subjects or from two matched groups of subjects Assumptions: Sampling distribution of the means is normally distributed Sampling distribution of the difference scores should be normally distributed Procedure: ANALYZE>COMPARE MEANS>PAIRED SAMPLES T-TEST

One-way Analysis of Variance Aim: looks at the means from several independent groups, extension of the independent sample t-test Requirements: Only one IV (categorical) More than two levels for that IV Only one DV (numerical) Assumptions: The populations that the sample are drawn are normally distributed Homogeneity of variances Observations are all independent of one another Procedure: ANALYZE>COMPARE MEANS>One-Way ANOVA Dependent List DV Factor IV

Two-way Analysis of Variance Aim: test for main effect and interaction effects on the DV Requirements: Two IV (categorical variables) Only one DV (continuous variable) Procedure: ANALYZE>General Linear Model>Univariate Dependent List DV Fixed Factor IVs

MANOVA Aim: extension of ANOVA when there is more than one DV (should be related) Assumptions: sample size normality outliers linearity homogeneity of regression multicollinearity and singularity homogeneity of variance-covariance matrices

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis Explore relationship between variables Compare groups

THANK YOU!