Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm



Similar documents
Descriptive Statistics

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Data analysis process

Introduction to Quantitative Methods

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Study Guide for the Final Exam

II. DISTRIBUTIONS distribution normal distribution. standard scores

UNIVERSITY OF NAIROBI

Research Methods & Experimental Design

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

The Dummy s Guide to Data Analysis Using SPSS

Instructions for SPSS 21

An introduction to IBM SPSS Statistics

Projects Involving Statistics (& SPSS)

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

SPSS Tests for Versions 9 to 13

When to Use a Particular Statistical Test

SPSS Explore procedure

Association Between Variables

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Foundation of Quantitative Data Analysis

Data Analysis Tools. Tools for Summarizing Data

Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship)

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Statistics Review PSY379

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

Descriptive Analysis

T-test & factor analysis

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

DATA COLLECTION AND ANALYSIS

Simple Linear Regression Inference

DATA INTERPRETATION AND STATISTICS

Fairfield Public Schools

Simple Predictive Analytics Curtis Seare

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 13 Introduction to Linear Regression and Correlation Analysis

SPSS TUTORIAL & EXERCISE BOOK

Final Exam Practice Problem Answers

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Univariate Regression

Using Excel for inferential statistics

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

The Statistics Tutor s Quick Guide to

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Quantitative Methods for Finance

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

How To Test For Significance On A Data Set

Introduction to Regression and Data Analysis

The Chi-Square Test. STAT E-50 Introduction to Statistics

Introduction to Statistics and Quantitative Research Methods

UNDERSTANDING THE TWO-WAY ANOVA

Guided Reading 9 th Edition. informed consent, protection from harm, deception, confidentiality, and anonymity.

Multiple Regression: What Is It?

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Two Related Samples t Test

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Part 2: Analysis of Relationship Between Two Variables

Statistics. Measurement. Scales of Measurement 7/18/2012

Multivariate Analysis of Variance (MANOVA)

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

When to use Excel. When NOT to use Excel 9/24/2014

DISCRIMINANT FUNCTION ANALYSIS (DA)

TRAINING PROGRAM INFORMATICS

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Data Analysis for Marketing Research - Using SPSS

Introduction to Statistics Used in Nursing Research

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Elementary Statistics Sample Exam #3

Intro to Parametric & Nonparametric Statistics

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ


Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Statistical tests for SPSS

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Directions for using SPSS

Tutorial 5: Hypothesis Testing

individualdifferences

January 26, 2009 The Faculty Center for Teaching and Learning

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

R&M Solutons

Transcription:

Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm Data Analysis Brief Book (glossary) http://rkb.home.cern.ch/rkb/titlea.html Exploratory Data Analysis http://www.drtomoconnor.com/3760/3760lect07.htm http://www.itl.nist.gov/div898/handbook/eda/eda.htm Statistical Data Analysis http://itl.nist.gov/div898/handbook/eda/eda.htm http://home.ubalt.edu/ntsbarsh/stat-data/topics.htm Using Excel for Data Analysis http://office.microsoft.com/en-us/excel-help/about-statisticalanalysis-tools-hp005203873.aspx http://people.umass.edu/evagold/excel.html (using Excel) 2 Copyright 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E 3 FIGURE 12.1 Data Analysis Get the feel for the data Get Mean, variance' and standard deviation on each variable See if for all items, responses range all over the scale, and not restricted to one end of the scale alone. Obtain Pearson Correlation among the variables under study. Get Frequency Distribution for all the variables. Tabulate your data. Describe your sample's key characteristics (Demographic details of sex composition, education, age, length of service, etc. ) See Histograms, Frequency Polygons, etc. 4 Quantitative Data Descriptive Statistics Describing key features of data Each type of data requires different analysis method(s): Nominal Labeling No inherent value basis Categorization purposes only Ordinal Ranking, sequence Interval Relationship basis (e.g. age) Central Tendency Mean, median mode Spread Variance, standard deviation, range Distribution (Shape ) Skewness, kurtosis 5 6 1

Descriptive Statistics Describing key features of data Testing Goodness of Fit Nominal Identification / categorization only Ordinal (Example on pg. 139) Non-parametric statistics Do not assume equal intervals Frequency counts Averages (median and mode) Interval Parametric Mean, Standard Deviation, variance 7 Reliability Validity Involves and Factor Analysis Split Half Internal Consistency Convergent Discriminant Factorial 8 Testing Hypotheses Use appropriate statistical analysis T-test (single or twin-tailed) Test the significance of differences of the mean of two groups ANOVA Test the significance of differences among the means of more than two different groups, using the F test. Regression (simple or multiple) Establish the variance explained in the DV by the variance in the Ivs http://itl.nist.gov/div898/handbook/eda/section3/s catterp.htm 9 Statistical Power Claiming a significant difference Errors in Methodology Type 1 error Reject the null hypothesis when you should not. Called an alpha error Type 2 error Fail to reject the null hypothesis when you should. Called a beta error Statistical power refers to the ability to detect true differences avoiding type 2 errors 10 Statistical Power see discussion at http://my.execpc.com/4a/b7/helberg/pitfalls/ Depends on 4 issues Sample size The effect size you want to detect The alpha (type 1 error rate) you specify The variability of the sample Too little power Overlook effect Too much power Any difference is significant Parametric vs. nonparametric Parametric (characteristics referring to specific population parameters) Parametric assumptions Independent samples Homogeneity of variance Data normally distributed Interval or better scale Nonparametric assumptions Sometimes independence of samples 11 12 2

t-tests (Look at t tables; p. 435) Used to compare two means or one observed mean against a guess about a hypothesized mean For large samples t and z can be considered equivalent Calculate t = - µ S Where S is the standard error of the mean, S/ n and df = n-1 13 t-tests Statistical programs will give you a choice between a matched pair and an independent t-test. Your sample and research design determine which you will use. 14 z-test for Proportions (Look at t tables; p. 435) When data are nominal Describe by counting occurrences of each value From counts, calculate proportions Compare proportion of occurrence in sample to proportion of occurrence in population Hypotheses testing allows only one of two outcomes: success or failure z-test for Proportions (Look at t tables; p. 435) Comparing sample proportion to the population proportion H 0 : π = k, where k is a value between 0 and 1 H 1 : π k z = p -π = p -π σ p (π(1- π)/n) Equivalent to χ 2 for df = 1 15 16 Chi-Square Test(sampling distribution) One Sample Measures sample variance Squared deviations from the mean based on normal distribution Nonparametric Compare expected with observed proportion H 0 : Observed proportion = expected proportion df = number of data points categories, cells (k) minus 1 Univariate z Test Test a guess about a proportion against an observed sample; eg., MBAs constitute 35% of the managerial population H 0 : π =.35 H 1 : π 5.35 (two-tailed test suggested) χ 2 = (O E)2 E 17 18 3

Univariate Tests Some univariate tests are different in that they are among statistical procedures where you, the researcher, set the null hypothesis. In many other statistical tests the null hypothesis is implied by the test itself. 19 Contingency Tables Relationship between nominal variables http://www.psychstat.smsu.edu/introbook/sbk28m.htm Relationship between subjects' scores on two qualitative or categorical variables (Early childhood intervention) If the columns are not contingent on the rows, then the rows and column frequencies are independent. The test of whether the columns are contingent on the rows is called the chi square test of independence. The null hypothesis is that there is no relationship between row and column frequencies. 20 A statistical summary of the degree and direction of association between two variables Correlation itself does not distinguish between independent and dependent variables Most common Pearson s r You believe that a linear relationship exists between two variables The range is from 1 to +1 R 2, the coefficient of determination, is the % of variance explained in each variable by the other 21 22 r = S xy /S x S y or the covariance between x and y divided by their standard deviations Calculations needed The means, x-bar and y-bar Deviations from the means, (x x-bar) and (y y-bar) for each case The squares of the deviations from the means for each case to insure positive distance measures when added, (x - x- bar) 2 and (y y-bar) 2 The cross product for each case (x x- bar) times (y y-bar) 23 The null hypothesis for correlations is H 0 : ρ = 0 and the alternative is usually H 1 : ρ 0 However, if you can justify it prior to analyzing the data you might also use H 1 : ρ > 0 or H 1 : ρ < 0, a one-tailed test 24 4

Alternative measures Spearman rank correlation, r ranks r ranks and r are nearly always equivalent measures for the same data (even when not the differences are trivial) Phi coefficient, r Φ, when both variables are dichotomous; again, it is equivalent to Pearson s r 25 Alternative measures Point-biserial, r pb when correlating a dichotomous with a continuous variable If a scatterplot shows a curvilinear relationship there are two options: A data transformation, or Use the correlation ratio, η 2 (etasquared) SS within 1 - SS total 26 ANOVA For two groups only the t-test and ANOVA yield the same results You must do paired comparisons when working with three or more groups to know where the means lie Multivariate Techniques Dependent variable Regression in its various forms Discriminant analysis MANOVA Classificatory or data reduction Cluster analysis Factor analysis Multidimensional scaling 27 28 Linear Regression We would like to be able to predict y from x Simple linear regression with raw scores y = dependent variable s y x = independent variable s x b = regression coefficient = r xy c = a constant term The general model is y = bx + c (+e) 29 Linear Regression The statistic for assessing the overall fit of a regression model is the R 2, or the overall % of variance explained by the model R 2 = 1 = unpredictable variance total variance predictable variance total variance = 1 (s 2 e / s 2 y), where s 2 e is the variance of the error or residual 30 5

Linear Regression Multiple regression: more than one predictor y = b 1 x 1 + b 2 x 2 + c Each regression coefficient b is assessed independently for its statistical significance; H 0 : b = 0 So, in a statistical program s output a statistically significant b rejects the notion that the variable associated with b contributes nothing to predicting y Linear Regression Multiple regression R 2 still tells us the amount of variation in y explained by all of the predictors (x) together The F-statistic tells us whether the model as a whole is statistically significant Several other types of regression models are available for data that do not meet the assumptions needed for least-squares models (such as logistic regression for dichotomous dependent variables) 31 32 Regression by SPSS & other Programs Methods for developing the model Stepwise: let s computer try to fit all chosen variables, leaving out those not significant and reexamining variables in the model at each step Enter: researcher specifies that all variables will be used in the model Forward, backward: begin with all (backward) or none (forward) of the variables and automatically adds or removes variables without reconsideration of variables already in the model 33 Multicollinearity Best regression model has uncorrelated IVs Model stability low with excessively correlated IVs Collinearity diagnostics identify problems, suggesting variables to be dropped High tolerance, low variance inflation factor are desirable 34 Discriminant Analysis Regression requires DV to be interval or ratio If DV categorical (nominal) can use discriminant analysis IVs should be interval or ratio scaled Key result is number of cases classified correctly MANOVA Compare means on two or more DVs (ANOVA limited to one DV) Pure MANOVA via SPSS only from command syntax Can use the general linear model though 35 36 6

Factor Analysis A data reduction technique a large set of variables can be reduced to a smaller set while retaining the information from the original data set Data must be on an interval or ratio scale E.g., a variable called socioeconomic status might be constructed from variables such as household income, educational attainment of the head of household, and average per capita income of the census block in which the person resides Cluster Analysis Cluster analysis seeks to group cases rather than variables; it too is a data reduction technique Data must be on an interval or ratio scale E.g., a marketing group might want to classify people into psychographic profiles regarding their tendencies to try or adopt new products pioneers or early adopters, early majority, late majority, laggards 37 38 Factor vs. Cluster Analysis Factor analysis focuses on creating linear composites of variables Number of variables with which we must work is then reduced Technique begins with a correlation matrix to seed the process Cluster analysis focuses on cases 39 Potential Biases Asking the inappropriate or wrong research questions. Insufficient literature survey and hence inadequate theoretical model. Measurement problems Samples not being representative. Problems with data collection: researcher biases respondent biases instrument biases Data analysis biases: coding errors data punching & input errors inappropriate statistical analysis Biases (subjectivity) in interpretation of results. 40 Questions to ask: Adopted from Robert Niles Where did the data come from? How (Who) was the data reviewed, verified, or substantiated? How were the data collected? How is the data presented? What is the context? Cherry-picking? Be skeptical when dealing with comparisons Spurious correlations Copyright 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E 41 FIGURE 11.2 7