Previous lecture. Lecture 6. Learning outcomes of this lecture. Today. Entering data into SPSS. Data coding. Data preparation methods

Similar documents
Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Lecture 1: Review and Exploratory Data Analysis (EDA)

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Using SPSS, Chapter 2: Descriptive Statistics

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 3. The Normal Distribution

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Exploratory data analysis (Chapter 2) Fall 2011

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Projects Involving Statistics (& SPSS)

Descriptive Statistics

AP * Statistics Review. Descriptive Statistics

Descriptive Statistics

Exercise 1.12 (Pg )

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Variables. Exploratory Data Analysis

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

II. DISTRIBUTIONS distribution normal distribution. standard scores

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

How To Write A Data Analysis

Data Exploration Data Visualization

Exploratory Data Analysis. Psychology 3256

Chapter 2: Descriptive Statistics

Summarizing and Displaying Categorical Data

Dongfeng Li. Autumn 2010

Describing, Exploring, and Comparing Data

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

An introduction to using Microsoft Excel for quantitative data analysis

Foundation of Quantitative Data Analysis

HYPOTHESIS TESTING WITH SPSS:

Chapter 7 Section 7.1: Inference for the Mean of a Population

The Dummy s Guide to Data Analysis Using SPSS

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Statistics. Measurement. Scales of Measurement 7/18/2012

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

MTH 140 Statistics Videos

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

THE KRUSKAL WALLLIS TEST

First Midterm Exam (MATH1070 Spring 2012)

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Introduction to Quantitative Methods

Mathematics within the Psychology Curriculum

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

Practice#1(chapter1,2) Name

Final Exam Practice Problem Answers

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

NCSS Statistical Software

IBM SPSS Statistics for Beginners for Windows

SPSS Explore procedure

3: Summary Statistics

Module 4: Data Exploration

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Lecture 2. Summarizing the Sample

CALCULATIONS & STATISTICS

AP Statistics Solutions to Packet 2

Descriptive Statistics

a. mean b. interquartile range c. range d. median

Basics of Statistics

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

GeoGebra Statistics and Probability

Data exploration with Microsoft Excel: univariate analysis

MEASURES OF LOCATION AND SPREAD

Exploratory Data Analysis

Introduction Course in SPSS - Evening 1

Introduction to Statistics with SPSS (15.0) Version 2.3 (public)

2. Filling Data Gaps, Data validation & Descriptive Statistics

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Midterm Review Problems

Testing for differences I exercises with SPSS

An introduction to IBM SPSS Statistics

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Study Guide for the Final Exam

Statistical Data analysis With Excel For HSMG.632 students

Description. Textbook. Grading. Objective

Mind on Statistics. Chapter 2

Chapter 1: Exploring Data

Introduction; Descriptive & Univariate Statistics

Descriptive Statistics and Measurement Scales

MBA 611 STATISTICS AND QUANTITATIVE METHODS

An Introduction to Statistics using Microsoft Excel. Dan Remenyi George Onofrei Joe English

Means, standard deviations and. and standard errors

Descriptive Analysis

Data analysis process

Chapter 23. Inferences for Regression

Directions for Frequency Tables, Histograms, and Frequency Bar Charts

How To Test For Significance On A Data Set

2: Frequency Distributions

Sta 309 (Statistics And Probability for Engineers)

Week 3&4: Z tables and the Sampling Distribution of X

Northumberland Knowledge

Chapter 7. One-way ANOVA

determining relationships among the explanatory variables, and

2 Describing, Exploring, and

Transcription:

Lecture 6 Empirical Research Methods IN4304 Data preparation methods Previous lecture participant-observation and non-participant observation Does the observer act as a member of the group? Sampling strategies Event sampling, state sampling, interval sampling, time sampling Coding scheme Focused Objective No context-dependent Explicitly defined Exhaustive Mutually exclusive Easy to record Inter-observer agreement (Cohen s Kappa) K = (P 0 P c )/(1-P c ) IN4304 Empirical Research Methods Spring 2010 Lecture 6 1 IN4304 Empirical Research Methods Spring 2010 Lecture 6 2 Today Learning outcomes of this lecture Data coding Measures of central tendency Measures of variability Explorative Data Analysis Data cleaning Normal distribution Standardizing data Confidence intervals After today s lecture you should be able : to enter your data into SPSS to examine the central tendency of your data set to understand the output of some explorative data analysis techniques to reflect up on reasons for data cleaning to explain the importance of normal distribution IN4304 Empirical Research Methods Spring 2010 Lecture 6 3 IN4304 Empirical Research Methods Spring 2010 Lecture 6 4 Data coding Coding is needed to process the data answers of questionnaire or observation How much do do you you like like this this lecture? (tick one one answer) Very much O (code = 1) 1) Quite a lot lot O (code = 2) 2) Moderately O (code = 3) 3) Not Not very much O (code = 4) 4) Not Not at at all all O (code = 5) 5) Not Not applicable O (code 900) Entering data into SPSS Give variable a clear name and label Specify data type (string, number, date etc) Specify values code Specify missing value code Specify level of variable (nominal, ordinal, interval) SPSS Demo No No answer (missing value) (code = 999) IN4304 Empirical Research Methods Spring 2010 Lecture 6 5 IN4304 Empirical Research Methods Spring 2010 Lecture 6 6 1

0.0 5 0.0 4 0.0 4 0.0 3 0.0 3 0.0 2 0.0 2 0.0 1 0.0 1 0.0 0 Changing data from one level to another Changing data from interval to ordinal level SPSS recode function Extraversion 1 100 1 45 56 100 low medium high Direction scale Sophistication Childish O O O O O O O Sophisticated Classy O O O O O O O Silly Novelty O O O O O O O Business Code1 : 1 2 3 4 5 6 7 Code2: -3-2 -1 0 1 2 3 Classy-Silly needs to be reversed For example: code1 : 8-value Code2 : -1*value SPSS command: Compute IN4304 Empirical Research Methods Spring 2010 Lecture 6 7 IN4304 Empirical Research Methods Spring 2010 Lecture 6 8 Measures of central tendency Interval level: Mean Measures of variability Ordinal level: Median Nominal level: Mode What is the best measure to represent: 1 3 3 3 4 7 8 8 9 10 11 11 12 13 13 15 16 17 17 19 10000 Mean: 485.71 Median: 11 Mode : 3 5% trimmed mean: 10.47 To express the distribution of the data set. Mean is 100. The spread is different. Relative frequency 80 100 120 Value of X IN4304 Empirical Research Methods Spring 2010 Lecture 6 9 IN4304 Empirical Research Methods Spring 2010 Lecture 6 10 Range Inter Quartile Range Range of a distribution is the difference between the highest and the lowest value in a set of numbers Range = Max-Min For example 8 9 10 2 4 3 100 12 80 Range = 100-2 = 98 A quartile is any of the three values which divide the sorted data set into four equal parts First quartile (Q1) = cuts off lowest 25% of data = 25th percentile Second quartile (Q2) = Median = cuts data set in half = 50th percentile Third quartile (Q3) = cuts off highest 25% of data, or lowest 75% = 75th percentile Inter Quartile Range (IQR) = Q3-Q1 The IQR is a more stable statistic than the range, as it is less effected by extreme values. Q3 Q2 Q1 IQR IN4304 Empirical Research Methods Spring 2010 Lecture 6 11 IN4304 Empirical Research Methods Spring 2010 Lecture 6 12 2

Standard Deviation (SD) Explorative Data Analysis (SDA) Standard Deviation measures the spread from the mean. You can think of it as the mean deviation from the mean Population mean: µ Sample mean: x Population SD: σ Sample SD: s σ = s = N ( xi µ ) i= 1 n N ( xi x) i= 1 n 1 2 2 Examine your data before you apply statistical analysis on it. Use techniques such as Box plot stem and leaf plot histogram IN4304 Empirical Research Methods Spring 2010 Lecture 6 13 IN4304 Empirical Research Methods Spring 2010 Lecture 6 14 Stemplot (stem-and-leaf plot) Number of times a week that a website was visited Stem-and-Leaf Plot Frequency Stem & Leaf 11.00 0. 00012233447 2.00 1. 22 3.00 2. 057 7.00 3. 1357888 6.00 4. 234569 1.00 5. 0 5.00 6. 25689 4.00 7. 2359 2.00 8. 69 Stem width: 100 Each leaf: 1 case(s) SPSS explorative command Weekno Hits 43 3 41 5 44 9 42 12 37 25 36 28 38 34 39 35 35 44 40 46 14 72 22 121 Histogram Gives an overview of shape of the distribution SPSS graph - histogram IN4304 Empirical Research Methods Spring 2010 Lecture 6 15 IN4304 Empirical Research Methods Spring 2010 Lecture 6 16 Box plot Minimum = Lowest point within Q1-1.5*IQR Outlier Q1 Median Q3 Minimum = Highest point within Q3+1.5*IQR 0 50 100 Outliers A case whose value is very different from most others Can bias statistics such as mean What should you do with outliers? Exclude them? Investigate them? Include them in the analysis? SPSS Explorative analysis or graph command IN4304 Empirical Research Methods Spring 2010 Lecture 6 17 IN4304 Empirical Research Methods Spring 2010 Lecture 6 18 3

Data cleaning Reason to remove cases Observation was distorted, e.g. did not understand task, questionnaire, purposely messing up your experiment, error in measuring instrument Outliers, but only statistical reason is questionable Check your data for unlikely data points IN4304 Empirical Research Methods Spring 2010 Lecture 6 19 Normal distribution 68% within 1SD of mean 95% within 2SD of mean 99.7% within 3SD of mean Why normal distribution important? 1. Gives a good model of data for some real world data sets (e.g. large populations) 2. Good approximation of results of random outcomes (e.g. throwing a dice many times) 3. Large number of inferential statistics are based on normal distributions (Moore and McCabe, 2006) IN4304 Empirical Research Methods Spring 2010 Lecture 6 20 Standardizing data (1) Standardizing data (2) Transforming data into z-score, makes it possible to compare data from different unit of measurement z = x µ σ x x z = s Repeated measure design focus on relative difference z-scores to overcome individual use of questionnaire scale e.g. People that only use the extremes People that never use the extremes People that only give high marks For each individual calculate mean and sd across the his/her questions Calculate z-score for value an individual gave to question based on this persons mean and sd of his/her questions Unlikely 1 extremely 2 quite 3 slightly 4 neither 5 slightly 6 quite 7 extremely Likely IN4304 Empirical Research Methods Spring 2010 Lecture 6 21 IN4304 Empirical Research Methods Spring 2010 Lecture 6 22 Standardizing data (3) Example Participant 1 answered: Q1=1 Q2=3 Q3=7 Q4=7 z-score would be: Q1= -1.17 Q2=-0.50 Q3=0.83 Q4=0.83 Participant 2 answered: Q1=4 Q2=5 Q3=6 Q4=6 z-score would be: Q1=-1.31 Q2=-0.26 Q3=0.78 Q4=0.78 Unlikely 1 extremely 2 quite 3 slightly 4 neither 5 slightly 6 quite 7 extremely Likely Medium, mean and distribution shape Symmetrical (median = mean) Notice that for the sample 2 8 110 2 The median = 5 The mean = 30.5 Positively skewed or skewed to the right (median < mean) Negatively skewed or skewed to the left (median > mean) IN4304 Empirical Research Methods Spring 2010 Lecture 6 23 IN4304 Empirical Research Methods Spring 2010 Lecture 6 24 4

Skewness Measure Skewness = 0 --> symmetrical distribution Skewness > 0 --> positively skewed Skewness < 0 --> negatively skewed Floor effect Ceiling effects Check for normality (1) Look at the histogram and the projected normal curve Rule of the thumb for serious deviation from normality skewness (or kurtosis) statistic > 2*Std Error (i.e. sd/sqrt(n)) Difference between mean and median > 0.5*SD (Coolican, 2004) IN4304 Empirical Research Methods Spring 2010 Lecture 6 25 IN4304 Empirical Research Methods Spring 2010 Lecture 6 26 Check for normality (2) Confidence intervals Kolmogorov-Smirnov Normality test Tests of Normalit VAR00004 Kolmogorov-Smirnov a Statistic df Sig..178 44.001 a. Lilliefors Significance Correction Probability that this sample was obtained from normal distributed population Remember: it is easier to show deviation from normality with large samples It does not tell about the size of deviation IN4304 Empirical Research Methods Spring 2010 Lecture 6 27 If we assume the sample is taken from a normal distribution it is possible to make a prediction of the interval in which the population mean might be with e.g. 95% certainty. IN4304 Empirical Research Methods Spring 2010 Lecture 6 28 Class Questions Summary What is the difference between box plot and confidence interval? Data Entry Code data, direction scales, clear name, missing data, level Central tendency Mean, median, mode, IOR, SD Explorative data analysis Boxplot Histogram Stem and leaf plot Normal distribution 1. Gives a good model of data for some real world data sets (e.g. large populations) 2. Good approximation of results of random outcomes (e.g. throwing a dice many times) 3. Large number of inferential statistics are based on normal distributions Check for Normality if you want to use parametric statistics IN4304 Empirical Research Methods Spring 2010 Lecture 6 29 IN4304 Empirical Research Methods Spring 2010 Lecture 6 30 5

This week in practicum Next time Explorative Data Analysis Box-plot Scatter plot Stem-and-leaf plot Quantitative Data analysis I Differences Hypothesis testing Chi-square test, t-test (M)ANOVA Mann-Whitney U-test (Robson ch. 13) IN4304 Empirical Research Methods Spring 2010 Lecture 6 31 IN4304 Empirical Research Methods Spring 2010 Lecture 6 32 References Coolican, H., (2004). Research methods and statistics in psychology (4 th ed). London, UK: Hodder Arnold. Moore D.S, and McCabe, G.P. (2006) Statistiek in de praktijk; theorieboek (5 ed) Den Haag: The Netherlands, Academic Service. IN4304 Empirical Research Methods Spring 2010 Lecture 6 33 6