Basic Statistical Concepts, Research Design, & Notation

, Research Design, & Notation

Variables, Scores, & Data A variable is a characteristic or condition that can change or take on different values. Most research begins with a general question about the relationship between two variables for a specific group of individuals. Example from book: time spent playing video games & time spent exercising. A score is the value of a variable measured for a particular individual Data are collections of scores measured for multiple individuals

Populations A population is the set of all individuals or events of interest in a particular study. Populations: Are generally very large Can consist of arbitrary categories of people, objects, and events Can include hypothetical or counterfactual events

Samples It is usually impractical for a researcher to examine every individual in the population Instead researchers typically select a small representative group a sample from the population and limit their studies to individuals in the sample The goal is to use the results obtained from the sample to help answer questions about the population.

Descriptive Statistics Descriptive statistics are methods for organizing and summarizing data. Tables and graphs organize data Descriptive values (averages, frequencies, proportions) summarize data. A descriptive value for a population is called a parameter and a descriptive value for a sample is called a statistic. Parameters are generally represented as Greek letters (e.g., µ,σ), while statistics are represented as Roman letters (e.g.,m,s)

Inferential Statistics Inferential statistics are methods for using sample data to make general conclusions (inferences) about populations. A sample typically contains only a small part of the whole population. As a result, sample statistics are generally imperfect representatives of the corresponding population parameters.

Basic Statistical Concepts

Sampling Error The discrepancy between a sample statistic and its population parameter is called sampling error. Sampling error depends critically on 1. The amount of variability in the population (e.g., number of legs on a cow versus volume of milk produced) 2. The number of individuals in the sample (e.g., how many cows have we measured?) Defining and measuring sampling error is a large part of inferential statistics. We ll look more closely at sampling error in later lectures.

Measuring Variables To establish relationships between variables, researchers must observe the variables and record their observations. This requires that the variables be measured. The process of measuring a variable requires a set of categories called a scale of measurement and a process that classifies each individual into one category.

Four Types of Measurement Scales Basic Statistical Concepts 1. A nominal scale is an unordered set of categories identified only by name. Nominal measurements only permit you to determine whether two individuals are the same or different. 2. An ordinal scale is an ordered set of categories. Ordinal measurements tell you the direction of difference between two individuals, but contain no information about the magnitude of the difference between neighboring categories.

Four Types of Measurement Scales Basic Statistical Concepts 3. An interval scale is an ordered series of equal-sized categories. Interval measurements identify the direction and magnitude of a difference. However, the zero point is located arbitrarily on an interval scale. 4. A ratio scale is an interval scale where a value of zero indicates none of the variable. Ratio measurements identify the direction and magnitude of differences and allow ratio comparisons of measurements.

Types of Variables Variables can be classified as discrete or continuous. Discrete variables (such as class size) consist of indivisible categories Continuous variables (such as time or weight) are infinitely divisible into whatever units a researcher may choose. For example, time can be measured to the nearest minute, second, half-second, etc.

Types of Data Another useful distinction is that of qualitative versus quantitative data Qualitative/Categorical data occur when we assign objects/events into labeled (i.e., nominal or ordinal) groups, representing only frequencies of occurrence E.g., race, gender, yes/no response Quantitative/Measurement data occur when we obtain some number that describes the quantitative trait of interest. These numbers can be either discrete or continuous E.g., height, weight, income

Examples of Variables and Their Classifications Variables Continuous vs. Discrete Qualitative vs. Quantitative Scale of Measurement Gender (male, female) Discrete Qualitative Nominal Seasons (spring, summer, fall, winter) Discrete Qualitative Nominal Number of dreams recalled Discrete Quantitative Ratio Number of errors Discrete Quantitative Ratio Duration of drug abuse (in years) Continuous Quantitative Ratio Ranking of favorite foods Discrete Quantitative Ordinal Ratings of satisfaction (1 to 7) Discrete Quantitative Interval or Ordinal Body type (slim, average, heavy) Discrete Qualitative Nominal Score on a multiple-choice exam Discrete Quantitative Ratio Number of students in your class Discrete Quantitative Ratio Temperature (degrees Fahrenheit) Continuous Quantitative Interval Time (in seconds) to memorize a list Continuous Quantitative Ratio The size of a reward (in grams) Continuous Quantitative Ratio Position standing in line Discrete Quantitative Ordinal Political Affiliation (Republican, Democrat) Discrete Qualitative Nominal Type of distraction (auditory, visual) Discrete Qualitative Nominal A letter grade (A, B, C, D, F) Discrete Qualitative Ordinal Weight (in pounds) of a newborn infant Continuous Quantitative Ratio A college students' SAT score Discrete Quantitative Interval Number of lever presses per minute Discrete Quantitative Ratio

Basic Research Designs Correlational Studies Experimental Studies Quasi-Experimental Studies Different research designs produce different forms of data answer different types of questions require different statistical techniques

Correlational Studies The goal of a correlational study is to determine whether there is a systematic relationship between two variables and to describe the relationship. A correlational study simply observes the two variables as they exist naturally.

Example Data from a Correlational Study

Experiments The goal of an experiment is to demonstrate a cause-and-effect relationship between two (or more) variables I.e., to show that changing the value of one variable causes changes to occur in a second variable. In a simple experiment: One variable (the independent variable) is manipulated to create treatment conditions. A second variable (the dependent variable) is observed and measured to obtain scores for a group of individuals in each of the treatment conditions. The critical elements of an experiment are: Manipulation of an independent variable Control of all extraneous variables (e.g., using random assignment) Measurement and comparison of dependent variable across conditions

Example Data from an Experiment Basic Statistical Concepts Variable 1 (independent): Distraction Condition Variable 2 (dependent): Exam Score Low Distraction High Distraction 92 78 77 80 75 82 82 64 84 67 93 85 96 75

Quasi-Experimental Studies Quasi-experimental studies are correlational studies that look similar to experiments because they also compare groups of scores. However: These studies do not use a manipulated variable to differentiate the groups. The variable that differentiates the groups is usually a pre-existing participant variable (such as male/female) or a time variable (such as before/after). Because these studies do not use the manipulation and control of true experiments, they cannot demonstrate cause and effect relationships.

Example Data from a Quasi-Experiment Basic Statistical Concepts Variable 1 (quasi-independent): Gender Variable 2 (dependent): Number of tasks completed Male Female 9 10 8 7 9 8 7 9 5 11 6 9 6 11

Random Sampling & Assignment Basic Statistical Concepts Random sampling occurs when individuals are selected such that each member of the population has an equal chance of inclusion Failure to sample randomly may result in statistics that don t reflect the whole population E.g., average height computed for a sample consisting only of women is unlikely to reflect the average height of all adults Random assignment occurs when individuals are assigned to different groups using a random process Failure to assign randomly confounds the independent variable; any measured difference in a dependent variable could be due solely to the assignment

Statistical Notation The individual measurements or scores obtained for a research participant will be identified by the letter x (or x and y if there are multiple scores for each individual). The number of scores in a data set will be identified by N. Summing a set of values is a common operation in statistics and has its own notation. The Greek letter sigma, Σ, is used to mean "the sum of." N For example, (or simply ΣX) identifies the sum of the N scores. i= 1 x i

Order of Operations PEMDAS (Please excuse my dear Aunt Sally) 1. All calculations within parentheses are done first. 2. Squaring or raising to other exponents is done second. 3. Multiplying, and dividing are done third, and should be completed in order from left to right. 4. Summation with the Σ notation is done next. 5. Any additional adding and subtracting is done last and should be completed in order from left to right. Note: in the interest of consistency, always report results to two decimal places beyond the precision of the original data (including in intermediate calculations)

Useful Summation Identities 1. N i i N Cx = C x i i 2. N i ( ) i N x + C = x + NC i i N N N 3. ( ) i x y = x y i i i i j j

Example Problems Given the following values for x and y: x ={11,14,10,13,12} y ={3,2,2,5,1} Compute the following: Σ2x Σ(x-1) Σy Σy 2 (Σy) 2 (Σ(x y)) 2