1.1 What is statistics? Data Collection Chapter 1 A science (bet you thought it was a math) of Collecting of data (Chap. 1) Organizing data (Chap. 2) Summarizing data (Chap. 2, 3) Analyzing data (Chap. 3-7) Drawing conclusions from data with a measure of confidence (Chap.8-10) What is data? Important Definitions According to American Heritage Dictionary data is defined as a fact or proposition used to draw a conclusion or make a decision. It can be Numerical Non-numerical Population: entire group of individuals to be studied Individual: person or object that is a member of the poulation Sample: subset of the being studied Statistic: numerical summary of a sample Parameter: numerical summary of a Descriptive Statistics Inferential Statistics Organizing data Summarizing data Generalize results from a sample to the s and measure their reliability Goal is to use statistics to estimate parameters.
Parameter vs Statistic Parameter: Percentage of all () students on your campus that own a car is 48.2%. Statistic: Percentage of students from a sample of 100 who own a car is 46%. Statistical Problem: Example 2 (pg 6) A poll was conducted by the Gallup Organization on October 4-7, 2007, to learn how Americans feel about existing gun-control laws. The following statistical process allowed the researchers at Gallup to conduct their study. Identify Research Objective To determine the percentage of Americans aged 18 or older who were in favor of more strict gun-control laws. Population being studied was Americans aged 18 years or older. Collect Information Needed to Answer the Objective Get a sample (1,010 Americans aged 18 years or older) Of those surveyed 515 stated they were in favor of more strict laws covering the sale of firearms. Describe Data Of the 1,010 individuals in the survey, 51% (=515/1,010) are in favor of more strict laws covering the sale of firearms. Descriptive statistic Perform Inference Gallup wanted to extent the results to Americans aged 18 years or older. Remember, when generalizing results from a sample to a, the results are uncertain. To account for uncertainity, Gallup reported a 3% margin of error. This means that Gallup feels fairly certain that the percentage of all Americans aged 18 or older in favor of more strict laws covering the sale of firearms is somewhere between 48% and 54%. See Pg 12: #46, 48 See Pg. 13: #55
Types of Data Variables Qualitative or Categorical Variables Words: vanilla, blue, yes, disagree Numbers used as identifiers: zip code, SS#, student number Quantitative variables Numerical measures: can be used to perform arithmetic operations which results in meaningful results See pg 12: # 22-28 evens Classification of Quantitative Variables Discrete Variables Finite number of values Countable Continuous Variables Infinite number of values Not countable (Measured) See pg 12: # 30-44 evens Variables vs Data Data: list of observed values for a variable Qualitative data: observations corresponding to a qualitative variable Discrete data: observations corresponding to a discrete variable Continuous data: observations corresponding to a continuous variable See Example 5 (pg 9) and #51 (pg 13) Level of Measurement of a Variable Nominal Level (qual): values of variable name, label, or categorize Ordinal Level (qual): properties of nominal level and naming scheme allows rank or specific order Interval Level (quan): properties of ordinal level and differences in values have a meaning Zero does not mean the absence of the quantity Ratio Level (quan): properties of interval level and the ratios of values have meaning Zero means absence of the quantity See Example 6 (pg. 10) and #38-44 evens (pg 14) 1.2 Statistical Study Goals Statistical Study Types Goal of any study is to determine how varying amounts of an explanatory variable affects the value of a response variable. See Example 1 (pg 15) Observational Study Characteristics or individuals studied but data not manipulated or influenced Ex post facto (after the fact) because data had already been gathered Does not allow a researcher to claim causation, only association
Reason for Observational Studies Don t collect data that has already been collected! Reason 1: To learn characteristics of a Reason 2: To determine whether there is an association between two or more varaibles where the values of the variables have already been determined. Types of Observational Studies Cross-sectional studies: collect information about individuals at a specific point in time or over a very short period of time Case-control studies: retropective, requires individuals to look back in time or researchers look at existing records Cohort studies: a group of individuals, cohort, observed over a period or time (can be a long time) where characteristics about individuals recorded and some individuals studied further See pg 21: #18 Statistical Study Types See Example 2 (pg. 16) Designed Experiment Individuals in study assigned to certain group Groups are given varying degrees of explanatory variable Values of the response variable are recorded for each group Reasons for Designed Experiments Use when control of certain variables is desired If cause and effect relationships among variables desired Design vs Observational See Example 3 (pg 17) Confounding: when effects of two or more explanatory variables are not separated Lurking variable: an explanatory variable that was not considered in a study but that affects the value of the response variable in the study See Pg 20: #10-16 evens 1.3 Simple Random Sampling Like selecting names from a hat Every possible sample has an equally likely chance of occurring Need to have a list of the, called a frame Sampling without replacement keeps the same individual from being used more than once
Random Number Table Used to randomly select individuals for a sample from a frame (pg. 25) Calculators and computers used more today (pg. 26) Samples created randomly can result in different conclusions regarding the Inferences based on samples will vary because the individuals in different samples vary. See Pg 27: #5, 8 1.4 Other Types of Sampling Stratified Sampling Separate in non-overlaping groups called strata Obtain simple random sample from each stratum Stratum should be homogeneous (or similar) in some way Advantages of stratified sampling Allows fewer individuals to be surveyed while obtaining the same or more information Allows analysis to determine significance differences between the strata or groups Systematic Sampling Obtained by selecting every kth individual from the. The first individual selected is a random number between 1 and k No frame (list of ) is needed K is determined when the size of the, N, is known by dividing by the sample size and rounding down. Advantages of systematic sampling Population size does not have to be known Provides more information for a given cost more than other sampling types Easier to do, less likely for interviewer error in getting sample Cluster Sampling Obtained by selecting all individuals within a randomly selected collection or group of individuals Questions in cluster sampling How do I cluster the? How many clusters? How many individuals should be in each cluster?
Clusters homogeneous more clusters with fewer individuals per cluster Clusters heterogeneous fewer clusters with more individuals per cluster Convenience Sampling Sample in which the individuals are easily obtained Self-selected most popular (voluntarily decide to be in sample) Examples: Magazine or Internet surveys Not good for making inferences about Multistage Sampling Combination of sampling techniques Examples: Nielsen ratings Sample Size Considerations How many individuals must I survey in order to draw conclusions about the within some predeterminded margin of error? Balance the cost and results needed Method of determining sample size mathematically later See chart on page 35 of text See pg 36: #12-22 evens 1.5 Bias in Sampling Sampling Bias (pg. 38) Technique used to obtain individuals for the sample tends to favor one part of Undercoverage means the proportion of one segment of the is lower in the sample than it is in the Incorrect predictions due to sampling errors (incorrect or incomplete frame)
Bias in Sampling Nonresponse Bias (pg. 39) Individuals selected for sample do not respond to the survey and have different opinons from those who do respond Can be controlled using callbacks Can be lessen by offering rewards and incentives Bias in Sampling Response Bias (pg. 40-41) Exists when answers on survey do not reflect the true feelings of the respondent Interviewer Error Misrepresented Answers Wording of Questions Ordering of Questions of Words Type of questions Data-entry error Sampling Errors vs Nonsampling Errors Nonsampling errors result from undercoverage, nonresponse bias, response bias or data-entry error. This error can be present in a census. Sampling error results from using a sample to estimate information about a. Occurs because a sample gives incomplete information about a. 1.6 Design of Experiments Experiment Controlled study conducted to determine the effect varying one or more explanatory variable or factors has on a response variable Treatment is a combinations of values of the factors Key Ingredients of well designed study: Control, manipulation, randomization, and replication Steps in Designing an Experiment (pg 47) Identify the problem to be solved Determine factors that affect response variable Determine the number of experimental units Determine the level of each factor Control Randomize Conduct experiment Replication Collect and process data Test the claim Example 3 (pg. 48)