Chapter 1: The Nature of Probability and Statistics Learning Objectives Upon successful completion of Chapter 1, you will have applicable knowledge of the following concepts: Statistics: An Overview and Description Types of Variables Measurement Scales Sampling Methods or Methods to Select Subjects for Samples Experimental Studies I. An Overview and Description A. Statistics is the science of conducting studies to: Collect Organize Summarize Analyze Draw conclusions from data B. Where do you use statistics? Sports Business Research Public Health C. Why do we use statistics? To be able to read and understand statistical studies. To conduct research To become a better consumer and citizen. D. Two branches of statistics: Descriptive statistics (Unit 1) is a collection, organization, summarization, and presentation of data. Inferential statistics (Units 3 & 4) uses data to generalize from a sample to its population, conduct hypothesis tests to determine relationships among variables, and estimate parameters. Dr. Janet Winter, jmw11@psu.edu Stat 200 Page 1
E. Basic Vocabulary Review A variable is a characteristic or attribute of interest that can assume different values. It is the question asked. Data are the values that the variables have assumed (answers to the question). Data set is a collection of data values, where each value is called a data value or datum. The population is all subjects of interest to the study. The sample is a part of the population or a subgroup (subset) of the subjects from the population. A parameter is a numerical summary of all data from the population. (e.g. mean of the populationμ). A statistic is a numerical of the data from the sample (e.g. mean of the sample x ). Symbols for Specific Parameters Type Parameter Statistic Mean μ x Size N n Variance σ 2 s 2 Standard Deviation Proportion P p σ s II. Variables A. Types of Variables I. Qualitative Data can be placed in distinct categories, according to some characteristic or attribute. Note: data is used to find the count and proportion for each category. a) Examples: Hair color Gender TV ownership Do you own a car? Employment status (full time, part time, not employed) Social security number License plate number Computer account number Dr. Janet Winter, jmw11@psu.edu Stat 200 Page 2
II. Quantitative Data are numerical representing counts or measurements that can be ordered and ranked. Note: data values can be used to find the average, standard, deviations, and variance. a) Types of Quantitative Variables: 1) Discrete counts or data with space between its possible values. ex: shoe shize. 2) Continuous measures or data that can assume an infinite number of values between two endpoints. ex: foot length b) Examples of Quantitative Variables: How many TV s do you own (discrete) How many cars does your family own? (discrete) What is your salary for the year? (discrete) What is your weight? (continuous) How far do you live from campus? (continuous) What is your blood pressure? (continuous) B. Exercise with Types of Variables Directions: Identify each of the following variables as Qualitative; Quantitative, discrete; or Quantitative, continuous. Use the chart provided below to write in your answers. Hair color Computer password Shoe size Shirt size (S, M, L) Shirt size (10, 12, 14, etc.) License plate number Zip code Foot length Height Time to drive to campus Qualitative Quantitative, Discrete Quantitative, Continuous *Answer key is located at the end of the document. Dr. Janet Winter, jmw11@psu.edu Stat 200 Page 3
C. Measurement Scales for Variables I. Types of Measurement Scales a) Nominal only classifies data into mutually exclusive (non-overlapping), exhausting categories in which no order or ranking can be imposed on the data. Example: SS# b) Ordinal classifies data into categories that can be ranked; however, precise differences between the ranks do not exist. Example: Shirt size (S, M, L) c) Interval ranks data with precise differences between the data values; however, there is no meaningful zero. Example: Shoe size d) Ratio possesses all the characteristics of interval measurement and a true zero. Example: Foot length Note: This level of measurement is called the ratio level because the zero starting point makes ratio meaningful. II. Levels of Measurement Data Level Summary Example Nominal Ordinal Interval Ratio (Triola & Triola, 2006) Categories only. Data cannot be arranged in an ordering scheme. Categories are ordered, but differences can t be found or are meaningless. Differences are meaningful, but there is no natural starting point, and ratios are meaningless. There is a natural zero starting point and ratios are meaningful. Bear encounter states: 5 New York 20 Idaho 40 Wyoming Bears according to aggressiveness: 5 not aggressive 20 somewhat aggressive 40 highly aggressive Bear den temperatures: 5 F 20 F 40 F Bear migration distances: 5 miles 20 miles 40 miles Categories or names only An order is determined by not, somewhat, highly. 0 F doesn t mean no heat. 40 F is not twice as hot as 20 F. 40 miles is twice as far as 20 miles. Dr. Janet Winter, jmw11@psu.edu Stat 200 Page 4
III. Exercise 01 with Measurement Scales Directions: Identify each of the following variables as Nominal, Ordinal, Interval or Ratio. Use the chart provided below to write in your answers. Zip code Grade Eye color Rating Gender IQ SAT score Height Ranking Temperature (F, C) Weight Time Nominal Ordinal Interval Ratio *Answer key is located at the end of the document. D. Exercise 02 with Measurement Scales Transportation Table Directions: The chart shows the number of job-related injuries for each of the transportation industries for 1998. Refer to this chart to answer the following 5 questions. Industry Number of injuries Railroad 4520 Intercity 5100 Subway 6850 Trucking 7144 Airline 9950 1. What are the variables under study? 2. Classify each of the variables as Quantitative, continuous; Quantitative, discrete; or Qualitative. 3. Identify the level of measurement for each variable. 4. The railroad is shown as the safest transportation industry. Does that mean railroads have fewer accidents than the other industries? 5. What factors other than safety influence a person s choice of transportation? ***Answers are located at the end of this document. Dr. Janet Winter, jmw11@psu.edu Stat 200 Page 5
III. Samples (part of the population, a subgroup or subset of the population) A. Samples Methods (or ways to select subjects or participants) I. Random samples number each subject in the population; select the subjects whose numbers match with numbers from a random number table. II. Systematic samples number each subject in the population; select the subject with every kth number. III. Stratified samples divide the population into subgroups according to some characteristic that is important to the study, then sample randomly from each subgroup. IV. Cluster samples randomly select entire intact groups called a cluster that represents the population. B. Why Use Samples Instead of Populations Saves time and money Experiment can include more detail It is effective C. SRS (Simple Random Sample) Required for most statistical procedures If data is not collected correctly, the study is useless. Every possible sample of size n has the same chance of being selected. D. Ways to Collect Data from Participants I. Surveys the researcher asks questions using a personal interview, telephone interview, or written questions. II. Observational study the researcher observes and draws conclusions based on the observations. III. Experimental study the researcher manipulates one of the variables and determines how the manipulation influences other variables. E. Kinds of Variables in Experimental Studies: Variable is the characteristic of interest or the questions asked. Independent or explanatory variable are manipulated by the researcher. Dependent, outcome, or response variable changes because of the manipulation of the independent variable. Dr. Janet Winter, jmw11@psu.edu Stat 200 Page 6
F. Some Problems with Experimental Studies Hawthorne effect the subjects know they are participating in an experiment and change their behavior in ways that affect the results of the study. Confounding variable a variable that influences the dependent or outcome variable but cannot be separated from the independent variables (e.g., IQ, previous knowledge or experience with dependent variable). Experimenter effect the experimenter unintentionally influences the dependent variable or outcome of the experiment. G. Controlling Effects Single blind- subjects do not know if they are in the experiment or control group. Double blind- neither the participant nor the experimenter know who is in the experimental or control group. H. Errors Sampling error- caused by chance fluctuations; it is the difference between the sample statistic and population parameter Non-sampling error- caused when sample data are incorrectly collected IV.Some Misuses of Statistics (Read the sections in the textbook which describe the misuses of statistics) A. Suspect Samples Very small samples Biased samples Volunteer samples B. Other Issues Ambiguous averages Detached statistics Implied connections Misleading graphics Faculty survey questions Dr. Janet Winter, jmw11@psu.edu Stat 200 Page 7
V. Computers and Calculators Calculators and computers simplify statistical computations and save time. A calculator with statistical functions is required for this class; you will not be able to do the work in this course without it. The TI 83 is strongly recommended and will be the only calculator demonstrated in class. If you choose to use a different calculator, you will be responsible to learn how to use it on your own. VI.Conclusion The applications of statistics are many and varied. You encounter statistics reading newspapers or magazines, listening to the radio, or watching television. Statistics have improved health care, business, social science, and every aspect of life. ANSWER KEYS TO EXERCISES Types of Variables (or Types of Data) *Exercise: Identify each variable as Qualitative or Quantitative Qualitative Quantitative, Discrete Quantitative, Continuous Hair color Shoe size Foot length Computer password Shirt size (10, 12, 14, etc.) Height License plate number Time to drive to campus Shirt size (S, M, L) Zip code Exercise: Measurement Scales **Exercise: Identify each of the following variables as Nominal, Ordinal, Interval or Ratio. Use the chart provided below to write in your answers. Nominal Ordinal Interval Ratio Zip code Grade IQ Height Gender Rating SAT score Time Eye color Ranking Temperature (F, C) Weight Dr. Janet Winter, jmw11@psu.edu Stat 200 Page 8
Exercise: Measurement Scales - Transportation Table ***Exercise: The chart shows the number of job-related injuries for each of the transportation industries for 1998. Refer to this chart to answer the following 5 questions. Industry Number of injuries Railroad 4520 Intercity 5100 Subway 6850 Trucking 7144 Airline 9950 1. What are the variables under study? Answer: Industry and Number of injuries. 2. Classify each of the variables as Quantitative, continuous; Quantitative, discrete; or Qualitative. Answer: Industry is Qualitative. Number of injuries is Quantitative, discrete. 3. Identify the level of measurement for each variable. Answer: Industry is Nominal. Number of injuries is Ratio. Additional questions to consider (these were reflective questions): 4. The railroad is shown as the safest transportation industry. Does that mean railroads have fewer accidents than the other industries? 5. What factors other than safety influence a person s choice of transportation? Works Cited Triola, M.D., Marc M. and Mario F. Triola. Biostatistics for the Biologoical and Health Sciences. New York: Pearson Education, Inc., 2006. Dr. Janet Winter, jmw11@psu.edu Stat 200 Page 9