Carolyn Anderson & Youngshil Paek (Slides created by Shuai Sam Wang) Department of Educational Psychology University of Illinois at Urbana-Champaign
Key Points 1. Data 2. Variable 3. Types of data 4. Define Statistics 5. Three main aspects of Statistics 6. Descriptive Statistics vs. Inferential Statistics 7. Subjects 8. Population vs. samples 9. Randomness and variability 2
Data Data are information we gather with experiments and with surveys. Example: Experiment on low carbohydrate diet Data could be weight of subjects before and after the experiment Example: Survey on effectiveness of a TV ad Data could be percentage of people who went to Starbucks since the ad aired 3
Variable Data contain one or more variables. A variable describes any characteristic that is recorded for the subjects in a study The terminology variable highlights that data values vary. Examples: Marital status, Height, Weight, IQ A variable can be classified as either Categorical, or Quantitative (Discrete, Continuous) 4
Categorical Variable A variable can be classified as categorical if each observation belongs to one of a set of categories. Examples: Gender (Male or Female) Religious Affiliation (Catholic, Jewish, ) Type of residence (Apt, Condo, ) Belief in Life After Death (Yes or No) 5
Quantitative Variable A variable is called quantitative if observations on it take numerical values that represent different magnitudes of the variable. Examples: Age Number of siblings Annual Income 6
Discrete Quantitative Variable A quantitative variable is discrete if its possible values form a set of separate numbers, such as 0,1,2,3,. Do discrete variables have a finite number of possible values? Examples: Number of pets in a household Number of children in a family Number of foreign languages spoken by an individual 7
Continuous Quantitative Variable A quantitative variable is continuous if its possible values form an interval. Continuous variables have an infinite number of possible values. Examples: Height/Weight Age Blood pressure Note: many continuous variables are treated as discrete because of limitations in our abilities to measure the quantities 8
Class Problem Identify the variable type as either categorical or quantitative 1. Number of siblings in a family 2. County of residence 3. Distance (in miles) of commute to school 4. Marital status 9
Class Problem Identify each of the following variables as continuous or discrete 1. Length of time to take a test 2. Number of people waiting in line 3. Number of speeding tickets received last year 4. Your dog s weight 10
Classroom Project Within this classroom, find 3 kinds of things for each categories of variables. Remember: A variable can be classified as either Categorical, or Quantitative (Discrete, Continuous) 11
Define Statistics Statistics is the art and science of: Designing studies, Analyzing the data that those studies produced, and Translating data into knowledge and understanding the world around us. 12
Who Uses Statistics? Marketing: Medical Studies: Government: Sports: Social Scientists: How can Statistics help your field of study/interest? 13
Three Aspects of Statistics Design: Planning how to obtain data Description: Summarizing the data Inference: Making decisions and predictions 14
Examples of Design Design questions: How to conduct the experiment, or How to select people for the survey to insure trustworthy results Examples: Planning the methods for data collection to study the effects of Vitamin E on athletic strength For a marketing study, how do you select people for your survey to provide proper coverage 15
Descriptive Statistics Methods for summarizing data Summaries usually consist of graphs and numerical summaries of the data (e.g., average, percentage). Examples: A meteorologist constructs a graph showing the total precipitation in Champaign, IL for each of the months of 2014. The average age of the students in a statistics class is 25 years. 16
Descriptive Statistics The main purpose of descriptive statistics is to reduce the data to simple summaries without distorting or losing much information. 17
Example of Descriptive Statistics Types of U.S. Households 18
Inferential Statistics Methods of making decisions or predictions about a populations based on information obtained from a sample. Examples: A medical study finds a new treatment is significantly more effective than the old treatment. An important aspect of statistical inference involves reporting the likely precision of a prediction. 19
Inferential Statistics Example In Florida and Wisconsin, where Mr. Obama had led Mr. Romney by six percentage points in polls conducted before the selection of Mr. Ryan, the race is essentially tied. Mr. Obama is ahead in Florida by 49 percent to 46 percent and in Wisconsin by 49 percent to 47 percent; differences within the polls margin of sampling error of plus or minus three percentage points. ---- The New York Times, August 23, 2012 http://www.nytimes.com/2012/08/23/us/politics/polls-say-medicare-iskey-issue-in-3-swing-states.html?_r=1&ref=newyorktimespollwatch 20
A social scientist is interested in studying the drinking habits of college students. She randomly picks 1,000 students from the college directory using an automated computer system. What aspect of statistics has she just completed? a) Design b) Description c) Inference 21
A social scientist is interested in studying the drinking habits of college students. She asks each student how many drinks they had last Saturday night. She finds the average for 1,000 randomly selected students and creates a graph to display the results. What aspect of statistics has she just completed? a) Design b) Description c) Inference 22
A social scientist is interested in studying the drinking habits of college students. She asks each student how many drinks they had last Saturday night. From a sample of 1,000 students she is able to conclude that the average amount of alcoholic beverages consumed by all students in the university last Saturday night is most likely between 0.3 and 2.3 drinks. What aspect of statistics has she just completed? a) Design b) Description c) Inference 23
Subjects The entities that we measure in a study Example Individuals Schools Rats Counties What can be the subjects in your field? 24
Population and Samples Population: All subjects of interest Sample: Subset of the population for whom we have data Sample Population Researchers often want to answer questions about some large group of individuals (this group is called the population). Often researchers can t measure all individuals in the population. So they measure a subset of individuals that is chosen to represent the entire population (this subset is called a sample). The researchers then use statistical techniques to make conclusions about the population based on the sample. 25
Example: The Sample and the Population for an Exit Poll In California in 2003, a special election was held to consider whether Governor Gray Davis should be recalled from office. An exit poll sampled 3160 of the 8 million people who voted. Define the sample and the population for this exit poll: The population was the 8 million people who voted in the election. The sample was the 3160 voters who were interviewed in the exit poll. 26
A social scientist is interested in studying the drinking habits of college students. She randomly picks 1,000 students from the University of Florida phone directory using an automated computer system. What is the population? a) the 1,000 students sampled b) all students at the University of Florida c) all students in universities across the U.S. 27
Sample Statistics and Population Parameters A parameter is a numerical summary of the population Example Mean number of cigarettes smoked per week by all teenagers (µ) Proportion of all teenagers who smoked in the last month (p) 28
Sample Statistics and Population Parameters A statistic is a numerical summary of a sample taken from the population Example Mean number of cigarettes smoked per day by a sample of teenagers ( ) Proportion of a sample of teenagers who smoked in the last month ( ) 29
In 2006 the GSS survey asked 2,815 participants if they were in favor or opposed to the death penalty for those convicted of murder and 67% of those surveyed stated that they were in favor of the death penalty. What is the parameter and the statistic? a) parameter = 67% statistic = 67% b) parameter = 67% statistic = unknown c) parameter = unknown statistic = 67% d) parameter = unknown statistic = unknown 30
Variability Measurements may vary from subject to subject, and Measurements may vary from sample to sample. Predictions will therefore be more accurate for larger samples. 31
Randomness Simple Random Sampling: each subject in the population has the same chance of being included in the sample Randomness is crucial to insuring that the sample is representative of the population so that powerful inferences can be made 32
Suppose that two researchers both randomly sample 400 different students at the University of Miami and ask the students if they have consumed any alcoholic beverages in the past week. From their sample, the researchers each compute the proportion of students that consumed an alcoholic beverage in the past week. Are the two proportions from the two samples the same? a) Yes, both researchers drew from the same population. b) Probably not, samples vary. 33
What Role Do Computers Play in Statistics? Using Technology You, not technology, must select valid analyses Data files Large sets of data are typically organized in a spreadsheet format known as a data file Each row contains measurements for a particular subject Each column contains measurements for a particular characteristic Databases An existing archive collection of data files Sources should always be checked for reliability 34
Suppose that you took a sample of three people and asked them how much they spent on lunch and whether they had a vegetarian lunch or not. Which of the following is the correct way to create a data file? a) Jane Jo Juan 10 8 8 y n n b) Jane Jo Juan 10 8 8 y n n c) Name Amount Spent on Lunch Jane 10 y Jo 8 n Juan 8 n Vegetarian or not 35
Key Points Revisited 1. Data 2. Variable 3. Types of data 4. Define Statistics 5. Three main aspects of Statistics 6. Descriptive Statistics vs. Inferential Statistics 7. Subjects 8. Population vs. samples 9. Randomness and variability 36