Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. H.G. Wells

Similar documents
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Descriptive Statistics and Measurement Scales

Elementary Statistics

Association Between Variables

Chapter 1: The Nature of Probability and Statistics

SOST 201 September 18-20, Measurement of Variables 2

Descriptive Inferential. The First Measured Century. Statistics. Statistics. We will focus on two types of statistical applications

Lecture 2: Types of Variables

Now, observe again the 10 digits we use to represent numbers Notice that not only is each digit different from every other

MATH 103/GRACEY PRACTICE QUIZ/CHAPTER 1. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Concepts of Variables. Levels of Measurement. The Four Levels of Measurement. Nominal Scale. Greg C Elvers, Ph.D.

Unit 26 Estimation with Confidence Intervals

Current California Math Standards Balanced Equations

Chapter 4. Probability and Probability Distributions

Business Statistics: Intorduction

Measurement and Measurement Scales

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

CALCULATIONS & STATISTICS

Determine whether the data are qualitative or quantitative. 8) the colors of automobiles on a used car lot Answer: qualitative

Basic Concepts in Research and Data Analysis

II. DISTRIBUTIONS distribution normal distribution. standard scores

AP Stats - Probability Review

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Point and Interval Estimates

Statistics. Measurement. Scales of Measurement 7/18/2012

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey):

Chapter 1: Data and Statistics GBS221, Class January 28, 2013 Notes Compiled by Nicolas C. Rouse, Instructor, Phoenix College

What is Statistic? OPRE 6301

STAT/MATH 3379: Dr. Manage Chapter Assignment Chapter 1: The Nature of Statistics-Solutions

Lesson 17: Margin of Error When Estimating a Population Proportion

Chapter 1 Introduction to Correlation

Mind on Statistics. Chapter 12

AP Statistics Chapters Practice Problems MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Levels of measurement in psychological research:

Quantitative vs. Categorical Data: A Difference Worth Knowing Stephen Few April 2005

Post-Debate Overnight Poll Finds Trump Still Leading Pack; Carly Fiorina Winner of Debates

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Statistics Review PSY379

Math and Science Bridge Program. Session 1 WHAT IS STATISTICS? 2/22/13. Research Paperwork. Agenda. Professional Development Website

Northumberland Knowledge

Means, standard deviations and. and standard errors

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Mind on Statistics. Chapter 4

Fundamentals of Probability

SHELL INDUSTRIAL APTITUDE BATTERY PREPARATION GUIDE

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

Non-random/non-probability sampling designs in quantitative research

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

Mind on Statistics. Chapter 10

Session 7 Bivariate Data and Analysis

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

SAMPLING DISTRIBUTIONS

DATA COACHING WORKSHOP DAY 1 PRESENTED BY SARITA SIQUEIROS THORNBURG, JILL PATNODE & KRISSY SOLTMAN

6.4 Normal Distribution

The Math. P (x) = 5! = = 120.

Your logbook. Choosing a topic

Practice#1(chapter1,2) Name

Working with whole numbers

Lesson 2: Constructing Line Graphs and Bar Graphs

PURPOSE OF GRAPHS YOU ARE ABOUT TO BUILD. To explore for a relationship between the categories of two discrete variables

Chapter 3 Review Math 1030

Hoover Institution Golden State Poll Fieldwork by YouGov October 3-17, List of Tables. 1. Family finances over the last year...

Newspaper Multiplatform Usage

A Guide to Understanding and Using Data for Effective Advocacy

Math 251, Review Questions for Test 3 Rough Answers

When you hear the word engagement, you

/-- / \ CASE STUDY APPLICATIONS STATISTICS IN INSTITUTIONAL RESEARCH. By MARY ANN COUGHLIN and MARIAN PAGAN(

DATA COLLECTION AND ANALYSIS

INTRODUCTION TO COTTON OPTIONS Blake K. Bennett Extension Economist/Management Texas Cooperative Extension, The Texas A&M University System

Microsoft Get It Done Survey of Office Workers

The Graphical Method: An Example

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

The Importance of Statistics Education

Topic #1: Introduction to measurement and statistics

Welcome back to EDFR I m Jeff Oescher, and I ll be discussing quantitative research design with you for the next several lessons.

Section 1.3 P 1 = 1 2. = P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., =

Determines if the data you collect is practical for analysis. Reviews the appropriateness of your data collection methods.

Statistical research is always concerned with a group of research objects, called population or universe (populaatio/perusjoukko).

Reflections on Probability vs Nonprobability Sampling

Normal and Binomial. Distributions

Statistics 151 Practice Midterm 1 Mike Kowalski

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

How to Get a Job. How Sociology Helps

Statistics 2014 Scoring Guidelines

S P S S Statistical Package for the Social Sciences

Mind on Statistics. Chapter 2

Self-Check and Review Chapter 1 Sections

Fairfield Public Schools

Mind on Statistics. Chapter 15

So You d Like a Sport Psychology Consultant to Work With Your Team? Three Key Lessons Learned from Olympic Teams

IBM SPSS Direct Marketing 23

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Mathematics Task Arcs

IBM SPSS Direct Marketing 22

SuperSpeed Math. Addition, Subtraction, Multiplication, Division And the Gnarlies!

SAMPLE TEST MATHEMATICS Oregon Content Standards Grades 3-8 GRADE 5

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

Lesson 5 From Family Stress to Family Strengths

Transcription:

1 Introduction Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. H.G. Wells Learning Objectives At the end of this chapter students should Have an understanding of the basic terminology used in statistics. Realize the importance that statistics plays in our lives. Establish a working terminology which will allow for technical discussion. Understand the different scales used in the measurement of data. Understand the different types of variables used in statistical analysis. A fair question to be asked is why should we study statistics? Should we study statistics simply because it is a required course for many academic programs, or is there more to the story? Statistical techniques make up the backbone of data analysis. Data analysis surrounds our daily life - the results of data analysis are literally everywhere we look, yet we seldom recognize it. When is the last time you took medication for a headache? Before the medication was placed on the shelves, tests were performed where some persons received the medication and others received a placebo. Results regarding level of relief were recorded and compared, along with any side effects. Political polls are commonly conducted to assist political candidates with valuable information regarding how their campaign is going. Registered voters are contacted and asked who they intend to vote for in an upcoming election, along with other political issues of interest. The results are tallied and winners are often predicted. Various automobile components are tested for durability, comparing strength levels between different manufacturing processes in an attempt to maximize durability while minimizing costs. New California Lotto games are developed and placed in circulation, giving us all the hopes of winning millions. A tremendous amount of calculations, simulations and marketing research is completed long before any of these new games are released. Did you hear the newly released unemployment rates on the morning news today? Residences are contacted and queried regarding the number of working adults. The results are compiled, and the current unemployment rate is estimated and reported. 1

CHAPTER 1 Basic Terms and Definitions An article in The Bakersfield Californian (May 23, 1998) reported current tests for antibiotics in milk after cows are treated for illness is inaccurate 20% of the time. They reached this conclusion only after data was collected and analyzed. A study on two populations of Killer Whales (Orcinus Orca) was conducted to determine if a live harvest of young whales in one of the populations had a significant effect on the birth rate of that population. The time interval between successive births for each population was measured. The average time interval between successive births for one population was 4.9 years, while that of the other population was 7.2 years. All of these processes depend heavily on proper statistical procedures to help insure reliable results. We will be taking a close look at many of the statistical procedures which bring researchers to the conclusions you read every day in the newspaper or hear on television or radio. The techniques we will be discussing are not watered down because this is an introductory level course. Rather, they are real, meaning they are actual techniques used by professional statisticians and data analysts at all levels of government and private research. In addition to the direct use of data analysis techniques, there are many indirect uses. As an example, the study of statistics will help teach you general problem solving strategies while increasing your critical thinking skills. Sharpening your critical thinking skills will be of tremendous benefit to you in your career advancement. Working your way up the corporate ladder is nearly impossible without demonstrating your ability to problem solve. The problems we will be working in the class are not the typical repetitive solve for x type of problems. Rather, the analysis required to solve statistical problems are, more often than not, multiple step problems, many of which are essentially mini-projects in and of themselves. An important goal in this course is to help the student develop a project-oriented critical thinking skills mind set, not to magically transform you into a statistician capable of solving the world s problems. More realistically, the purpose of this course is to familiarize you with both the logic and mechanics of data analysis which will increase your critical thinking skills, allowing you to better understand what you read in the newspaper and hear on the radio or television, making you an informed consumer. For those of you who will be completing research as part of your degree requirements, this course will provide you with the necessary foundation for basic research that will enable you to complete a technical report summarizing the research you have done. Should your research require statistical methods not covered in this course, you will have enough of an understanding of the basic principles of statistics to allow you to work with a statistical consultant if necessary. 1.1 Basic Terms and Definitions As with any specialized field, statistics has its own language. Before we can begin our study of statistical techniques, we must first learn some of the language and basic underlying principles as the language pertains to it. Population: the collection of all objects of interest in a statistical study. Typically when conducting research, we have a particular population that we are interested in. A population is defined as the collection of all objects of interest in a statistical study. Usually the population is much too large to examine each and every object. Consider the milk study that found current testing was inaccurate 20% of the time. This is suggesting that of all cows 2

CHAPTER 1 Basic Terms and Definitions treated for an illness with an antibiotic such as penicillin or amoxicillin, the current early testing practices miss the residual amounts of antibiotics 20 out of 100 times. The population of interest is all cows (possibly just in California or the United States) that have been treated for an illness with an antibiotic. Is it reasonable to assume researchers actually used some form of advanced testing procedures on all cows being treated with an antibiotic? Of course not. The size of the population would be enormous, which in itself makes testing of each cow impossible. Census: A complete enumeration of the entire population. Information of interest is recorded from every object in the population. A census consists of obtaining the desired information from every element in the entire population. As an example, to take a census in the cow antibiotic study would be to test each and every cow that has been given an antibiotic. The mere size of the population makes a census impossible, not to mention the cost. Every time a cow is tested costs someone some money. Financial considerations, when taking a sample or census, are very important. It is always desirable to complete a census if possible, but seldom is it ever realistic due to financial restrictions, time, and/or physical constraints. If it were possible to take a true census, then we would have the ability to remove all the guess work from data analysis. Sample: a subset of the population, selected in such a way that it is believed it will be representative of the population. Even though it is impossible to actually test each cow, it is possible to obtain a sample from that population. A sample is a subset of the population selected in such a way that it is believed it will be representative of the population. It is of utmost importance that the sample is a random representation of the population. Randomization techniques that will help to ensure the population is appropriately represented will be discussed later. We typically take a sample from the population because the population is much too large to obtain data from every object in the population of interest. A large population is not by itself a reason to restrict (reduce) the population of interest. Suppose we were interested in all the voters in California. We could restrict (reduce) the population to only those voters in a particular county. That would make the process more manageable, but could the results be generalized to all of California? No, it could not. We will discuss why a generalization to all of California is not appropriate in a later chapter. Variable: a characteristic of interest about each individual element of a population or sample. A variable is a characteristic of interest about each individual element of a population or sample. In the milk example, the variable is simply whether or not trace amounts of the antibiotic were found. A variable of interest could be a person s weight or age, the length of time it takes to complete a specified task, presence or absence of a particular condition or disease, the score on an exam, or any other quantifiable measure of interest. A variable may also be qualitative in that it describes a quality such as sex, race, military rank or color. 3

CHAPTER 1 Basic Terms and Definitions Data: (singular) refers to the value of the variable associated with one element of a population or sample. The value can be numeric, a word, or a symbol. Data: (plural) refers to the entire set of values collected from each of the elements belonging to the sample. Based on the data collected from the sample, we then infer what is happening within the population. The term data can be both singular or plural. Data (singular) refers to the value of the variable associated with one element of a population or sample. The value can be numeric, a word, or a symbol. Data (plural) refers to the entire set of values collected from each of the elements belonging to the sample. As an example, researchers may have tested the milk and simply recorded whether or not traces of an antibiotic were observed in the milk. For a particular test, from a particular cow, the results would have been either positive or negative. The individual test result is an example of the singular use of the term data. An individual result is often referred to as a data point. If ninety-five cows were tested, then the complete list of results is an example of the plural use of the word data. Parameter: a numerical value summarizing the entire population. The population has characteristics which are individually referred to as parameters. A parameter is a numerical value summarizing the entire population. This value is typically unknown by the data analyst. The true value does exist; however, it is known only by mother nature. It is the value of the parameter in which we are most interested but usually can not obtain. Thus, we take a sample, collect data from that sample and calculate the sample statistic. Sample statistic: a numerical value summarizing the sample data. A statistic (sample statistic) is a numerical value summarizing the sample data. In fact, if you can calculate it from the sample data, then it is a statistic - it may or may not be very meaningful; nonetheless, it is a statistic. To help keep clear the concepts of parameters and statistics, keep in mind that a parameter is a fixed value that only mother nature knows, whereas the value of a statistic will vary depending on the sample. Statistics are used to estimate parameters and to test a hypothesis regarding a believed parameter value. Consider once again the cow antibiotic study. The population parameter of interest is the true proportion of inaccurate tests for antibiotics in milk cows. The sample statistic is the 20% calculated by the research team based on a sample of milk cows that were treated with antibiotics. Their estimate of the true proportion of inaccurate tests is 20%. The research team is most certainly wrong, but hopefully close. As another example, suppose we were interested in the average time it takes for a migraine headache to subside after a new experimental treatment is administered. The population of interest is all persons who suffer from migraine headaches, and the variable of interest is the time it takes for a migraine to subside. The creators of the new treatment would like to get the treatment in the hands of everyone currently suffering from migraines. This is simply not possible. The parameter of interest is the true average time it takes a migraine to subside. The true value exists. It is a number you could never truly discover; however, if you properly take a sample, perhaps 50 people suffering from migraines who try the new treatment, you should be able to get a reasonable estimate of the true average. Based on the sample, an average time is calculated. This sample average is a sample statistic. The average based on your sample is the statistic you will use to make inferences regarding the true average. 4

CHAPTER 1 What is the Study of Statistics? 1.2 What is the Study of Statistics? The field of study known as statistics is typically described as the collection, analysis and summary of data. The study of statistics is divided into two main areas. Those two areas are descriptive statistics and inferential statistics. We will be working a great deal with both of these areas. Most, but by no means all data analysis projects, include both a descriptive statistical section along with an inferential section. Statistics (the study of): The collection, analysis, summary and presentation of data. From the Greek word Stata, meaning the state of affairs. 1.2.1 Descriptive Statistics Descriptive statistics is just as the name implies, they describe the state of the data which has been collected. The description may include such things as the sample size, mean, median and standard deviation. If the data is based on information collected from people, a descriptive statistics portion of a report may outline the number of people (or percent) examined that were male or female along with the ages and ethnicity of the persons tested, if such information is pertinent to the topic being discussed. Descriptive statistics describe the data that has been collected. Often times what we see reported in the news media is descriptive statistics. Graphs seen in newspapers and magazines typically describe the data that was collected. Those values are then often used to make inferences regarding the population from which they were obtained. 1.2.2 Inferential Statistics Inferential statistics take the descriptive statistics and use them to make conjectures (inferences) about the population that the sample represents. There are many ways to infer about the population. As an example, the cow study claimed 20% of the milk cow antibiotic tests were inaccurate. While it may be true that 20% of those sampled were inaccurate, it does not necessarily mean 20% of all milk cow antibiotic tests are inaccurate. However, that is the inference being made and typically how sample statistics are used to infer back to the population of interest. Later, we will take a close look at assessing how accurate the 20% estimation probably is and figure out how to put error bounds on our estimation, such as 20% plus or minus 3%. In reality, inferential statistics is no more than educated guessing. We will be looking at many techniques that we can use as tools to insure our educated guess is in fact a highly educated guess, not just a shot-in-the-dark, with mathematics to back up the basis for our best guess and decisions made based on those guesses. 1.3 Level of Measurement: Nominal, Ordinal, Interval and Ratio Scales Understanding the various levels of measurement is important in data analysis because many of the techniques we will be discussing are only appropriate for specific measurement scales. It is important to be able to assess your needs and apply the correct tool. You can not simply grab an analysis technique from your tool box and blindly apply it to a given situation and expect the result to be accurate. You must insure the technique you choose is appropriate for the situation you are working with. 5

CHAPTER 1 Level of Measurement: Nominal, Ordinal, Interval and Ratio Scales Assessing the appropriateness of a particular technique for a given situation begins with an understanding of the level of measurement used in the data collection. 1.3.1 Nominal Scale The most basic and simple form of measurement is that of classification. Variables that categorize without any form of natural ordering are known as nominal scale variables. Examples of nominal scale variables are religious preference, political party, hair color, ethnicity and gender. Typical mathematical operations have no meaning with nominal scale variables. What does it mean to add, subtract, multiply or divide a brunette and a blonde? The answer is simple - it has no meaning. Quite often we code data with numerical representations. As an example, if I were recording hair color, I may code blonde hair as 1, brown as 2, red as 3 and other as 4. Even though the data consists of numbers, the numerical representations are still nothing more than labels. Consider telephone numbers. The data collected would be the actual telephone number, or perhaps only the area code. The seven digit telephone number, although a number is still nothing more than a label. There is no natural ordering, and basic arithmetic operations have no meaning. It would not be difficult to mount an argument that the nine digit telephone number (area code + phone number) does have a natural ordering in that in general smaller area codes start on the east cost, whereas larger area codes are on the west cost. Although an argument can be made, it would not be a very strong argument. The type of physical impairment a patient has is also a nominal scale variable as would be the political party to which a person belongs. 1.3.2 Ordinal Scale It is often possible to order categories with respect to the degree they possess a certain characteristic. These categorical variables are referred to as ordinal scale variables. Ordinal scale variables possess a natural ordering, such as Likert scales (Likert scales are commonly used for surveys where 1 may represent strongly disagree, 2 represents disagree, 3 represents no opinion, 4 represents agree and 5 represents strongly agree), the days of the week, months of the year, exam grades (A, B, C, D, F), socioeconomic status (upper, middle, lower), and military rank (private, private first-class, lance corporal, corporal, sergeant,...). Each of these examples possess a natural ordering. Achieving a grade of A on an exam is greater than a grade of B, which, in turn, is greater than a grade of C. This natural ordering exists yet a B plus a C is certainly not equal to an A. Basic mathematical operations still fail to have meaning. Likewise, Monday + Tuesday is not equal to Thursday. Street addresses are ordinal. Smaller street numbers are on one end of the street, and larger numbers are on the other end. Again, what meaning does it have to add two street numbers together? 1.3.3 Interval Scale Ordinal scales allow ranking with respect to the degree they share a characteristic, but distances do not have meaning. Interval scale variables require a physical unit of measure. The most obvious interval scale variable is a variable that measures temperature. Notice that temperature does not have a natural zero in that it does not have a natural starting point. The temperature at 0 degrees is very arbitrary. Even though temperature does not have a natural starting point, distances between two points of measurement have meaning. The difference between 25 degrees and 30 degrees is 5 degrees. That 5 degrees does have a meaning which is easily understood. We can say the difference is an exact measurement, in fact 5 degrees. Addition and subtraction make perfect sense with interval scale variables since 30 degrees plus 5 degrees really is 35 degrees. Likewise, 55 6

CHAPTER 1 Quantitative vs. Qualitative Variables degrees minus 50 degrees really is equal to 5 degrees. On the other hand, division does not have meaning since 60 degrees is not twice as hot as 30 degrees. As such, we say the ratio 60/30 degrees is not preserved. Although 60/30 = 2, 60 degrees divided by 30 degrees lacks meaning. Continuing with the idea of a natural starting point, historical time is also an interval scale variable. The distance between any two dates can be measured exactly. As such, subtracting dates seems reasonable; however, the ratio is not preserved. If an event occurred in the 1800 s and we were contrasting it with an event that occurred in the year 600, then 1800-600 = 1200 years. This is very reasonable. However, what does it mean to look at the ratio which is 1800/600 = 3? This only has meaning if we can identify the year 0 as an absolute starting point for time. In some context this may be reasonable, but generally speaking the year 0 is not a natural starting point, and identifying it as such is rather arbitrary. A survey designed to measure the level of parenting skills results in a score of 0 to 10. The scores are interval in that when comparing the scores of two parents, say one with a score of 8 and one with a score of 2, then 8-2 really is 6. There really was a difference of 6 between the two scores; however, 8/2 = 4 but that does not mean the parent that scored an 8 on the evaluation is 4 times the better parent than someone who scored 2. 1.3.4 Ratio Scale Ratio scale variables are very similar to interval scale variables. The key differences being (a) the ratio is preserved when working with a ratio scaled variable and (b) a ratio scale variable has a meaningful zero. A meaningful zero means there is a natural and absolute starting point. Consider the time it takes for two people to complete a task. If it takes 8 minutes for the first person and 4 minutes for the second person, then the ratio is preserved, indicating the ratio has meaning. In this example, the ratio of times to complete the task is 8/4, which is equal to 2. It really did take twice as long for the first person to complete the task as compared to the second person. Likewise, there is an absolute zero starting point of zero minutes. This means there was a point in time in which the recording of the time began. You may want to think of this situation containing a ready, set, go! component. Commonly, ratio scaled variables are used to represent rates. Crime rate is another example of a ratio scale variable. The crime rate for a city is reported as the number of crimes committed in the city divided by the number of people in the city. The ratio of crimes committed to the size of the city s population has meaning. Unemployment rates, as all other rates, have a similar calculation scheme. Likewise, measurements of length are ratio scale variables, as are measurements of weight where a value of zero really means there is nothing there. When talking about weight, 10 pounds really is twice as heavy as 5 pounds, hence the ratio 10/5 = 2 has meaning. 1.4 Quantitative vs. Qualitative Variables A quantitative variable (or numerical variable) is a variable that quantifies an element of a population. When we think of a quantitative variable, we are thinking in terms of an interval or ratio scale of measurement. Arithmetic operators have meaning. As an example, suppose we recorded the weight of everyone in the room (156 pounds, 205 pounds, 90 pounds, ). We could then obtain an average weight, which would be a meaningful measure. Weight is a quantitative variable. If, on the other hand, we recorded everyone s weight as being either heavy, medium, or under weight, then we have categorized weight, making it a qualitative variable. Quantitative variable: (or numerical) A variable that quantifies (measures how much of) an element of a population. 7

CHAPTER 1 Discrete and Continuous Variables A qualitative variable (or attribute variable) is a variable that categorizes or describes an element of a population.when we think of a qualitative variable, we are talking about a nominal or ordinal scaled variable. Arithmetic operations such as addition, subtraction, multiplication, division, and averaging, have no meaning. As an example, suppose we recorded the eye color of everyone in a room. Does the idea of an average eye color have any meaning? Clearly not. Eye color is an example of a qualitative variable. Typically if the variable has a word as its value, then it will be a qualitative variable. Qualitative variable: (or attribute) A variable that categorizes or describes an element of a population. It is very common to code qualitative data values with numerical values. As an example, suppose we were looking at hair color and decided to code black hair as 1, red as 2, blond as 3, gray as 4, and all others as 5. Clearly we are using numbers so we could find an average but would that average have any meaning? That average would, by definition, be a statistic since it was calculated from the sample data, but the value would be completely useless. By using numbers to represent categories, we seem to be transforming nominal scale variables to ordinal; however, we are not actually transforming a nominal scale to ordinal scale. Rather, the numeric representation of the categories are simply for ease of use. Computers and calculators typically think better with numbers than letters or words, thus the primary reason for using numbers to represent categories. Don t allow yourself to believe you are working with ordinal, interval, or ratio scale data just because you see numbers. Always ask yourself what the numbers represent and then decide the scale you are working with. Another good example of a nominal scale (qualitative) variable which is identified by numbers is a social security number. Clearly, a social security number is a number, but it is used to identify a person. Adding, subtracting, multiplying or dividing social security numbers has no meaning. The same can be said about telephone numbers and zip codes. 1.5 Discrete and Continuous Variables Beyond the measurement scales of nominal, ordinal, interval and ratio, we also have discrete and continuous variables. Qualitative variables are discrete, whereas quantitative variables may be discrete or continuous. 1.5.1 Discrete Variables A discrete variable is a variable that can assume a countable number of values. We will be careful here with our terminology. The term countable has a very specific meaning in mathematics. We won t go into great depths as to the exact mathematical meaning of the term countable. Suffice it to say that if we can actually make a list of all possible outcomes, then we are dealing with a discrete variable. As an example, the number of students graduating from the local high school district each year is a discrete variable. There is an actual number of graduates that you can identify each year. Likewise, the number of new laws passed each year in the state of Michigan is a discrete variable. There is a countable number of new laws that can be identified each year. An effective rule to consider when attempting to determine if a particular variable is discrete is to pick any two possible values as close together as possible. Consider once again the number of laws passed in Michigan last year. The number of laws may have been 97 or 98. Now chose a value between 97 and 98, such as 97.6, and ask yourself if that is a possible value for this variable. No, it is not. Either a law passed, or it did not. As such, 97.6 is not a possible value. The variable is discrete. If 97.6 was a possible value, then chose another value between 97.6 and 98, perhaps 97.9. If that value is possible, then chose another between 97.9 and 98. If you can continue this little game forever, then you are not dealing with a discrete variable. 8

CHAPTER 1 Bringing It All Together A word of caution is in order here. There is a tremendous difference between the variable and our ability to measure. In reality, we live in a discrete world. Humans do not have the ability to measure continuously. Do not confuse your ability to measure with what values are actually possible. Consider a person s height. We may have the ability to measure to the nearest one-millionth of an inch, but (a) it is not practical and (b) measuring to the nearest one-millionth is still not a continuous measurement. 1.5.2 Continuous Variables A continuous variable is a variable that can assume an uncountable number of values. Again, the term uncountable has a specific meaning in mathematics. In this context we are considering a variable that can have an infinite number of possibilities between any two values. Consider the length of a rope. You may measure the length with a tape-measure and make the statement the rope is five feet long. I can guarantee you the statement will be wrong. The rope might be five feet long, to the best of your ability to measure it, but it is impossible for the length to actually be exactly 5 feet long. If we had the ability to measure more accurately, we may find the length to be 5.0017 feet long. If our measuring device was even more accurate, we may find the length to be 5.001789300152 feet long. If our measuring device was even more accurate, we may find the length to be... and this can go on and on forever. Unlike a discrete variable, if we pick two possible values very close together, we can always find another possible value between the two values we chose. A reasonable way to make a determination as to if a particular variable is discrete or continuous is to ask yourself if given two values as close together as seemingly possible, is it possible to get another value in between? If yes, then you have a continuous variable. Otherwise you have a discrete variable. Consider a variable that is recording the number of cracked eggs in a carton which contains a dozen eggs. We can have 1 cracked egg or 2 cracked eggs, but we can not have 1.5 cracked eggs. Either the egg is cracked or it is not. The variable used to record the number of cracked eggs is discrete. However, suppose we also included the variable Weight in our study. If we recorded the weight of an egg as being 9 grams, is it possible the true weight of the egg could be between 9.0 and 9.1 grams? Yes. Is it possible the true weight of the egg could be between 9.0 and 9.01 grams? Yes. Is it possible the true weight of the egg could be between 9.0 and 9.001 grams? Yes. Could we play this game forever? Yes. Hence Weight is a continuous variable. 1.6 Bringing It All Together Understanding the concept of variables and measurement scale is important. Later in the course, this basic understanding will help you decide the appropriate statistical techniques to apply to sample data. The following table may be useful as a summary of variable classification and how they interact with each other. 9

CHAPTER 1 Bringing It All Together Data Type Qualitative Quantitative Measurement Scale Nominal Ordinal Interval Ratio Discrete or Continuous? Discrete Can be either Example: The director of medical research at a local hospital believes a beneficial side effect of a particular blood pressure drug is that of lowering the level of renal failure (kidney failure) among those patients with Type-II Diabetes. To explore his hypothesis he enrolled 39 patients, all suffering from Type-II Diabetes and advanced renal failure, in a small clinical trial. Each patient was started on a dose of the drug to be taken daily for six months. The level of renal failure was measured by the urinary Albumin/Creatinine Ratio (ACR) where urinary albumin was measured in milligrams (mg) and urinary creatinine was measured in milligrams per deciliter (mg/dl). The ACR was measured before the first dose was administered and upon completion of the six month trial period. The levels were then compared to determine if there is evidence indicating reduction in renal failure. a) What is the population? The population of interest is all persons suffering from Type-II Diabetes with advanced renal failure. b) What is the sample? The sample is the 39 patients that enrolled in the study. c) What is the parameter of interest? The parameter of interest is the true proportion of patients who would display a reduction in renal failure if given the blood pressure medicine. d) What is the statistic? The statistic is the observed proportion of patients who displayed a reduction in renal failure. Note that the actual statistical value (observed proportion) was not reported here; however, once the population parameter is identified, the statement of the sample statistic is easily identified. e) What data was collected? There are many possibilities here. Clearly, the level of renal failure was recorded; however, it is very reasonable to assume additional information such as age, gender and ethnicity were also recorded. f) Identify each of the variables listed above as being qualitative or quantitative, discrete or continuous, along with the appropriate measurement scale. Level of renal failure: quantitative, continuous, ratio. Age: quantitative, continuous, ratio. Gender: qualitative, discrete, nominal. Ethnicity: qualitative, discrete, nominal. 10

CHAPTER 1 From a Consultant s Perspective 1.7 From a Consultant s Perspective AIFD Inc., an international food distribution company, is considering branching out into the food manufacturing business. Specifically, they are interested in the breakfast cereal market. AIFD has hired your statistical consulting firm, TSG inc., to examine current in store marketing strategies, such as the shelf arrangement for various types of breakfast cereal, along with nutritional value and interrelationships. Your analysis will be used to assist AIFD in identifying how they can best enter the market, along with formulating appropriate marketing strategies. As a consultant, you must either assist the client in producing a list of specific questions to be answered or design these questions yourself. These questions will help identify the population of interest, along with appropriate variables. Commonly, consultants will work closely with clients in the early planning stages to identify these aspects to help insure data is collected in a meaningful way, allowing the questions to be addressed. A logical approach may be to list every possible question you can come up with, along with the variables needed to answer those questions. A master list is then produced and used to guide the research planning and execution. It is not at all uncommon for a project to get way out of hand in terms of size, hence resources needed, when creating the master plan. This master plan is simply reduced to a manageable size. As a consultant, it is your responsibility to give your client a clear picture of what will be required to successfully complete the research. The client can then decide whether or not to dedicate the needed resources or modify the scope of the project. Regardless of whether you are acting as a consultant for a client or designing research which you will execute as part of a senior project for your bachelor s degree, a master s thesis, etc., the planning and execution of the plan is the same as discussed above. The only difference is that you will make the decision regarding the dedication of appropriate resources and subsequent scope of the project. The concept of the project will be followed at the end of each chapter, when appropriate, in this text. As a student you will be at a disadvantage in comparison to an actual consultant because a consultant already knows the techniques you will be learning. As such, the consultant will clearly have better foresight. The goal of From a Consultant s Perspective is simply to help guide the student through a logical thought process regarding the planning, execution and analysis of a research project. Your professor will provide you with information regarding the data you will be using for the analysis. One option is to use the cereal data set provided in the Data sets section of this text. Another option would be for the students to form research teams, or act individually, and collect their own data at local supermarkets. AIFD did provide information collected by Consumer Reports (located in the Data sets section of this text) in reference to various variables of interest and cereal. One of the variables is Rating. AIFD believes the rating value pertains to how healthy a particular cereal is believed to be. AIFD requests TSG to confirm their suspicions. 11

CHAPTER 1 Review Exercises 1.8 Review Exercises 1.1 Identify each of the following as being nominal scale, ordinal scale, interval scale or ratio scale variables. a) Number of courses each student of a survey is enrolled in. b) Whether or not an electrical switch is defective. c) A patient s temperature in a hospital. d) The breaking strength of a particular type of cable. e) The number of phone calls received by a switchboard operator in an 8 hour shift. f) The color of hair for each person surveyed. g) A person s gender. h) A person s rank in the military. i) Zip codes. j) The years in which the average rainfall in Bakersfield, CA was less than 6 inches. 1.2 State the level of measurement (nominal, ordinal, interval or ratio) for each of the variables below: a) A street address. b) The percent of alcohol in the blood 30 minutes after consuming two alcoholic beverages. c) The wattage listed on a light bulb. d) The class level of a random college student (freshman, sophomore, junior, senior). e) A phone number. f) The gender of medical school applicants. g) The temperature, in degrees Celsius. h) The number of pink bole worms found in a 10 by 10 area of a cotton field. i) Flavor of a new beverage as measured on a scale of 1 to 5 where 1 is poor and 5 is excellent. j) Every year during which a presidential assassination took place. 1.3 Explain the difference between a sample and a population using non-technical language. 1.4 Briefly discuss the differences between the four measurement scales (nominal, ordinal, interval and ratio). 1.5 Consider the following scenario: In an attempt to obtain information of how the weights of students on campus (weights in terms of skinny, slender, appropriate, chunky, and obese) are distributed, a student sits in front of the library from 10:00 to 1:00 every day for a week and categorizes the weights of those who enter (as skinny, slender, appropriate, chunky, and obese). After a week the student has recorded 825 weights. Identify the following: a) population of interest b) sample size c) variable of interest d) population parameter(s) of interest e) variable level of measurement 12

CHAPTER 1 Review Exercises 1.6 The blunt nosed leopard lizard had been added to the endangered species list recently. To determine eligibility for endangered species status, samples were taken in different locations of their native habitat. Not only was the actual count or population estimation of concern here, but also the health of the lizards. a) Name 4 variables that could be measured by a biologist studying these lizards. b) State if each variable is qualitative or quantitative. c) State if each variable is discrete or continuous. d) State what level of measurement each variable has, nominal, ordinal, interval or ratio. 1.7 For each of the following polling situations describe the population of interest and the variable or variables of interest. a) Polling is a common use of statistics. In the government, polling is done constantly. When an important bill is due to be voted on by the legislator, aids will conduct polling among the representatives to determine if there are enough votes to pass a bill. b) During campaigns, polling of the public for opinions on candidates and important issues is conducted often to help the campaign manager stay in touch with the voting public. c) One of the previous situations involves a census and one, a sample. Identify each and explain why each method used was the correct for the situation. 1.8 In large corporations, the ability to collect accounts receivable in a timely fashion is important. In fact, when determining the net worth of a corporation, the average amount of time to collect a debt is often measured. Since auditing thousands of accounts is unreasonable, a sample of several hundred accounts is taken, and the number of days from the billing date to the date payment was received is measured. State the population of interest, parameter of interest and the sample statistic. 1.9 Look at a container of milk and determine the following: a) Name at least three (3) qualitative variables that could be measured for this container of milk. b) Name at least three (3) quantitative variables that could be measured for this container of milk. c) For each of the six variables, state the following: discrete or continuous and the level of measurement. 1.10 Returning to the study of blunt nosed leopard lizards; these lizards were captured, measured, banded for identification and released. Along with many other variables, the total count was reported in an effort to estimate the actual population and determine eligibility for endangered species status. Both descriptive and inferential statistics were done in the study. Differentiate between the descriptive statistics and inferential statistics used in this study. 1.11 Look at each of the following situations and state whether the statistics used were descriptive or inferential. a) A student samples shoppers in the local mall on a Sunday afternoon and reports the proportion of men who were carrying the shopping bags. b) Due to a recent rash of bad weather, farmers were concerned about the loss of profits from crop damage. A study was done, and the estimated crop loss was reported, in thousands of dollars. 13

CHAPTER 1 Review Exercises c) Patients in a diabetes clinic were given either a placebo (a pill with no medication) or a drug, and their kidney function was monitored for improvement. After the results were gathered, it was decided that the drug improves kidney function in diabetes patients. d) By examining reports by police statewide, it was determined that 23% of the departments surveyed are using racial profiling as criteria for a traffic stop. 1.12 Explain the difference between a statistic and a parameter, using non-technical language. 1.13 A study on two populations of Killer Whales (Orcinus Orca) was conducted to determine if a live harvest of young whales in one of the populations had a significant effect on the birth rate of that population. The time interval between successive births for each population was measured. The average time interval between successive births for one population was 4.9 years, while that of the other population was 7.2 years. a) Identify the variable of interest in the study. b) Identify the measurement scale for the variable of interest. c) Is the variable discrete or continuous? d) Does the average time intervals of 4.9 years and 7.2 years represent population parameters or sample statistics? 1.14 In a study to determine if the scar tissue from an episiotomy is stronger than the scar tissue from a natural laceration (tear) at birth, a doctor recorded whether or not a mother had an episiotomy or was allowed to tear (and did tear) during the delivery of her first child. The doctor then recorded the tearing observed, without an episiotomy, during the delivery of that mother s second child. a) What is the population of interest? b) What is the variable(s) of interest? c) Is the variable(s) quantitative or qualitative? d) Identify the measurement scale for the variable(s). e) Does the study constitute a census or a sample? Why? 1.15 Based on a report by USA Today (www.usatoday.com, 1/6/2004), 80% of cell phone users have experienced service problems. The information was obtained from a survey of 12 metro areas in the United States. Even though a significant number of those surveyed had problems with their service, only 40% of those who reported their problems found their carrier's response helpful. Of the respondents, 26% reported that they had received an overcharge of $10 or more. a) What is the population of interest in this survey? b) What is the sample? c) Name one or more parameters of interest. d) For the above parameter(s), what is/are the statistic(s)? e) What variables were collected? f) For each of these variables, identify them as qualitative or quantitative, discrete or continuous, and their level of measurement. 14

CHAPTER 1 Review Exercises 1.16 Jerry Sloan, the coach of the NBA Utah Jazz, is the longest-tenured head coach that has stayed with the same team in all professional sports (www.usatoday, 1/8/04). He has been with the Jazz for 16 years. His winning percentage is reported as the 6th best in NBA history at 62.7%. a) What is the population of interest in this report? b) How was the information gathered? (sample or census) c) Name one or more parameters of interest? d) What variables were collected? e) For each of these variables, identify them as qualitative or quantitative, discrete or continuous, and their level of measurement. 1.17 As a consultant, you have been hired to help the owners of Plants R Us, a new nursery and landscaping company, to determine what services they should offer. a) Determine a population of interest for this company. b) Make a list of at least 10 questions that might be used in a survey of this population that would provide the owners with useful information. c) Identify the variable for each question. d) For each of these variables, identify them as qualitative or quantitative, discrete or continuous, and their level of measurement. 1.18 As an investigator looking into a cluster of pediatric cancer cases in a small valley community, you will be working with statistical consultants to gather information from the community. a) Determine the population of interest for this study. b) Make a list of at least 10 questions you will include in your study. These may be simple demographic questions or as complex as you like. c) Identify the variable for each question. d) For each of these variables, identify them as qualitative or quantitative, discrete or continuous, and their level of measurement. 1.19 Based on a report by USA Today (www.usatoday.com, 1/8/04), the federal government has reported the rate of growth of prescription drug costs has slowed. The report states the growth in drug spending slowed to 15.3% in 2002 but is expected to continue to rise for the next 10 years. a) What is the population of interest in this survey? b) What is the sample? c) Name the parameter of interest. d) For the above parameter, what is the statistic? e) Is this study inferential or descriptive? f) Name at least 4 variables that may have been collected? g) For each of these variables, identify them as qualitative or quantitative, discrete or continuous, and their level of measurement. 15

CHAPTER 1 Review Exercises 1.20 Mini - Project: Go to the library or Internet and find a journal article in your major field (or field of interest if you don t have a major yet) that uses statistics. Make a copy of the article for future use. a) Read the article and determine the population of interest and the variables measured. b) Do you believe the sample is representative of the population of interest? Explain. c) Are the statistics used descriptive, inferential, or both? 16