Statistical research is always concerned with a group of research objects, called population or universe (populaatio/perusjoukko).

Similar documents
Descriptive Statistics and Measurement Scales

Elementary Statistics

Measurement and Measurement Scales

Basic Concepts in Research and Data Analysis

Business Statistics: Intorduction

SOST 201 September 18-20, Measurement of Variables 2

Levels of measurement in psychological research:

II. DISTRIBUTIONS distribution normal distribution. standard scores

Concepts of Variables. Levels of Measurement. The Four Levels of Measurement. Nominal Scale. Greg C Elvers, Ph.D.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

Chapter 1: Data and Statistics GBS221, Class January 28, 2013 Notes Compiled by Nicolas C. Rouse, Instructor, Phoenix College

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Lecture 2: Types of Variables

Exploratory data analysis (Chapter 2) Fall 2011

The Binomial Distribution

Decision-making with the AHP: Why is the principal eigenvector necessary

EconS Advanced Microeconomics II Handout on Cheap Talk

How To Understand The Scientific Theory Of Evolution

Simple Predictive Analytics Curtis Seare

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

S P S S Statistical Package for the Social Sciences

Recall this chart that showed how most of our course would be organized:

Published entries to the three competitions on Tricky Stats in The Psychologist

Chapter 1: The Nature of Probability and Statistics

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Application of sensitivity analysis in investment project evaluation under uncertainty and risk

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Data Mining 5. Cluster Analysis

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 22

Measurement. How are variables measured?

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CAPM, Arbitrage, and Linear Factor Models


Northumberland Knowledge

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Statistics Review PSY379

Microsoft Azure Machine learning Algorithms

Problem Set 2 Due Date: March 7, 2013

Notes on Factoring. MA 206 Kurt Bryan

Lecture Notes: Basic Concepts in Option Pricing - The Black and Scholes Model

1 Measurement Scales

Correlation key concepts:

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

Interpreting Data in Normal Distributions

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

Topic #1: Introduction to measurement and statistics

160 CHAPTER 4. VECTOR SPACES

IBM SPSS Statistics for Beginners for Windows

Chapter 5 Analysis of variance SPSS Analysis of variance

UNDERSTANDING THE TWO-WAY ANOVA

Framing Business Problems as Data Mining Problems

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

Statistics. Measurement. Scales of Measurement 7/18/2012

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Lesson 3: Calculating Conditional Probabilities and Evaluating Independence Using Two-Way Tables

/-- / \ CASE STUDY APPLICATIONS STATISTICS IN INSTITUTIONAL RESEARCH. By MARY ANN COUGHLIN and MARIAN PAGAN(

parent ROADMAP MATHEMATICS SUPPORTING YOUR CHILD IN HIGH SCHOOL

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Quantitative vs. Categorical Data: A Difference Worth Knowing Stephen Few April 2005

DATA COLLECTION AND ANALYSIS

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Descriptive Inferential. The First Measured Century. Statistics. Statistics. We will focus on two types of statistical applications

Social Media Mining. Data Mining Essentials

Association Between Variables

Exploratory Data Analysis

FAO Standard Seed Security Assessment CREATING MS EXCEL DATABASE

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Variables. Exploratory Data Analysis

Introduction; Descriptive & Univariate Statistics

Research Methods & Experimental Design

Measurement is the foundation of scientific inquiry. In order to test our hypotheses, we. Chapter 7. Measurement. Levels Of Measurement

Mind on Statistics. Chapter 15


CALCULATIONS & STATISTICS

1.1 What is Statistics?

Elements of probability theory

Applications to Data Smoothing and Image Processing I

SPSS Workbook 1 Data Entry : Questionnaire Data

Income Statement. (Explanation)

Analyzing Research Data Using Excel

Multiplying and Dividing Signed Numbers. Finding the Product of Two Signed Numbers. (a) (3)( 4) ( 4) ( 4) ( 4) 12 (b) (4)( 5) ( 5) ( 5) ( 5) ( 5) 20

Best Practices in Data Visualizations. Vihao Pham January 29, 2014

Best Practices in Data Visualizations. Vihao Pham 2014

Intro to GIS Winter Data Visualization Part I

Approaches to Theory and Method in Criminal Justice

Exploratory Data Analysis

IBM SPSS Direct Marketing 19

Lessons Learned International Evaluation

Database Security. Database Security Requirements

Linear Codes. Chapter Basics

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Decision Trees What Are They?

Applied Finite Mathematics Second Edition. Rupinder Sekhon De Anza College Cupertino, California. Page 1

Lecture 6 - Data Mining Processes

An Analysis of Rank Ordered Data

A Short Introduction Prepared by Mirya Holman

Global Views on Immigration

Transcription:

2. Data and Measurement 2.1. Basic Concepts Statistical research is always concerned with a group of research objects, called population or universe (populaatio/perusjoukko). Determining the bounds of the relevant population is the rst step in every statistical research project. Example (continued). A relevant population would have been the complete set of candidates willing to audit for the Big Brother TV show (also in past and coming years, not just the ones whose personality we tested). Note: The results of some statistical research are only valid for the population under study! Example (continued). Our hypothetical example pertained to the Big Brother TV show in Great Britain. Cultural di erences may lead to completely di erent patterns in Finland or Spain. 21

The elements of a population are the so called (sampling) units (tilastoyksikäot). In the examples above, the individual applicants for thetvshowaresamplingunits. The characteristic under investigation (e.g. narcissistic personality disorder) is called a statistical variable (tilastollinen muuttuja). The values obtained or measured for a statistical variable are called observation values (havaintoarvot) or simply observations (havainnot). The collection of all observationsiscalleddata(havaintoaineisto). We may be interested in more than one characteristic of the same population at the same time. In such cases it is customs to arrange the observations in a table called data matrix (havaintomatriisi), where the sampling units arearrangedinrowsandthestatisticalvariables in columns. The observations obtained for a particular sampling unit (the rows) are then called data vectors (havaintovektorit) or pro les (pro ilit), and the observations for each characteristic (the columns) distribution vectors (jakaumavektorit). 22

Example. Consider the relationship between the latest advertising campaign for a certain soft drink and the number of bottles sold. Eight persons have been asked, how often they saw the advertising campaign and how many bottles they purchased. The following data has been obtained: number of number of adverts seen bottles bought Person 1 5 10 Person 2 10 12 Person 3 4 5 Person 4 0 4 Person 5 2 1 Person 6 7 3 Person 7 3 4 Person 8 6 8 Here the population consists of eight persons, the sample units are person no. 1 to 8, and the statistical variables are the numbers of adverts seen and bottles purchased. The data matrix contains 16 data points, which may be split up either according to person into 8 data vectors or pro les of 2 data points each (rows 1{8), or into 2 distribution vectors of 8 data points each (columns 1{2). 23

2.2. Measurement The term measurement (mittaaminen) denotes the process of assigning concrete values to statistical variables for the di erent sample units either by direct observation or indirect means such as questionaires. The concrete numbers obtained are then used as observation values (havaintoarvot) for the statistical variables. Not everything can be measured in numbers. Factors, that are hard or impossible to measure quantitatively, such as intelligence, may still be approximated as such by using so called indicators (indikaattorit), such as the sum of points attained in a battery of intelligence tests. Carehastobetakenthatthestatisticalvariable chosen is representative for the characteristic to be studied. This is called the variables validity (validiteetti). A variables reliability (reliabiliteetti) tells whether its values are stable with respect to random in- uences (e.g. repeated measurements should yield identical observations). 24

Scales of Measurement 1. Nominal Scale (Nominaali- eli luokittelueli laatueroasteikko): Numbers are used simply to label groups or classes. Which number is used for labeling any speci c class is completely arbitrary. Therefore, the only thing that can be inferred from statistical variables measured on nominal scale is to which class the statistical unit in question belongs. In particular, we may not use them in order to rank the statistical units or even do arithmetic calculations with their values. Example: gender: male = 1 female = 2 profession: priest = 1 parish clerk = 2 cantor = 3 Liisa is a priest and Leena is a cantor. Liisa and Leena have di erent professions. Liisa and Leena have the same gender. 25

2. Ordinal Scale (Ordinaali-/jÄarjestysasteikko): In addition to classi cation, the statistical units may be ordered or ranked by their index values. There is however not enough information to perform arithmetic calculations with them. Example: grade: satisfactory = 1 good = 2 excellent = 3 opinion poll: agree fully = 1 agree partially = 2 do not agree nor disagree = 3 disagree partially = 4 disagree fully = 5 Mattis grade is good and Liisas grade is excellent. Matti and Liisa got di erent grades. Liisas grade is better than Mattis. 26

3. Interval Scale (Intervalli- eli väalimatkaasteikko): The distance between two measurements has a meaning, but the value of zero is assigned arbitrarily. Therefore we cannot meaningfully take the ratio of two measurements, but wecantaketheratioofintervals. Example: Clock Time. 10:00h is not twice as late as 5:00h, because 0:00h has been set to an arbitrary point in time (midnight). But the interval betwen 0:00h and 10:00h is twice as large as the interval between 5:00h and 10:00h. Example: Temperature. 0 C does not mean zero heat, as it has been arbitrarily set to the freezing point of water (in Fahrenheit the same temperature would read 32 F), and 100 C (212 F) is not twice as hot as 50 C(122 F). But the di erence between 0 C and 100 Cistwiceaslargeas the di erence between 50 C and 100 C, no matter whether measured in Cor F. 27

4. Ratio Scale (suhdeasteikko): The distance between two measurements has a meaning and the zero in this scale is an absolute zero, where the characteristic in question \disappears". Therefore, wemaytaketheratioof two measurements. Example: Money. 0C= is no money and 100C= is twice as much money as 50C=. Example: Weight. Matti weighs 90kg and Liisa 45kg. Matti and Liisa have di erent weights. Matti is heavier than Liisa. Matti weighs 45kg more than Liisa. Mattis weight is twice as large as Liisas. Note: The interval between two interval scale measurements will always be on ratio scale. Example: Clock and calendar time are only on interval scale but time intervals (measured e.g. in seconds, hours, days, years) are on ratio scale. 28

Quantitative vs. Qualitative Variables Statistical variables may be classi ed according to their measurement scale. Variables measured on nominal and ordinal style are called qualitative (or categorical) (kvalitatiivinen eli laadullinen), and variables measured on interval and ratio scale are called quantitative (or numerical) (kvantitatiivinen eli mäaäaräallinen). Quantitative variables may be either discrete (diskreeti) or continuous (jatkuva) depending on whether they may attain only a countable number of values with some (not necessarily integer or equidistant) intervals between them, or any real value within a relevant range. Note. In reality, all statistical variables are discrete due to nite measurement precision. But we call them continuous, if the idealization of perfect measurement precision would in principle lead to an uncountable in nite number of possible values. 29