CTRC Core Curriculum Seminar Series

Size: px
Start display at page:

Download "CTRC Core Curriculum Seminar Series"

Transcription

1 CTRC Core Curriculum Seminar Series Descriptive Statistics: Data Types and Measures, Central Tendency, Variability Chang-Xing Ma, PhD Associate Professor Department of Biostatistics, UB January 4, 2012

2 Disclosure Statement Chang-Xing Ma, PhD Nothing to disclose

3 Goals and Objectives Goals: Gain the knowledge of basic statistics and how to describe the data Objectives: Describe the data type Summarize data Understand Measure of Central Tendency Understand Measure of Dispersion

4 Outline Basic concepts of biostatistics Data type Summarize data Measure of Central Tendency Measure of Dispersion

5 Some terminology Statistics is the study of how to collect, organize, analyze, and interpret numerical information from data Biostatistics the theory and techniques for collecting, describing, analyzing, and interpreting health data.

6 Some terminology Population refer to all measurements or observations of interest Sample is simply a part of the population. But the sample MUST represent the population. A random sample is such a representative sample The sample must be large enough The sample should be selected randomly

7 Some terminology Parameter is some numerical or nominal characteristic of a population A parameter is constant, e.g. mean of a population Usually unknown Statistic is some numerical or nominal characteristic of a sample. We use statistic as an estimate of a parameter of the population It tends to differ from one sample to another We also use statistic to test hypothesis

8 Population: all U.S. persons ~ Normal (µ h,σ h2 ), Parameters (µ w,σ w2 ), A random sample: sample size = 100 Generate Gender Height Weight A sample mean height: std height: mean weight std weight % of male (=1) statistics True Paramete

9 Sources of data Records Surveys Experiments Comprehensive Sample

10 Types of variables Quantitative variables Qualitative variables Quantitative continuous Qualitative nominal Quantitative discrete Qualitative ordinal

11 Data Types Numerical (Quantitative) numerical measurement Height Weight Categorical (Qualitative) with no natural sense of ordering Gender Hair color Blood type

12 Numerical Variable Continuous Range of values Discrete Height in inch Limited possible values Age - # of smoking per day # of children in a family

13 Determining Data Types Ordinal (Categorical) vs. Discrete (Numerical) Ordinal Cancer Stage I, II, III, IV Stage II 2 times Stage I Categories could also be A, B, C, D Discrete # of children: 0, 1, 2, 4 children = 2 times 2 children

14 Descriptive Statistics reducing a complex mass of data to a manageable set of information Descriptive Statistics: the summary and presentation of data to: simplify the data enable meaning full interpretation support decision making Numerical descriptive measures (few numbers) Graphical presentations

15 Inferential statistics From a sample to estimate population parameters to test hypothesis to build the model to reflect the population

16 The student test score (FCAT) Student ID Race Sex Reading Math Poverty W F W F W F W F B F B F W F W M W M W M W M W M B M B M H F B F W M W M Code: Race: W White B Black H Hispanic A Asian Sex: F Female M Male Poverty: 0 not poor 1 poor Problem 1 1.Among the 6 variables, which ones are qualitative and which ones are quantitative? 2.Is Race nominal or ordinal?

17 Descriptive Statistics Categorical variables: Frequency distribution Bar chart, pie chart Contingency tables Continuous variables: Grouped frequency table Central Tendency Variability

18 Simple Frequency Distribution An ordered arrangement that shows the frequency of each level of a variable. race Frequency Percent A B H W sex Frequency Percent F M

19 Simple Frequency Distribution It is useful for categorical variable For continuous variable, it allows you to pick up at a glance some valuable information, such as highest, lowest value. ascertain the general shape or form of the distribution make an informed guess about central tendency values

20 Bar Chart Race BY None summarizing a set of categorical data - nominal or ordinal data It displays the data using a number of rectangles, each of which represents a particular category. The length of each rectangle is proportional to the number of cases in the category it represents can be displayed horizontally or vertically they are usually drawn with a gap between the bars Bars for multiple (usually two) variables can be drawn together to see the relationship A B H W Race Horizontally

21 Pie Chart summarizing a set of categorical data - nominal or ordinal data It is a circle which is divided into segments. Each segment represents a particular category. The area of each segment is proportional to the number of cases in that category. Female Male Race PieExploded

22 Complex frequency distribution Table Distribution of 20 lung cancer patients at the chest department of Alexandria hospital and 40 controls in May 2008 according to smoking Smoking Cases Lung cancer Control Total No. % No. % No. % Smoker 15 75% 8 20% Non smoker 5 25% 32 80% Total

23 How about continuous variables? How data is distributed? Measure of Central Tendency Measure of Variability

24 DATA: Grouped Frequency Distribution for continuous variable Frequency Table Example Data [142.5, 157.5) 1.6% [157.5, 172.5) % [172.5, 187.5) % [187.5, 202.5) % [202.5, 217.5) % [217.5, 232.5) % Interval Size: New Data N: µ: σ:

25 Grouped Frequency Distribution BUT the problem is that so much information is presented that it is difficult to discern what the data is really like, or to "cognitively digest" the data. the simple frequency distribution usually need to condense even more. It is possible to lose information (precision) about the data to gain understanding about distributions. This is the function of grouping data into equal-sized intervals called class intervals. The grouped frequency distribution is further presented as Frequency Polygons, Histograms, Bar Charts, Pie Charts.

26 Describing Distributions Bell-Shaped Distribution 0.40 Normal distribution N (µ=0, σ 2 =1) t-distribution

27 Describing Distributions Skewed Distribution positively skewed distribution

28 Describing Distributions Skewed Distribution negatively skewed distribution

29 Describing Distributions Other Shapes Rectangular Bimodal

30 Other Shapes J-curve Describing Distributions

31 Probability density function - Normal z-transform green curve is standard normal distribution

32 Measure of Central Tendency Mean, Median, Mode The Mean average value not robust to outlying value X N i 1 N X i Length of hospital stays: 6, 4, 5, 9, 10, 7, 1, 4, 3, 4 Mean=( )/10=5.3

33 Measure of Central Tendency Mean, Median, Mode The Median is the point that divides a distribution of data into two equal parts robust to outlying value Length of hospital stays: sort data median=4.5 Split Data

34 Measure of Central Tendency Mean, Median, Mode The Mode is the midpoint of the interval that has highest frequency robust to outlying value, but sometimes misleading Length of hospital stays: sort data Mode=4, which occurred 3 times. Most frequently

35 Comparison between mean and median Mean Median

36 Comparison between mean and median Median Mean

37 Comparison between mean and median Mean Median

38 Summary Frequency distribution Histogram, Polygon graph Bar Chart, Pie Chart Describing Distributions Mean, Median, Mode DATASET:

39 Problem 2 In a study, we collected a medical measurements X for 4 patients Data of X: 2, 3, 5, 6 Mean of X? Median of X? Mode of?

40 The sample range Interquartile range Descriptive Statistics Variability The sample standard deviation (SD), variance Standard error of mean (SEM)

41 Measures of Dispersion - Range Range the difference between the lowest and highest For example, Age of Patients (years): lowest 2, highest 17 Range=2-17 years When sample size increases, the range tends to increase as well. (not robust)

42 Measures of Dispersion - Range All of curves have the same range Mean? Median?

43 Measures of Dispersion Percentiles, Deciles, Quartiles Percentiles: based on dividing a sample or population into 100 equal parts. Deciles divide the distribution into 10 parts Quartiles divide the distribution into 4 equal parts. 1 st quartile includes the lowest 25% of the values (Q1) 2 st quartile includes the values from 26 percentile through 50 percentile (Q2) - median 3 st quartile includes the values from 51 percentile through 75 percentile (Q3)

44 Measures of Dispersion Interquarile Range Interquarile Range the 25 percentile (1 st quartile) to 75 percentile (3 rd quartile) Age of Patients (years): st quartile 6, 2 nd quartile 8.5, 3 rd 13 Interquarile Range = 6-13 years Interquarile Range is a robust estimate of data variability

45 Measures of Dispersion Interquarile Range Robust estimate, less efficient

46 Deviations from the mean Variance and Standard Deviation deviation: observation - mean sum of deviation BUT ( x x) i ( x) x i 0

47 Deviations from the mean Variance and Standard Deviation Measure of how different the values in a set of numbers are from each other Variance: Standard Deviation: 2 2 ) ( 1 1 x x n s i 2 ) ( 1 1 x x n s i

48 Deviations from the mean Variance and Standard Deviation Data set: 2,3,5,6 2 1 Calculation: s ( xi n 1 x) 2 x x / n ( ) / 4 i 4.0 x Value of X (X- ) (X- ) =0 =10 Variance Standard Deviation s ( x ) n 1 x i x 10 /(4 1) 1 2 s ( x ) n 1 x i

49 0.60 Three normal distributions: mean=0 s 2 =1 s 2 =2 s 2 =0.5 Leptokurtic Homogenous Narrow scatter Mesokurtic Platykurtic Heterogeneous wide scatter Central Tendency mean=0 0,1 0,2 0,0.5

50 Example 2: FEV1 (litres) of 57 male medical students Table: FEV1 (litres) of 57 male medical students

51 Frequency Example 2: FEV1 (litres) of 57 male medical students Mean: 4.06 Variance: 0.45 SD: 0.67 Q1: 3.54 Q2 (Median): 4.10 Q3: Percentile 5.16 Range: 2.85 to FEV1 (litre)

52 The Meaning of Standard Deviation How the data are dispersed around mean Mean ± 1 SD represent 68.3% of the population Mean ± 2 SD represent 96% of the population Mean ± 3 SD represent 99.7% of the population

53 The Meaning of Standard Deviation ±SD % of Pop % 34% 1SD 1SD 2SD 48% 2SD 48%

54 Standard Error of Mean (SEM) How confident can we be that the sample mean represents the population mean µ? SEM=SD/ n SEM must be much smaller than the SD mean ± 1.96*SD cover 95% of the data mean ± 1.96*SEM cover 95% of the population mean SEM and SD are different!

55 Standard Error of Mean (SEM) Describing the scatter or spread of data, use SD Estimate population parameters, use SEM Epidemiologic study, SEM Clinical or laboratory research, SD

56 Put DATA below: None RUN Summarizing Data - Calculator Interval Size: Mean: 4.06 Variance: 0.45 SD: 0.67 Q1: 3.54 Q2 (Median): 4.10 Q3: Percentile 5.16 Range: 2.85 to New Data N: µ: σ: 100 Race 10 Ylim: 2,6 ReDraw

57 Box-Plot The box itself contains the middle 50% of the data. The upper edge (hinge) of the box indicates the 75th percentile of the data set, and the lower hinge indicates the 25th percentile. The range of the middle two quartiles is known as the inter-quartile range. The line in the box indicates the median value of the data. The + indicate mean value The ends of the vertical lines or "whiskers" indicate the minimum and maximum data values, unless outliers are present in which case the whiskers extend to a maximum of 1.5 times the inter-quartile range. The points outside the ends of the whiskers are outliers or suspected outliers

58 Box Plot Example 2 FEV1 of 57 students Serum triglyceride measurements in cord blood from 282 babies

59 What you can get from a box-plot? Graphically display a variable's location and spread at a glance. [Q1, Q2 (median), Q3, interquartile range] Provide some indication of the data's symmetry and skewness. Unlike many other methods of data display, boxplots show outliers. By using a boxplot for each categorical variable side-by-side on the same graph, one quickly can compare data sets. One drawback of boxplots is that they tend to emphasize the tails of a distribution, which are the least certain points in the data set. They also hide many of the details of the distribution. Displaying histogram in conjunction with the boxplot helps

60 frequency frequency Transformations triglyceride triglyceride log(triglyceride) LOG (triglyceride)

61 Summarizing data Univariate categorical variable Frequency distributions Bar Chart, Pie Chart

62 Summarizing data Univariate continuous variable Grouped frequency distributions Polygon or histogram Mean, Median, Mode, Percentile, Q1, Q2, Q3, extreme values Standard deviation, variance, range, interquartile range Box-Plot Normality test statistics

63 Next lecture ( Lecture 2) Bivariate one is categorical and the other is continuous variable t-test ANOVA

64 Lecture 3 categorical data analysis Bivariate both are categorical Contingency tables Chi-square test Response is categorical, predictors could be both types. Logistical regression

65 Lecture 4 Continuous response Correlation Multiple linear regression

66 Thanks. Question?

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: What do the data look like? Data Analysis Plan The appropriate methods of data analysis are determined by your data types and variables of interest, the actual distribution of the variables, and the number of cases. Different analyses

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

List of Examples. Examples 319

List of Examples. Examples 319 Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Descriptive Statistics and Exploratory Data Analysis

Descriptive Statistics and Exploratory Data Analysis Descriptive Statistics and Exploratory Data Analysis Dean s s Faculty and Resident Development Series UT College of Medicine Chattanooga Probasco Auditorium at Erlanger January 14, 2008 Marc Loizeaux,

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1 DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1 OVERVIEW STATISTICS PANIK...THE THEORY AND METHODS OF COLLECTING, ORGANIZING, PRESENTING, ANALYZING, AND INTERPRETING DATA SETS SO AS TO DETERMINE THEIR ESSENTIAL

More information

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175) Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement Measurement & Data Analysis Overview of Measurement. Variability & Measurement Error.. Descriptive vs. Inferential Statistics. Descriptive Statistics. Distributions. Standardized Scores. Graphing Data.

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Exploratory Data Analysis. Psychology 3256

Exploratory Data Analysis. Psychology 3256 Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Using SPSS, Chapter 2: Descriptive Statistics

Using SPSS, Chapter 2: Descriptive Statistics 1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Chapter 1 Review 1. As part of survey of college students a researcher is interested in the variable class standing. She records a 1 if the student is a freshman, a 2 if the student

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Learning Objectives: 1. After completion of this module, the student will be able to explore data graphically in Excel using histogram boxplot bar chart scatter plot 2. After

More information

Describing and presenting data

Describing and presenting data Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Demographics of Atlanta, Georgia:

Demographics of Atlanta, Georgia: Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13 COMMON DESCRIPTIVE STATISTICS / 13 CHAPTER THREE COMMON DESCRIPTIVE STATISTICS The analysis of data begins with descriptive statistics such as the mean, median, mode, range, standard deviation, variance,

More information

EXPLORING SPATIAL PATTERNS IN YOUR DATA

EXPLORING SPATIAL PATTERNS IN YOUR DATA EXPLORING SPATIAL PATTERNS IN YOUR DATA OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze

More information

Module 2: Introduction to Quantitative Data Analysis

Module 2: Introduction to Quantitative Data Analysis Module 2: Introduction to Quantitative Data Analysis Contents Antony Fielding 1 University of Birmingham & Centre for Multilevel Modelling Rebecca Pillinger Centre for Multilevel Modelling Introduction...

More information

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.) Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize

More information

Scatter Plots with Error Bars

Scatter Plots with Error Bars Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

More information

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

More information

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

A Correlation of. to the. South Carolina Data Analysis and Probability Standards A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

SPSS Manual for Introductory Applied Statistics: A Variable Approach

SPSS Manual for Introductory Applied Statistics: A Variable Approach SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All

More information

Chapter 2 Data Exploration

Chapter 2 Data Exploration Chapter 2 Data Exploration 2.1 Data Visualization and Summary Statistics After clearly defining the scientific question we try to answer, selecting a set of representative members from the population of

More information

Intro to Statistics 8 Curriculum

Intro to Statistics 8 Curriculum Intro to Statistics 8 Curriculum Unit 1 Bar, Line and Circle Graphs Estimated time frame for unit Big Ideas 8 Days... Essential Question Concepts Competencies Lesson Plans and Suggested Resources Bar graphs

More information

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers 1.3 Measuring Center & Spread, The Five Number Summary & Boxplots Describing Quantitative Data with Numbers 1.3 I can n Calculate and interpret measures of center (mean, median) in context. n Calculate

More information

430 Statistics and Financial Mathematics for Business

430 Statistics and Financial Mathematics for Business Prescription: 430 Statistics and Financial Mathematics for Business Elective prescription Level 4 Credit 20 Version 2 Aim Students will be able to summarise, analyse, interpret and present data, make predictions

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

Descriptive statistics parameters: Measures of centrality

Descriptive statistics parameters: Measures of centrality Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building

More information

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous Chapter 2 Overview Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Classify as categorical or qualitative data. 1) A survey of autos parked in

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Statistics Revision Sheet Question 6 of Paper 2

Statistics Revision Sheet Question 6 of Paper 2 Statistics Revision Sheet Question 6 of Paper The Statistics question is concerned mainly with the following terms. The Mean and the Median and are two ways of measuring the average. sumof values no. of

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Unit Support Notes Statistics (SCQF level 6)

Unit Support Notes Statistics (SCQF level 6) Unit Support Notes Statistics (SCQF level 6) This document may be reproduced in whole or in part for educational purposes provided that no profit is derived from reproduction and that, if reproduced in

More information

2. Filling Data Gaps, Data validation & Descriptive Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics 2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Basics of Statistics

Basics of Statistics Basics of Statistics Jarkko Isotalo 30 20 10 Std. Dev = 486.32 Mean = 3553.8 0 N = 120.00 2400.0 2800.0 3200.0 3600.0 4000.0 4400.0 4800.0 2600.0 3000.0 3400.0 3800.0 4200.0 4600.0 5000.0 Birthweights

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

determining relationships among the explanatory variables, and

determining relationships among the explanatory variables, and Chapter 4 Exploratory Data Analysis A first look at the data. As mentioned in Chapter 1, exploratory data analysis or EDA is a critical first step in analyzing the data from an experiment. Here are the

More information

3: Summary Statistics

3: Summary Statistics 3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck! STP 231 EXAM #1 (Example) Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.

More information

Mind on Statistics. Chapter 2

Mind on Statistics. Chapter 2 Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table

More information

CHAPTER THREE. Key Concepts

CHAPTER THREE. Key Concepts CHAPTER THREE Key Concepts interval, ordinal, and nominal scale quantitative, qualitative continuous data, categorical or discrete data table, frequency distribution histogram, bar graph, frequency polygon,

More information

Midterm Review Problems

Midterm Review Problems Midterm Review Problems October 19, 2013 1. Consider the following research title: Cooperation among nursery school children under two types of instruction. In this study, what is the independent variable?

More information

Name: Date: Use the following to answer questions 2-3:

Name: Date: Use the following to answer questions 2-3: Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

More information

Data exploration with Microsoft Excel: analysing more than one variable

Data exploration with Microsoft Excel: analysing more than one variable Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical

More information

Descriptive statistics; Correlation and regression

Descriptive statistics; Correlation and regression Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1. Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

More information

Descriptive Analysis

Descriptive Analysis Research Methods William G. Zikmund Basic Data Analysis: Descriptive Statistics Descriptive Analysis The transformation of raw data into a form that will make them easy to understand and interpret; rearranging,

More information