# Paper DV KEYWORDS: SAS, R, Statistics, Data visualization, Monte Carlo simulation, Pseudo- random numbers

Save this PDF as:

Size: px
Start display at page:

Download "Paper DV-06-2015. KEYWORDS: SAS, R, Statistics, Data visualization, Monte Carlo simulation, Pseudo- random numbers"

## Transcription

2 bivariate pairs of (x,y) values. These data are then correlated, with the Pearson s r value saved in variable a. Then the x and y values are returned and the process is repeated for i = 100 iterations. As a result, there are now 100 Pearson s r values. The result is depicted in Figure 1. DEMONSTRATION 1: CORRELATION R CODE a <- NULL for (i in 1:100) { x <- round(runif(50, 1, 100)) y <- round(runif(50, 1, 100)) a[i] <- cor(x,y) } plot(a, col = "blue", main = "Figure 1: Correlations", xlab = "Iteration", ylab = "Pearson's r") abline(h=0) print(sort(a)) #Step 1. Stop here. b <- a[a<.01 & a>-.01] plot(b, ylim=c(-.01,.01), col = "blue", main = "Figure 2: Correlations near Zero", xlab = "Count", ylab = "Pearson's r", xaxp = c(x1 = 1, x2 = length(b), n = length(b)-1)) abline(h=0) #Step 2. Zoomed in. Figure 1. Pearson s r Values in R. The X axis of the plot represents the repetition, ordered from 1 through 100. The Y axis shows the resulting Pearson s r value for that particular iteration. No seed is set in order to rerun the code multiple times and explore various random outcomes. Each time the code is executed, the dots move around. The plot has a horizontal line across at r = 0. Ignoring direction (i.e., positive and negative), one thing that can be seen is how using random numbers, the magnitude of the relationship between x and y can be as high as roughly 0.3 in this case. In some cases when the experiment is run this gets as high as 0.4 or 0.5 in magnitude. The message to take away from this is to be suspicious with low correlation values, as even using totally random numbers with pure noise and no signal, low to moderate correlations might be found due to chance alone. At the very least, repetition of experiments is advised. Using this demonstration one can see how a correlation that low in magnitude could easily be, and is probably is, nothing meaningful. 2

3 Figure 2 is the R printout for all 100 Pearson s r values, sorted smallest to largest. Note how the code allows for both the data to be visualized and printed out so one can cross- reference. Once again, it can be seen that r = 0.3 is occurring at both the negative and positive sides. Specifically, by chance, the strongest negative correlation in this dataset was r = , while the strongest positive was Figure 2. Sorted R Console Printout of Pearson s r Values. Another relevant insight is that just because one looks at two completely random variables, does not mean it can be expected for the variables to be perfectly uncorrelated. Indeed, most dots fall nowhere near the line across at zero, despite there being 100 of them. So not only can random data be stronger than one might suspect on the high side, they also might be stronger on the low side. The last five lines of code allow the instructor to zoom in to show the class how few very low Pearson s r values there are. Specifically, the b value has been created to visualize all correlations smaller than Figure 3 shows the results with this filter included. In this case, only 9 out of 100 correlations are even close to zero, with none of them exactly on the zero line. This helps students understand not to expect even random data to be perfectly uncorrelated at r = like they might expect. Instead, they are encouraged to adopt the general heuristic that only at roughly r = 0.5 or 0.6 should a correlation start to be considered meaningful. Anything less can easily occur due to chance alone. Figure 3. Pearson s r Values < 0.01 in R. 3

4 DEMONSTRATION 1: CORRELATION SAS CODE SAS was used to replicate the existing R code. The software is similarly capable of creating correlation data from scratch. Once again, n = 50 bivariate (x,y) pairs of data were drawn and correlated for a total of i = 100 iterations. The code can be seen below, followed by the visual output in Figure 4. DATA correlation; DO i = 1 to 100; DO j = 1 to 50; x = ROUND((RAND('UNIFORM')*100), 1); y = ROUND((RAND('UNIFORM')*100), 1); output; END; END; PROC CORR DATA = correlation OUTP = r NOPRINT; VAR x y; BY i; PROC SQL; CREATE TABLE r2 AS SELECT t1.i, t1.x FROM WORK.r t1 WHERE t1._name_ = 'y'; QUIT; PROC SGPLOT data = r2; SCATTER x = i y = x; XAXIS LABEL="Iteration"; YAXIS LABEL="Pearson's R"; REFLINE 0; RUN; Figure 4. Pearson s r Values in SAS. 4

5 The code was then run to filter only the values for which r < The results are visualized below in Figure 5. Figure 5. Pearson s r Values < 0.01 in SAS. DEMONSTRATION 2: One of the important concepts to understand in statistics is Type I error, or false positives. Nominal alpha is set in order to have an objective goalpost. However, if set to alpha = 0.05, then in 5% or 1 in 20 of hypothesis tests the null will be rejected (on average). This is true due to probability and occurs regardless of using real, sensible data or purely pseudo- random data. Therefore, a Monte Carlo simulation was coded to demonstrate how type I errors can appear. The lessons learned are to replicate scientific studies, as well as use ANOVA combined with post hoc analyses and Hotelling s t 2 whenever appropriate. The X axis in Figure 6 shows the p- values sorted lowest to highest. The Y axis shows what the p- values are. In this particular case, exactly 5 out of 100 p- values are lower than the cut- off point, which can vary. No seed is set so the experiment can be repeated to show a variety of chance outcomes. Note the false positive rate can vary each experiment, but will be 5% on average. 5

6 DEMONSTRATION 2: P- VALUES R CODE a <- NULL b <- 0 c <- NULL for (i in 1:1000) { x <- rnorm(50) y <- rnorm(50) a[i] <- t.test(x, y, var.equal = TRUE)\$p.value if(a[i] <= 0.05) b <- b+1 } c <- b/i*100 plot(sort(a), col = "blue") abline(h=0.05) print(sort(a)) paste(b,"out of",i,"which is",c,"percent",sep = " ") Figure 6. Sorted p- values, i = 100 reps. Lower is Stronger; Below Line is Statistical Significance. The last line of code instructs the console to output how many p- values were statistically significant and at what percent. In this case, in Figure 7 shows an exact 5%. Figure 7. R Console Output showing Type I Error Rate. i = 100 reps 6

7 One can also change the number of iterations for great effect. Figure 7 is with i = 20 repetitions, as 5% is 1 in 20, making it the lowest common denominator. Figure 8. Sorted p- values, i = 20 reps. Lower is Stronger; Below Line is Statistical Significance. The printout automatically changes to Figure 9. Figure 9. R Console Output showing Type I Error Rate. i = 20 reps In this case the result is slightly off of 5% at 2 out of 20. This example has been cherry picked to demonstrate its possibility. Finally, one can look at the Type I error rate when conducting i = 1000 tests, as can be seen in Figure 10. 7

8 Figure 10. Sorted p- values, i = 1000 reps. Lower is Stronger; Below Line is Statistical Significance. The console output corresponding to Figure 10 can be seen in Figure 11. Figure 11. R Console Output showing Type I Error Rate. i = 1000 reps Once again, it is not a perfect 5%, but if one runs the code repeatedly it will generally be slightly higher or lower than 5%, sometimes 5% on the dot, distributed as a bell curve due to central limit theorem. 8

9 DEMONSTRATION 2: P- VALUES SAS CODE DATA t; DO i = 1 to 100; DO j = 1 to 100; x = RAND('NORMAL'); if j <= 50 then group = "a"; if j > 50 then group = "b"; output; END; END; PROC TTEST DATA = t; CLASS group; VAR x; BY i; ODS output TTESTS = p; PROC SQL; ALTER table p ADD counter num; CREATE TABLE p2 AS SELECT t1.i, t1.probt, t1.counter FROM WORK.P t1 WHERE t1.variances = 'Equal' ORDER BY t1.probt; QUIT; DATA p2; SET p2; id = _N_; IF Probt <=.05 then counter = 1; ELSE counter = 0; RUN; PROC SGPLOT data = p2; SCATTER x = id y = Probt; XAXIS LABEL="Sorted Iteration"; YAXIS LABEL="P-Value"; REFLINE 0.05; RUN; PROC PRINT DATA = p2; SUM counter; RUN; 9

10 Figure 12. Sorted p- values, i = 100 reps. Lower is Stronger; Below Line is Statistical Significance. The graphical printout in Figure 12 shows 5 false positives out of 100 iterations. Therefore, the R demonstration has again been successfully replicated in SAS. CONCLUSION Ultimately, the purpose of these exercises was to visually demonstrate how statistics works under the hood. Data were pseudo- randomly drawn from theoretical distributions in order to gain real world insights into how a researcher should use the techniques. Specifically, the correlation and t- test were used to intuitively demonstrate the results one can expect when examining data trends and conducting hypothesis tests. CONTACT INFORMATION Name: Jack Sawilowsky, Ph.D. Enterprise: Union Pacific Railroad Address: 1400 Douglas STOP 1120 City, State ZIP: Omaha, NE Work Phone: E- mail: Web: 10

11 Trademark Citation SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. 11

### containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

Getting Correlations Using PROC CORR Correlation analysis provides a method to measure the strength of a linear relationship between two numeric variables. PROC CORR can be used to compute Pearson product-moment

### Research Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement

Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.

### Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Pearson's Correlation Tests

Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Paper 208-28. KEYWORDS PROC TRANSPOSE, PROC CORR, PROC MEANS, PROC GPLOT, Macro Language, Mean, Standard Deviation, Vertical Reference.

Paper 208-28 Analysis of Method Comparison Studies Using SAS Mohamed Shoukri, King Faisal Specialist Hospital & Research Center, Riyadh, KSA and Department of Epidemiology and Biostatistics, University

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

### Module 5: Statistical Analysis

Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Geostatistics Exploratory Analysis

Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

### An analysis method for a quantitative outcome and two categorical explanatory variables.

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

### Point Biserial Correlation Tests

Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

### Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

### What are the place values to the left of the decimal point and their associated powers of ten?

The verbal answers to all of the following questions should be memorized before completion of algebra. Answers that are not memorized will hinder your ability to succeed in geometry and algebra. (Everything

### Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

### QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

### Statistics for Clinical Trial SAS Programmers 1: paired t-test Kevin Lee, Covance Inc., Conshohocken, PA

Statistics for Clinical Trial SAS Programmers 1: paired t-test Kevin Lee, Covance Inc., Conshohocken, PA ABSTRACT This paper is intended for SAS programmers who are interested in understanding common statistical

### Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes Simcha Pollack, Ph.D. St. John s University Tobin College of Business Queens, NY, 11439 pollacks@stjohns.edu

### Assignment 2: Option Pricing and the Black-Scholes formula The University of British Columbia Science One CS 2015-2016 Instructor: Michael Gelbart

Assignment 2: Option Pricing and the Black-Scholes formula The University of British Columbia Science One CS 2015-2016 Instructor: Michael Gelbart Overview Due Thursday, November 12th at 11:59pm Last updated

### Statistical Significance and Bivariate Tests

Statistical Significance and Bivariate Tests BUS 735: Business Decision Making and Research 1 1.1 Goals Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions,

### Statistical tests for SPSS

Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

### SPC Data Visualization of Seasonal and Financial Data Using JMP WHITE PAPER

SPC Data Visualization of Seasonal and Financial Data Using JMP WHITE PAPER SAS White Paper Table of Contents Abstract.... 1 Background.... 1 Example 1: Telescope Company Monitors Revenue.... 3 Example

### Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

SAS Enterprise Guide for Educational Researchers: Data Import to Publication without Programming AnnMaria De Mars, University of Southern California, Los Angeles, CA ABSTRACT In this workshop, participants

### MEASURES OF LOCATION AND SPREAD

Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

### Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### LEGENDplex Data Analysis Software

LEGENDplex Data Analysis Software Version 7.0 User Guide Copyright 2013-2014 VigeneTech. All rights reserved. Contents Introduction... 1 Lesson 1 - The Workspace... 2 Lesson 2 Quantitative Wizard... 3

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

### Your Resume Selling Yourself Using SAS

Your Resume Selling Yourself Using SAS Rebecca Ottesen, California Polytechnic State University, San Luis Obispo, CA Susan J. Slaughter, Avocet Solutions, Davis CA ABSTRACT These days selling yourself

### Course Description. Learning Objectives

STAT X400 (2 semester units in Statistics) Business, Technology & Engineering Technology & Information Management Quantitative Analysis & Analytics Course Description This course introduces students to

### January 26, 2009 The Faculty Center for Teaching and Learning

THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

### Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

### What Does the Normal Distribution Sound Like?

What Does the Normal Distribution Sound Like? Ananda Jayawardhana Pittsburg State University ananda@pittstate.edu Published: June 2013 Overview of Lesson In this activity, students conduct an investigation

### The Big Picture. Correlation. Scatter Plots. Data

The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered

### CLUSTER ANALYSIS. Kingdom Phylum Subphylum Class Order Family Genus Species. In economics, cluster analysis can be used for data mining.

CLUSTER ANALYSIS Introduction Cluster analysis is a technique for grouping individuals or objects hierarchically into unknown groups suggested by the data. Cluster analysis can be considered an alternative

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

### Ten Tips for Simulating Data with SAS

Paper SAS1387-2015 Ten Tips for Simulating Data with SAS Rick Wicklin, SAS Institute Inc. ABSTRACT Data simulation is a fundamental tool for statistical programmers. SAS software provides many techniques

There are three kinds of people in the world those who are good at math and those who are not. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Positive Views The record of a month

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### Regression and Correlation

Regression and Correlation Topics Covered: Dependent and independent variables. Scatter diagram. Correlation coefficient. Linear Regression line. by Dr.I.Namestnikova 1 Introduction Regression analysis

### In this chapter, you will learn improvement curve concepts and their application to cost and price analysis.

7.0 - Chapter Introduction In this chapter, you will learn improvement curve concepts and their application to cost and price analysis. Basic Improvement Curve Concept. You may have learned about improvement

### A full analysis example Multiple correlations Partial correlations

A full analysis example Multiple correlations Partial correlations New Dataset: Confidence This is a dataset taken of the confidence scales of 41 employees some years ago using 4 facets of confidence (Physical,

### The North Carolina Health Data Explorer

1 The North Carolina Health Data Explorer The Health Data Explorer provides access to health data for North Carolina counties in an interactive, user-friendly atlas of maps, tables, and charts. It allows

### t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

### Spearman s correlation

Spearman s correlation Introduction Before learning about Spearman s correllation it is important to understand Pearson s correlation which is a statistical measure of the strength of a linear relationship

### Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### Factoring Patterns in the Gaussian Plane

Factoring Patterns in the Gaussian Plane Steve Phelps Introduction This paper describes discoveries made at the Park City Mathematics Institute, 00, as well as some proofs. Before the summer I understood

### Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER JMP Genomics Step-by-Step Guide to Bi-Parental Linkage Mapping Introduction JMP Genomics offers several tools for the creation of linkage maps

### Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

### Nonparametric Statistics

1 14.1 Using the Binomial Table Nonparametric Statistics In this chapter, we will survey several methods of inference from Nonparametric Statistics. These methods will introduce us to several new tables

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### Dealing with Missing Data for Credit Scoring

ABSTRACT Paper SD-08 Dealing with Missing Data for Credit Scoring Steve Fleming, Clarity Services Inc. How well can statistical adjustments account for missing data in the development of credit scores?

### Assignment 2: Exploratory Data Analysis: Applying Visualization Tools

: Exploratory Data Analysis: Applying Visualization Tools Introduction Economic boom, though inspiring, is always connected with unsustainable development. Because of this, people tend to view economic

### Non-Inferiority Tests for One Mean

Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

Chapter 12 Sample Size and Power Calculations Chapter Table of Contents Introduction...253 Hypothesis Testing...255 Confidence Intervals...260 Equivalence Tests...264 One-Way ANOVA...269 Power Computation

### Factoring Trinomials: The ac Method

6.7 Factoring Trinomials: The ac Method 6.7 OBJECTIVES 1. Use the ac test to determine whether a trinomial is factorable over the integers 2. Use the results of the ac test to factor a trinomial 3. For

### Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Consider a study in which How many subjects? The importance of sample size calculations Office of Research Protections Brown Bag Series KB Boomer, Ph.D. Director, boomer@stat.psu.edu A researcher conducts

### Mathematics. Probability and Statistics Curriculum Guide. Revised 2010

Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

### Visualization Software

Visualization Software Maneesh Agrawala CS 294-10: Visualization Fall 2007 Assignment 1b: Deconstruction & Redesign Due before class on Sep 12, 2007 1 Assignment 2: Creating Visualizations Use existing

### Innovative Techniques and Tools to Detect Data Quality Problems

Paper DM05 Innovative Techniques and Tools to Detect Data Quality Problems Hong Qi and Allan Glaser Merck & Co., Inc., Upper Gwynnedd, PA ABSTRACT High quality data are essential for accurate and meaningful

### Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D.

Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D. In biological science, investigators often collect biological

### 6 Scalar, Stochastic, Discrete Dynamic Systems

47 6 Scalar, Stochastic, Discrete Dynamic Systems Consider modeling a population of sand-hill cranes in year n by the first-order, deterministic recurrence equation y(n + 1) = Ry(n) where R = 1 + r = 1

### Soccer Analytics. Predicting the outcome of soccer matches. Research Paper Business Analytics. December 2012. Nivard van Wijk

Soccer Analytics Predicting the outcome of soccer matches Research Paper Business Analytics December 2012 Nivard van Wijk Supervisor: prof. dr. R.D. van der Mei VU University Amsterdam Faculty of Sciences

### The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data

The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data Carter Sevick MS, DoD Center for Deployment Health Research, San Diego, CA ABSTRACT Whether by design or by error there

### CODE ASSESSMENT METHODOLOGY PROJECT (CAMP) Comparative Evaluation:

This document contains information exempt from mandatory disclosure under the FOIA. Exemptions 2 and 4 apply. CODE ASSESSMENT METHODOLOGY PROJECT (CAMP) Comparative Evaluation: Coverity Prevent 2.4.0 Fortify

### Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

### Linear Models in STATA and ANOVA

Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

### DIRECT MAIL: MEASURING VOLATILITY WITH CRYSTAL BALL

Proceedings of the 2005 Crystal Ball User Conference DIRECT MAIL: MEASURING VOLATILITY WITH CRYSTAL BALL Sourabh Tambe Honza Vitazka Tim Sweeney Alliance Data Systems, 800 Tech Center Drive Gahanna, OH

### Alex Vidras, David Tysinger. Merkle Inc.

Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT

### Better decision making under uncertain conditions using Monte Carlo Simulation

IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

### Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

### PEARSON R CORRELATION COEFFICIENT

PEARSON R CORRELATION COEFFICIENT Introduction: Sometimes in scientific data, it appears that two variables are connected in such a way that when one variable changes, the other variable changes also.

### Mean of Ratios or Ratio of Means: statistical uncertainty applied to estimate Multiperiod Probability of Default

Mean of Ratios or Ratio of Means: statistical uncertainty applied to estimate Multiperiod Probability of Default Matteo Formenti 1 Group Risk Management UniCredit Group Università Carlo Cattaneo September

### Once saved, if the file was zipped you will need to unzip it.

1 Commands in SPSS 1.1 Dowloading data from the web The data I post on my webpage will be either in a zipped directory containing a few files or just in one file containing data. Please learn how to unzip

### Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

### WISE Power Tutorial All Exercises

ame Date Class WISE Power Tutorial All Exercises Power: The B.E.A.. Mnemonic Four interrelated features of power can be summarized using BEA B Beta Error (Power = 1 Beta Error): Beta error (or Type II

### Health Care and Life Sciences

Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS Implementations Wen Zhu 1, Nancy Zeng 2, Ning Wang 2 1 K&L consulting services, Inc, Fort Washington,

### IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

### The F distribution and the basic principle behind ANOVAs. Situating ANOVAs in the world of statistical tests

Tutorial The F distribution and the basic principle behind ANOVAs Bodo Winter 1 Updates: September 21, 2011; January 23, 2014; April 24, 2014; March 2, 2015 This tutorial focuses on understanding rather