Experimental design and analysis, Jesús López Fidalgo



Similar documents
Statistics in Medicine Research Lecture Series CSMC Fall 2014

Correlational Research

Analysis and Interpretation of Clinical Trials. How to conclude?

Biostatistics: Types of Data Analysis

MTH 140 Statistics Videos

STATISTICS APPLIED TO BUSINESS ADMINISTRATION

Introduction to Regression and Data Analysis

3. Data Analysis, Statistics, and Probability

Teaching guide ECONOMETRICS

2013 MBA Jump Start Program. Statistics Module Part 3

Research Methods & Experimental Design


Simple Linear Regression Inference

COMMON CORE STATE STANDARDS FOR

Statistics Graduate Courses

Statistical tests for SPSS

Analysis of Variance. MINITAB User s Guide 2 3-1

Introduction to Statistics and Quantitative Research Methods

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Statistics Review PSY379

Teaching guide for the course: BUSINESS STATISTICS II

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

The impact of Discussion Classes on ODL Learners in Basic Statistics Ms S Muchengetwa and Mr R Ssekuma muches@unisa.ac.za, ssekur@unisa.ac.

What is the purpose of this document? What is in the document? How do I send Feedback?

SAS Software to Fit the Generalized Linear Model

Online 12 - Sections 9.1 and 9.2-Doug Ensley

Data, Measurements, Features

Study Design and Statistical Analysis

Organizing Your Approach to a Data Analysis

Lean Six Sigma Black Belt Body of Knowledge

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

Part 2: Analysis of Relationship Between Two Variables

STAT 350 Practice Final Exam Solution (Spring 2015)

Name: Date: Use the following to answer questions 3-4:

Fairfield Public Schools

Master of Science. Public Health Nutrition

Qualitative vs Quantitative research & Multilevel methods

Designer: Nathan Kimball. Stage 1 Desired Results

Results from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics

Comparison of EngineRoom (6.0) with Minitab (16) and Quality Companion (3)

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Chapter Eight: Quantitative Methods

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, Abstract. Review session.

Introduction to Quantitative Methods

NEW YORK CITY COLLEGE OF TECHNOLOGY The City University of New York

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

AIE: 85-86, 193, , 294, , , 412, , , 682, SE: : 339, 434, , , , 680, 686

Two-sample hypothesis testing, II /16/2004

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Experimental methods. Elisabeth Ahlsén Linguistic Methods Course

Chapter 7: One-Sample Inference

Elements of statistics (MATH0487-1)

Module 5: Statistical Analysis

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Introduction to Hypothesis Testing

II. DISTRIBUTIONS distribution normal distribution. standard scores

Teaching guide for the course: BUSINESS STATISTICS I

Mathematics within the Psychology Curriculum

Statistical Models in R

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS INTRODUCTION TO STATISTICS MATH 2050

Sales Management Main Features

Pearson s Correlation

Chapter 7: Simple linear regression Learning Objectives

Data Quality Assessment: A Reviewer s Guide EPA QA/G-9R

Chapter 4 and 5 solutions

Statistics for BIG data

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

SPSS Explore procedure

STAT 360 Probability and Statistics. Fall 2012

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

DEVELOPING HYPOTHESIS AND

Factors affecting online sales

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Testing Hypotheses About Proportions

MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS

Graduate Course Offerings in Transportation Engineering at Villanova University

Principles of Hypothesis Testing for Public Health

Algebra 1 Course Information

Sample Size Planning, Calculation, and Justification

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

The Friedman Test with MS Excel. In 3 Simple Steps. Kilem L. Gwet, Ph.D.

Scrivere un articolo Statistica. Valter Torri Dip. Oncologia

MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics

Non-Inferiority Tests for One Mean

Parametric and non-parametric statistical methods for the life sciences - Session I

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Regression Analysis: A Complete Example

Planning sample size for randomized evaluations

Transcription:

Experimental design and analysis Jesus.LopezFidalgo@uclm.es University of Castilla-La Mancha Department of Mathematics Institute of Applied Mathematics to Science and Engineering

OUTLINE THIS COURSE. 1. MOTIVATING INTRODUCTION TO STATISTICS. 2. IMPORTANCE OF DESIGNING AN EXPERIMENT. 3. ANOVA. 4. REGRESSION AND CORRELATION. 5. EXPERIMENTAL DESIGN: MOTIVATION AND CRITICISMS. 6. OPTIMAL DESIGN THEORY (LINEAR MODELS). 7. OPTIMAL DESIGNS FOR NONLINEAR MODELS. 8. REAL APPLICATIONS.

THIS COURSE ASIGNATURA: Modelización y análisis estadístico de procesos estocásticos (Diseño y análisis de experimentos). PROFESOR: Jesus.LopezFidalgo@uclm.es http://www.uclm.es/profesorado/jesuslopezfidalgo/lect.html

Libros de texto recomendados Atkinson A.C. and Donev A.N. (1992). Optimum experimental design. Oxford science publications. Oxford. Fedorov V.V. (1972). Theory of optimal experiments. Academic press. New York. Fedorov V.V. and Hackl P. (1997). Model-oriented design of experiments. Springer. New York. Montgomery D. C. (1991). Diseño y Análisis de Experimentos. Grupo Editorial Iberoamericano. México. Peña Sánchez de Rivera, D. (2002). Regresión y Diseño de Experimentos. Alianza Editorial. Madrid.

Apuntes y vídeos de la asignatura (web) Apuntes: Diseño óptimo. Vídeos: Fundamentos sobre modelización estadística: Probabilidad (error TCL). Descriptiva. Introducción a los contrastes de hipótesis. Estimación y contrastes: Estimación y contrastes típicos. Introducción modelos lineales: mínimos cuadrados, máxima verosimilitud... Introducción ANOVA para un factor. Introducción al diseño de experimentos: ANOVA (un factor). Análisis de la varianza: ANOVA para un factor (análisis de los residuos y ejemplo). Más de un factor e interacciones (ejemplo).

Evaluación e información Evaluación teórica (asistencia, intervenciones): 20% Trabajos cortos: 40% Trabajo final: 40% Se aconseja revisar la página web http://www.uclm.es/profesorado/jesuslopezfidalgo/lect.html y moodle con periodicidad para ver avisos o trabajos recomendados.

1. MOTIVATING INTRODUCTION TO STATISTICS

Misconceptions of Statistics Bernard Shaw: If a man has his head in an oven and his feet in a freezer, then his body is in the ideal temperature average. The probability of a car accident increases with time of driving, thus this probability will drop increasing the speed. 33% of the mortal accidents involve a drunk driver 67% involve someone who has not drunk much drive drunk. The Vatican has two Popes per Km 2. A sample tortured enough confess what you wants. Manipulating: Modifying the data. Bad sampling planning or design. Wrong model or analysis (e.g. treatment of non response). Inadequate interpretation.

What does Statistics do? Infer conclusions from experimental data. Discover relationships: Genes related to a desease. Influence of a diet in preventing a type of cancer. Measures the goodness of fitting a model to the reality. Support and reference tool. Scientific method: Deduction and induction (irregular die). Proof: Fast and efficient. Non exact, but rigorous and scientific.

Healthy critical spirit with mass media 67% of the young people drink alcohol during the weekends What is a young person? What is the meaning of drinking alcohol? What is a weekend? Who did conduct/write it?

Some things to take into account (for instance) How was the sample been taken? Covariation does not mean couase/effect relationship (e.g police/delincuents or storks/births). Graphics scale. Dealing with non response.

Modern Statistics Union of two disciplines which were developed independently: Probability. Descriptive Statistics. Result: inference, decision making.

Statistical procedure Choose the model. Experimental design / sampling. Preparing the data (e.g. transformations). Analysis. Interpretation and decision making.

Hypotheses testing Court trial: Guilty vs. Innocent (treatment vs. traditional) The system assumes innocency while the guilt is not clear: reject the null hypothesis (significant) Sentence Truth H 0 H 1 Innocent Guilty H 0 Innocent Guilty Free free free ERROR II H 1 Innocent Guilty Convict convict convict ERROR I

Conditional probability (extra/prior information) P(B) = P(B) P(E)

Conditional probability (extra/prior information) P(B) = P(B) P(A B) P(E), P(B A) = P(A E) = P(A B) P(A)

Conditional probability (extra/prior information) P(B) = P(B) P(A B) P(E), P(B A) = P(A E) = P(A B) P(A)

p-value and test power Risk α = P(reject H 0 H 0 ) = P(Type I Error). Risk β = P(accept H 0 H 1 ) = P(Type II Error). Test power 1 β (depends on each value of H 1 and α). From the sample, p-value: p = P(Obtaining either these observations or any other farther from H 0 H 0 true).

Remarks p does not measure the magnitude of the association between two variables: E.g. Pisa report. It is not the probability of H 0. No rejecting H 0 does not mean accepting H 0 (test power). Importance of the design and the sample size to succeed in rejecting H 0 when it is false.

Hypotheses test { Sample from N (µ, σ 2 = 3 2 H0 : µ = 0 ), n = 10, H 1 : µ = 2

Central limit theorem (the magic) What if the sample distribution is unknown? X = N (µ, σ 2 /n). For n 30 the approximation usually works well.

Sampling How many observations? { α = 0.05, σ 2 = 3 2 H0 : µ = 0, H 1 : µ = 2 1 b 1.0 0.8 0.6 0.4 0.2 10 20 30 40 50 n

Example: Atypical cases of leukemia in a school National proportion: 0.0001 (1 in 10000). Proportion of 0.0017 in a particular school (17 times more than the national reference) School A: 3000 students and 5 cases (p = 0.035). School B: 1200 students and 2 cases (p = 0.184).

Frequent statistical analysis X (Explanatory variables) Quantitative Qualitative Regression t-test / ANOVA Y Quant. Correlation Mann Withney / Kruskal Wallis (Res- Wilcoxon / Friedman pon-) Discriminant A. Fisher exact test se) Qual. Logit, Probit... chi-squared / log-linear neuronal networks

Interpretation 90% of lung cancer patients have been smokers is not the same as 90% of the smokers die of lung cancer :

Reliability of a particular cancer test 90% reliable. Your test gives positive!, but... In what sense is 90% reliable? If you really have cancer the test gives positive with 90% probability (sensitivity). If you do not have this cancer the test gives negative in 90% of the cases (specificity). How many people currently have this particular cancer? 1 in 10.000 (prevalence). Actual probability that you really have this cancer: 1 in 1000.

interpretation and use of graphics

The same, but well done

Rigorous proportion

Histograms

2. IMPORTANCE OF DESIGNING AN EXPERIMENT

Why? Think before acting (especially in the middle of a crisis). Saving time, money and risk. Correct analysis.

Basic principles Randomization. Replication ( repeated measurements, helicopter example). Blocking (e.g. to eliminate nuisance factors variability).

Guidelines for Designing an Experiment (Montgomery) Pre-experimental planning: Recognition and statement of the problem. Model: Choice of factors (Controllable, uncontrollable and noise), levels, and ranges. Selection of the response variable. Choice of experimental design. Performing the experiment (monitor the process, wine...). Statistical analysis of the data. Conclusions and recommendations.

Examples Factorial (fractional). Screening: Select important factors from a big quantity. Nested or hierarchical: Split-plot designs: Whole plot (main treatments): Temperatures and times. Split-plot: Remaining variables. Add as a block. Sequential and adaptive designs. Mixture Experiments. Proper name designs. Response surface.

Continued