Experimental design and analysis Jesus.LopezFidalgo@uclm.es University of Castilla-La Mancha Department of Mathematics Institute of Applied Mathematics to Science and Engineering
OUTLINE THIS COURSE. 1. MOTIVATING INTRODUCTION TO STATISTICS. 2. IMPORTANCE OF DESIGNING AN EXPERIMENT. 3. ANOVA. 4. REGRESSION AND CORRELATION. 5. EXPERIMENTAL DESIGN: MOTIVATION AND CRITICISMS. 6. OPTIMAL DESIGN THEORY (LINEAR MODELS). 7. OPTIMAL DESIGNS FOR NONLINEAR MODELS. 8. REAL APPLICATIONS.
THIS COURSE ASIGNATURA: Modelización y análisis estadístico de procesos estocásticos (Diseño y análisis de experimentos). PROFESOR: Jesus.LopezFidalgo@uclm.es http://www.uclm.es/profesorado/jesuslopezfidalgo/lect.html
Libros de texto recomendados Atkinson A.C. and Donev A.N. (1992). Optimum experimental design. Oxford science publications. Oxford. Fedorov V.V. (1972). Theory of optimal experiments. Academic press. New York. Fedorov V.V. and Hackl P. (1997). Model-oriented design of experiments. Springer. New York. Montgomery D. C. (1991). Diseño y Análisis de Experimentos. Grupo Editorial Iberoamericano. México. Peña Sánchez de Rivera, D. (2002). Regresión y Diseño de Experimentos. Alianza Editorial. Madrid.
Apuntes y vídeos de la asignatura (web) Apuntes: Diseño óptimo. Vídeos: Fundamentos sobre modelización estadística: Probabilidad (error TCL). Descriptiva. Introducción a los contrastes de hipótesis. Estimación y contrastes: Estimación y contrastes típicos. Introducción modelos lineales: mínimos cuadrados, máxima verosimilitud... Introducción ANOVA para un factor. Introducción al diseño de experimentos: ANOVA (un factor). Análisis de la varianza: ANOVA para un factor (análisis de los residuos y ejemplo). Más de un factor e interacciones (ejemplo).
Evaluación e información Evaluación teórica (asistencia, intervenciones): 20% Trabajos cortos: 40% Trabajo final: 40% Se aconseja revisar la página web http://www.uclm.es/profesorado/jesuslopezfidalgo/lect.html y moodle con periodicidad para ver avisos o trabajos recomendados.
1. MOTIVATING INTRODUCTION TO STATISTICS
Misconceptions of Statistics Bernard Shaw: If a man has his head in an oven and his feet in a freezer, then his body is in the ideal temperature average. The probability of a car accident increases with time of driving, thus this probability will drop increasing the speed. 33% of the mortal accidents involve a drunk driver 67% involve someone who has not drunk much drive drunk. The Vatican has two Popes per Km 2. A sample tortured enough confess what you wants. Manipulating: Modifying the data. Bad sampling planning or design. Wrong model or analysis (e.g. treatment of non response). Inadequate interpretation.
What does Statistics do? Infer conclusions from experimental data. Discover relationships: Genes related to a desease. Influence of a diet in preventing a type of cancer. Measures the goodness of fitting a model to the reality. Support and reference tool. Scientific method: Deduction and induction (irregular die). Proof: Fast and efficient. Non exact, but rigorous and scientific.
Healthy critical spirit with mass media 67% of the young people drink alcohol during the weekends What is a young person? What is the meaning of drinking alcohol? What is a weekend? Who did conduct/write it?
Some things to take into account (for instance) How was the sample been taken? Covariation does not mean couase/effect relationship (e.g police/delincuents or storks/births). Graphics scale. Dealing with non response.
Modern Statistics Union of two disciplines which were developed independently: Probability. Descriptive Statistics. Result: inference, decision making.
Statistical procedure Choose the model. Experimental design / sampling. Preparing the data (e.g. transformations). Analysis. Interpretation and decision making.
Hypotheses testing Court trial: Guilty vs. Innocent (treatment vs. traditional) The system assumes innocency while the guilt is not clear: reject the null hypothesis (significant) Sentence Truth H 0 H 1 Innocent Guilty H 0 Innocent Guilty Free free free ERROR II H 1 Innocent Guilty Convict convict convict ERROR I
Conditional probability (extra/prior information) P(B) = P(B) P(E)
Conditional probability (extra/prior information) P(B) = P(B) P(A B) P(E), P(B A) = P(A E) = P(A B) P(A)
Conditional probability (extra/prior information) P(B) = P(B) P(A B) P(E), P(B A) = P(A E) = P(A B) P(A)
p-value and test power Risk α = P(reject H 0 H 0 ) = P(Type I Error). Risk β = P(accept H 0 H 1 ) = P(Type II Error). Test power 1 β (depends on each value of H 1 and α). From the sample, p-value: p = P(Obtaining either these observations or any other farther from H 0 H 0 true).
Remarks p does not measure the magnitude of the association between two variables: E.g. Pisa report. It is not the probability of H 0. No rejecting H 0 does not mean accepting H 0 (test power). Importance of the design and the sample size to succeed in rejecting H 0 when it is false.
Hypotheses test { Sample from N (µ, σ 2 = 3 2 H0 : µ = 0 ), n = 10, H 1 : µ = 2
Central limit theorem (the magic) What if the sample distribution is unknown? X = N (µ, σ 2 /n). For n 30 the approximation usually works well.
Sampling How many observations? { α = 0.05, σ 2 = 3 2 H0 : µ = 0, H 1 : µ = 2 1 b 1.0 0.8 0.6 0.4 0.2 10 20 30 40 50 n
Example: Atypical cases of leukemia in a school National proportion: 0.0001 (1 in 10000). Proportion of 0.0017 in a particular school (17 times more than the national reference) School A: 3000 students and 5 cases (p = 0.035). School B: 1200 students and 2 cases (p = 0.184).
Frequent statistical analysis X (Explanatory variables) Quantitative Qualitative Regression t-test / ANOVA Y Quant. Correlation Mann Withney / Kruskal Wallis (Res- Wilcoxon / Friedman pon-) Discriminant A. Fisher exact test se) Qual. Logit, Probit... chi-squared / log-linear neuronal networks
Interpretation 90% of lung cancer patients have been smokers is not the same as 90% of the smokers die of lung cancer :
Reliability of a particular cancer test 90% reliable. Your test gives positive!, but... In what sense is 90% reliable? If you really have cancer the test gives positive with 90% probability (sensitivity). If you do not have this cancer the test gives negative in 90% of the cases (specificity). How many people currently have this particular cancer? 1 in 10.000 (prevalence). Actual probability that you really have this cancer: 1 in 1000.
interpretation and use of graphics
The same, but well done
Rigorous proportion
Histograms
2. IMPORTANCE OF DESIGNING AN EXPERIMENT
Why? Think before acting (especially in the middle of a crisis). Saving time, money and risk. Correct analysis.
Basic principles Randomization. Replication ( repeated measurements, helicopter example). Blocking (e.g. to eliminate nuisance factors variability).
Guidelines for Designing an Experiment (Montgomery) Pre-experimental planning: Recognition and statement of the problem. Model: Choice of factors (Controllable, uncontrollable and noise), levels, and ranges. Selection of the response variable. Choice of experimental design. Performing the experiment (monitor the process, wine...). Statistical analysis of the data. Conclusions and recommendations.
Examples Factorial (fractional). Screening: Select important factors from a big quantity. Nested or hierarchical: Split-plot designs: Whole plot (main treatments): Temperatures and times. Split-plot: Remaining variables. Add as a block. Sequential and adaptive designs. Mixture Experiments. Proper name designs. Response surface.
Continued