Experimental data analysis Lecture 3: Confidence intervals. Dodo Das

Similar documents
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Bootstrapping Big Data

Simple Linear Regression Inference

SAS Certificate Applied Statistics and SAS Programming

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STA 4273H: Statistical Machine Learning

Nonlinear Regression:

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Penalized regression: Introduction

From the help desk: Bootstrapped standard errors

Multiple Linear Regression in Data Mining

Regression Modeling Strategies

Imputing Missing Data using SAS

Introduction to Logistic Regression

Nonparametric statistics and model selection

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Regression III: Advanced Methods

Nonlinear Statistical Models

lavaan: an R package for structural equation modeling

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Comparison of resampling method applied to censored data

Chapter 6: Multivariate Cointegration Analysis

Bayesian Statistics in One Hour. Patrick Lam

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Regularized Logistic Regression for Mind Reading with Parallel Validation

Corporate Defaults and Large Macroeconomic Shocks

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

How To Understand The Theory Of Probability

Elements of statistics (MATH0487-1)

Local classification and local likelihoods

Server Load Prediction

APPLIED MISSING DATA ANALYSIS

Least Squares Estimation

Machine Learning and Pattern Recognition Logistic Regression

A Comparison of Nonlinear. Regression Codes

Generalized Linear Models

Linear Threshold Units

Orthogonal Distance Regression

SAS Software to Fit the Generalized Linear Model

Parallelization Strategies for Multicore Data Analysis

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Organizing Your Approach to a Data Analysis

On Parametric Model Estimation

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

Data Mining Practical Machine Learning Tools and Techniques

Why do statisticians "hate" us?

Predictive Analytics

Model Combination. 24 Novembre 2009

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Aachen Summer Simulation Seminar 2014

Lecture 3: Linear methods for classification

Centre for Central Banking Studies

One-year reserve risk including a tail factor : closed formula and bootstrap approaches

Lecture 6. Artificial Neural Networks

PS 271B: Quantitative Methods II. Lecture Notes

A SOFTWARE CODE FOR DESIGN AND SIMULATION OF SUN DIALS

Time Series Analysis

Applying Statistics Recommended by Regulatory Documents

The Variability of P-Values. Summary

We discuss 2 resampling methods in this chapter - cross-validation - the bootstrap

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Roots of Equations (Chapters 5 and 6)

Statistical Machine Learning

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Gamma Distribution Fitting

Factors affecting online sales

Lecture 6: Logistic Regression

Appendix 1: Time series analysis of peak-rate years and synchrony testing.

Package EstCRM. July 13, 2015

College Readiness LINKING STUDY

Data Mining Lab 5: Introduction to Neural Networks

Multivariate Normal Distribution

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

How To Check For Differences In The One Way Anova

Scatter Plots with Error Bars

Logistic Regression for Spam Filtering

Fitting Models to Biological Data using Linear and Nonlinear Regression

The KaleidaGraph Guide to Curve Fitting

IBM SPSS Regression 20

PEST - Beyond Basic Model Calibration. Presented by Jon Traum

Nonlinear Regression:

Curve Fitting. Before You Begin

Azure Machine Learning, SQL Data Mining and R

2. Linear regression with multiple regressors

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

Descriptive Statistics

II. DISTRIBUTIONS distribution normal distribution. standard scores

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Transcription:

Experimental data analysis Lecture 3: Confidence intervals Dodo Das

Review of lecture 2 Nonlinear regression - Iterative likelihood maximization Levenberg-Marquardt algorithm (Hybrid of steepest descent and Gauss-Newton) Stochastic optimization - MCMC, Simulated annealing.

Demo 3: Nonlinear regression in MATLAB Objective: Using a Hill function to model dose-response data.

Demo 3: Nonlinear regression in MATLAB Need the statistics toolbox which contains the function nlinfit i) Write a MATLAB function to compute the Hill function for a given set of parameter values.

Demo 3: Nonlinear regression in MATLAB ii) Choose some model parameters to simulate data [or collect data from experiments]

Demo 3: Nonlinear regression in MATLAB iii) Pick an initial guess and call nlinfit

Demo 3: Nonlinear regression in MATLAB

Demo 3: Nonlinear regression in MATLAB

Demo 3: Nonlinear regression in MATLAB Convergence can be sensitive to the initial guess.

Demo 3: Nonlinear regression in MATLAB Convergence can be sensitive to the initial guess.

Demo 3: Nonlinear regression in MATLAB Compute and plot residuals

Parameter confidence intervals Question: What does a 95% confidence interval mean? eg: Say, a best fit parameter estimate is â = 1, and we have estimated the 95% CI to be [0.5, 1.5]. How can we interpret this result?

Parameter confidence intervals Question: What does a 95% confidence interval mean? eg: Say, a best fit parameter estimate is â = 1, and we have estimated the 95% CI to be [0.5, 1.5]. How can we interpret this result? If we repeat our experiment and the fitting procedure many times, 95% of the times the true (but unknown) parameter value will lie within this CI.

Computing parameter confidence intervals Two approaches: 1. Asymptotic confidence intervals: Based on an analytical approximation. 2. Bootstrap confidence intervals: Computational technique based on resampling the errors.

Asymptotic confidence intervals Use a local approximation of the SSR landscape to estimate the curvature (covariance matrix). Assuming that the errors are normally distributed, this covariance matrix can be used to compute a confidence interval around each parameter.

Demo 4: Computing asymptotic CIs in MATLAB Use the nlparci command in the Statistics toolbox

Demo 4: Computing asymptotic CIs in MATLAB Use the nlparci command in the Statistics toolbox

Bootstrapping: The principle A computational approach that addresses the following question: Given a limited number of observations, how can we estimate some quantity, eg: mean, median etc. for the population from which the observations are drawn? If we resample from the observations, we can, in some sense, simulate the population distribution.

Bootstrapping in practice for nonlinear regression. 1. Use the nonlinear least squares regression to determine the best-fit estimates of the model parameters, and the predicted model response (y i ) predicted at each value of the independent variable x i. 2. Calculate the residuals ɛ i =(y i ) observed (y i ) predicted at each of the N data points. 3. Resample the residuals with replacement to generate a new set of residuals {ɛ i }.Whatthismeansisthatwe generate a new set of N residuals where each of N values is one of the original residuals chosen with equal probability. Typically, some of the original residuals will be chosen more than once, while some will not be chosen at all. For example, say we have three data points and we calculatetheresidualstobe0.1,-0.2and 0.3. Then some possible sets of resampled residuals are: {-0.2, 0.1, -0.2}, {0.1, 0.3, -0.2}, {0.3, -0.2, -0.2}, and so on. 4. Add the resampled residuals to the predicted response to generate a bootstrap data set, {x i,y i } = {x i, (y i ) predicted + ɛ i } 5. Treat the bootstrap dataset as an independent replicate experiment, and fit it to the model to calculate new estimates of model parameters. 6. Repeat steps 3-5 many times - typically 500 to 1000 times - each time generating a new bootstrap data set, and fitting it to the model. Store the resulting best-fit parameter estimates.theseindependentestimatesconstitute asamplefromthebootstrapdistributionofthemodelparameters. 7. For each parameter, calculate the standard deviation of the bootstrap sample. This standard deviation is the estimated bootstrap standard error for that parameter. 8. To calculate the 95% bootstrap CIs, compute the 97.5 th and the 2.5 th percentile values of each parameter from the bootstrap distributions. (There are other prescriptions for calculating bootstrap CIs, but this one is the simplest.)

Demo 5: Estimating bootstrap CIs in MATLAB

Demo 5: Estimating bootstrap CIs in MATLAB

Demo 5: Estimating bootstrap CIs in MATLAB

Demo 5: Estimating bootstrap CIs in MATLAB

Comparing fits from two different experiments A Which paramters are different between the two datasets?

Bootstrap-based hypothesis testing Emax is not (the 95% CI of the difference distribution crosses 0), but EC50 and n are different.

Friday How to pick the best model from a set of proposed models? Bias-variance tradeoff F-test Akaike s information criterion (AIC)