The Role of Statistical Principles in Quantitative Biomedical Modeling

Similar documents
Mathematical-Statistical Modeling to Inform the Design of HIV Treatment Strategies and Clinical Trials

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Basics of Statistical Machine Learning

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

1 Prior Probability and Posterior Probability

Markov Chain Monte Carlo Simulation Made Simple

Least Squares Estimation

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

1.5 Oneway Analysis of Variance

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Introduction to mixed model and missing data issues in longitudinal studies

Deterministic and Stochastic Modeling of Insulin Sensitivity

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Fitting Subject-specific Curves to Grouped Longitudinal Data

Maximum Likelihood Estimation

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Credit Risk Models: An Overview

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Introduction to Fixed Effects Methods

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

Multivariate Normal Distribution

CHAPTER 2 Estimating Probabilities

Lecture 3: Linear methods for classification

Fairfield Public Schools

3. Regression & Exponential Smoothing

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Likelihood Approaches for Trial Designs in Early Phase Oncology

Bayesian Statistics in One Hour. Patrick Lam

Switch to Dolutegravir plus Rilpivirine dual therapy in cart-experienced Subjects: an Italian cohort

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Least-Squares Intersection of Lines

Financial Mathematics and Simulation MATH Spring 2011 Homework 2

Module 3: Correlation and Covariance

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Statistical Machine Learning

Valuation of the Surrender Option Embedded in Equity-Linked Life Insurance. Brennan Schwartz (1976,1979) Brennan Schwartz

A Basic Introduction to Missing Data

II. DISTRIBUTIONS distribution normal distribution. standard scores

Principle of Data Reduction

Statistical Models in R

Imputation of missing data under missing not at random assumption & sensitivity analysis

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Tutorial on Markov Chain Monte Carlo

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

Identification of Demand through Statistical Distribution Modeling for Improved Demand Forecasting

Time Series Analysis

STRATEGIC CAPACITY PLANNING USING STOCK CONTROL MODEL

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Understanding and Applying Kalman Filtering

FRC Risk Reporting Requirements Working Party Case Study (Pharmaceutical Industry)

Models for Longitudinal and Clustered Data

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

Linear Classification. Volker Tresp Summer 2015

WHERE DOES THE 10% CONDITION COME FROM?

Sample Size Calculation for Longitudinal Studies

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Monte Carlo-based statistical methods (MASM11/FMS091)

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Linear Threshold Units

A Programme Implementation of Several Inventory Control Algorithms

UNDERSTANDING THE TWO-WAY ANOVA

Gaussian Conjugate Prior Cheat Sheet

Evaluating the Lead Time Demand Distribution for (r, Q) Policies Under Intermittent Demand

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

Sample Size and Power in Clinical Trials

Numerical Methods for Option Pricing

Introduction to General and Generalized Linear Models

Forecasting in supply chains

Imputing Missing Data using SAS

A Log-Robust Optimization Approach to Portfolio Management

Exploratory Factor Analysis of Demographic Characteristics of Antenatal Clinic Attendees and their Association with HIV Risk

R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

FACTOR ANALYSIS NASC

EE 570: Location and Navigation

Probability and Random Variables. Generation of random variables (r.v.)

The Probit Link Function in Generalized Linear Models for Data Mining Applications

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

PS 271B: Quantitative Methods II. Lecture Notes

Nonparametric adaptive age replacement with a one-cycle criterion

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Department of Economics

Chapter 1. Longitudinal Data Analysis. 1.1 Introduction

Bayesian Statistical Analysis in Medical Research

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Multivariate Analysis (Slides 13)

Non-Stationary Time Series andunitroottests

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005

Binomial lattice model for stock prices

CALCULATIONS & STATISTICS

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Understanding the Impact of Weights Constraints in Portfolio Theory

Section 5. Stan for Big Data. Bob Carpenter. Columbia University

Transcription:

The Role of Statistical Principles in Quantitative Biomedical Modeling Atlantic Coast Conference on Mathematics in the Life and Biological Sciences Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/ davidian Statistical Principles in Biomedical Modeling 1

Outline 1. Modeling in biomedical research 2. Sources of variation in data 3. Statistical models 4. Hierarchical statistical models for longitudinal biomedical data 5. Fitting statistical models and statistical inference 6. Using models in biomedical research 7. Closing remarks Statistical Principles in Biomedical Modeling 2

1. Modeling in biomedical research Increasingly: Recognition of the role of mathematical models of biological systems in biomedical research Mechanisms underlying disease and its progression Processes involved in the disposition of drugs Processes governing the action of treatments In general, physiological mechanisms taking place within a subject over time Usefulness: Evaluation of effects of new treatments Identification of appropriate doses of drugs for further study Development of strategies for administration of treatments over time Design of clinical studies Statistical Principles in Biomedical Modeling 3

1. Modeling in biomedical research For example: http://www.fda.gov/oc/initiatives/criticalpath/ Statistical Principles in Biomedical Modeling 4

1. Modeling in biomedical research Most popular models: (Deterministic) systems of ODEs with s states ẋ(t) = g{t, x(t), θ}, initial conditions x(0) = x 0, solution x(t, θ) (s 1) Forward simulation of x(t) given θ, x 0 Inverse problem to determine θ given x(t) Information available to facilitate modeling: Established knowledge and hypotheses about physiology at various levels (e.g., molecular, cellular, organs/tissues, system,... ) Longitudinal data on (parts of ) x(t, θ) arising from human subjects observed for each subject y 1,..., y n at times 0 t 1 < < t n Statistical Principles in Biomedical Modeling 5

1. Modeling in biomedical research Usual features of data: y 1,..., y n for any subject do not track exactly along x(t, θ) Different observed trajectories for different subjects Variation Statistics: The study of variation Not a branch of mathematics; more a philosophy for characterizing how data arise...... and for drawing inferences from data in the face of variation This talk: Describe how statistical principles must form the basis for Application of such models to data The broader goals on Slide 3 Statistical Principles in Biomedical Modeling 6

2. Sources of variation in data Simple example: Pharmacokinetics (PK ) of theophylline Understanding of PK (absorption, distribution, elimination) needed for dose recommendations One compartment model (s = 2) oral dose D A(t) k a k e Assume constant relationship between concentration C(t) and amount A(t) C(t) = A(t)/V C(t) = k af D V (k a k e ) {exp( k et) exp( k a t)} Statistical Principles in Biomedical Modeling 7

2. Sources of variation in data Thus: Closed form solution for observable state x 1 (t, θ) = Typical intensive PK study: k af D V (k a k e ) {exp( k et) exp( k a t)} θ = (k a, V, k e ) T N subjects given same oral dose D (mg/kg) at t = 0 Blood samples at subsequent times assayed for concentration y 1,..., y n at t 1,..., t n for each subject Main objective : Learn extent to which absorption, distribution, elimination, i.e., θ = (k a, V, k a ) T vary from subject to subject...... and use this knowledge to develop dosing recommendations for the population Statistical Principles in Biomedical Modeling 8

2. Sources of variation in data All subjects Theophylline Conc. (mg/l) 0 2 4 6 8 10 12 0 5 10 15 20 25 Time (hr) Statistical Principles in Biomedical Modeling 9

2. Sources of variation in data Single subject: With fitted model Subject 12 Theophylline Conc. (mg/l) 0 2 4 6 8 10 12 0 5 10 15 20 25 Time (hr) Statistical Principles in Biomedical Modeling 10

2. Sources of variation in data Why don t the observed concentrations like exactly on the fitted trajectory? Observation error One obvious reason Assay (measurement ) error Model misspecification true PK more complex Times/dose recorded incorrectly Etc... Statistical Principles in Biomedical Modeling 11

2. Sources of variation in data Hypothetically: PSfrag replacements C(t) 0 2 4 6 8 10 12 Statistical Principles in Biomedical Modeling 12 0 5 10 15 20 t PSfrag replacements C(t) t

2. Sources of variation in data Sources of variation: The (deterministic ) model is a good representation of the long-term trajectory, but observed concentrations are subject to Intra-subject fluctuations about it Measurement error E.g., measurement error: A particular sample has a true concentration Ideally all determinations of this true concentration should be the same, but measuring devices commit errors Measure over and over a different error committed each time All possible concentrations we might observe vary due the effect of measurement error Statistical Principles in Biomedical Modeling 13

2. Sources of variation in data Result: If we wish to learn about θ for a particular subject based on data, we must take appropriate account of these sources of within-subject variation Variation begets: Uncertainty in what we observe Because observations could have turned out differently due to variation in their possible values...... Any determination of θ from data is subject to uncertainty Uncertainty must be characterized and quantified to gauge how much faith to place in results More variation: Variation due to within-subject phenomena is not the only issue... Statistical Principles in Biomedical Modeling 14

2. Sources of variation in data All subjects Theophylline Conc. (mg/l) 0 2 4 6 8 10 12 0 5 10 15 20 25 Time (hr) Statistical Principles in Biomedical Modeling 15

2. Sources of variation in data Interpretation: Although the observed pattern is consistent with the model for all subjects...... there is among-subject variation Each subject has his/her own θ dictating his/her PK Result: Objective restated characterize how θ values are distributed (and hence vary ) in the population of subjects like these And use this characterization to develop dosing strategies In doing this based on data, we must take appropriate account of both within- and among-subject variation Have seen only a sample from this population any attempt to do this is subject to uncertainty Statistical Principles in Biomedical Modeling 16

2. Sources of variation in data To address the objectives: Need a framework in which to State and address the objectives in a formal way This will require characterizing variation (within and among subjects)...... and will provide a basis for assessing the uncertainty involved in using the data Statistical Principles in Biomedical Modeling 17

2. Sources of variation in data A fancier example: HIV dynamics under antiretroviral therapy CD4 + T cells / ul 1500 1000 500 Patient #14 data fit w/half fit w/all 0 200 400 600 800 1000 1200 1400 1600 virus copies/ml 10 5 10 0 0 200 400 600 800 1000 1200 1400 1600 time (days) CD4+ T-cell count immunologic status Viral load virologic status Statistical Principles in Biomedical Modeling 18

2. Sources of variation in data Model for within-subject dynamics: Statistical Principles in Biomedical Modeling 19

2. Sources of variation in data Model for within-subject dynamics: s = 7 T 1 = λ 1 d 1 T 1 {1 ɛ 1 u(t)}k 1 V I T 1 T 2 = λ 2 d 2 T 2 {1 fɛ 1 u(t)}k 2 V I T 2 T 1 = {1 ɛ 1 u(t)}k 1 V I T 1 δt1 m 2 ET1 T 2 = {1 fɛ 1 u(t)}k 2 V I T 2 δt2 m 2 ET2 V I = {1 ɛ 2 u(t)}10 3 N T δ(t1 + T2 ) cv I {1 ɛ 1 u(t)}ρ 1 10 3 k 1 T 1 V I {1 fɛ 1 u(t)}ρ 2 10 3 k 2 T 2 V I V NI = ɛ 2 u(t)10 3 N T δ(t1 + T2 ) cv NI Ė = λ E + b E(T1 + T2 ) (T1 + T 2 ) + K E d E(T1 + T2 ) b (T1 + T 2 ) + K E δ E E d Plus initial conditions, θ = (λ 1, d 1, ɛ 1, k 1,...) T Observable: CD4 count = T 1 + T 1, viral load = V I + V NI u(t) = treatment input at t Statistical Principles in Biomedical Modeling 20

2. Sources of variation in data Objectives: What would CD4 and viral load progression look like in the population of HIV-infected subjects if all subjects followed a particular treatment pattern u(t)? Can we design good, realistic u(t) that work well for the population (at least on average )? Observations: y 1,..., y n at t 1,..., t n for each subject, y j is (2 1) vector of observed CD4 and viral load Within-subject variation : Fluctuations from smooth trajectories for T 1 (t) + T 1 (t) and V I (t) + V NI (t) and assay error Among-subject variation : Each subject has his/her own subject-specific dynamic parameters θ Distribution of θ in population dictates progression in the population Statistical Principles in Biomedical Modeling 21

2. Sources of variation in data Objectives: Learn about distribution of of θ in population Dictates distribution of CD4 and viral load progression in the population under different u(t) Further complication: The viral load assay has a lower limit of quantification L Viral load measurements known only to lie below L censored data Disregarding censored data or setting equal to L, L/2, etc, compromises determination (estimation ) of θ Statistical Principles in Biomedical Modeling 22

2. Sources of variation in data Again: Need a formal framework to characterize the sources of variation and the uncertainty involved Need to embed the mathematical model in a statistical model Formal statement of objectives Will dictate how to fit the mathematical model to the data (and how to assess uncertainty in doing this) in pursuit of the objectives Also provides a basis for correct handling of censored observations Statistical Principles in Biomedical Modeling 23

3. Statistical models Perspective: Data that are observed are envisioned as realizations of random variables (vectors ), e.g., PK study : For subject i, i = 1,..., N, observed at n i times t i1,..., t ini, Y ij = drug concentration at time t ij Y i = (Y i1,..., Y ini ) T HIV study : For subject i, i = 1,..., N, observed at n i times t i1,..., t ini Y ij = (Yij CD4, Yij V L ) T at time t ij Y i = (Y i1,..., Y ini ) T Actual data on i are realizations y i1,..., y ini summarized by y i For all subjects i = 1,..., N, we have Y 1,..., Y N Statistical Principles in Biomedical Modeling 24

3. Statistical models Additional data collected: Conditions under which i is observed, e.g.., single dose, on/off treatment pattern over time U i Subject characteristics, e.g., demographic, physiologic information (age, weight, renal function, prior treatment history, IV drug use, etc) A i Full set of observed data: Realizations of random vector triplets Z i = (Y i, U i, A i ), i = 1,..., N Z 1,..., Z N, ordinarily assumed to be independent random vectors (from unrelated subjects) Statistical Principles in Biomedical Modeling 25

3. Statistical models Statistical model: Aka probability model The class of probability distributions for the Z i, i = 1,..., N, that we believe could have generated the data (i.e., realizations of Z 1,..., Z N ) we saw I.e., such a probability distribution describes the way in which Z 1,..., Z N corresponding to N subjects drawn from a population of interest would take on their values Should embody realistic assumptions about the sources of variation For simplicity: Assume for now no A i collected and U i are fixed and focus on a probability model for the Y i Statistical Principles in Biomedical Modeling 26

3. Statistical models For longitudinal data: It is standard to build up a statistical model in a hierarchy of stages 1. The probability mechanism by which Y i would take on its values for a single subject i involves within-subject variation 2. The probability mechanism that describes variation among subjects in how Y 1,..., Y N take on their values Mathematical model: s states ẋ(t) = g{t, x(t), θ}, solution x(t, θ) (s 1) Observations not available on all s states x = Ox for observation operator O Statistical Principles in Biomedical Modeling 27

4. Hierarchical statistical models First stage: Consider a single subject i θ i denotes the parameters dictating i s PK/HIV dynamics/etc Approach: Embed x in a subject-specific stochastic process Y i (t, U i ) = x(t, U i, θ i ) + e i (t, U i ) e i (t, U i ) is the deviation process describing how realizations of Y i (t, U i ) would deviate from the deterministic trajectory x(t, U i, θ i ) due to fluctuations and measurement error E{e i (t, U i ) U i, θ i } = 0 over all possible realizations, the deviations average out to zero E{Y i (t, U i ) U i, θ i } = x(t, U i, θ i ), i.e., interpret x(t, U i, θ i ) as the average trajectory over all possible realizations we could see on subject i under conditions U i Statistical Principles in Biomedical Modeling 28

4. Hierarchical statistical models Conceptually: PSfrag replacements 0 2 4 6 8 10 12 0 5 10 15 20 t x(t, PSfrag U i ) replacements Statistical Principles in Biomedical Modeling 29 t

4. Hierarchical statistical models Y i (t, U i ) = x(t, U i, θ i ) + e i (t, U i ) x(t, U i, θ i ) is inherent tendency for i s system to evolve over time Depends on θ i, viewed as an inherent characteristic of i dictating this tendency Can think of e i (t, U i ) = e i,f (t, U i ) + e i,m (t, U i ) Fluctuations and measurement error Intermittent observations on Y i (t, U i ): At times t i1,..., t ini Y ij = Y i (t ij, U i ), e ij = e i (t ij, U i ) Y ij = x(t ij, θ i ) + e ij, j = 1,..., n i Statistical Principles in Biomedical Modeling 30

4. Hierarchical statistical models Conceptually: PSfrag replacements 0 2 4 6 8 10 12 0 5 10 15 20 t x(t, U i ), x(t, U i ) + PSfrag e i,f (t, replacements U i ), Y ij = Y (t ij, U i ) Statistical Principles in Biomedical Modeling 31 t

4. Hierarchical statistical models To complete the first stage model: Must make assumptions on e i (t, U i ) = e i,f (t, U i ) + e i,m (t, U i ) and hence on e ij = e ij,f + e ij,m Measurement error: Typical assumptions on e i,m (t, U i ) Uncorrelated or independent across time devices commit haphazard errors At particular time t, elements of e i,m (t, U i ) may or may not be correlated separate assays for CD4, viral load but common sample preparation Normal probability distribution for each element Variances of each component are different, may or may not be constant Statistical Principles in Biomedical Modeling 32

4. Hierarchical statistical models To complete the first stage model: Must make assumptions on e i (t, U i ) = e i,f (t, U i ) + e i,m (t, U i ) and hence on e ij = e ij,f + e ij,m Fluctuation process: Typical assumptions on e i,f (t, U i ) Fluctuations close together in time tend to be similar e i,f (t, U i ) and e i,f (s, U i ) (auto )correlated depending on t s If times t i1,..., t ini as negligible sufficiently intermittent, treat autocorrelation Elements of e i,f (t, U i ) at a particular time t may or may not be correlated Normal probability distribution for each element Variances of each element different, may or may not be constant Statistical Principles in Biomedical Modeling 33

4. Hierarchical statistical models Transformation: Assumptions like normal distribution may make more sense on a transformed scale, e.g. log{y i (t, U i )} = log{x(t, θ i )} + e i (t, U i ) Result: Probability model for Y i, e.g., for HIV dynamic example At each time t ij, j = 1,..., n i ( log(y ij ) U i, θ i N log{x(t, θ i )}, Σ ), Σ = σ2 CD4 0 0 σ 2 V L. Y ij uncorrelated (independent ) across j = 1,..., n i Whatever assumptions are made determine a probability density for Y i (given U i, θ i ) p(y i u i, θ i ; Σ) Statistical Principles in Biomedical Modeling 34

4. Hierarchical statistical models Interest in subject i: If we wish to learn only about subject i s PK or dynamics Want to estimate θ i based on observing a realization of Y i How this is accomplished should be based on p(y i u i, θ i ; Σ) More later... Statistical Principles in Biomedical Modeling 35

4. Hierarchical statistical models Second stage: Consider the population of subjects θ i dictates PK/HIV dynamics/etc specific to subject i Different subjects in the population have different θ values leading to variation in inherent trajectories among subjects Conceptualize the population as all possible θ values a probability distribution θ 1,..., θ N for a sample of N subjects are independent random vectors taking their values according to this distribution E.g., θ i N (θ, D) or log(θ i ) N (θ, D), i = 1,..., N θ is the average value of θ in the population D is a covariance matrix whose diagonal elements are the variances of elements of θ across the population Statistical Principles in Biomedical Modeling 36

4. Hierarchical statistical models Result: Assumed probability model for θ i, i =, 1,..., N, determines a probability density N p(θ 1,..., θ N ; θ, D) = p(θ i ; θ, D) i=1 Objectives: Distribution of possible θ values in population Estimate θ and D (more in a moment ) Refinement: If A i, i = 1,..., N, also available, can develop a probability model for θ i and A i jointly Values of A i determine subpopulations Allows different probability distributions for θ depending on values of A i (different subpopulations) Statistical Principles in Biomedical Modeling 37

4. Hierarchical statistical models Putting together: Full statistical model for Y 1,..., Y N Probability density for Y 1,..., Y N Y i and θ i assumed independent across i p(y 1,..., y N ; ψ) = p(y 1,..., y N, θ 1,..., θ N ) dθ 1 dθ N = N i=1 N = i=1 N = i=1 Depends on ψ = (Σ, θ, D) p(y i, θ i ) dθ i by independence p(y i θ i )p(θ i ) dθ i p(y i u i, θ i ; Σ)p(θ i ; θ, D) dθ i Statistical Principles in Biomedical Modeling 38

5. Fitting and inference p(y 1,..., y N ; ψ) = N i=1 p(y i u i, θ i ; Σ)p(θ i ; θ, D) dθ i, ψ = (Σ, θ, D) How is all of this used? The statistical model provides a formal framework for fitting the mathematical model to data From statistical point of view: Under our assumptions Probability distributions that could have led to the realization y 1,..., y N (data ) we saw are specified by different values of ψ Which value of ψ truly governs the data generating mechanism? How do we estimate ψ from a potential set of data, i.e., Y 1,..., Y N? Given we can t see the whole population, how uncertain will we be? Statistical Principles in Biomedical Modeling 39

5. Fitting and inference Estimator: A function of Y 1,..., Y N that, if evaluated at a particular realization y 1,..., y N yields a numerical value (the estimate ) that gives information on the true value of ψ Write ψ = ψ(y 1,..., Y N ) Sampling distribution of an estimator: An estimator has a probability distribution characterizing how it takes on its its possible values (depending on Y 1,..., Y N and hence ψ) All possible realizations y 1,..., y N all possible values ψ can take on (we observe only one realization) These values vary across realizations Large variance another realization might give a very different estimate lots of uncertainty Small variance similar estimate from another realization mild uncertainty Statistical Principles in Biomedical Modeling 40

5. Fitting and inference Result: If we estimate ψ based on data, we must also report some estimate of the variance of the sampling distribution of ψ Sampling distribution often approximated based on large sample theory Standard error an estimate of variance based on this Quantifies uncertainty How do we get estimators and approximations to their sampling distributions? Least squares often DOES NOT yield an appropriate estimator... Even if we are only interested in part of ψ (e.g., θ and D), we often must estimate all of ψ Statistical Principles in Biomedical Modeling 41

5. Fitting and inference Standard approach: Maximum likelihood estimation Likelihood function L(ψ y 1,..., y N ) = p(y 1,..., y N ; ψ) For each possible realization y 1,..., y N there is a corresponding value ψ(y 1,..., y n ) maximizing L(ψ y 1,..., y N ) define the maximum likelihood estimator (MLE ) to be the corresponding function ψ(y 1,..., Y N ) MLE is the value of ψ most likely to have led to the observed data Optimality properties most precise ( smallest sampling variance in large samples ) General large sample theory yields approximate sampling distribution standard errors In general for complex statistical models, L(ψ y 1,..., y N ) is a complex function of ψ Statistical Principles in Biomedical Modeling 42

5. Fitting and inference Individual inference: Interested only in subject i s HIV dynamics MLE for θ i based on the statistical model p(y i u i, θ i ; Σ) such that at each time t ij, j = 1,..., n i ( ) Y ij U i, θ i N x(t, θ i ), Σ, Σ = σ2 CD4 0. 0 σv 2 L θ i = arg min θ { σ 2 CD4 σ 2 CD4 = N 1 N i=1 N i=1 y CD4 ij x 1 (t ij, θ) 2 + σ 2 V L N i=1 y CD4 ij x 1 (t ij, θ i ) 2, σ 2 V L = N 1 If censoring is much more complicated... Σ must be estimated, too y V L ij x 2 (t ij, θ) 2 } N i=1 y V L ij x 2 (t ij, θ i ) 2 See, for example, Adams et al. (2007) Statistical Principles in Biomedical Modeling 43

5. Fitting and inference Inference on the population: MLE for ψ = (Σ, θ, D) maximizes L(ψ y 1,..., y N ) = N i=1 p(y i u i, θ i ; Σ)p(θ i ; θ, D) dθ i Computational complications complex likelihood surface, intractable integration, etc. Focus is on θ and D, but must estimate Σ, too Alternative: Bayesian inference Analogous modeling, computational issues Statistical Principles in Biomedical Modeling 44

6. Using models Recall: Evaluation of effects of new treatments Identification of appropriate doses of drugs for further study Development of strategies for administration of treatments over time Design of clinical studies Great current interest: Use mathematical models embedded in an appropriate statistical model as the basis for simulation to address question like these E.g., Modeling and Simulation initiatives at most large pharmaceutical companies and FDA Basis for simulation the statistical framework Statistical Principles in Biomedical Modeling 45

6. Using models Approach: Simulation of the population based on a fitted hierarchical statistical model Generate N sim virtual subjects by generating θ i, i = 1,..., N sim, from p(θ i ; θ, D) Generate inherent trajectories x(t, θi U i, e.g. input u(t) ) under different conditions Can add within-subject deviations to obtain virtual data Can run virtual clinical studies with different sample sizes, subject populations, doses, etc, to predict outcome Statistical Principles in Biomedical Modeling 46

6. Using models HIV dynamics: 100 virtual inherent viral load trajectories with antiretroviral therapy terminated at 6 months, i.e., u(t) = 1, 0 t 6, u(t) = 0, t > 6 10 6 10 5 10 4 virus copies/ml 10 3 10 2 10 1 10 0 0 5 10 15 20 time (months) Statistical Principles in Biomedical Modeling 47

6. Using models HIV dynamics: Means of 15,000 virtual CD4 and viral load data profiles with u(t) = 1, 0 t τ, u(t) = 0, t > τ, τ = 0, 3, 4,..., 12 months Median CD4 0 200 400 600 800 1000 1200 Mean log10 VL 0 1 2 3 4 5 6 0 5 10 15 20 months 0 5 10 15 20 months Statistical Principles in Biomedical Modeling 48

7. Closing remarks For application to data mathematical models must be embedded in a statistical model that represents sources of variation in data Inverse problem should be regarded as an statistical estimation/inference problem Estimation should be accompanied by assessment of uncertainty Such mathematical-statistical models will be an increasingly important tool in biomedical research Statistical Principles in Biomedical Modeling 49

References Adams, B.M., Banks, H.T., Davidian, M., and Rosenberg, E.S. (2007) Model fitting and prediction with HIV treatment interruption data. Bulletin of Mathematical Biology 69, 563 584. Davidian, M. and Giltinan, D.M. (1995) Nonlinear Models for Repeated Measurement Data. London: Chapman & Hall. Davidian, M. and Giltinan, D.M. (2003) Nonlinear models for repeated measures data: An overview and update. Editor s invited paper, Journal of Agricultural, Biological, and Environmental Statistics 8, 387 419. Rosenberg, E.S., Davidian, M., and Banks, H.T. (2007) Using mathematical modeling and control to develop structured treatment interruption strategies for HIV infection. Drug and Alcohol Dependence special supplement issue on Customizing Treatment to the Patient: Adaptive Treatment Strategies 88S, S41-S51. Statistical Principles in Biomedical Modeling 50