Design of Foow-Up Experiments for Improving Mode Discrimination and Parameter Estimation Szu Hui Ng 1 Stephen E. Chick 2 Nationa University of Singapore, 10 Kent Ridge Crescent, Singapore 119260. Technoogy Management Area, INSEAD, 77305 Fontainebeau CEDEX, France. isensh@nus.edu.sg stephen.chick@insead.edu One goa of experimentation is to identify which design parameters most significanty infuence the mean performance of a system. Another goa is to obtain good parameter estimates for a response mode that quantifies how the mean performance depends on infuentia parameters. Most experimenta design techniques focus on one goa at a time. This paper proposes a new entropybased design criterion for foow-up experiments that jointy identifies the important parameters and reduces the variance of parameter estimates. We simpify computations for the norma inear mode by identifying an approximation that eads to a cosed form soution. The criterion is appied to an exampe from the experimenta design iterature, to a known mode and to a critica care faciity simuation experiment. (Design of Experiments; Mode discrimination; Parameter estimation; Entropy; Simuation) 1. Introduction A common purpose of many experiments is to obtain an adequate mathematica mode of the underying system, incuding the functiona form, and precise estimates of the mode s parameters. Response modes that describe the reationship between inputs to a system and the output can be usefu for design decisions, and much focus has gone into seecting inputs in a way that improves the estimate of the response mode [9, 29, 30, 40]. Response modes can be used in iterative processes to identify design parameters (e.g., number of servers, production ine speeds) that optimize some expected reward criterion (e.g., mean monthy revenue, average output), or to provide intuition about how input factors infuence aggregate system behavior. In simuation, response modes can reate the parameters of stochastic modes (e.g., demand arriva rates, infection transmission parameters) to system performance [2, 13, 24, 25, 32, 35]. 1 Corresponding author. 2 The authors acknowedge the financia support of the Nationa Institutes of Heath (grant R01 AI45168-01A1). 1
A number of design criteria are avaiabe to seect design factors (or inputs) for experiments. Severa authors [6, 16, 26] use the expected gain in Shannon information (or decrease in entropy) as an optima design criterion to seect vaues for the experiment s design factors. Bernardo [6] and Smith and Verdinei [38] adopted this approach and ooked at how to pan experiments to ensure precise estimates of the mode s parameters. However, many experiments aim to identify which factors most infuence the system response. Identifying the subset of most important parameters can be phrased as a mode seection probem [34]. Box and Hi [10] used Shannon information to deveop the MD design criterion for discriminating among mutipe candidate modes. Hi [21] stressed the importance of experimenta design for the joint objective of mode discrimination and parameter inference in his review of design procedures. But most design criteria focus either on identifying important parameters or improving estimates of response parameters, but not both. Exceptions are Hi et a. [22], whose joint criterion requires certain mode parameters to be estimated or known, and Borth [8], whose entropy-based criterion can be chaenging to compute. This paper describes a new joint criterion for experimenta design that seects designs to simutaneousy identify important factors and to reduce the variance of the response mode parameter estimates. The new criterion is shown to simpify to a cosed form for the standard inear regression mode with norma observation errors, and is computationay more efficient than Borth s criterion. Our criterion does not require initia estimates of the mode parameters and incorporates prior information and data from preiminary experiments. It is fexibe for use in either starting or foow-up experiments, particuary if resuts remain inconcusive about which factors most infuence the system response, and when the parameters are sti poory understood after an initia response surface experiment has been competed. We consider designs with a given number n of observations, and do not describe how to baance initia runs with foow-up runs. Section 2 describes the mathematica formuation for the design space and response modes. It aso describes a Bayesian formuation to quantify input mode and parameter uncertainty, as we as the new entropy based design criterion. Three numerica experiments in Section 3 show that optima designs depend heaviy on the criterion seected, and highight the benefits and tradeoffs of the new joint criterion over individua mode discrimination and parameter estimation criteria, as we as existing joint criteria. The exampes stress the need for baancing the two types of entropy measures of the joint criterion. For the exampes considered, we aso find that the weights in our criterion are robust to misspecification. The new criterion does we at both identifying important factors and reducing parameter uncertainty, and is computationay more efficient than 2
Borth s joint criterion. 2. Formaism The design criterion is appied to a finite, but perhaps arge, set of potentia factors in a finite number n of runs, where n is seected by the experimenter. The design space and cass of regression modes is described before the new entropy-based design criterion and computationa issues. 2.1 Design Space and Regression Modes Experiments often invove severa factors. Here we consider representing the performance of the systems by the usua inear mode. We consider a finite number q of rea-vaued factor inputs, x 1,..., x q, each of which may be chosen to take on a finite set of different vaues. These factors can be combined agebraicay to generate a finite number, p, of predictors, y 1,..., y p, each of which is some function of the input factors. We foow the formuation of Raftery et a. [34] to identify the most important of the p predictors. That is, we presume the existence of s = 2 p candidate response modes in a mode space M that are inear in some subset of the predictors. Assuming the -th candidate response mode, the output z i of the i-th run is presumed to be of the form z i = β 0 + β 1 y i,(1) + β 2 y i,(2) + + β t y i,(t) + ζ i, (1) where y i,(1),..., y i,(t) are the t predictors present in the -th mode, the vaues of the β j may depend upon, and ζ i is a zero mean noise term. The seection of a candidate response mode identifies the important predictors, reative to the size of the noise in the response. See aso George [20]. Let D be the (finite) design space of a possibe ega combinations of the inputs for each of the n runs. A design x D can be represented as an n q matrix whose i-th row contains the vaues of the factors for the i-th run. If mode M has t predictors, the design matrix can be converted to an n (t + 1) predictor matrix y = y (x) whose rows contain the vaues of predictors for each run, and the first coumn corresponds to the intercept. Let z be the coumn vector of n outputs. 2.2 Entropy-Based Formuation The probem is to choose a design x that in some sense is effective at identifying the most important predictors (i.e., seects the most appropriate mode in M), and estimate regression parameters. 3
We assess uncertainty about response mode seection and parameter estimation with probabiity distributions. The design that most improves an entropy-based criterion is then seected. 2.2.1 Uncertainty Assessment One Bayesian approach to quantify the joint uncertainty about mode form and parameter vaues is to assign a prior distribution to each of the modes M M, then assign a conditiona probabiity distribution for the parameter vector β, given M. The identity of the best response mode and parameter is then inferred by Bayes rue, using the prior distributions and the probabiity distribution of the output, given the mode and input parameters. This is the approach taken by [14, 28, 34]. We make a standard assumption of jointy independent, normay distributed errors, ζ Norma (0, σ 2 ), so if mode M is the mode, β is the parameter, x is the design with predictor matrix y = y (x), then the output Z has an mutivariate norma distribution, p(z M, β, σ 2, x) Norma ( y β, σ 2 I n ), where I n is the identity matrix. For prior distributions, we presume a conjugate prior distribution [5] for the unknown θ = (β, σ 2 ), conditiona on the -th mode M, π(β M, σ 2 ) Norma ( ) β µ, σ 2 V ( π(σ 2 M ) InvertedGamma σ 2 ν 2, νλ 2 where the conditiona prior mean vector µ and covariance matrix σ 2 V for β may depend on the mode M. The parameters ν and λ are seected by the modeer. The InvertedGamma (x α, β) distribution has pdf x (α+1) e β/x β α /Γ(α) and mean β/(α 1). Raftery et a. [34] suggest vaues of µ, V, ν and λ that minimize the infuence of the priors in numerica experiments. The distributions in Eq. (2) can either be based on prior information aone, or can incude information gained during initia stages of experimentation. Data z 0 from an initia stage of n 0 observations with predictor matrix y 0 is straightforward to incorporate because of the conjugate ( ) 1 ( form [5]. Repace the mean µ with µ = V 1 + y T 0 y 0 V 1 µ + y T 0 z 0 ); repace V with ( ) 1; V 1 + y T 0 y 0 repace ν/2 with (ν + n0 )/2; and repace νλ/2 with (νλ + (z 0 y 0 µ )T z 0 + (µ µ )T V 1 µ )/2. Choices used here for the prior distribution of M incude the discrete uniform (p(m ) = 1/s) and the independence prior p(m i ) = ω t i (1 ω) p t i, where t i is the number of predictors in mode i, (i = 1,..., s), and ω is the prior probabiity that a predictor is active. Raftery et a. [34] provide cosed form formuas to update the probabiities p(m z 0, y 0 ). 4 ), (2)
In the rest of the paper, the prior distribution for the foow up stage is based on a prior distribution from Eq. (2) in combination with data from an initia stage. The optima baance of the amount of initia stage data versus the amount of foow-up data is beyond the scope of this paper. 2.2.2 Modeing Remarks Considering posterior probabiities is a usefu way to assess the reative merits of the modes [20]. Seecting modes according to p(m Z) is consistent in that if one of the entertained modes is actuay the true mode, then it wi seect the true mode if enough data is observed. When the true mode is not among those being considered, Bayesian mode seection chooses the candidate that is cosest to the true mode in terms of Kuback-Leiber divergence [4, 5, 18]. In practice, the true mode is typicay not known and is potentiay not in M. Despite this, carefu seection of a cass of approximating modes is important in the understanding of many probems. Here we seek a mode within the cass that is approximatey correct (containing ony significant predictors) and that approximates the parameters of the true underying response mode with ow variance [9]. Atkinson [1] raises severa concerns about inference for regression modes. Our prior probabiity framework avoids by design his concern about improper prior distributions for modes. A concern about nesting, so that two modes may be true, is resoved by noting that the simper mode wi be identified as more data is coected, and that simper modes are more desirabe expanations [19, 28]. Atkinson [1] aso indicates that if two response modes are compared, the true mode and an incorrect mode with fewer parameters, then asymptoticay the correct mode wi be seected, but that for finite numbers of sampes the posterior probabiities may support the incorrect mode in the absence of strong evidence from the data. This is a cause for care, but is not a vioation of the ikeihood principa, and negative consequences for seecting a mode when the data do not provide enough evidence is a probem for any seection criterion. A goodness-of-fit test may be usefu to provide further post hoc vaidation. 2.2.3 Entropy-Based Criteria Severa authors [6, 16, 26] proposed the use of the expected gain in Shannon information (or decrease in entropy) given by an experiment as an optima design criterion. This expected gain is a natura measure of the utiity of an experiment. The choice of design infuences the expected gain in information as the predictive distribution of future output Z is determined by the design x, mode M, and the prior distribution in Eq. (2) p(z M, σ 2, x) Norma 5 (yµ, σ 2 [yv y T + I n ]). (3)
The margina distribution of Z given M, y, obtained by integrating out σ 2, is a mutivariate t distribution. Entropy is different for discrete (mode seection) and continuous (parameter estimation) random variabes, so each is discussed in turn. For mode seection, Box and Hi [10] use the expected increase in Shannon information J as a design criterion. The criterion was derived from information theory where the information (entropy) was used as a measure of uncertainty for distinguishing the s candidate modes. ( ) J = p(m ) og p(m ) + p(m Z, y ) og p(m Z, y ) p(z y )dz (4) = =1 p(m ) =1 og =1 p(z M, y ) s =1 p(z M, y )p(m ) p(z M, y )dz An expicit soution is unknown in genera, so J may be evauated numericay or approximated. Aternatey, Box and Hi [10] gave an upper bound approximation, the expected gain in Shannon information between the predictive distributions of each pair of candidate modes M i and M. This approximation was originay named the D-criterion, but we use the notation MD, as in [28]. MD = ( p(m i )p(m ) p(z M i, y i ) og p(z M ) i, y i ) p(z M, y ) dz 0 i s The MD criterion is effective in practice and popuar with research workers [21]. We use MD for the mode discrimination portion of our joint criterion. For the norma inear mode, Meyer et a. [28] show that MD reduces to a cosed form if a noninformative prior 1/σ on σ and a conditionay norma prior for β given σ are assumed. A cosed form aso resuts if the conjugate prior is assumed. Proposition 1. Assume the conjugate norma gamma prior in Eq. (2). Let ẑ = y µ, and V = [y V y T + I ]. Then for the inear mode, MD simpifies to MD = 0 i s Proof. See Appendix A.1 [ 1 2 p(m i)p(m ) n + tr(v 1 V i ) + 1 ] λ (ẑ i ẑ ) T V 1 (ẑ i ẑ ) For parameter estimation, Bernardo [6] and Smith and Verdinei [38] adopted an entropy based method to ensure precise estimates for parameters that have aready been identified as important. They choose the design that maximizes the expected gain in Shannon information (or equivaenty, (5) (6) 6
maximizes the expected Kuback-Leiber distance) between the posterior and prior distributions of the parameters θ = (β, σ 2 ). BD = [ ] p(θ Z) p(z)p(θ Z) og dθdz (7) p(θ) Eq. (7) simpifies consideraby for the norma inear mode into a form known as the Bayesian D-optima criterion (hence the choice of name BD). Proposition 2. For a inear mode M of the form Eq. (1), the prior probabiity mode Eq. (2), and a given design y, Proof. See Appendix A.2. BD = 1 y 2 og T y + V 1 1 2 og V 1 Foowing Borth [8], the entropy criterion S P for parameter uncertainty generaizes when there are mutipe candidate modes. S P = + =1 p(m ) =1 p(m Z) p(θ M ) og p(θ M )dθ (8) p(θ Z, M ) og p(θ Z, M )dθ Proposition 3. For the norma inear mode, S P simpifies to S P = =1 p(m ) 2 for some K that does not depend on the design. Proof. See Appendix A.3. 2.2.4 Joint Criterion p(m )p(z M )dz =1 og y T y + V 1 + K (9) In order to account for both mode discrimination and parameter estimation simutaneousy, Hi et a. [22] proposed a joint criterion that adds a weighted measure of discrimination and precision, C = w 1 D 0 + w 2 E 0, (10) where D 0 is some measure of discrimination and E 0 is some measure of precision in parameter estimation. A nonunique choice of D 0 and E 0 they suggest is the mode discrimination criterion 7
proposed by Box and Hi [10], and the determinant of the regression matrix for estimating the parameters for mode i. where MD and E i C = w MD MD + (1 w) i=1 p(m i ) E i, (11) Ei are the maximum vaues of MD and E i over the design region, and w is a nonnegative weight paced on mode discrimination. They assumed that σ 2 is known or can be estimated when computing C. As the two criteria are summed together and weighted by w, the maxima MD and E i may be ess reevant than the range of the criteria over the design space. Borth [8] treated the two objectives using the idea of the change in tota entropy. He showed that it decomposed into the mode discrimination term J and parameter estimation term S P. We denote Borth s criterion as B hereafter. The scae for entropy for continuous random variabes (parameters) may not be we-caibrated with entropy for a discrete random variabe (mode seection): their range may differ when evauated throughout the design space. Borth s method aso requires computationay expensive numerica integration. Here we aso use the idea of the expected gain in entropy of an experiment, but normaize J and S P over their range of vaues, and simpify the weight factor. Instead of numericay evauating the J criterion, we approximate it with the MD criterion. So an upper bound approximation of the joint criterion for mode discrimination and parameter estimation is where MD min, MD max, S Pmin, S Pmax S Q = w MD MD min + (1 w) S P S Pmin, (12) MD max MD min S Pmax S Pmin are the smaest and argest MD vaues and the smaest and argest S P vaues respectivey over a designs in D, and w [0, 1] is a weight factor. This is simiar in form to criterion C, but reduces to a cosed form if the prior setup in Eq. (2) is used, as a resut of Eq. (6) and Eq. (9). Eq. (12) does not require σ 2 to be known (see the propositions above for inear modes, and comments beow for noninear modes), and incorporates prior information and data from initia experiments. The weight w shoud be seected based on the resuts of the initia experiments and the focus of the foow-up experiment. If the initia experiment was insufficient to identify the important parameters, then more weight shoud be paced on mode discrimination. If the mode is reasonaby determined, then more focus can be paced on parameter estimation. Hi et a. [22] suggested w = [ s s 1 (1 p(m max)) ] ξ where Mmax is the a priori most probabe mode. Another choice is w = [(1 (p(m max ) p(m max2 ))] ξ, where M max2 is the second most probabe mode. Sma vaues of ξ paces more weight on mode discrimination. To equay baance the two caibrated 8
entropy measures, w can aso be set at 1/2. The numerica exampes in Section 3 use both the weighting function of Hi et a. [22] and w = 1/2. Exampes 1 and 2 assess the dependence upon the optima design on the weight. The exampes show that rescaing the entropies can be important, but that the fina design may be somewhat insensitive to a misspecification in w. To achieve the joint objectives of mode discrimination and parameter estimation, we seek a design x D that maximizes S Q in Eq. (12). For norma inear modes, S Q simpifies to a cosed form through Eq. (6) and Eq. (9). The criterion is aso appicabe to noninear modes. When a noninear mode can be approximated by a inear mode in the neighborhood of θ 0, S Q can be appied by substituting the initia estimates of the parameters [12]. For non-norma modes, S Q requires numericay integrating Eq. (5) and Eq. (7). For generaized inear and noninear modes, Bayesian methods [3, 17] can be used to approximate the terms in Eq. (5) and Eq. (7). Shannon information is not the ony possibe approach to deveop a joint mode seection and parameter estimation design criterion. Bingham and Chipman [7] propose a weighted average of Heinger distances between predictive densities of a possibe pairs of competing modes as a criterion for mode discrimination. A inear combination of Bingham and Chipman [7] s criterion and a weighted average of the Heinger distances between the prior and posterior distributions of each mode s parameters can aso be used as a joint criterion. For the prior setup in Eq. (2), this reduces to a cosed form. The weighting functions described above can be used to weight the importance of each objective. The upper bound on the Heinger distances for each individua term can be usefu for rescaing, but the maximum vaues of each term for a particuar finite n, can be quite far from the upper bound and rescaing each term by its upper bound may not be appropriate. We do not consider that combined criterion further here. 2.3 Some Computationa Issues Athough S Q simpifies to a cosed form for the norma inear mode, there are computationa chaenges. We consider three here. First, the number of modes grows exponentiay in the number of predictors. Second, the min and max vaues of the two entropy measures that comprise S Q are required. Third, the number of designs grows combinatoriay in the number of candidate runs. To address the first issue, the summands for MD and S P are computed by using ony the most ikey modes. There are typicay far fewer than s = 2 p different modes whose probabiity p(m ) ead it to be a competitor for the best after the initia stage of experimentation. By considering ony the most ikey modes, Eq. (12) becomes tractabe. There are severa ways one can chose a subset of probabe modes: (i) Pick a modes h so that p(m h ) E, (ii) Pick the h most ikey 9
modes, where h is the smaest integer so that p(m (1) ) + p(m (2) ) + + p(m (h) ) F. Raftery et a. [34] take a simiar approach to mode averaging. Exampes 2 and 3 of Section 3 use (i) with E = 0.02. The top modes have higher posterior probabiities, so we set E = 0.02. In Section 3.1, no mode ceary stands out after the initia runs, so we use E = 0.008. When direct enumeration is not computationay feasibe, these more important modes can be identified heuristicay by using Markov Chain Monte Caro methods ike MC 3 (Markov Chain Monte Caro Mode Composition) [27] to estimate the p(m ). The state space for MC 3 is the set of s modes, and a sampe path visits a sequence of different modes, M. Candidate states for transitions are chosen from the set of modes with one more or one fewer active predictors. The reative probabiities for the current and candidate states, needed to impement the Metropois- Hastings step of MC 3, can be computed from cosed-form formuas in Raftery et a. [34]. The number of times a mode is visited during MC 3 divided by the number of iterations of MC 3 is a consistent estimate of the mode s posterior probabiity. Chipman et a. [15] and Ng [31] discuss some practicaities of Markov chain Monte Caro methods for mode seection. Second, we use an optimization heuristic to estimate MD min, MD max, S Pmin, S Pmax. We use the k-exchange agorithm of Johnson and Nachtsheim [23] to search for the maximum and minimum vaues. The k-exchange agorithm was first proposed to construct D-optima designs, but because it is a genera agorithm, it can be used to seect from a finite set of designs as ong as an optimaity criterion is given. Numerica resuts [23, 33] show that it is efficient and effective in constructing optima designs, and the agorithm has been widey used. In the numerica exampes we considered, the k-exchange agorithm was very efficient in identifying the optima designs. In addition to increasing k as suggested in [23], we aso found that for the k-exchange agorithm, increasing the number of starting designs from scattered points in the design space improves the search for the optima. Aternativey, the branch and bound agorithm in [39], or nested partitions [37] can be used to find the goba optima. Third, we generaize the k-exchange agorithm (Appendix A.4) to identify a design with a high vaue of S Q to improve the scaing of the entropy measures. The agorithm is a greedy agorithm that swaps in and out design points one at a time. More work on computationa issues is an avenue for future research. 10
3. Numerica Resuts Three numerica experiments compare the new criterion, S Q, with the two other joint criteria in the iterature, as we as the MD and S P criteria. The optima S Q foow-up design x (w, z 0 ) depends upon the weight w and previous observations z 0. Let x (z 0 ) = {x (w, z 0 ) : w [0, 1]} be the set of designs that the S Q criterion identifies, given z 0. 3.1 Chemica Reactor Experiment Box et a. [11, p. 377] gave data for a chemica-reactor experiment that used a 2 5 fu factoria design. From this data, we extracted runs that correspond to five coumns of a Packett-Burman 12 run (PB12) design. We treated those runs (see Tabe 1) as an initia experiment. The foow-up design was simuated by extracting the remaining runs from the compete experiment. Tabe 1: PB12 design and data extracted from the fu 2 5 reactor experiment Run i A B C D E z i 1 1 1-1 1 1 77 2 1-1 1 1 1 42 3-1 1 1 1-1 95 4 1 1 1-1 -1 61 5 1 1-1 -1-1 61 6 1-1 -1-1 1 63 7-1 -1-1 1-1 69 8-1 -1 1-1 1 59 9-1 1-1 1 1 78 10 1-1 1 1-1 60 11-1 1 1-1 1 67 12-1 -1-1 -1-1 61 We considered fifteen predictors (five factors and their two factor interactions) to get 2 15 distinct inear modes in the mode space M, each differing by the absence or presence of each predictor. We used the equa probabiity prior for mode uncertainty, p(m ) = 2 15, and the prior for parameters suggested by Raftery et a. [34]. Tabe 2 shows the probabiities for the top 8 modes, given that prior distribution and the PB12 data. No mode ceary stands out, but the mode identified in the origina anaysis of a 32 runs [11], with factors (B, D, E, BD, DE), is ranked best. To distinguish between the top eight modes, n = 3 additiona runs were seected from the remaining 20 runs. The best designs for each joint criterion (S Q with w = 0.5; B; and C with ξ = 2 as in [22]) were computed by evauating the criteria over each possibe design. The joint 11
Tabe 2: Probabiity of the eight most probabe modes after 12 runs Mode Posterior Probabiity B, D, E, BD, DE 0.0483 B, C, D, E, BD, DE 0.0206 B, D, E, BC, BD, DE 0.0195 B, D, E, BD, BE, DE 0.0106 A, B, D, E, BD, DE 0.0105 B, D, E, AE, BD, DE 0.0101 B, D, E, AC, BD, DE 0.0085 B, D, E, BD, CD, DE 0.0083 Tabe 3: Posterior probabiity (Post.) of the three most probabe modes with PB12, + 3 runs determined by the best design obtained from fu enumeration of the S Q, B, and C criteria. New S Q Criterion Borth s B Criterion Hi s C Criterion Mode Post. Mode Post. Mode Post. B, D, E, BD, DE 0.189 B, D, E, BD, DE 0.132 B, D, E, BD, DE 0.166 B, D, E, BC, BD, DE 0.053 B, C, D, E, BD, DE 0.082 B, D, E, BC, BD, DE 0.065 A, B, D, E, BD, DE 0.045 A, B, D, E, BD, DE 0.051 B, D, E, BD, BE, DE 0.04 criterion S Q resuts in different designs than the B and C criteria. The posterior probabiities of a modes were then recomputed using a 15 runs, and the top 3 modes are shown in the eft portion of Tabe 3. A three designs identified the same top mode identified in the origina anaysis of a 32 runs. S Q discriminated in favor of the top mode more than criteria B and C. Tabe 4 indicates that S Q reduced the parameter generaized variance (the determinant of the posterior covariance matrix of the parameter estimates, V (β) ) of the top mode more than B and C. To compare the computationa burden, each criterion was evauated for a possibe designs for one, two, three and four additiona runs using Mape8 (sow, because it is interpreted, but reative CPU times are iustrative). Tabe 5 shows the computation times for the S Q and B criterion. The computation times for S Q and C were simiar. The curse of dimensionaity made quadrature an inefficient approach for the numerica integrations required by B. Tabe 6 shows the posterior probabiities of the top three modes with the mode discrimina- Tabe 4: Parameter generaized variance V (β) for the a posteriori top mode (B, D, E, BD, DE), given PB12 + 3 runs, based on the S Q, B and C criteria. Criterion V (β) S Q 3.62 10 12 B 4.81 10 12 C 4.74 10 12 12
Tabe 5: CPU time for computing S Q, C and B (hours). Additiona runs S Q and C B 1 0.005 0.007 2 0.04 0.86 3 0.33 55.4 4 1.87 453 Tabe 6: Posterior probabiity (Post.) of the three most probabe modes with PB12 + 3 runs determined by the best design obtained by fu enumeration for S Q, MD, and S P. S Q Criterion MD Criterion S P Criterion Mode Post. Mode Post. Mode Post. B, D, E, BD, DE 0.189 B, D, E, BD, DE 0.156 B, D, E, BD, DE 0.124 B, D, E, BC, BD, DE 0.053 B, D, E, BD, BE, DE 0.048 B, C, D, E, BD, DE 0.072 A, B, D, E, BD, DE 0.045 B, C, D, E, BD, DE 0.038 A, B, D, E, BD, DE 0.032 tion MD and parameter estimation S P criteria. As expected, MD did a better job than S P at distinguishing the top mode from the others, and Tabe 7 indicates that S P outperformed MD at reducing the parameter generaized variance of the top mode. S Q on the other hand outperformed MD in favoring the top mode, and was ony sighty poorer than S P in parameter estimation. This exampe aso iustrates the importance of normaizing that we suggest, as one of the subcriteria woud be ignored without recaibration. With the equa probabiity prior for each mode, the range of uncaibrated MD scores over the design space ranged from 0.034 to 0.062, whie the uncaibrated S P scores range from 0.88 to 0.90. Without recaibration, the joint criterion woud have seected the best S P design, and ignored the mode discrimination objective. We tested the sensitivity to the prior distribution by rerunning the experiment with the independence prior p(m ) = ω t (1 ω) p t, where t is the number of predictors in mode, ( = 1,..., 2 15 ), with ω = 0.25. The mode with factors (B, D, E, BD, DE) is ranked third when ony 12 runs are used, but the S Q criterion again identified (B, D, E, BD, DE) as the most probabe mode after the n = 3 run foow up was competed. To test the sensitivity of the designs to the weights, w was varied from 0 to 1. In this exampe, Tabe 7: Parameter generaized variance V (β) for the a posteriori top mode (B, D, E, BD, DE), given PB12 + 3 runs, based on the S Q, MD and S P criteria. Criterion V (β) S Q 3.62 10 12 MD 4.74 10 12 2.13 10 12 S P 13
the top S Q design is robust over a range of weights. There were ony three different top designs obtained as w was varied from 0 to 1. When 0 w <.27, the top S P design is obtained. The same design obtained for S Q when w = 0.5 is obtained when 0.27 w < 0.78. The top MD design is obtained when w.78. For the weighting function suggested in [22], the best MD design is seected when ξ gets sma, ξ 4, and the best S P design is seected when ξ gets arge, ξ > 27, and the same design seected for any ξ in between. In this exampe, sma changes in w or ξ do not significanty change the optima design. The Entropy Baancing k-exchange Agorithm in Appendix A.4 was impemented to evauate its effectiveness to find good designs in this probem. The best design is known for this probem because an exhaustive evauation of a 1140 designs is possibe. The Entropy Baancing agorithm consists of two steps. In the initia step of the agorithm, the maximum and minimum vaues of MD and S p are estimated using the k-exchange max and k-exchange min agorithms. In the second step, the k-exchange max agorithm is used to search for a good S Q design. We first evauated how we the initia step performs in determining the maximum and minimum vaues. We varied the number r of initia random designs to start the agorithm, r = 100 and r = 200, and conducted 10 independent repications of the agorithm for each r (each repication sampes an independent set of r design points). When r = 100, the estimates of (MD max, MD min, S Pmin, S Pmax ) were a equa to the actua vaue 70% of the time. When r = 200, a four estimates were correct 80% of the time. In the remaining cases, ony one of the four actua vaues was not obtained, but the estimate was cose to the actua vaue. We next tested the how we the second step of the agorithm identifies the known best optima S Q design. The known best optima S Q design was identified 70% of the time with r = 100, and 90% of the time with r = 200. The remaining nonoptima designs seected were among the top 8 designs. 3.2 Finding a Known Mode To determine how we the criteria performs in detecting a known mode, and how the best S Q mode depends upon w in repeated sampes, we ran 50 repications of an experiment using the S Q criterion on a known mode with 4 potentia factors (A, B, C, D), namey Z = 10A + 15B + 6AB +7AC +ζ i, where ζ i Norma(0, 5). For each repication i, we ran an initia 2 4 1 fractiona factoria design to generate preiminary data z 0,i, which was then used to create a prior distribution for a foow-up design with n = 3 runs as described in Fig. 1. In each repication, the true (known) mode was among the top eight candidate modes after the initia 2 4 1 = 8 runs but was competey confounded with three other modes. The three additiona runs seected by S Q deaiased 14
Figure 1: Agorithm to assess the identification of a known mode in Section 3.2 For i = 1, 2,..., 50: 1. Generate independent preiminary data z 0,i with a 2 4 1 factoria design. 2. Update the distributions for the unknown modes and parameters as in Section 2.2.1. 3. Determine the best S Q design x (w, z 0,i ) for n = 3 additiona runs, as a function of w. 4. Run the best foow-up design, x (w, z 0,i ), for each w. 5. Compute the posterior probabiity and ordina rank of each mode, as a function of w. End for oop the confounded effects and distinguished between the top competing modes. In each repication, three different designs were obtained ( x (z 0,i ) = 3) as w varied between 0 and 1. The best S Q design x (w, z 0,i ) was the top S P design (x (0, z 0,i )) for sma w. For a certain range between 0 and 1, a unique top S Q design was obtained that baances mode discrimination and parameter estimation. For arger w, the best S Q design was the best MD design (x (1, z 0,i )). In 70% of the repications, the same three designs x (z 0,i ) were seected and the same S Q was seected for w in approximatey the range (0.1, 0.55). The other 30% of the repications resuted in 3 other sets of S Q designs with the same quaitative features: the designs with w in the range of about 0.1 up to 0.4-0.8 baanced mode discrimination and parameter estimation. In 49 out of 50 repications, the true mode was identified as the best mode when the S Q design with intermediate vaues of w was used to baance discrimination and estimation. In the remaining repication, the true mode had the third highest posterior probabiity. Averaging over 50 repications, the probabiity that the true mode was best improved from 0.04 after the initia stage (8 runs), to 0.21 (after the 3 foow-up runs). The average range of MD was 7.42, the average range of S P was 0.59, and MD max > S Pmax for a repications. If the individua measures were not recaibrated by their ranges, the S Q criterion woud have seected the best MD design and ignored parameter estimation uness a very sma weight were paced on mode discrimination. This experiment gave the same quaitative concusions as Section 3.1. Borth s criterion took orders of magnitude more time to compute due to numerica integration issues (curse of dimensionaity). Rebaancing the entropy measures was important for assuring a baance between discrimination and estimation. The optima design was not highy sensitive to the choice of w. 3.3 Critica Care Faciity The critica care faciity iustrated in Fig. 2 was originay studied by Schruben and Margoin [36]. Patients arrive according to a Poisson process and are routed through the system depending 15
Figure 2: Estimated fraction of patients routed through the units of a critica care faciity. upon their specific heath condition. Stays in the intensive care (ICU), coronary care (CCU), and intermediate care faciities are presumed to be ognormay distributed. This section compares S Q with C and the individua criteria, MD and S P. Borth s criterion B took too much time to evauate and was therefore not compared. We initiay ran a 64 run design using the Bayesian mode average to sampe uncertain input parameters [32]. We considered tweve input parameters, resuting in 2 12 distinct inear modes in the mode space M, each differing by the absence and presence of each predictor. Tabe 8 shows the posterior probabiities for the top 5 modes. The mode identified in a 128 run study in [32] is ranked fifth here. Schruben and Margoin [36] studied how to aocate random number streams to reduce variabiity in response surface parameter estimates. Their response mode predicts the expected number of patients per month E[Z] that are denied entry to the faciity as a function of the number of beds in the ICU, CCU, and intermediate care faciities. They presume fixed point estimates for k = 6 input parameters, one per source of randomness, to describe the patient arriva process (Poisson arrivas, mean ˆλ = 3.3/day), ICU stay duration (ognorma, mean 3.4 and standard deviation 3.5 days), intermediate ICU stay duration (ognorma, mean 15.0, standard deviation 7.0), intermediate CCU stay duration (ognorma, mean 17.0, standard deviation 3.0), CCU stay duration (ognorma, mean 3.8, standard deviation 1.6), and routing probabiities (mutinomia, ˆp 1 = 0.2, ˆp 3 = 0.2, ˆp 4 = 0.05). Some parameters are mutivariate, and there are a tota of 1 + 4 2 + 3 = 12 dimensions of parameters. For the ognorma service times, the og of the service times has mean µ and precision λ = 1/σ 2. Subscripts distinguish the parameters of each service type (e.g., µ icu, µ iicu, µ iccu, µccu, λ icu ). The anaysis here presumes a inear response mode in these 12 parameters. The actua system parameters are not known with certainty, and the estimated system performance wi be in error if the actua parameter vaues differ from their point estimates. As in Ng and 16
Tabe 8: The five most probabe modes after 64 runs. Mode Post. Prob. λsys, µ iicu, λ iicu, p 1, p 4 0.43298 λsys, µ iicu, λ iicu, p 1, p 3, p 4 0.20019 λsys, µ iicu, λ iicu, λ iccu, p 1, p 4 0.0682 λsys, µ iicu, λ iicu, µ iccu, p 1, p 4 0.0516 λsys, µ iicu, λ iicu, µ iccu, p 1, p 3, p 4 0.0296 Tabe 9: The most probabe modes with 64+32 runs determined by S Q with w = 0.55. Mode Post. Prob. λsys, µ iicu, λ iicu, µ iccu, p 1, p 3, p 4 0.407 λsys, µ iicu, λ iicu, p 1, p 3, p 4 0.215 λsys, µ iicu, λ iicu, p 1, p 4 0.110 Chick [32], who used naive Monte Caro samping for unknown inputs to do an uncertainty anaysis, we fix the number of beds in each of the three units (14 in ICU, 5 in CCU, 16 in intermediate care), and study how the expected number of patients per month that are denied entry depends on the unknown parameters. Design points for the unknown parameter vaues coud take on vaues of the MLE ± one standard error. The approach of Raftery et a. [34] was used to obtain prior distributions for the unknown response parameters. We used the S Q criterion with w = 0.55 (or ξ = 1) to avoid focusing on parameter estimation too eary. The design points of a fu factoria for the 12 parameters were candidates for the 32 run foow-up design. The number of possibe 32 run designs from the 2 12 candidate runs is arge, we used the k-exchange agorithm to search for the best S Q design (r = 50, k = 5), then ran the critica care simuations again with that design. The posterior probabiities for the top three modes, given the data from the combined design (64+32), are shown in Tabe 9. The top mode is the same mode identified in the 128 runs anaysis in [32], but the S Q criterion identified this mode with fewer runs. We aso used the k-exchange agorithm (r = 50 and k = 5) to determine a good C design. Tabe 10 shows the posterior probabiities after running the simuations with the C design. The C design identified the same mode as S Q, but S Q did sighty better in discriminating the top two modes. The best designs for MD and S P are different than the best S Q design with w = 0.55. The MD design identified the same top mode as the S Q design, and discriminated between the top two modes sighty better than the S Q design (Tabe 9 and Tabe 11). Tabe 12 indicates that design S Q did a better job than C and MD at reducing the parameter generaized variance of the top mode. The S Q criterion with w = 0.75 (or ξ = 0.5) resuted in better mode discrimination than with 17
Tabe 10: Most probabe modes with 64+32 runs with the C criterion. Mode Post. Prob. λsys, µ iicu, λ iicu, µ iccu, p 1, p 3, p 4 0.359 λsys, µ iicu, λ iicu, p 1, p 3, p 4 0.232 λsys, µ iicu, λ iicu, λ iccu, p 1, p 3, p 4 0.077 Tabe 11: Most probabe modes with 64+32 runs with the MD criterion. Mode Post. Prob. λsys, µ iicu, λ iicu, µ iccu, p 1, p 3, p 4 0.438 λsys, µ iicu, λ iicu, µ iccu, λ iccu, p 1, p 3, p 4 0.125 λsys, µ iicu, λ iicu, µ iccu, p 1, p 3, p 4 0.109 w = 0.55 at the cost of sighty ess effective parameter estimation (in this case, x (z 0,i ) > 3). The top two modes identified in the S P design were the same modes identified in the origina 64 run anaysis, and the top mode identified by the S Q and MD design is ony ranked fourth when the S P design is used. The S P criterion focused on designs that had good parameter estimation primariy for modes with higher posterior probabiity. Using the S P criterion too eary in the experimentation process can prematurey focus the design and experimentation on a few modes that may or may not be good approximations to the system (because of the sma number of runs), an issue raised by Atkinson [1]. An eary focus on M D can better distinguish competitors for the best mode, but at the expense of poorer parameter estimates. S Q baanced both of those needs. 4. Discussion and Concusions The purpose of many experiments is to distinguish between ikey mathematica modes and obtain precise estimates for the mode parameters. The three joint design criteria examined here each use an additive measure for entropy measures or bounds for mode and parameter uncertainty. Our Tabe 12: Parameter generaized variance V (β) after 64+32 runs for the mode with λsys, µ iicu, λ iicu, µ iccu, p 1, p 3, p 4. Runs Criterion V (β) 96 S P 4.63 10 28 96 S Qw=0.55 4.69 10 28 96 S Qw=0.75 5.33 10 28 96 C 5.70 10 28 96 MD 5.71 10 28 64 9.46 10 26 18
proposa for the new S Q criterion to normaize each entropy measure by the amount each varies over the design space provides an insight that the other joint criteria do not: It indicates how rich the design space is for improving each entropy measure. If the range for one of the component criteria is much smaer than for the other (e.g. MD max MD min >> S Pmin S Pmax ), or if the number of potentiay optima designs, x (z 0 ), is sma, then a richer design space might be considered. S Q is computationay more efficient than Borth s joint criterion especiay when the panned foow-up designs get arger. In the first two exampes, the S Q design performs as we as Borth s criterion, but it is computationay more efficient and practica. S Q extends the C criterion as it considers the reevant range of the individua criteria, does not require initia estimates of the variance, and accounts for avaiabe prior information. These two exampes aso show that there are 3 different designs for S Q as w varies from 0 to 1. The optimum MD design is seected when the mode discrimination term is heaviy weighted and the optimum S p design is seected when the parameter estimation term is heaviy weighted. For each experiment, the S Q design that baances both objectives is shown to be insensitive to a range of w, and this best design seected performs more efficienty than the other criteria. Three numerica experiments iustrated the compromise between mode discrimination and parameter estimation obtained when using the joint criterion S Q. Compared with the individua criteria, the baanced S Q design was about as good as the MD design for mode discrimination, and was amost as good as the S P design for parameter estimation. The MD design fared ess we for parameter estimation, and the S P design was east effective for mode discrimination. Athough S Q is easier to compute for the inear mode than Borth s criterion, the arge number of matrix cacuations required to compute the S Q criterion may need to be baanced against the cost of actuay running the experiments. In a simuation context, CPU cyces might be better spent running repications rather than computing S Q if the simuations run quicky. For expensive industria experiments or compex simuations with ong run times, the S Q criterion may be an effective mechanism to baance the needs of factor identification and parameter estimation. Sequentia designs and criteria based on the Heinger distance are avenues for further research. 19
A. Mathematica Detais A.1 Proof of Prop. 1 Conditioning on σ 2, the MD criterion can be rewritten MD = p(m i )p(m ) p(σ 2 ) p(z M i, y i, σ 2 ) og p(z M i, y i, σ 2 ) 0 p(z M, y, σ 2 ) dzdσ2 (13) 0 i s Meyer et a. [28] substituted the predictive distribution of the norma form in Eq. (3) into Eq. (13) and integrated with respect to p(z M i, σ 2 ) to obtain: MD = [ [ 1 p(m i )p(m ) π(σ 2 ) 2 og 0 i s ( 0 i s nσ 2 σ 2 tr(v 1 0 ( V ) V 1 i 2σ 2 V i ) (ẑ i ẑ ) T V 1 (ẑ i ẑ ) We now isoate the dependence on the noninformative prior. MD = [ ( 1 V ) p(m i )p(m ) 2 og V i ( nσ 2 σ 2 tr ( V 1 = 0 i s V i 1 2 p(m i)p(m ) + (ẑ i ẑ ) T V 1 (ẑ i ẑ ) 0 1 2σ 2 ) ] dσ 2 ] ) ) ] (ẑi ẑ ) T V 1 (ẑ i ẑ ) π(σ 2 )dσ 2 [ ( V ) og n + tr(v 1 V i ) 0 V i ] 1 σ 2 π(σ2 )dσ 2 The doube sum means that pairs i, can be matched to make the og terms cance out. And 1 π(σ 2 )dσ 2 = [(ν/2)/(νλ/2)] InvertedGamma ( σ 2 ( ν + 1) ), νλ 0 σ 2 0 2 2 dσ 2 = 1/λ. Substitute this into Eq. (14) to justify the caim in Eq. (6). A.2 Proof of Prop. 2 (14) Condition on mode M and et θ = (β, σ 2 ). [ ] p(θ Z) BD = p(z)p(θ Z) og dθdz p(θ) [ ] = p(z)p(θ Z) og p(θ Z) dθdz [ ] p(z)p(θ Z) og p(θ) dθdz 20
The second term on the right hand side of this ast equation simpifies using Fubini s theorem and Bayes theorem, assuming the integras exist, and is independent of the design. [ ] [ ] p(z)p(θ Z) og p(θ) dθdz = p(z)p(θ Z) og p(θ) dzdθ [ ] = og p(θ) p(θ) p(z θ)dzdθ [ ] = og p(θ) p(θ)dθ So BD = [ ] p(z)p(θ Z) og p(θ Z) dθdz [ ] og p(θ) p(θ)dθ. (15) Condition now on both M and σ 2 and focus on the inner integra in the eft hand term of the ast equation. It is we known that the conditiona distribution of β given z, σ 2 is a norma distribution with variance σ 2 (y T i y i + V 1 i ) (e.g. see [5]). Ca the posterior mean µ. First integrate out β, with σ 2 handed after. p(β Z, M i, σ 2 ) og p(β Z, M i, σ 2 )dβ [ 1 σ = 2 (y T i y i + V 1 2 i ) (2π) t i +1 2 exp 1 ( ) ] 2 (β µ ) T σ 2 (y T i y i + V 1 i ) (β µ ) ( ) σ og 2 (y T i y i + V 1 i ) (t i + 1) og 2π (β µ ) T σ 2 y T i y i + V 1 i (β µ ) dβ 2 2 2 og σ 2 (y T i y i + V 1 i ) = (t i + 1) og 2π σ 2 (y T i y i + V 1 i ) σ 2 (y T i y i + V 1 i ) 1 2 2 2 = (t i + 1) og σ + 1 y 2 og T i y i + V 1 1 2 (t i + 1) og 2π 1 2. i Simiary, p(β M i, σ 2 ) og p(β M i, σ 2 )dβ = (t i + 1) og σ + 1 2 og V 1 1 2 (t i + 1) og 2π 1 2. The difference in BD therefore has severa terms that cance out, justifying the caimed reationship. 1 y BD = 2 og T i y i + V 1 i 1 2 og V 1 i dzdσ 2 = 1 ( og y T i y i + V 1 i og V 1 ) i 2 i 21
A.3 Proof of Prop. 3 Substitute p(m i Z) = p(z M i )p(m i )/ s j=1 p(z M j)p(m j ) into Eq. (8), to obtain S P = + = = i=1 + p(m i ) s i=1 p(m i)p(z M i ) s j=1 p(m j)p(z M j ) i=1 i=1 p(θ i M i ) og p(θ i M i )dθ i p(θ i Z, M i ) og p(θ i Z, M i )dθ i p(m i ) p(m i ) [ p(m i ) i=1 + p(z M i ) p(m j )p(z M j )dz j=1 p(θ i M i ) og p(θ i M i )dθ i p(z M i ) p(θ i Z, M i ) og p(θ i Z, M i )dθ i dz p(θ i M i ) og p(θ i M i )dθ i ] p(θ i Z, M i ) og p(θ i Z, M i )dθ i dz By Prop. 2, S P = = [ 1 y p(m i ) 2 og T i y i + V 1 i 1 2 og ] V 1 i p(m i ) og y T i y i + V 1 i + K 2 i=1 i=1 where K is independent of x, as caimed. A.4 Variant of k-exchange Agorithm In the foowing, et x α be the α th row of the design matrix x, and et a prime ( ) denote an estimate of the primed variabe. Define S Q = MD MD min MD max MD min + S P S Pmin. S Pmax S Pmin Let S Q (x α ) be S Q evauated with the design matrix x(α), with row x α removed from x. A generic maximum k-exchange agorithm for an arbitrary criterion C r foows. An agorithm for the minimum is simiar. 22
k-exchange max Agorithm 1. Order the n design points according to their C r (x) vaues, x (1), x (2),..., x (n), so that α < β impies C r (x α ) C r (x β ). 2. For α = 1 to k: deete x (α) from design matrix x, add x from the ist of a avaiabe design points where x maximizes C r when added to design x(α). 3. Repeat (reorder and repace points) unti no improvement in C r can be found. Entropy Baancing k Exchange Agorithm 1. Estimate the range of entropy scores in order to caibrate them. (a) Seect r random n run designs. (b) Estimate MD max i. For each of the r designs, impement a k-exchange max for criterion MD. ii. Let R D be the set of designs obtained from 1(b)i. Set MD max = max d R (MD) (c) Simiary estimate S Pmax, MD min, S Pmin. 2. For each of the r designs, impement the k-exchange max agorithm for criterion S Q The agorithm does not guarantee optimaity, but appying Steps 1 and 2 to each of the r randomy seected designs provides some protection against getting stuck in a oca extrema. For faster execution of the agorithm, k shoud be chosen sma. However, Johnson and Nachtsheim [23] noted that a arger vaue of k eads to greater D-efficiency. A good choice of k to search for good S Q designs depends on the number of predictors, the number of starting random designs r, and the number of additiona runs n. References [1] Atkinson, A. C. (1978). Posterior probabiities for choosing a regression mode. Biometrika 65(1), 39 48. [2] Barton, R. R. (1998). Simuation metamodes. In D. J. Madeiros, E. F. Watson, J. S. Carson, and M. S. Manivannan (Eds.), Proceedings of the Winter Simuation Conference, Piscataway, New Jersey, pp. 167 174. Institute of Eectrica and Eectronics Engineers, Inc. 23
[3] Bennett, J.E., A. Racine. and J.C. Wakefied (1996). MCMC for noninear hierarchica modes. In W.R. Giks, S. Richardson, and D.J. Spiegehater (Eds.), Markov Chain Monte Caro in Practice, pp. 339-358, Chapman and Ha. [4] Berk, R. (1966). Limiting Behaviour of Posterior Distributions when the Mode is Incorrect. Annas of Mathematica Statistics 37(1), 51 58. [5] Bernardo, J. M. and A. F. M. Smith (1994). Bayesian Theory. Chichester, UK: Wiey. [6] Bernardo, J. (1979). Expected information as expected utiity. Annas of Statistics 7, 686 690. [7] Bingham, D. R. and H.A. Chipman (2003). Optima designs for mode seection. Technica Report, University of Michigan [8] Borth, D. (1975). A tota entropy criterion for the dua probem of mode discrimination and parameter estimation. Journa of the Roya Statistica Society, Series B 37(1), 77 87. [9] Box, G., and N. Draper (1987). Empirica Mode-Buiding and Response Surfaces. New York: Wiey. [10] Box, G. and W. Hi (1967). Discrimination among mechanistic modes. Technometrics 9(1), 57 71. [11] Box, G., W. Hunter, and J. Hunter (1978). Statistics for Experimenters. An Introduction to Design, Data Anaysis and Mode Buiding. New York: Wiey. [12] Box, G. and H.L Lucas (1959). Design of Experiments in Non-Linear Situations. Biometrika 46(1), 77 90. [13] Cheng, R. C. H. and W. Hoand (1997). Sensitivity of computer simuation experiments to errors in input data. Journa on Statistica Compututing and Simuation 57, 219 241. [14] Chipman, H., M. Hamada. and C.F.J. Wu (1997). Bayesian Variabe Seection for Designed Experiments with Compex Aiasing. Technometrics 39, 372 381. [15] Chipman, H., E.I. Goerge. and R.E. McCuoch (2001). The Practica Impementation of Bayesian Mode Seection. IMS Lecture Notes - Monograph Series 38, 67 116. [16] DeGroot, M. (1962). Uncertainty, information and sequentia experiments. Annas of Statistics 33, 404 419. 24
[17] Deaportas, P. and A.F.M. Smith. (1993). Bayesian Inference for Generaized Linear and Proportiona Hazards Modes via Gibbs Samping. Appied Statistics 3, 443 459. [18] Dmochowski, J. (1996). Intrinsic Priors via Kuback-Leiber Geometry. Bayesian Statistics 5, 543 549. [19] Fedorov, V. (1996). Discussion: Foow-up designs to resove confounding in mutifactor experiments (with discussion). Technometrics 38(4), 303 332. [20] George, E. I. (1999). Bayesian mode seection. Encycopedia of Statistica Sciences 3, 39 46. [21] Hi, P. (1978). A review of experimenta design procedures for regression mode discrimination. Technometrics 20(1), 15 21. [22] Hi, W., W. Hunter, and D. Wichern (1968). A joint design criterion for the dua probem of mode discrimination and parameter estimation. Technometrics 10(1), 145 160. [23] Johnson, M. and C. Nachtsheim (1983). Some guideines for constructing exact d-optima designs on convex design spaces. Technometrics 25(3), 271 277. [24] Keijnen, J. (1996). Experimenta design for sensitivity anaysis, optimization, and vaidation of simuation modes. In J. Banks (Ed.), Handbook of Simuation. New York: John Wiey & Sons, Inc. [25] Law, A. M. and W. D. Keton (2000). Simuation Modeing & Anaysis (3d ed.). New York: McGraw-Hi, Inc. [26] Lindey, D. V. (1956). On a measure of the information provided by an experiment. Annas of Mathematica Statistics 27, 986 1005. [27] Madigan, D. and J. York (1995). Bayesian graphica modes for discrete data. Internationa Statistica Review 63(2), 215 232. [28] Meyer, R. D., D. M. Steinberg, and G. E. P. Box (1996). Foow-up designs to resove confounding in mutifactor experiments (with discussion). Technometrics 38(4), 303 332. [29] Myers, R. H., R. H. Khuri, and W. H. C. Carter (1989). Response Surface Methodoogy: 1966-1988. Technometrics 31(2), 137 157. 25
[30] Myers, R. H., and D.C. Montgomery (1995). Response Surface Methodoogy. New York: Wiey. [31] Ng, S.-H. (2001). Sensitivity and Uncertainty Anaysis in Compex Simuation Modes. Ph.D. Thesis. Department of Industria and Operations Engineering, University of Michigan, Ann Arbor. [32] Ng, S.-H. and S. E. Chick (2001). Reducing input distribution uncertainty for simuations. In B. Peters, J. Smith, M. Rohrer, and D. Madeiros (Eds.), Proceedings of the Winter Simuation Conference, Piscataway, NJ, pp. in press. Institute of Eectrica and Eectronics Engineers, Inc. [33] Nguyen, N. K., and A. J. Mier (1992). A Review of some exhange agorithms for constructing discrete D-optima designs. Computationa Statistics and Data Anaysis 14, 489 498. [34] Raftery, A. E., D. Madigan, and J. A. Hoeting (1997). Bayesian mode averaging for inear regression modes. Journa of the American Statistica Association 92(437), 179 191. [35] Sanchez, S. M. (1994). A robust design tutoria. In J. D. Tew, S. Manivannan, D. A. Sadowski, and A. F. Seia (Eds.), Proceedings of the Winter Simuation Conference, Piscataway, New Jersey, pp. 106 113. Institute of Eectrica and Eectronics Engineers, Inc. [36] Schruben, L. W. and B. H. Margoin (1978). Pseudorandom number assignment in statisticay designed simuation and distribution samping experiments. Journa of the American Statistica Association 73(363), 504 525. [37] Shi, L., and S. Óafsson. 2000. Nested partitions method for goba optimization. Operations Research 48 (3): 424 435. [38] Smith, A. F. M. and I. Verdinei (1980). A note on Bayesian design for inference using hierarchica inear mode. Biometrika 67(3), 613 619. [39] Wech, W. J. (1982). Branch and bound search for experimenta designs based on D- optimaity and other criteria. Technometrics 24(1), 41 48. [40] Wu, C.F. J., and M. Hamada (2000). Experiments: Panning, Anaysis, and Parameter Design Optimization. New York: Wiey. 26