Design of FollowUp Experiments for Improving Model Discrimination and Parameter Estimation


 Melinda Greene
 1 years ago
 Views:
Transcription
1 Design of FoowUp Experiments for Improving Mode Discrimination and Parameter Estimation Szu Hui Ng 1 Stephen E. Chick 2 Nationa University of Singapore, 10 Kent Ridge Crescent, Singapore Technoogy Management Area, INSEAD, Fontainebeau CEDEX, France. One goa of experimentation is to identify which design parameters most significanty infuence the mean performance of a system. Another goa is to obtain good parameter estimates for a response mode that quantifies how the mean performance depends on infuentia parameters. Most experimenta design techniques focus on one goa at a time. This paper proposes a new entropybased design criterion for foowup experiments that jointy identifies the important parameters and reduces the variance of parameter estimates. We simpify computations for the norma inear mode by identifying an approximation that eads to a cosed form soution. The criterion is appied to an exampe from the experimenta design iterature, to a known mode and to a critica care faciity simuation experiment. (Design of Experiments; Mode discrimination; Parameter estimation; Entropy; Simuation) 1. Introduction A common purpose of many experiments is to obtain an adequate mathematica mode of the underying system, incuding the functiona form, and precise estimates of the mode s parameters. Response modes that describe the reationship between inputs to a system and the output can be usefu for design decisions, and much focus has gone into seecting inputs in a way that improves the estimate of the response mode [9, 29, 30, 40]. Response modes can be used in iterative processes to identify design parameters (e.g., number of servers, production ine speeds) that optimize some expected reward criterion (e.g., mean monthy revenue, average output), or to provide intuition about how input factors infuence aggregate system behavior. In simuation, response modes can reate the parameters of stochastic modes (e.g., demand arriva rates, infection transmission parameters) to system performance [2, 13, 24, 25, 32, 35]. 1 Corresponding author. 2 The authors acknowedge the financia support of the Nationa Institutes of Heath (grant R01 AI A1). 1
2 A number of design criteria are avaiabe to seect design factors (or inputs) for experiments. Severa authors [6, 16, 26] use the expected gain in Shannon information (or decrease in entropy) as an optima design criterion to seect vaues for the experiment s design factors. Bernardo [6] and Smith and Verdinei [38] adopted this approach and ooked at how to pan experiments to ensure precise estimates of the mode s parameters. However, many experiments aim to identify which factors most infuence the system response. Identifying the subset of most important parameters can be phrased as a mode seection probem [34]. Box and Hi [10] used Shannon information to deveop the MD design criterion for discriminating among mutipe candidate modes. Hi [21] stressed the importance of experimenta design for the joint objective of mode discrimination and parameter inference in his review of design procedures. But most design criteria focus either on identifying important parameters or improving estimates of response parameters, but not both. Exceptions are Hi et a. [22], whose joint criterion requires certain mode parameters to be estimated or known, and Borth [8], whose entropybased criterion can be chaenging to compute. This paper describes a new joint criterion for experimenta design that seects designs to simutaneousy identify important factors and to reduce the variance of the response mode parameter estimates. The new criterion is shown to simpify to a cosed form for the standard inear regression mode with norma observation errors, and is computationay more efficient than Borth s criterion. Our criterion does not require initia estimates of the mode parameters and incorporates prior information and data from preiminary experiments. It is fexibe for use in either starting or foowup experiments, particuary if resuts remain inconcusive about which factors most infuence the system response, and when the parameters are sti poory understood after an initia response surface experiment has been competed. We consider designs with a given number n of observations, and do not describe how to baance initia runs with foowup runs. Section 2 describes the mathematica formuation for the design space and response modes. It aso describes a Bayesian formuation to quantify input mode and parameter uncertainty, as we as the new entropy based design criterion. Three numerica experiments in Section 3 show that optima designs depend heaviy on the criterion seected, and highight the benefits and tradeoffs of the new joint criterion over individua mode discrimination and parameter estimation criteria, as we as existing joint criteria. The exampes stress the need for baancing the two types of entropy measures of the joint criterion. For the exampes considered, we aso find that the weights in our criterion are robust to misspecification. The new criterion does we at both identifying important factors and reducing parameter uncertainty, and is computationay more efficient than 2
3 Borth s joint criterion. 2. Formaism The design criterion is appied to a finite, but perhaps arge, set of potentia factors in a finite number n of runs, where n is seected by the experimenter. The design space and cass of regression modes is described before the new entropybased design criterion and computationa issues. 2.1 Design Space and Regression Modes Experiments often invove severa factors. Here we consider representing the performance of the systems by the usua inear mode. We consider a finite number q of reavaued factor inputs, x 1,..., x q, each of which may be chosen to take on a finite set of different vaues. These factors can be combined agebraicay to generate a finite number, p, of predictors, y 1,..., y p, each of which is some function of the input factors. We foow the formuation of Raftery et a. [34] to identify the most important of the p predictors. That is, we presume the existence of s = 2 p candidate response modes in a mode space M that are inear in some subset of the predictors. Assuming the th candidate response mode, the output z i of the ith run is presumed to be of the form z i = β 0 + β 1 y i,(1) + β 2 y i,(2) + + β t y i,(t) + ζ i, (1) where y i,(1),..., y i,(t) are the t predictors present in the th mode, the vaues of the β j may depend upon, and ζ i is a zero mean noise term. The seection of a candidate response mode identifies the important predictors, reative to the size of the noise in the response. See aso George [20]. Let D be the (finite) design space of a possibe ega combinations of the inputs for each of the n runs. A design x D can be represented as an n q matrix whose ith row contains the vaues of the factors for the ith run. If mode M has t predictors, the design matrix can be converted to an n (t + 1) predictor matrix y = y (x) whose rows contain the vaues of predictors for each run, and the first coumn corresponds to the intercept. Let z be the coumn vector of n outputs. 2.2 EntropyBased Formuation The probem is to choose a design x that in some sense is effective at identifying the most important predictors (i.e., seects the most appropriate mode in M), and estimate regression parameters. 3
4 We assess uncertainty about response mode seection and parameter estimation with probabiity distributions. The design that most improves an entropybased criterion is then seected Uncertainty Assessment One Bayesian approach to quantify the joint uncertainty about mode form and parameter vaues is to assign a prior distribution to each of the modes M M, then assign a conditiona probabiity distribution for the parameter vector β, given M. The identity of the best response mode and parameter is then inferred by Bayes rue, using the prior distributions and the probabiity distribution of the output, given the mode and input parameters. This is the approach taken by [14, 28, 34]. We make a standard assumption of jointy independent, normay distributed errors, ζ Norma (0, σ 2 ), so if mode M is the mode, β is the parameter, x is the design with predictor matrix y = y (x), then the output Z has an mutivariate norma distribution, p(z M, β, σ 2, x) Norma ( y β, σ 2 I n ), where I n is the identity matrix. For prior distributions, we presume a conjugate prior distribution [5] for the unknown θ = (β, σ 2 ), conditiona on the th mode M, π(β M, σ 2 ) Norma ( ) β µ, σ 2 V ( π(σ 2 M ) InvertedGamma σ 2 ν 2, νλ 2 where the conditiona prior mean vector µ and covariance matrix σ 2 V for β may depend on the mode M. The parameters ν and λ are seected by the modeer. The InvertedGamma (x α, β) distribution has pdf x (α+1) e β/x β α /Γ(α) and mean β/(α 1). Raftery et a. [34] suggest vaues of µ, V, ν and λ that minimize the infuence of the priors in numerica experiments. The distributions in Eq. (2) can either be based on prior information aone, or can incude information gained during initia stages of experimentation. Data z 0 from an initia stage of n 0 observations with predictor matrix y 0 is straightforward to incorporate because of the conjugate ( ) 1 ( form [5]. Repace the mean µ with µ = V 1 + y T 0 y 0 V 1 µ + y T 0 z 0 ); repace V with ( ) 1; V 1 + y T 0 y 0 repace ν/2 with (ν + n0 )/2; and repace νλ/2 with (νλ + (z 0 y 0 µ )T z 0 + (µ µ )T V 1 µ )/2. Choices used here for the prior distribution of M incude the discrete uniform (p(m ) = 1/s) and the independence prior p(m i ) = ω t i (1 ω) p t i, where t i is the number of predictors in mode i, (i = 1,..., s), and ω is the prior probabiity that a predictor is active. Raftery et a. [34] provide cosed form formuas to update the probabiities p(m z 0, y 0 ). 4 ), (2)
5 In the rest of the paper, the prior distribution for the foow up stage is based on a prior distribution from Eq. (2) in combination with data from an initia stage. The optima baance of the amount of initia stage data versus the amount of foowup data is beyond the scope of this paper Modeing Remarks Considering posterior probabiities is a usefu way to assess the reative merits of the modes [20]. Seecting modes according to p(m Z) is consistent in that if one of the entertained modes is actuay the true mode, then it wi seect the true mode if enough data is observed. When the true mode is not among those being considered, Bayesian mode seection chooses the candidate that is cosest to the true mode in terms of KubackLeiber divergence [4, 5, 18]. In practice, the true mode is typicay not known and is potentiay not in M. Despite this, carefu seection of a cass of approximating modes is important in the understanding of many probems. Here we seek a mode within the cass that is approximatey correct (containing ony significant predictors) and that approximates the parameters of the true underying response mode with ow variance [9]. Atkinson [1] raises severa concerns about inference for regression modes. Our prior probabiity framework avoids by design his concern about improper prior distributions for modes. A concern about nesting, so that two modes may be true, is resoved by noting that the simper mode wi be identified as more data is coected, and that simper modes are more desirabe expanations [19, 28]. Atkinson [1] aso indicates that if two response modes are compared, the true mode and an incorrect mode with fewer parameters, then asymptoticay the correct mode wi be seected, but that for finite numbers of sampes the posterior probabiities may support the incorrect mode in the absence of strong evidence from the data. This is a cause for care, but is not a vioation of the ikeihood principa, and negative consequences for seecting a mode when the data do not provide enough evidence is a probem for any seection criterion. A goodnessoffit test may be usefu to provide further post hoc vaidation EntropyBased Criteria Severa authors [6, 16, 26] proposed the use of the expected gain in Shannon information (or decrease in entropy) given by an experiment as an optima design criterion. This expected gain is a natura measure of the utiity of an experiment. The choice of design infuences the expected gain in information as the predictive distribution of future output Z is determined by the design x, mode M, and the prior distribution in Eq. (2) p(z M, σ 2, x) Norma 5 (yµ, σ 2 [yv y T + I n ]). (3)
6 The margina distribution of Z given M, y, obtained by integrating out σ 2, is a mutivariate t distribution. Entropy is different for discrete (mode seection) and continuous (parameter estimation) random variabes, so each is discussed in turn. For mode seection, Box and Hi [10] use the expected increase in Shannon information J as a design criterion. The criterion was derived from information theory where the information (entropy) was used as a measure of uncertainty for distinguishing the s candidate modes. ( ) J = p(m ) og p(m ) + p(m Z, y ) og p(m Z, y ) p(z y )dz (4) = =1 p(m ) =1 og =1 p(z M, y ) s =1 p(z M, y )p(m ) p(z M, y )dz An expicit soution is unknown in genera, so J may be evauated numericay or approximated. Aternatey, Box and Hi [10] gave an upper bound approximation, the expected gain in Shannon information between the predictive distributions of each pair of candidate modes M i and M. This approximation was originay named the Dcriterion, but we use the notation MD, as in [28]. MD = ( p(m i )p(m ) p(z M i, y i ) og p(z M ) i, y i ) p(z M, y ) dz 0 i s The MD criterion is effective in practice and popuar with research workers [21]. We use MD for the mode discrimination portion of our joint criterion. For the norma inear mode, Meyer et a. [28] show that MD reduces to a cosed form if a noninformative prior 1/σ on σ and a conditionay norma prior for β given σ are assumed. A cosed form aso resuts if the conjugate prior is assumed. Proposition 1. Assume the conjugate norma gamma prior in Eq. (2). Let ẑ = y µ, and V = [y V y T + I ]. Then for the inear mode, MD simpifies to MD = 0 i s Proof. See Appendix A.1 [ 1 2 p(m i)p(m ) n + tr(v 1 V i ) + 1 ] λ (ẑ i ẑ ) T V 1 (ẑ i ẑ ) For parameter estimation, Bernardo [6] and Smith and Verdinei [38] adopted an entropy based method to ensure precise estimates for parameters that have aready been identified as important. They choose the design that maximizes the expected gain in Shannon information (or equivaenty, (5) (6) 6
7 maximizes the expected KubackLeiber distance) between the posterior and prior distributions of the parameters θ = (β, σ 2 ). BD = [ ] p(θ Z) p(z)p(θ Z) og dθdz (7) p(θ) Eq. (7) simpifies consideraby for the norma inear mode into a form known as the Bayesian Doptima criterion (hence the choice of name BD). Proposition 2. For a inear mode M of the form Eq. (1), the prior probabiity mode Eq. (2), and a given design y, Proof. See Appendix A.2. BD = 1 y 2 og T y + V og V 1 Foowing Borth [8], the entropy criterion S P for parameter uncertainty generaizes when there are mutipe candidate modes. S P = + =1 p(m ) =1 p(m Z) p(θ M ) og p(θ M )dθ (8) p(θ Z, M ) og p(θ Z, M )dθ Proposition 3. For the norma inear mode, S P simpifies to S P = =1 p(m ) 2 for some K that does not depend on the design. Proof. See Appendix A Joint Criterion p(m )p(z M )dz =1 og y T y + V 1 + K (9) In order to account for both mode discrimination and parameter estimation simutaneousy, Hi et a. [22] proposed a joint criterion that adds a weighted measure of discrimination and precision, C = w 1 D 0 + w 2 E 0, (10) where D 0 is some measure of discrimination and E 0 is some measure of precision in parameter estimation. A nonunique choice of D 0 and E 0 they suggest is the mode discrimination criterion 7
8 proposed by Box and Hi [10], and the determinant of the regression matrix for estimating the parameters for mode i. where MD and E i C = w MD MD + (1 w) i=1 p(m i ) E i, (11) Ei are the maximum vaues of MD and E i over the design region, and w is a nonnegative weight paced on mode discrimination. They assumed that σ 2 is known or can be estimated when computing C. As the two criteria are summed together and weighted by w, the maxima MD and E i may be ess reevant than the range of the criteria over the design space. Borth [8] treated the two objectives using the idea of the change in tota entropy. He showed that it decomposed into the mode discrimination term J and parameter estimation term S P. We denote Borth s criterion as B hereafter. The scae for entropy for continuous random variabes (parameters) may not be wecaibrated with entropy for a discrete random variabe (mode seection): their range may differ when evauated throughout the design space. Borth s method aso requires computationay expensive numerica integration. Here we aso use the idea of the expected gain in entropy of an experiment, but normaize J and S P over their range of vaues, and simpify the weight factor. Instead of numericay evauating the J criterion, we approximate it with the MD criterion. So an upper bound approximation of the joint criterion for mode discrimination and parameter estimation is where MD min, MD max, S Pmin, S Pmax S Q = w MD MD min + (1 w) S P S Pmin, (12) MD max MD min S Pmax S Pmin are the smaest and argest MD vaues and the smaest and argest S P vaues respectivey over a designs in D, and w [0, 1] is a weight factor. This is simiar in form to criterion C, but reduces to a cosed form if the prior setup in Eq. (2) is used, as a resut of Eq. (6) and Eq. (9). Eq. (12) does not require σ 2 to be known (see the propositions above for inear modes, and comments beow for noninear modes), and incorporates prior information and data from initia experiments. The weight w shoud be seected based on the resuts of the initia experiments and the focus of the foowup experiment. If the initia experiment was insufficient to identify the important parameters, then more weight shoud be paced on mode discrimination. If the mode is reasonaby determined, then more focus can be paced on parameter estimation. Hi et a. [22] suggested w = [ s s 1 (1 p(m max)) ] ξ where Mmax is the a priori most probabe mode. Another choice is w = [(1 (p(m max ) p(m max2 ))] ξ, where M max2 is the second most probabe mode. Sma vaues of ξ paces more weight on mode discrimination. To equay baance the two caibrated 8
9 entropy measures, w can aso be set at 1/2. The numerica exampes in Section 3 use both the weighting function of Hi et a. [22] and w = 1/2. Exampes 1 and 2 assess the dependence upon the optima design on the weight. The exampes show that rescaing the entropies can be important, but that the fina design may be somewhat insensitive to a misspecification in w. To achieve the joint objectives of mode discrimination and parameter estimation, we seek a design x D that maximizes S Q in Eq. (12). For norma inear modes, S Q simpifies to a cosed form through Eq. (6) and Eq. (9). The criterion is aso appicabe to noninear modes. When a noninear mode can be approximated by a inear mode in the neighborhood of θ 0, S Q can be appied by substituting the initia estimates of the parameters [12]. For nonnorma modes, S Q requires numericay integrating Eq. (5) and Eq. (7). For generaized inear and noninear modes, Bayesian methods [3, 17] can be used to approximate the terms in Eq. (5) and Eq. (7). Shannon information is not the ony possibe approach to deveop a joint mode seection and parameter estimation design criterion. Bingham and Chipman [7] propose a weighted average of Heinger distances between predictive densities of a possibe pairs of competing modes as a criterion for mode discrimination. A inear combination of Bingham and Chipman [7] s criterion and a weighted average of the Heinger distances between the prior and posterior distributions of each mode s parameters can aso be used as a joint criterion. For the prior setup in Eq. (2), this reduces to a cosed form. The weighting functions described above can be used to weight the importance of each objective. The upper bound on the Heinger distances for each individua term can be usefu for rescaing, but the maximum vaues of each term for a particuar finite n, can be quite far from the upper bound and rescaing each term by its upper bound may not be appropriate. We do not consider that combined criterion further here. 2.3 Some Computationa Issues Athough S Q simpifies to a cosed form for the norma inear mode, there are computationa chaenges. We consider three here. First, the number of modes grows exponentiay in the number of predictors. Second, the min and max vaues of the two entropy measures that comprise S Q are required. Third, the number of designs grows combinatoriay in the number of candidate runs. To address the first issue, the summands for MD and S P are computed by using ony the most ikey modes. There are typicay far fewer than s = 2 p different modes whose probabiity p(m ) ead it to be a competitor for the best after the initia stage of experimentation. By considering ony the most ikey modes, Eq. (12) becomes tractabe. There are severa ways one can chose a subset of probabe modes: (i) Pick a modes h so that p(m h ) E, (ii) Pick the h most ikey 9
10 modes, where h is the smaest integer so that p(m (1) ) + p(m (2) ) + + p(m (h) ) F. Raftery et a. [34] take a simiar approach to mode averaging. Exampes 2 and 3 of Section 3 use (i) with E = The top modes have higher posterior probabiities, so we set E = In Section 3.1, no mode ceary stands out after the initia runs, so we use E = When direct enumeration is not computationay feasibe, these more important modes can be identified heuristicay by using Markov Chain Monte Caro methods ike MC 3 (Markov Chain Monte Caro Mode Composition) [27] to estimate the p(m ). The state space for MC 3 is the set of s modes, and a sampe path visits a sequence of different modes, M. Candidate states for transitions are chosen from the set of modes with one more or one fewer active predictors. The reative probabiities for the current and candidate states, needed to impement the Metropois Hastings step of MC 3, can be computed from cosedform formuas in Raftery et a. [34]. The number of times a mode is visited during MC 3 divided by the number of iterations of MC 3 is a consistent estimate of the mode s posterior probabiity. Chipman et a. [15] and Ng [31] discuss some practicaities of Markov chain Monte Caro methods for mode seection. Second, we use an optimization heuristic to estimate MD min, MD max, S Pmin, S Pmax. We use the kexchange agorithm of Johnson and Nachtsheim [23] to search for the maximum and minimum vaues. The kexchange agorithm was first proposed to construct Doptima designs, but because it is a genera agorithm, it can be used to seect from a finite set of designs as ong as an optimaity criterion is given. Numerica resuts [23, 33] show that it is efficient and effective in constructing optima designs, and the agorithm has been widey used. In the numerica exampes we considered, the kexchange agorithm was very efficient in identifying the optima designs. In addition to increasing k as suggested in [23], we aso found that for the kexchange agorithm, increasing the number of starting designs from scattered points in the design space improves the search for the optima. Aternativey, the branch and bound agorithm in [39], or nested partitions [37] can be used to find the goba optima. Third, we generaize the kexchange agorithm (Appendix A.4) to identify a design with a high vaue of S Q to improve the scaing of the entropy measures. The agorithm is a greedy agorithm that swaps in and out design points one at a time. More work on computationa issues is an avenue for future research. 10
11 3. Numerica Resuts Three numerica experiments compare the new criterion, S Q, with the two other joint criteria in the iterature, as we as the MD and S P criteria. The optima S Q foowup design x (w, z 0 ) depends upon the weight w and previous observations z 0. Let x (z 0 ) = {x (w, z 0 ) : w [0, 1]} be the set of designs that the S Q criterion identifies, given z Chemica Reactor Experiment Box et a. [11, p. 377] gave data for a chemicareactor experiment that used a 2 5 fu factoria design. From this data, we extracted runs that correspond to five coumns of a PackettBurman 12 run (PB12) design. We treated those runs (see Tabe 1) as an initia experiment. The foowup design was simuated by extracting the remaining runs from the compete experiment. Tabe 1: PB12 design and data extracted from the fu 2 5 reactor experiment Run i A B C D E z i We considered fifteen predictors (five factors and their two factor interactions) to get 2 15 distinct inear modes in the mode space M, each differing by the absence or presence of each predictor. We used the equa probabiity prior for mode uncertainty, p(m ) = 2 15, and the prior for parameters suggested by Raftery et a. [34]. Tabe 2 shows the probabiities for the top 8 modes, given that prior distribution and the PB12 data. No mode ceary stands out, but the mode identified in the origina anaysis of a 32 runs [11], with factors (B, D, E, BD, DE), is ranked best. To distinguish between the top eight modes, n = 3 additiona runs were seected from the remaining 20 runs. The best designs for each joint criterion (S Q with w = 0.5; B; and C with ξ = 2 as in [22]) were computed by evauating the criteria over each possibe design. The joint 11
12 Tabe 2: Probabiity of the eight most probabe modes after 12 runs Mode Posterior Probabiity B, D, E, BD, DE B, C, D, E, BD, DE B, D, E, BC, BD, DE B, D, E, BD, BE, DE A, B, D, E, BD, DE B, D, E, AE, BD, DE B, D, E, AC, BD, DE B, D, E, BD, CD, DE Tabe 3: Posterior probabiity (Post.) of the three most probabe modes with PB12, + 3 runs determined by the best design obtained from fu enumeration of the S Q, B, and C criteria. New S Q Criterion Borth s B Criterion Hi s C Criterion Mode Post. Mode Post. Mode Post. B, D, E, BD, DE B, D, E, BD, DE B, D, E, BD, DE B, D, E, BC, BD, DE B, C, D, E, BD, DE B, D, E, BC, BD, DE A, B, D, E, BD, DE A, B, D, E, BD, DE B, D, E, BD, BE, DE 0.04 criterion S Q resuts in different designs than the B and C criteria. The posterior probabiities of a modes were then recomputed using a 15 runs, and the top 3 modes are shown in the eft portion of Tabe 3. A three designs identified the same top mode identified in the origina anaysis of a 32 runs. S Q discriminated in favor of the top mode more than criteria B and C. Tabe 4 indicates that S Q reduced the parameter generaized variance (the determinant of the posterior covariance matrix of the parameter estimates, V (β) ) of the top mode more than B and C. To compare the computationa burden, each criterion was evauated for a possibe designs for one, two, three and four additiona runs using Mape8 (sow, because it is interpreted, but reative CPU times are iustrative). Tabe 5 shows the computation times for the S Q and B criterion. The computation times for S Q and C were simiar. The curse of dimensionaity made quadrature an inefficient approach for the numerica integrations required by B. Tabe 6 shows the posterior probabiities of the top three modes with the mode discrimina Tabe 4: Parameter generaized variance V (β) for the a posteriori top mode (B, D, E, BD, DE), given PB runs, based on the S Q, B and C criteria. Criterion V (β) S Q B C
13 Tabe 5: CPU time for computing S Q, C and B (hours). Additiona runs S Q and C B Tabe 6: Posterior probabiity (Post.) of the three most probabe modes with PB runs determined by the best design obtained by fu enumeration for S Q, MD, and S P. S Q Criterion MD Criterion S P Criterion Mode Post. Mode Post. Mode Post. B, D, E, BD, DE B, D, E, BD, DE B, D, E, BD, DE B, D, E, BC, BD, DE B, D, E, BD, BE, DE B, C, D, E, BD, DE A, B, D, E, BD, DE B, C, D, E, BD, DE A, B, D, E, BD, DE tion MD and parameter estimation S P criteria. As expected, MD did a better job than S P at distinguishing the top mode from the others, and Tabe 7 indicates that S P outperformed MD at reducing the parameter generaized variance of the top mode. S Q on the other hand outperformed MD in favoring the top mode, and was ony sighty poorer than S P in parameter estimation. This exampe aso iustrates the importance of normaizing that we suggest, as one of the subcriteria woud be ignored without recaibration. With the equa probabiity prior for each mode, the range of uncaibrated MD scores over the design space ranged from to 0.062, whie the uncaibrated S P scores range from 0.88 to Without recaibration, the joint criterion woud have seected the best S P design, and ignored the mode discrimination objective. We tested the sensitivity to the prior distribution by rerunning the experiment with the independence prior p(m ) = ω t (1 ω) p t, where t is the number of predictors in mode, ( = 1,..., 2 15 ), with ω = The mode with factors (B, D, E, BD, DE) is ranked third when ony 12 runs are used, but the S Q criterion again identified (B, D, E, BD, DE) as the most probabe mode after the n = 3 run foow up was competed. To test the sensitivity of the designs to the weights, w was varied from 0 to 1. In this exampe, Tabe 7: Parameter generaized variance V (β) for the a posteriori top mode (B, D, E, BD, DE), given PB runs, based on the S Q, MD and S P criteria. Criterion V (β) S Q MD S P 13
14 the top S Q design is robust over a range of weights. There were ony three different top designs obtained as w was varied from 0 to 1. When 0 w <.27, the top S P design is obtained. The same design obtained for S Q when w = 0.5 is obtained when 0.27 w < The top MD design is obtained when w.78. For the weighting function suggested in [22], the best MD design is seected when ξ gets sma, ξ 4, and the best S P design is seected when ξ gets arge, ξ > 27, and the same design seected for any ξ in between. In this exampe, sma changes in w or ξ do not significanty change the optima design. The Entropy Baancing kexchange Agorithm in Appendix A.4 was impemented to evauate its effectiveness to find good designs in this probem. The best design is known for this probem because an exhaustive evauation of a 1140 designs is possibe. The Entropy Baancing agorithm consists of two steps. In the initia step of the agorithm, the maximum and minimum vaues of MD and S p are estimated using the kexchange max and kexchange min agorithms. In the second step, the kexchange max agorithm is used to search for a good S Q design. We first evauated how we the initia step performs in determining the maximum and minimum vaues. We varied the number r of initia random designs to start the agorithm, r = 100 and r = 200, and conducted 10 independent repications of the agorithm for each r (each repication sampes an independent set of r design points). When r = 100, the estimates of (MD max, MD min, S Pmin, S Pmax ) were a equa to the actua vaue 70% of the time. When r = 200, a four estimates were correct 80% of the time. In the remaining cases, ony one of the four actua vaues was not obtained, but the estimate was cose to the actua vaue. We next tested the how we the second step of the agorithm identifies the known best optima S Q design. The known best optima S Q design was identified 70% of the time with r = 100, and 90% of the time with r = 200. The remaining nonoptima designs seected were among the top 8 designs. 3.2 Finding a Known Mode To determine how we the criteria performs in detecting a known mode, and how the best S Q mode depends upon w in repeated sampes, we ran 50 repications of an experiment using the S Q criterion on a known mode with 4 potentia factors (A, B, C, D), namey Z = 10A + 15B + 6AB +7AC +ζ i, where ζ i Norma(0, 5). For each repication i, we ran an initia fractiona factoria design to generate preiminary data z 0,i, which was then used to create a prior distribution for a foowup design with n = 3 runs as described in Fig. 1. In each repication, the true (known) mode was among the top eight candidate modes after the initia = 8 runs but was competey confounded with three other modes. The three additiona runs seected by S Q deaiased 14
15 Figure 1: Agorithm to assess the identification of a known mode in Section 3.2 For i = 1, 2,..., 50: 1. Generate independent preiminary data z 0,i with a factoria design. 2. Update the distributions for the unknown modes and parameters as in Section Determine the best S Q design x (w, z 0,i ) for n = 3 additiona runs, as a function of w. 4. Run the best foowup design, x (w, z 0,i ), for each w. 5. Compute the posterior probabiity and ordina rank of each mode, as a function of w. End for oop the confounded effects and distinguished between the top competing modes. In each repication, three different designs were obtained ( x (z 0,i ) = 3) as w varied between 0 and 1. The best S Q design x (w, z 0,i ) was the top S P design (x (0, z 0,i )) for sma w. For a certain range between 0 and 1, a unique top S Q design was obtained that baances mode discrimination and parameter estimation. For arger w, the best S Q design was the best MD design (x (1, z 0,i )). In 70% of the repications, the same three designs x (z 0,i ) were seected and the same S Q was seected for w in approximatey the range (0.1, 0.55). The other 30% of the repications resuted in 3 other sets of S Q designs with the same quaitative features: the designs with w in the range of about 0.1 up to baanced mode discrimination and parameter estimation. In 49 out of 50 repications, the true mode was identified as the best mode when the S Q design with intermediate vaues of w was used to baance discrimination and estimation. In the remaining repication, the true mode had the third highest posterior probabiity. Averaging over 50 repications, the probabiity that the true mode was best improved from 0.04 after the initia stage (8 runs), to 0.21 (after the 3 foowup runs). The average range of MD was 7.42, the average range of S P was 0.59, and MD max > S Pmax for a repications. If the individua measures were not recaibrated by their ranges, the S Q criterion woud have seected the best MD design and ignored parameter estimation uness a very sma weight were paced on mode discrimination. This experiment gave the same quaitative concusions as Section 3.1. Borth s criterion took orders of magnitude more time to compute due to numerica integration issues (curse of dimensionaity). Rebaancing the entropy measures was important for assuring a baance between discrimination and estimation. The optima design was not highy sensitive to the choice of w. 3.3 Critica Care Faciity The critica care faciity iustrated in Fig. 2 was originay studied by Schruben and Margoin [36]. Patients arrive according to a Poisson process and are routed through the system depending 15
16 Figure 2: Estimated fraction of patients routed through the units of a critica care faciity. upon their specific heath condition. Stays in the intensive care (ICU), coronary care (CCU), and intermediate care faciities are presumed to be ognormay distributed. This section compares S Q with C and the individua criteria, MD and S P. Borth s criterion B took too much time to evauate and was therefore not compared. We initiay ran a 64 run design using the Bayesian mode average to sampe uncertain input parameters [32]. We considered tweve input parameters, resuting in 2 12 distinct inear modes in the mode space M, each differing by the absence and presence of each predictor. Tabe 8 shows the posterior probabiities for the top 5 modes. The mode identified in a 128 run study in [32] is ranked fifth here. Schruben and Margoin [36] studied how to aocate random number streams to reduce variabiity in response surface parameter estimates. Their response mode predicts the expected number of patients per month E[Z] that are denied entry to the faciity as a function of the number of beds in the ICU, CCU, and intermediate care faciities. They presume fixed point estimates for k = 6 input parameters, one per source of randomness, to describe the patient arriva process (Poisson arrivas, mean ˆλ = 3.3/day), ICU stay duration (ognorma, mean 3.4 and standard deviation 3.5 days), intermediate ICU stay duration (ognorma, mean 15.0, standard deviation 7.0), intermediate CCU stay duration (ognorma, mean 17.0, standard deviation 3.0), CCU stay duration (ognorma, mean 3.8, standard deviation 1.6), and routing probabiities (mutinomia, ˆp 1 = 0.2, ˆp 3 = 0.2, ˆp 4 = 0.05). Some parameters are mutivariate, and there are a tota of = 12 dimensions of parameters. For the ognorma service times, the og of the service times has mean µ and precision λ = 1/σ 2. Subscripts distinguish the parameters of each service type (e.g., µ icu, µ iicu, µ iccu, µccu, λ icu ). The anaysis here presumes a inear response mode in these 12 parameters. The actua system parameters are not known with certainty, and the estimated system performance wi be in error if the actua parameter vaues differ from their point estimates. As in Ng and 16
17 Tabe 8: The five most probabe modes after 64 runs. Mode Post. Prob. λsys, µ iicu, λ iicu, p 1, p λsys, µ iicu, λ iicu, p 1, p 3, p λsys, µ iicu, λ iicu, λ iccu, p 1, p λsys, µ iicu, λ iicu, µ iccu, p 1, p λsys, µ iicu, λ iicu, µ iccu, p 1, p 3, p Tabe 9: The most probabe modes with runs determined by S Q with w = Mode Post. Prob. λsys, µ iicu, λ iicu, µ iccu, p 1, p 3, p λsys, µ iicu, λ iicu, p 1, p 3, p λsys, µ iicu, λ iicu, p 1, p Chick [32], who used naive Monte Caro samping for unknown inputs to do an uncertainty anaysis, we fix the number of beds in each of the three units (14 in ICU, 5 in CCU, 16 in intermediate care), and study how the expected number of patients per month that are denied entry depends on the unknown parameters. Design points for the unknown parameter vaues coud take on vaues of the MLE ± one standard error. The approach of Raftery et a. [34] was used to obtain prior distributions for the unknown response parameters. We used the S Q criterion with w = 0.55 (or ξ = 1) to avoid focusing on parameter estimation too eary. The design points of a fu factoria for the 12 parameters were candidates for the 32 run foowup design. The number of possibe 32 run designs from the 2 12 candidate runs is arge, we used the kexchange agorithm to search for the best S Q design (r = 50, k = 5), then ran the critica care simuations again with that design. The posterior probabiities for the top three modes, given the data from the combined design (64+32), are shown in Tabe 9. The top mode is the same mode identified in the 128 runs anaysis in [32], but the S Q criterion identified this mode with fewer runs. We aso used the kexchange agorithm (r = 50 and k = 5) to determine a good C design. Tabe 10 shows the posterior probabiities after running the simuations with the C design. The C design identified the same mode as S Q, but S Q did sighty better in discriminating the top two modes. The best designs for MD and S P are different than the best S Q design with w = The MD design identified the same top mode as the S Q design, and discriminated between the top two modes sighty better than the S Q design (Tabe 9 and Tabe 11). Tabe 12 indicates that design S Q did a better job than C and MD at reducing the parameter generaized variance of the top mode. The S Q criterion with w = 0.75 (or ξ = 0.5) resuted in better mode discrimination than with 17
18 Tabe 10: Most probabe modes with runs with the C criterion. Mode Post. Prob. λsys, µ iicu, λ iicu, µ iccu, p 1, p 3, p λsys, µ iicu, λ iicu, p 1, p 3, p λsys, µ iicu, λ iicu, λ iccu, p 1, p 3, p Tabe 11: Most probabe modes with runs with the MD criterion. Mode Post. Prob. λsys, µ iicu, λ iicu, µ iccu, p 1, p 3, p λsys, µ iicu, λ iicu, µ iccu, λ iccu, p 1, p 3, p λsys, µ iicu, λ iicu, µ iccu, p 1, p 3, p w = 0.55 at the cost of sighty ess effective parameter estimation (in this case, x (z 0,i ) > 3). The top two modes identified in the S P design were the same modes identified in the origina 64 run anaysis, and the top mode identified by the S Q and MD design is ony ranked fourth when the S P design is used. The S P criterion focused on designs that had good parameter estimation primariy for modes with higher posterior probabiity. Using the S P criterion too eary in the experimentation process can prematurey focus the design and experimentation on a few modes that may or may not be good approximations to the system (because of the sma number of runs), an issue raised by Atkinson [1]. An eary focus on M D can better distinguish competitors for the best mode, but at the expense of poorer parameter estimates. S Q baanced both of those needs. 4. Discussion and Concusions The purpose of many experiments is to distinguish between ikey mathematica modes and obtain precise estimates for the mode parameters. The three joint design criteria examined here each use an additive measure for entropy measures or bounds for mode and parameter uncertainty. Our Tabe 12: Parameter generaized variance V (β) after runs for the mode with λsys, µ iicu, λ iicu, µ iccu, p 1, p 3, p 4. Runs Criterion V (β) 96 S P S Qw= S Qw= C MD
19 proposa for the new S Q criterion to normaize each entropy measure by the amount each varies over the design space provides an insight that the other joint criteria do not: It indicates how rich the design space is for improving each entropy measure. If the range for one of the component criteria is much smaer than for the other (e.g. MD max MD min >> S Pmin S Pmax ), or if the number of potentiay optima designs, x (z 0 ), is sma, then a richer design space might be considered. S Q is computationay more efficient than Borth s joint criterion especiay when the panned foowup designs get arger. In the first two exampes, the S Q design performs as we as Borth s criterion, but it is computationay more efficient and practica. S Q extends the C criterion as it considers the reevant range of the individua criteria, does not require initia estimates of the variance, and accounts for avaiabe prior information. These two exampes aso show that there are 3 different designs for S Q as w varies from 0 to 1. The optimum MD design is seected when the mode discrimination term is heaviy weighted and the optimum S p design is seected when the parameter estimation term is heaviy weighted. For each experiment, the S Q design that baances both objectives is shown to be insensitive to a range of w, and this best design seected performs more efficienty than the other criteria. Three numerica experiments iustrated the compromise between mode discrimination and parameter estimation obtained when using the joint criterion S Q. Compared with the individua criteria, the baanced S Q design was about as good as the MD design for mode discrimination, and was amost as good as the S P design for parameter estimation. The MD design fared ess we for parameter estimation, and the S P design was east effective for mode discrimination. Athough S Q is easier to compute for the inear mode than Borth s criterion, the arge number of matrix cacuations required to compute the S Q criterion may need to be baanced against the cost of actuay running the experiments. In a simuation context, CPU cyces might be better spent running repications rather than computing S Q if the simuations run quicky. For expensive industria experiments or compex simuations with ong run times, the S Q criterion may be an effective mechanism to baance the needs of factor identification and parameter estimation. Sequentia designs and criteria based on the Heinger distance are avenues for further research. 19
20 A. Mathematica Detais A.1 Proof of Prop. 1 Conditioning on σ 2, the MD criterion can be rewritten MD = p(m i )p(m ) p(σ 2 ) p(z M i, y i, σ 2 ) og p(z M i, y i, σ 2 ) 0 p(z M, y, σ 2 ) dzdσ2 (13) 0 i s Meyer et a. [28] substituted the predictive distribution of the norma form in Eq. (3) into Eq. (13) and integrated with respect to p(z M i, σ 2 ) to obtain: MD = [ [ 1 p(m i )p(m ) π(σ 2 ) 2 og 0 i s ( 0 i s nσ 2 σ 2 tr(v 1 0 ( V ) V 1 i 2σ 2 V i ) (ẑ i ẑ ) T V 1 (ẑ i ẑ ) We now isoate the dependence on the noninformative prior. MD = [ ( 1 V ) p(m i )p(m ) 2 og V i ( nσ 2 σ 2 tr ( V 1 = 0 i s V i 1 2 p(m i)p(m ) + (ẑ i ẑ ) T V 1 (ẑ i ẑ ) 0 1 2σ 2 ) ] dσ 2 ] ) ) ] (ẑi ẑ ) T V 1 (ẑ i ẑ ) π(σ 2 )dσ 2 [ ( V ) og n + tr(v 1 V i ) 0 V i ] 1 σ 2 π(σ2 )dσ 2 The doube sum means that pairs i, can be matched to make the og terms cance out. And 1 π(σ 2 )dσ 2 = [(ν/2)/(νλ/2)] InvertedGamma ( σ 2 ( ν + 1) ), νλ 0 σ dσ 2 = 1/λ. Substitute this into Eq. (14) to justify the caim in Eq. (6). A.2 Proof of Prop. 2 (14) Condition on mode M and et θ = (β, σ 2 ). [ ] p(θ Z) BD = p(z)p(θ Z) og dθdz p(θ) [ ] = p(z)p(θ Z) og p(θ Z) dθdz [ ] p(z)p(θ Z) og p(θ) dθdz 20
OPINION Two cheers for Pvalues?
Journa of Epidemioogy and Biostatistics (2001) Vo. 6, No. 2, 193 204 OPINION Two cheers for Pvaues? S SENN Department of Epidemioogy and Pubic Heath, Department of Statistica Science, University Coege
More informationOn the relationship between radiance and irradiance: determining the illumination from images of a convex Lambertian object
2448 J. Opt. Soc. Am. A/ Vo. 18, No. 10/ October 2001 R. Ramamoorthi and P. Hanrahan On the reationship between radiance and irradiance: determining the iumination from images of a convex Lambertian object
More informationUse R! Series Editors: Robert Gentleman Kurt Hornik Giovanni G. Parmigiani. For further volumes: http://www.springer.
Use R! Series Editors: Robert Genteman Kurt Hornik Giovanni G. Parmigiani For further voumes: http://www.springer.com/series/6991 Graham Wiiams Data Mining with Ratte and R The Art of Excavating Data
More informationAll Aspects. of a...business...industry...company. Planning. Management. Finance. An Information. Technical Skills. Technology.
A Aspects Panning of a...business...industry...company Management Finance Technica Skis Technoogy Labor Issues An Information Sourcebook Community Issues Heath & Safety Persona Work Habits Acknowedgement
More informationAre Health Problems Systemic?
Document de travai Working paper Are Heath Probems Systemic? Poitics of Access and Choice under Beveridge and Bismarck Systems Zeynep Or (Irdes) Chanta Cases (Irdes) Meanie Lisac (Bertesmann Stiftung)
More informationHealth Literacy Online
Heath Literacy Onine A guide to writing and designing easytouse heath Web sites Strategies Actions Testing Methods Resources HEALTH OF & HUMAN SERVICES USA U.S. Department of Heath and Human Services
More informationEVERYTHING YOU ALWAYS WANTED TO KNOW ABOUT SNAKES (BUT WERE AFRAID TO ASK) Jim Ivins & John Porrill
EVERYTHING YOU ALWAYS WANTED TO KNOW ABOUT SNAKES (BUT WERE AFRAID TO ASK) Jim Ivins & John Porri AIVRU Technica Memo #86, Juy 993 (Revised June 995; March 2000) Artificia Inteigence Vision Research Unit
More informationWho to Follow and Why: Link Prediction with Explanations
Who to Foow and Why: Lin rediction with Expanations Nicoa Barbieri Yahoo Labs Barceona, Spain barbieri@yahooinc.com Francesco Bonchi Yahoo Labs Barceona, Spain bonchi@yahooinc.com Giuseppe Manco ICARCNR
More informationHow to Make Adoption an Affordable Option
How to Make Adoption an Affordabe Option How to Make Adoption an Affordabe Option 2015 Nationa Endowment for Financia Education. A rights reserved. The content areas in this materia are beieved to be current
More informationThe IBM System/38. 8.1 Introduction
8 The IBM System/38 8.1 Introduction IBM s capabiitybased System38 [Berstis 80a, Houdek 81, IBM Sa, IBM 82b], announced in 1978 and deivered in 1980, is an outgrowth of work that began in the ate sixties
More informationSecuring the future of excellent patient care. Final report of the independent review Led by Professor David Greenaway
Securing the future of exceent patient care Fina report of the independent review Led by Professor David Greenaway Contents Foreword 3 Executive summary 4 Training structure for the future 6 Recommendations
More informationMinimum Required Payment and Supplemental Information Disclosure Effects on Consumer Debt Repayment Decisions
DANIEL NAVARROMARTINEZ, LINDA COURT SALISBURY, KATHERINE N. LEMON, NEIL STEWART, WILLIAM J. MATTHEWS, and ADAM J.L. HARRIS Repayment decisions ow muc of te oan to repay and wen to make te payments directy
More informationRelationship Between the Retirement, Disability, and Unemployment Insurance Programs: The U.S. Experience
Reationship Between the Retirement, Disabiity, and Unempoyment Insurance Programs The US Experience by Virginia P Reno and Danie N, Price* This artice was prepared initiay for an internationa conference
More informationExample of Credit Card Agreement for Bank of America Visa Signature and World MasterCard accounts
Exampe of Credit Card Agreement for Bank of America Visa Signature and Word MasterCard accounts PRICING INFORMATION Actua pricing wi vary from one cardhoder to another Annua Percentage Rates for Purchases
More informationEVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NONLINEAR REGRESSION. Carl Edward Rasmussen
EVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NONLINEAR REGRESSION Carl Edward Rasmussen A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy, Graduate
More informationGenerative or Discriminative? Getting the Best of Both Worlds
BAYESIAN STATISTICS 8, pp. 3 24. J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2007 Generative or Discriminative?
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationAn Introduction to Variable and Feature Selection
Journal of Machine Learning Research 3 (23) 11571182 Submitted 11/2; Published 3/3 An Introduction to Variable and Feature Selection Isabelle Guyon Clopinet 955 Creston Road Berkeley, CA 9478151, USA
More informationA Googlelike Model of Road Network Dynamics and its Application to Regulation and Control
A Googlelike Model of Road Network Dynamics and its Application to Regulation and Control Emanuele Crisostomi, Steve Kirkland, Robert Shorten August, 2010 Abstract Inspired by the ability of Markov chains
More informationHow to Use Expert Advice
NICOLÒ CESABIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California
More informationIf You re So Smart, Why Aren t You Rich? Belief Selection in Complete and Incomplete Markets
If You re So Smart, Why Aren t You Rich? Belief Selection in Complete and Incomplete Markets Lawrence Blume and David Easley Department of Economics Cornell University July 2002 Today: June 24, 2004 The
More informationDirichlet Process Gaussian Mixture Models: Choice of the Base Distribution
Görür D, Rasmussen CE. Dirichlet process Gaussian mixture models: Choice of the base distribution. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 5(4): 615 66 July 010/DOI 10.1007/s1139001010511 Dirichlet
More informationScalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights
Seventh IEEE International Conference on Data Mining Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park
More informationSubspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at UrbanaChampaign
More informationimproving culture, arts and sporting opportunities through planning a good practice guide
improving cuture, arts and sporting opportunities through panning a good practice guide Improving Cuture, Arts and Sporting Opportunities through Panning. A Good Practice Guide Supported by: A fu ist of
More informationTHE adoption of classical statistical modeling techniques
236 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 2, FEBRUARY 2006 Data Driven Image Models through Continuous Joint Alignment Erik G. LearnedMiller Abstract This paper
More information1 An Introduction to Conditional Random Fields for Relational Learning
1 An Introduction to Conditional Random Fields for Relational Learning Charles Sutton Department of Computer Science University of Massachusetts, USA casutton@cs.umass.edu http://www.cs.umass.edu/ casutton
More informationLearning to Select Features using their Properties
Journal of Machine Learning Research 9 (2008) 23492376 Submitted 8/06; Revised 1/08; Published 10/08 Learning to Select Features using their Properties Eyal Krupka Amir Navot Naftali Tishby School of
More informationThe Lovely but Lonely Vickrey Auction
The Lovey but Loney Vickrey Auction Lawrence M. Ausube and Pau Migro 1. Introduction Wiia Vickrey s (1961) inquiry into auctions and counterspecuation arked the first serious attept by an econoist to anayze
More informationStatistical Inference in TwoStage Online Controlled Experiments with Treatment Selection and Validation
Statistical Inference in TwoStage Online Controlled Experiments with Treatment Selection and Validation Alex Deng Microsoft One Microsoft Way Redmond, WA 98052 alexdeng@microsoft.com Tianxi Li Department
More information