Gaussian Process Optimization with Mutual Information

Size: px
Start display at page:

Download "Gaussian Process Optimization with Mutual Information"

Transcription

1 Emile Contal CMLA, UMR CNRS 8536, ENS Cachan, France Vianne Perchet LPMA, Université Paris Diderot, France Nicolas Vaatis CMLA, UMR CNRS 8536, ENS Cachan, France Abstract In this paper, we analze a generic algorithm scheme for sequential global optimization using Gaussian processes. he upper bounds we derive on the cumulative regret for this generic algorithm improve b an exponential factor the previousl known bounds for algorithms like GP-. We also introduce the novel Gaussian Process Mutual Information algorithm (GP-), which significantl improves further these upper bounds for the cumulative regret. We confirm the efficienc of this algorithm on snthetic and real tasks against the natural competitor, GP-, and also the Expected Improvement heuristic.. Introduction Stochastic optimization problems are encountered in numerous real world domains including engineering design (Wang & Shan, 7), finance (Ziemba & Vickson, 6), natural sciences (Floudas & Pardalos, ), or in machine learning for selecting models b tuning the parameters of learning algorithms (Snoek et al., ). We aim at finding the input of a given sstem which optimizes the output (or reward). In this view, an iterative procedure uses the previousl acquired measures to choose the next quer predicted to be the most useful. he goal is to maximize the sum of the rewards received at each iteration, that is to minimize the cumulative regret b balancing exploration (gathering information b favoring locations with high uncertaint) and exploitation (focusing on the optimum b favoring locations with high predicted reward). his optimization task becomes challenging when the dimension Proceedings of the 3 st International Conference on Machine Learning, Beijing, China, 4. JMLR: W&CP volume 3. Copright 4 b the author(s). of the search space is high and the evaluations are nois and expensive. Efficient algorithms have been studied to tackle this challenge such as multiarmed bandit (Auer et al., ; Kleinberg, 4; Bubeck et al., ; Audibert et al., ), active learning (Carpentier et al., ; Chen & Krause, 3) or Baesian optimization (Mockus, 989; Grunewalder et al., ; Srinivas et al., ; de Freitas et al., ). he theoretical analsis of such optimization procedures requires some prior assumptions on the underling function f. Modeling f as a function distributed from a Gaussian process (GP) enforces near-b locations to have close associated values, and allows to control the general smoothness of f with high probabilit according to the kernel of the GP (Rasmussen & Williams, 6). Our main contribution is twofold: we propose a generic algorithm scheme for Gaussian process optimization and we prove sharp upper bounds on its cumulative regret. he theoretical analsis has a direct impact on strategies built with the GP- algorithm (Srinivas et al., ) such as (Krause & Ong, ; Desautels et al., ; Contal et al., 3). We suggest an alternative polic which achieves an exponential speed up with respect to the cumulative regret. We also introduce a novel algorithm, the Gaussian Process Mutual Information algorithm (GP-), which improves furthermore upper bounds for the cumulative regret from O( a (log ) d+ ) for GP-, the current state of the art, to the spectacular O( a (log ) d+ ), where is the number of iterations, d is the dimension of the input space and the kernel function is Gaussian. he remainder of this article is organized as follows. We first introduce the setup and notations. We define the GP- algorithm in Section. Main results on the cumulative regret bounds are presented in Section 3. We then provide technical details in Section 4. We finall confirm the performances of GP- on real and snthetic tasks compared to the state of the art of GP optimization and some heuristics used in practice.

2 .. he Gaussian process framework Figure. One dimensional Gaussian process inference of the posterior mean µ (blue line) and posterior deviation σ (half of the height of the gra envelope) with squared exponential kernel, based on four observations (blue crosses).. Gaussian process optimization and the GP- algorithm.. Sequential optimization and cumulative regret Let f : X R, where X R d is a compact and convex set, be the unknown function modeling the sstem we want to be optimized. We consider the problem of finding the maximum of f denoted b: f(x ) = max x X f(x), via successive queries x, x,... X. At iteration +, the choice of the next quer x + depends on the previous nois observations, Y = {,..., } at locations X = {x,..., x } where t = f(x t ) + ɛ t for all t, and the noise variables ɛ,..., ɛ are independentl distributed as a Gaussian random variable N (, σ ) with zero mean and variance σ. he efficienc of a polic and its abilit to address the exploration/exploitation trade-off is measured via the cumulative regret R, defined as the sum of the instantaneous regret r t, the gaps between the value of the maximum and the values at the sample locations, r t = f(x ) f(x t ) for t and R = f(x ) f(x t ). Our aim is to obtain upper bounds on the cumulative regret R with high probabilit. Prior assumption on f. In order to control the smoothness of the underling function, we assume that f is sampled from a Gaussian process GP(m, k) with mean function m : X R and kernel function k : X X R +. We formalize in this manner the prior assumption that high local variations of f have low probabilit. he prior mean function is considered without loss of generalit to be zero, as the kernel k can completel define the GP (Rasmussen & Williams, 6). We consider the normalized and dimensionless framework introduced b (Srinivas et al., ) where the variance is assumed to be bounded, that is k(x, x) for all x X. Baesian inference. At iteration +, given the previousl observed nois values Y at locations X, we use Baesian inference to compute the current posterior distribution (Rasmussen & Williams, 6), which is a GP of mean µ + and variance σ + given at an x X b, µ + (x) = k (x) C Y () and σ +(x) = k(x, x) k (x) C k (x), () where k (x) = [k(x t, x)] xt X is the vector of covariances between x and the quer points at time, and C = K +σ I with K = [k(x t, x t )] xt,x t X the kernel matrix, σ is the variance of the noise and I stands for the identit matrix. he Baesian inference is illustrated on Figure in a sample problem in dimension one, where the posteriors are based on four observations of a Gaussian Process with squared exponential kernel. he height of the gra area represents two posterior standard deviations at each point..3. he Gaussian Process Mutual Information algorithm he Gaussian Process Mutual Information algorithm (GP- ) is presented as Algorithm. he ke statement is the choice of the quer, x t = argmax x X µ t (x) + φ t (x). he exploitation abilit of the procedure is driven b µ t, while the exploration is governed b φ t : X R, which is an increasing function of σt (x). he novelt in the GP- algorithm is that φ t is empiricall controlled b the amount of exploration that has been alread done, that is, the more the algorithm has gathered information on f, the more it will focus on the optimum. In the GP- algorithm from (Srinivas et al., ) the exploration coefficient is a O(log t) and therefore tends to infinit. he parameter α in Algorithm governs the trade-off between precision and confidence, as shown in heorem. he efficienc of the algorithm is robust to the choice of its value. We confirm empiricall this propert and provide further discussion on the calibration of α in Section 5.

3 Algorithm GP- pγ for t =,,... do // Baesian inference (Eq., ) Compute µ t and σt // Definition of φ t (x) for all x X φ t (x)? α ˆbσ t (x) + pγ t a pγ t // Selection of the next quer location x t argmax x X µ t (x) + φ t (x) // Update pγ t pγ t pγ t + σ t (x t ) // Quer Sample x t and observe t end for Mutual information. he quantit pγ controlling the exploration in our algorithm forms a lower bound on the information acquired on f b the quer points X. he information on f is formall defined b I (X ), the mutual information between f and the nois observations Y at X, hence the name of the GP- algorithm. For a Gaussian process distribution I (X ) = log det(i + σ K ) where K is the kernel matrix [k(x i, x j )] xi,x j X. We refer to (Cover & homas, 99) for further reading on mutual information. We denote b γ = max X X : X = I (X ) the maximum mutual information obtainable b a sequence of quer points. In the case of Gaussian processes with bounded variance, the following inequalit is satisfied (Lemma 5.4 in (Srinivas et al., )): pγ = σt (x t ) C γ (3) where C = log(+σ ) and σ is the noise variance. he upper bounds on the cumulative regret we derive in the next section depend mainl on this ke quantit. 3. Main Results 3.. Generic Optimization Scheme We first consider the generic optimization scheme defined in Algorithm, where we let φ t as a generic function viewed as a parameter of the algorithm. We onl require φ t to be measurable with respect to Y t, the observations available at iteration t. he theoretical analsis of Algorithm can be used as a plug-in theorem for existing algorithms. For example the GP- algorithm with parameter β t = O(log t) is obtained with φ t (x) = a β t σ t (x). A generic analsis of Algorithm leads to the following upper bounds on the cumulative regret with high probabilit. Algorithm Generic Optimization Scheme (φ t ) for t =,,... do Compute µ t and φ t x t argmax x X µ t (x) + φ t (x) Sample x t and observe t end for heorem (Regret bounds for the generic algorithm). For all δ > and >, the regret R incurred b Algorithm on f distributed as a GP perturbed b independent Gaussian noise with variance σ satisfies the following bound with high probabilit, with C = log(+σ ) and α = log δ : Pr [ R pφ t (x t ) φ t (x )q + 4 a? α α(c γ + ) + ] σ t (x )? δ. C γ + he proof of heorem, relies on concentration guarantees for Gaussian processes (see Section 4.). heorem provides an intermediate result used for the calibration of φ t to face the exploration/exploitation trade-off. For example b choosing φ t (x) =? α σ t (x) (where the dimensional constant is hidden), Algorithm becomes a variant of the GP- algorithm where in particular the exploration parameter? β t is fixed to? α instead of being an increasing function of t. he upper bounds on the cumulative regret with this definition of φ t are of the form R = O(γ ), as stated in Corollar. We then also consider the case where the kernel k of the Gaussian process is under the form of a squared exponential (RBF) kernel, k(x, x ) = exp x x l, for all x, x X and length scale l R. In this setting, the maximum mutual information γ satisfies the upper bound γ = Op(log ) d+ q, where d is the dimension of the input space (Srinivas et al., ). Corollar. Consider the Algorithm where we set φ t (x) =? α σ t (x). Under the assumptions of heorem, we have that the cumulative regret for Algorithm satisfies the following upper bounds with high probabilit: For f sampled from a GP with general kernel: R = O(γ ). For f sampled from a GP with RBF kernel: R = Op(log ) d+ q. o prove Corollar we appl heorem with the given definition of φ t and then Equation 3, which leads to Pr rr? α C γ + 4 a α(c γ + )s δ. he

4 previousl known upper bounds on the cumulative regret for the GP- algorithm are of the form R = Op? β γ q where β = Op log δ q. he improvement of the generic Algorithm with φ t (x) =? α σ t (x) over the GP- algorithm with respect to the cumulative regret is then exponential in the case of Gaussian processes with RBF kernel. For f sampled from a GP with linear kernel, corresponding to f(x) = w x with w N (, I), we obtain R = Opd log q. We remark that the GP assumption with linear kernel is more restrictive than the linear bandit framework, as it implies a Gaussian prior over the linear coefficients w. Hence there is no contradiction with the lower bounds stated for linear bandit like those of (Dani et al., 8). We refer to (Srinivas et al., ) for the analsis of γ with other kernels widel used in practice. 3.. Regret bounds for the GP- algorithm We present here the main result of the paper, the upper bound on the cumulative regret for the GP- algorithm. heorem (Regret bounds for the GP- algorithm). For all δ > and >, the regret R incurred b Algorithm on f distributed as a GP perturbed b independent Gaussian noise with variance σ satisfies the following bound with high probabilit, with C = log(+σ ) and α=log δ : Pr R 5 a αc γ + 4? ı α δ. he proof for heorem is provided in Section 4., where we analze the properties of the exploration functions φ t. Corollar describes the case with RBF kernel for the GP- algorithm. Corollar (RBF kernels). he cumulative regret R incurred b Algorithm on f sampled from a GP with RBF kernel satisfies with high probabilit, R = O (log ) d+. he GP- algorithm significantl improves the upper bounds for the cumulative regret over the GP- algorithm and the alternative polic of Corollar. 4. heoretical analsis In this section, we provide the proofs for heorem and heorem. he approach presented here to stud the cumulative regret incurred b Gaussian process optimization strategies is general and can be used further for other algorithms. 4.. Analsis of the general algorithm he theoretical analsis of heorem uses a similar approach to the Azuma-Hoeffding inequalit adapted for Gaussian processes. Let r t = f(x ) f(x t ) for all t. We define M, which is shown later to be a martingale with respect to Y, M = r t pµ t (x ) µ t (x t )q, (4) for and M =. Let Y t be defined as the martingale difference sequence with respect to M, that is the difference between the instantaneous regret and the gap between the posterior mean for the optimum and the one for the point queried, Y t = M t M t = r t pµ t (x ) µ t (x t )q for t. Lemma. he sequence M is a martingale with respect to Y and for all t, given Y t, the random variable Y t is distributed as a Gaussian N (, l t ) with zero mean and variance l t, where: l t = σ t (x ) + σ t (x t ) k(x, x t ). (5) Proof. From the GP assumption, we know that given Y t, the distribution of f(x) is Gaussian N pµ t (x), σ t (x)q for all x X, and r t is a projection of a Gaussian random vector, that is r t is distributed as a Gaussian N pµ t (x ) µ t (x t ), l t q and Y t is distributed as Gaussian N (, l t ), with l t = σ t (x ) + σ t (x t ) k(x, x t ), hence M is a Gaussian martingale. We now give a concentration result for M using inequalities for self-normalized martingales. Lemma. For all δ > and >, the martingale M normalized b the predictable quadratic variation l t satisfies the following concentration inequalit with α = log δ and = 8(C γ + ): Pr [ M a α + c α ] σt (x ) δ. Proof. Let = 8(C γ + ). We introduce the notation Pr > [A] for Pr[A l t > ] and Pr [A] for Pr[A l t ]. Given that M t is a Gaussian martingale, using heorem 4. and Remark 4. from (Bercu & ouati, 8) with M = l t and a = and b = we obtain for all x > : Pr > With x = b α M l t ı > x < exp x where α = log δ, we have: Pr > M > b α. ı l t < δ.

5 B definition of l t in Eq. 5 and with k(x, x t ), we have for all t that l t σt (x t ) + σt (x ). Using Equation 3 we have pγ 8, we finall get: Pr > M >? b α 8 + α σ t (x )ı < δ. (6) Now, using heorem 4. and Remark 4. from (Bercu & ouati, 8) the following inequalit is satisfied for all x > : Pr rm > xs < exp. With x =? α we have: x Pr rm >? αs < δ. (7) Combining Equations 6 and 7 leads to, Pr «proving Lemma. M > a α + c α ff σt (x ) < δ, he following lemma concludes the proof of heorem using the previous concentration result and the properties of the generic Algorithm. Lemma 3. he cumulative regret for Algorithm on f sampled from a GP satisfies the following bound for all δ > and α and defined in Lemma : Pr [ + R c α pφ t (x t ) φ t (x )q + a α ] σt (x ) δ. Proof. B construction of the generic Algorithm, we have x t = argmax x X µ t (x) + φ t (x), which guarantees for all t that µ t (x ) µ t (x t ) φ t (x t ) φ t (x ). Replacing M b its definition in Eq. 4 and using the previous propert in Lemma proves Lemma Analsis of the GP- algorithm In order to bound the cumulative regret for the GP- algorithm, we focus on an alternative definition of the exploration functions φ t where the last term is modified inductivel so as to simplif the sum φ t(x t ) for all >. Being a constant term for a fixed t >, Algorithm remains unchanged. Let φ t be defined as, b t φ t (x) = α(σt (x) + pγ t ) φ i (x i ), i= where x t is the point selected b Algorithm at iteration t. We have for all >, φ t (x t ) = a αpγ φ t (x t ) + φ t (x t ) = a αpγ. (8) We can now derive upper bounds for pφ t(x t ) φ t (x )q which will be plugged in heorem in order to cancel out the terms involving x. In this manner we can calibrate sharpl the exploration/exploitation trade-off b optimizing the remaining terms. Lemma 4. For the GP- algorithm, the exploration term in the equation of heorem satisfies the following inequalit: pφ t (x t ) φ t (x )q a? α αpγ σ t (x ) a pγ + Proof. Using our alternative definition of φ t which gives the equalit stated in Equation 8, we know that, pφ t (x t ) φ t (x )q =? α apγ + a b pγt pγ t + σt (x ). B concavit of the square root, we have for all a b that? a + b? a b? a. Introducing the notations a t = pγ t + σt (x ) and b t = σt (x ), we obtain, pφ t (x t ) φ t (x )q a? α αpγ + b t? at. Moreover, with σ t (x) for all x X, we have a t pγ + and b t for all t which gives, b t? σ t (x ) a, at pγ + leading to the inequalit of Lemma 4. he following lemma combines the results from heorem and Lemma 4 to derive upper bounds on the cumulative regret for the GP- algorithm with high probabilit. Lemma 5. he cumulative regret for Algorithm on f sampled from a GP satisfies the following bound for all δ > and α defined in Lemma, Pr R 5 a αc γ + 4? ı α δ..

6 Proof. Considering heorem in the case of the GP- algorithm and bounding pφ t(x t ) φ t (x )q with Lemma 4, we obtain the following bound on the cumulative regret incurred b GP-: [ Pr R a? α αpγ + 4 a? α α(c γ + ) + Gaussian Process Optimization with Mutual Information σ t (x ) a pγ + ] σ t (x )? δ, C γ + which simplifies to the inequalit of Lemma 5 using Equation 3, and thus proves heorem. 5. Practical considerations and experiments 5.. Numerical experiments Protocol. We compare the empirical performances of our algorithm against the state-of-the-art of GP optimization, the GP- algorithm (Srinivas et al., ), and a commonl used heuristic, the Expected Improvement () algorithm with GP (Jones et al., 998). he tasks used for assessment come from two real applications and five snthetic problems described here. For all data sets and algorithms the learners were initialized with a random subset of observations {(x i, i )} i. When the prior distribution of the underling function was not known, the Baesian inference was made using a squared exponential kernel. We first picked the half of the data set to estimate the hperparameters of the kernel via cross validation in this subset. In this wa, each algorithm was running with the same prior information. he value of the parameter δ for the GP- and the GP- algorithms was fixed to δ = 6 for all these experimental tasks. Modifing this value b several orders of magnitude is insignificant with respect to the empirical mean cumulative regret incurred b the algorithms, as discussed in Section 5.. he results are provided in Figure 3. he curves show the evolution of the average regret R in term of iteration. We report the mean value with the confidence interval over a hundred experiments. Description of the data sets. data sets used for assessment. We describe briefl all the Generated GP. he generated Gaussian process functions are random GPs drawn from an isotropic Matérn kernel in dimension and 4, with the kernel bandwidth set to for dimension, and 6 for dimension 4. he Matérn parameter was set to ν = 3 and the noise standard deviation to % of the signal standard deviation. Gaussian Mixture. his snthetic function comes from the addition of three -D Gaussian functions. (a) Gaussian mixture (b) Himmelblau Figure. Visualization of the snthetic functions used for assessment We then perturb these Gaussian functions with smooth variations generated from a Gaussian Process with isotropic Matérn Kernel and % of noise. It is shown on Figure (a). he highest peak being thin, the sequential search for the maximum of this function is quite challenging. Himmelblau. his task is another snthetic function in dimension. We compute a slightl tilted version of the Himmelblau s function with the addition of a linear function, and take the opposite to match the challenge of finding its maximum. his function presents four peaks but onl one global maximum. It gives a practical wa to test the abilit of a strateg to manage exploration/exploitation trade-offs. It is represented in Figure (b).

7 Branin. he Branin or Branin-Hoo function is a common benchmark function for global optimization. It presents three global optimum in the -D square [ 5, ] [, 5]. his benchmark is one of the two snthetic functions used b (Srinivas et al., ) to evaluate the empirical performances of the GP- algorithm. No noise has been added to the original signal in this experimental task. Goldstein-Price. he Goldstein & Price function is an other benchmark function for global optimization, with a single global optimum but several local optima in the -D square [, ] [, ]. his is the second snthetic benchmark used b (Srinivas et al., ). Like in the previous challenge, no noise has been added to the original signal. sunamis. Recent post-tsunami surve data as well as the numerical simulations of (Hill et al., ) have shown that in some cases the run-up, which is the maximum vertical extent of wave climbing on a beach, in areas which were supposed to be protected b small islands in the vicinit of coast, was significantl higher than in neighboring locations. Motivated b these observations (Stefanakis et al., ) investigated this phenomenon b emploing numerical simulations using the VOLNA code (Dutkh et al., ) with the simplified geometr of a conical island sitting on a flat surface in front of a sloping beach. In the stud of (Stefanakis et al., 3) the setup was controlled b five phsical parameters and the aim was to find with confidence and with the least number of simulations the parameters leading to the maximum run-up amplification. Macke-Glass function. he Macke-Glass deladifferential equation is a chaotic sstem in dimension 6, but without noise. It models real feedback sstems and is used in phsiological domains such as hematolog, cardiolog, neurolog, and pschiatr. he highl chaotic behavior of this function makes it an exceptionall difficult optimization problem. It has been used as a benchmark for example b (Flake & Lawrence, ). Empirical comparison of the algorithms. Figure 3 compares the empirical mean average regret R for the three algorithms. On the eas optimization assessments like the Branin data set (Fig. 3(e)) the three strategies behave in a comparable manner, but the GP- algorithm incurs a larger cumulative regret. For more difficult assessments the GP- algorithm performs poorl and our algorithm alwas surpasses the heuristic. he improvement of the GP- algorithm against the two competitors is the most significant for exceptionall challenging optimization tasks as illustrated in Figures 3(a) to 3(d) and R / R / R / R / (a) Generated GP(d = ) , (c) Gaussian Mixture (e) Branin (g) sunamis (b) Generated GP(d = 4) (d) Himmelblau (f) Goldstein , (h) Macke-Glass Figure 3. Empirical mean and confidence interval of the average regret R in term of iteration on real and snthetic tasks for the GP- and GP- algorithms and the heuristic (lower is better). 3(h), where the underling functions present several local optima. he abilit of our algorithm to deal with the exploration/exploitation trade-off is emphasized b these experimental results as its average regret decreases directl after the first iterations, avoiding unwanted exploration like GP- on Figures 3(a) to 3(d), or getting stuck in some local optimum like on Figures 3(c), 3(g) and 3(h). We further mention that the GP- algorithm is empiricall robust against the number of dimensions of the data set (Fig. 3(b), 3(g), 3(h)). 5.. Practical aspects Calibration of α. he value of the parameter α is chosen following heorem as α = log δ with < δ < being a confidence parameter. he guarantees we prove in Section 4. on the cumulative regret for the GP- algorithm holds with probabilit at least δ. With α increasing linearl for δ decreasing exponentiall toward, the algorithm is

8 R / δ = 9 δ = 6 δ = 3 δ = References Audibert, J-Y., Bubeck, S., and Munos, R. Bandit view on nois optimization. In Optimization for Machine Learning, pp Press,. Auer, P., Cesa-Bianchi, N., and Fischer, P. Finite-time analsis of the multiarmed bandit problem. Machine Learning, 47(-3):35 56,. Bercu, B. and ouati, A. Exponential inequalities for selfnormalized martingales with applications. he Annals of Applied Probabilit, 8(5): , Bubeck, S., Munos, R., Stoltz, G., and Szepesvári, C. X- armed bandits. Journal of Machine Learning Research, : ,. Figure 4. Small impact of the value of δ on the mean average regret of the GP- algorithm running on the Himmelblau data set. robust to the choice of δ. We present on Figure 4 the small impact of δ on the average regret for four different values selected on a wide range. Numerical Complexit. Even if the numerical cost of GP- is insignificant in practice compared to the cost of the evaluation of f, the complexit of the sequential Baesian update (Osborne, ) is O( ) and might be prohibitive for large. One can reduce drasticall the computational time b means of Laz Variance Calculation (Desautels et al., ), built on the fact that σ (x) alwas decreases for increasing and for all x X. We further mention that approximated inference algorithms such as the EP approximation and MCMC sampling (Kuss et al., 5) can be used as an alternative if the computational time is a restrictive factor. 6. Conclusion We introduced the GP- algorithm for GP optimization and prove upper bounds on its cumulative regret which improve exponentiall the state-of-the-art in common settings. he theoretical analsis was presented in a generic framework in order to expand its impact to other similar algorithms. he experiments we performed on real and snthetic assessments confirmed empiricall the efficienc of our algorithm against both the theoretical state-of-the-art of GP optimization, the GP- algorithm, and the commonl used heuristic. ACKNOWLEDGEMENS he authors would like to thank David Buffoni and Raphaël Bonaque for fruitful discussions. he author also thank the anonmous reviewers for their detailed feedback. Carpentier, A., Lazaric, A., Ghavamzadeh, M., Munos, R., and Auer, P. Upper-confidence-bound algorithms for active learning in multi-armed bandits. In Proceedings of the International Conference on Algorithmic Learning heor, pp Springer-Verlag,. Chen, Y. and Krause, A. Near-optimal batch mode active learning and adaptive submodular optimization. In Proceedings of the International Conference on Machine Learning. icml.cc / Omnipress, 3. Contal, E., Buffoni, D., Robicquet, A., and Vaatis, N. Parallel Gaussian process optimization with upper confidence bound and pure exploration. In Machine Learning and Knowledge Discover in Databases, volume 888, pp Springer Berlin Heidelberg, 3. Cover,. M. and homas, J. A. Elements of Information heor. Wile-Interscience, 99. Dani, V., Haes,. P., and Kakade, S. M. Stochastic linear optimization under bandit feedback. In Proceedings of the st Annual Conference on Learning heor, pp , 8. de Freitas, N., Smola, A. J., and Zoghi, M. Exponential regret bounds for Gaussian process bandits with deterministic observations. In Proceedings of the 9th International Conference on Machine Learning. icml.cc / Omnipress,. Desautels,., Krause, A., and Burdick, J.W. Parallelizing exploration-exploitation tradeoffs with Gaussian process bandit optimization. In Proceedings of the 9th International Conference on Machine Learning, pp icml.cc / Omnipress,. Dutkh, D., Poncet, R, and Dias, F. he VOLNA code for the numerical modelling of tsunami waves: generation, propagation and inundation. European Journal of Mechanics B/Fluids, 3:598 65,.

9 Flake, G. W. and Lawrence, S. Efficient SVM regression training with SMO. Machine Learning, 46(-3):7 9,. Floudas, C.A. and Pardalos, P.M. Optimization in Computational Chemistr and Molecular Biolog: Local and Global Approaches. Nonconvex Optimization and Its Applications. Springer,. Grunewalder, S., Audibert, J-Y., Opper, M., and Shawe- alor, J. Regret bounds for Gaussian process bandit problems. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pp Press,. Hill, E. M., Borrero, J. C., Huang, Z., Qiu, Q., Banerjee, P., Natawidjaja, D. H., Elosegui, P., Fritz, H. M., Suwargadi, B. W., Prananto, I. R., Li, L., Macpherson, K. A., Skanavis, V., Snolakis, C. E., and Sieh, K. he Mw 7.8 Mentawai earthquake: Ver shallow source of a rare tsunami earthquake determined from tsunami field surve and near-field GPS data. J. Geophs. Res., 7: B64,. Jones, D. R., Schonlau, M., and Welch, W. J. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 3(4):455 49, December 998. Srinivas, N., Krause, A., Kakade, S., and Seeger, M. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the International Conference on Machine Learning, pp. 5. icml.cc / Omnipress,. Srinivas, N., Krause, A., Kakade, S., and Seeger, M. Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE ransactions on Information heor, 58(5):35 365,. Stefanakis,. S., Dias, F., Vaatis, N., and Guillas, S. Long-wave runup on a plane beach behind a conical island. In Proceedings of the World Conference on Earthquake Engineering,. Stefanakis,. S., Contal, E., Vaatis, N., Dias, F., and Snolakis, C. E. Can small islands protect nearb coasts from tsunamis? An active experimental design approach. arxiv preprint arxiv: , 3. Wang, G. and Shan, S. Review of metamodeling techniques in support of engineering design optimization. Journal of Mechanical Design, 9(4):37 38, 7. Ziemba, W.. and Vickson, R. G. Stochastic optimization models in finance. World Scientific Singapore, 6. Kleinberg, R. Nearl tight bounds for the continuum-armed bandit problem. In Advances in Neural Information Processing Sstems 7, pp Press, 4. Krause, A. and Ong, C.S. Contextual Gaussian process bandit optimization. In Advances in Neural Information Processing Sstems 4, pp ,. Kuss, M., Pfingsten,., Csató, L., and Rasmussen, C.E. Approximate inference for robust Gaussian process regression. Max Planck Inst. Biological Cbern., ubingen, Germanech. Rep, 36, 5. Mockus, J. Baesian approach to global optimization. Mathematics and its applications. Kluwer Academic, 989. Osborne, Michael. Baesian Gaussian processes for sequential prediction, optimisation and quadrature. PhD thesis, Oxford Universit New College,. Rasmussen, C. E. and Williams, C. Gaussian Processes for Machine Learning. Press, 6. Snoek, J., Larochelle, H., and Adams, R. P. Practical baesian optimization of machine learning algorithms. In Advances in Neural Information Processing Sstems 5, pp ,.

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design : No Regret and Experimental Design Niranjan Srinivas Andreas Krause California Institute of Technology, Pasadena, CA, USA Sham Kakade University of Pennsylvania, Philadelphia, PA, USA Matthias Seeger

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl

More information

Trading regret rate for computational efficiency in online learning with limited feedback

Trading regret rate for computational efficiency in online learning with limited feedback Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

5. Linear regression and correlation

5. Linear regression and correlation Statistics for Engineers 5-1 5. Linear regression and correlation If we measure a response variable at various values of a controlled variable, linear regression is the process of fitting a straight line

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.

More information

Monotone multi-armed bandit allocations

Monotone multi-armed bandit allocations JMLR: Workshop and Conference Proceedings 19 (2011) 829 833 24th Annual Conference on Learning Theory Monotone multi-armed bandit allocations Aleksandrs Slivkins Microsoft Research Silicon Valley, Mountain

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Local Gaussian Process Regression for Real Time Online Model Learning and Control

Local Gaussian Process Regression for Real Time Online Model Learning and Control Local Gaussian Process Regression for Real Time Online Model Learning and Control Duy Nguyen-Tuong Jan Peters Matthias Seeger Max Planck Institute for Biological Cybernetics Spemannstraße 38, 776 Tübingen,

More information

Identify a pattern and find the next three numbers in the pattern. 5. 5(2s 2 1) 2 3(s 1 2); s 5 4

Identify a pattern and find the next three numbers in the pattern. 5. 5(2s 2 1) 2 3(s 1 2); s 5 4 Chapter 1 Test Do ou know HOW? Identif a pattern and find the net three numbers in the pattern. 1. 5, 1, 3, 7, c. 6, 3, 16, 8, c Each term is more than the previous Each term is half of the previous term;

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Online Learning with Switching Costs and Other Adaptive Adversaries

Online Learning with Switching Costs and Other Adaptive Adversaries Online Learning with Switching Costs and Other Adaptive Adversaries Nicolò Cesa-Bianchi Università degli Studi di Milano Italy Ofer Dekel Microsoft Research USA Ohad Shamir Microsoft Research and the Weizmann

More information

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

More information

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization Jean- Damien Villiers ESSEC Business School Master of Sciences in Management Grande Ecole September 2013 1 Non Linear

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,

More information

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.

More information

Online Optimization and Personalization of Teaching Sequences

Online Optimization and Personalization of Teaching Sequences Online Optimization and Personalization of Teaching Sequences Benjamin Clément 1, Didier Roy 1, Manuel Lopes 1, Pierre-Yves Oudeyer 1 1 Flowers Lab Inria Bordeaux Sud-Ouest, Bordeaux 33400, France, didier.roy@inria.fr

More information

Portfolio Allocation for Bayesian Optimization

Portfolio Allocation for Bayesian Optimization Portfolio Allocation for Bayesian Optimization Matthew Hoffman, Eric Brochu, Nando de Freitas Department of Computer Science University of British Columbia Vancouver, Canada {hoffmanm,ebrochu,nando}@cs.ubc.ca

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

More information

Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

More information

Notes from Week 1: Algorithms for sequential prediction

Notes from Week 1: Algorithms for sequential prediction CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

More information

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER 93 CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER 5.1 INTRODUCTION The development of an active trap based feeder for handling brakeliners was discussed

More information

Line-Torus Intersection for Ray Tracing: Alternative Formulations

Line-Torus Intersection for Ray Tracing: Alternative Formulations Line-Torus Intersection for Ra Tracing: Alternative Formulations VACLAV SKALA Department of Computer Science Engineering Universit of West Bohemia Univerzitni 8, CZ 30614 Plzen CZECH REPUBLIC http://www.vaclavskala.eu

More information

EXIT TIME PROBLEMS AND ESCAPE FROM A POTENTIAL WELL

EXIT TIME PROBLEMS AND ESCAPE FROM A POTENTIAL WELL EXIT TIME PROBLEMS AND ESCAPE FROM A POTENTIAL WELL Exit Time problems and Escape from a Potential Well Escape From a Potential Well There are many systems in physics, chemistry and biology that exist

More information

Using artificial intelligence for data reduction in mechanical engineering

Using artificial intelligence for data reduction in mechanical engineering Using artificial intelligence for data reduction in mechanical engineering L. Mdlazi 1, C.J. Stander 1, P.S. Heyns 1, T. Marwala 2 1 Dynamic Systems Group Department of Mechanical and Aeronautical Engineering,

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

A Learning Algorithm For Neural Network Ensembles

A Learning Algorithm For Neural Network Ensembles A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Handling Advertisements of Unknown Quality in Search Advertising

Handling Advertisements of Unknown Quality in Search Advertising Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Carnegie Mellon University spandey@cs.cmu.edu Christopher Olston Yahoo! Research olston@yahoo-inc.com Abstract We consider

More information

Multiple Kernel Learning on the Limit Order Book

Multiple Kernel Learning on the Limit Order Book JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor

More information

INVESTIGATIONS AND FUNCTIONS 1.1.1 1.1.4. Example 1

INVESTIGATIONS AND FUNCTIONS 1.1.1 1.1.4. Example 1 Chapter 1 INVESTIGATIONS AND FUNCTIONS 1.1.1 1.1.4 This opening section introduces the students to man of the big ideas of Algebra 2, as well as different was of thinking and various problem solving strategies.

More information

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09 1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Independent component ordering in ICA time series analysis

Independent component ordering in ICA time series analysis Neurocomputing 41 (2001) 145}152 Independent component ordering in ICA time series analysis Yiu-ming Cheung*, Lei Xu Department of Computer Science and Engineering, The Chinese University of Hong Kong,

More information

A New Quantitative Behavioral Model for Financial Prediction

A New Quantitative Behavioral Model for Financial Prediction 2011 3rd International Conference on Information and Financial Engineering IPEDR vol.12 (2011) (2011) IACSIT Press, Singapore A New Quantitative Behavioral Model for Financial Prediction Thimmaraya Ramesh

More information

Probability and Random Variables. Generation of random variables (r.v.)

Probability and Random Variables. Generation of random variables (r.v.) Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly

More information

Higher. Polynomials and Quadratics 64

Higher. Polynomials and Quadratics 64 hsn.uk.net Higher Mathematics UNIT OUTCOME 1 Polnomials and Quadratics Contents Polnomials and Quadratics 64 1 Quadratics 64 The Discriminant 66 3 Completing the Square 67 4 Sketching Parabolas 70 5 Determining

More information

Load Balancing and Switch Scheduling

Load Balancing and Switch Scheduling EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

More information

Items related to expected use of graphing technology appear in bold italics.

Items related to expected use of graphing technology appear in bold italics. - 1 - Items related to expected use of graphing technology appear in bold italics. Investigating the Graphs of Polynomial Functions determine, through investigation, using graphing calculators or graphing

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Geometrical Approaches for Artificial Neural Networks

Geometrical Approaches for Artificial Neural Networks Geometrical Approaches for Artificial Neural Networks Centre for Computational Intelligence De Montfort University Leicester, UK email:elizondo@dmu.ac.uk http://www.cci.dmu.ac.uk/ Workshop on Principal

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Probability and Statistics

Probability and Statistics CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

More information

Monte Carlo Methods in Finance

Monte Carlo Methods in Finance Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne

More information

Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009 Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering

2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering 2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering Compulsory Courses IENG540 Optimization Models and Algorithms In the course important deterministic optimization

More information

arxiv:1410.4984v1 [cs.dc] 18 Oct 2014

arxiv:1410.4984v1 [cs.dc] 18 Oct 2014 Gaussian Process Models with Parallelization and GPU acceleration arxiv:1410.4984v1 [cs.dc] 18 Oct 2014 Zhenwen Dai Andreas Damianou James Hensman Neil Lawrence Department of Computer Science University

More information

Machine Learning in FX Carry Basket Prediction

Machine Learning in FX Carry Basket Prediction Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

Zeros of Polynomial Functions. The Fundamental Theorem of Algebra. The Fundamental Theorem of Algebra. zero in the complex number system.

Zeros of Polynomial Functions. The Fundamental Theorem of Algebra. The Fundamental Theorem of Algebra. zero in the complex number system. _.qd /7/ 9:6 AM Page 69 Section. Zeros of Polnomial Functions 69. Zeros of Polnomial Functions What ou should learn Use the Fundamental Theorem of Algebra to determine the number of zeros of polnomial

More information

Automated Stellar Classification for Large Surveys with EKF and RBF Neural Networks

Automated Stellar Classification for Large Surveys with EKF and RBF Neural Networks Chin. J. Astron. Astrophys. Vol. 5 (2005), No. 2, 203 210 (http:/www.chjaa.org) Chinese Journal of Astronomy and Astrophysics Automated Stellar Classification for Large Surveys with EKF and RBF Neural

More information

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

Finite-time Analysis of the Multiarmed Bandit Problem*

Finite-time Analysis of the Multiarmed Bandit Problem* Machine Learning, 47, 35 56, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Finite-time Analysis of the Multiarmed Bandit Problem* PETER AUER University of Technology Graz, A-8010

More information

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

2.1 Complexity Classes

2.1 Complexity Classes 15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Flexible Neural Trees Ensemble for Stock Index Modeling

Flexible Neural Trees Ensemble for Stock Index Modeling Flexible Neural Trees Ensemble for Stock Index Modeling Yuehui Chen 1, Ju Yang 1, Bo Yang 1 and Ajith Abraham 2 1 School of Information Science and Engineering Jinan University, Jinan 250022, P.R.China

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Aachen Summer Simulation Seminar 2014

Aachen Summer Simulation Seminar 2014 Aachen Summer Simulation Seminar 2014 Lecture 07 Input Modelling + Experimentation + Output Analysis Peer-Olaf Siebers pos@cs.nott.ac.uk Motivation 1. Input modelling Improve the understanding about how

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

Arrangements And Duality

Arrangements And Duality Arrangements And Duality 3.1 Introduction 3 Point configurations are tbe most basic structure we study in computational geometry. But what about configurations of more complicated shapes? For example,

More information

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions. Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing! MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt

More information

Learning in Abstract Memory Schemes for Dynamic Optimization

Learning in Abstract Memory Schemes for Dynamic Optimization Fourth International Conference on Natural Computation Learning in Abstract Memory Schemes for Dynamic Optimization Hendrik Richter HTWK Leipzig, Fachbereich Elektrotechnik und Informationstechnik, Institut

More information

A Potential-based Framework for Online Multi-class Learning with Partial Feedback

A Potential-based Framework for Online Multi-class Learning with Partial Feedback A Potential-based Framework for Online Multi-class Learning with Partial Feedback Shijun Wang Rong Jin Hamed Valizadegan Radiology and Imaging Sciences Computer Science and Engineering Computer Science

More information

Primality - Factorization

Primality - Factorization Primality - Factorization Christophe Ritzenthaler November 9, 2009 1 Prime and factorization Definition 1.1. An integer p > 1 is called a prime number (nombre premier) if it has only 1 and p as divisors.

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Ideal Class Group and Units

Ideal Class Group and Units Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals

More information

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student

More information

Virtual Power Limiter System which Guarantees Stability of Control Systems

Virtual Power Limiter System which Guarantees Stability of Control Systems Virtual Power Limiter Sstem which Guarantees Stabilit of Control Sstems Katsua KANAOKA Department of Robotics, Ritsumeikan Universit Shiga 525-8577, Japan Email: kanaoka@se.ritsumei.ac.jp Abstract In this

More information

Stochastic Gradient Method: Applications

Stochastic Gradient Method: Applications Stochastic Gradient Method: Applications February 03, 2015 P. Carpentier Master MMMEF Cours MNOS 2014-2015 114 / 267 Lecture Outline 1 Two Elementary Exercices on the Stochastic Gradient Two-Stage Recourse

More information

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

COMPLETE MARKETS DO NOT ALLOW FREE CASH FLOW STREAMS

COMPLETE MARKETS DO NOT ALLOW FREE CASH FLOW STREAMS COMPLETE MARKETS DO NOT ALLOW FREE CASH FLOW STREAMS NICOLE BÄUERLE AND STEFANIE GRETHER Abstract. In this short note we prove a conjecture posed in Cui et al. 2012): Dynamic mean-variance problems in

More information

THE SCHEDULING OF MAINTENANCE SERVICE

THE SCHEDULING OF MAINTENANCE SERVICE THE SCHEDULING OF MAINTENANCE SERVICE Shoshana Anily Celia A. Glass Refael Hassin Abstract We study a discrete problem of scheduling activities of several types under the constraint that at most a single

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

Thinkwell s Homeschool Algebra 2 Course Lesson Plan: 34 weeks

Thinkwell s Homeschool Algebra 2 Course Lesson Plan: 34 weeks Thinkwell s Homeschool Algebra 2 Course Lesson Plan: 34 weeks Welcome to Thinkwell s Homeschool Algebra 2! We re thrilled that you ve decided to make us part of your homeschool curriculum. This lesson

More information

THE SVM APPROACH FOR BOX JENKINS MODELS

THE SVM APPROACH FOR BOX JENKINS MODELS REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 23 36 THE SVM APPROACH FOR BOX JENKINS MODELS Authors: Saeid Amiri Dep. of Energy and Technology, Swedish Univ. of Agriculture Sciences, P.O.Box

More information

Forecasting methods applied to engineering management

Forecasting methods applied to engineering management Forecasting methods applied to engineering management Áron Szász-Gábor Abstract. This paper presents arguments for the usefulness of a simple forecasting application package for sustaining operational

More information

Georgia Department of Education Kathy Cox, State Superintendent of Schools 7/19/2005 All Rights Reserved 1

Georgia Department of Education Kathy Cox, State Superintendent of Schools 7/19/2005 All Rights Reserved 1 Accelerated Mathematics 3 This is a course in precalculus and statistics, designed to prepare students to take AB or BC Advanced Placement Calculus. It includes rational, circular trigonometric, and inverse

More information