Gaussian Process Optimization with Mutual Information


 Emily Summers
 2 years ago
 Views:
Transcription
1 Emile Contal CMLA, UMR CNRS 8536, ENS Cachan, France Vianne Perchet LPMA, Université Paris Diderot, France Nicolas Vaatis CMLA, UMR CNRS 8536, ENS Cachan, France Abstract In this paper, we analze a generic algorithm scheme for sequential global optimization using Gaussian processes. he upper bounds we derive on the cumulative regret for this generic algorithm improve b an exponential factor the previousl known bounds for algorithms like GP. We also introduce the novel Gaussian Process Mutual Information algorithm (GP), which significantl improves further these upper bounds for the cumulative regret. We confirm the efficienc of this algorithm on snthetic and real tasks against the natural competitor, GP, and also the Expected Improvement heuristic.. Introduction Stochastic optimization problems are encountered in numerous real world domains including engineering design (Wang & Shan, 7), finance (Ziemba & Vickson, 6), natural sciences (Floudas & Pardalos, ), or in machine learning for selecting models b tuning the parameters of learning algorithms (Snoek et al., ). We aim at finding the input of a given sstem which optimizes the output (or reward). In this view, an iterative procedure uses the previousl acquired measures to choose the next quer predicted to be the most useful. he goal is to maximize the sum of the rewards received at each iteration, that is to minimize the cumulative regret b balancing exploration (gathering information b favoring locations with high uncertaint) and exploitation (focusing on the optimum b favoring locations with high predicted reward). his optimization task becomes challenging when the dimension Proceedings of the 3 st International Conference on Machine Learning, Beijing, China, 4. JMLR: W&CP volume 3. Copright 4 b the author(s). of the search space is high and the evaluations are nois and expensive. Efficient algorithms have been studied to tackle this challenge such as multiarmed bandit (Auer et al., ; Kleinberg, 4; Bubeck et al., ; Audibert et al., ), active learning (Carpentier et al., ; Chen & Krause, 3) or Baesian optimization (Mockus, 989; Grunewalder et al., ; Srinivas et al., ; de Freitas et al., ). he theoretical analsis of such optimization procedures requires some prior assumptions on the underling function f. Modeling f as a function distributed from a Gaussian process (GP) enforces nearb locations to have close associated values, and allows to control the general smoothness of f with high probabilit according to the kernel of the GP (Rasmussen & Williams, 6). Our main contribution is twofold: we propose a generic algorithm scheme for Gaussian process optimization and we prove sharp upper bounds on its cumulative regret. he theoretical analsis has a direct impact on strategies built with the GP algorithm (Srinivas et al., ) such as (Krause & Ong, ; Desautels et al., ; Contal et al., 3). We suggest an alternative polic which achieves an exponential speed up with respect to the cumulative regret. We also introduce a novel algorithm, the Gaussian Process Mutual Information algorithm (GP), which improves furthermore upper bounds for the cumulative regret from O( a (log ) d+ ) for GP, the current state of the art, to the spectacular O( a (log ) d+ ), where is the number of iterations, d is the dimension of the input space and the kernel function is Gaussian. he remainder of this article is organized as follows. We first introduce the setup and notations. We define the GP algorithm in Section. Main results on the cumulative regret bounds are presented in Section 3. We then provide technical details in Section 4. We finall confirm the performances of GP on real and snthetic tasks compared to the state of the art of GP optimization and some heuristics used in practice.
2 .. he Gaussian process framework Figure. One dimensional Gaussian process inference of the posterior mean µ (blue line) and posterior deviation σ (half of the height of the gra envelope) with squared exponential kernel, based on four observations (blue crosses).. Gaussian process optimization and the GP algorithm.. Sequential optimization and cumulative regret Let f : X R, where X R d is a compact and convex set, be the unknown function modeling the sstem we want to be optimized. We consider the problem of finding the maximum of f denoted b: f(x ) = max x X f(x), via successive queries x, x,... X. At iteration +, the choice of the next quer x + depends on the previous nois observations, Y = {,..., } at locations X = {x,..., x } where t = f(x t ) + ɛ t for all t, and the noise variables ɛ,..., ɛ are independentl distributed as a Gaussian random variable N (, σ ) with zero mean and variance σ. he efficienc of a polic and its abilit to address the exploration/exploitation tradeoff is measured via the cumulative regret R, defined as the sum of the instantaneous regret r t, the gaps between the value of the maximum and the values at the sample locations, r t = f(x ) f(x t ) for t and R = f(x ) f(x t ). Our aim is to obtain upper bounds on the cumulative regret R with high probabilit. Prior assumption on f. In order to control the smoothness of the underling function, we assume that f is sampled from a Gaussian process GP(m, k) with mean function m : X R and kernel function k : X X R +. We formalize in this manner the prior assumption that high local variations of f have low probabilit. he prior mean function is considered without loss of generalit to be zero, as the kernel k can completel define the GP (Rasmussen & Williams, 6). We consider the normalized and dimensionless framework introduced b (Srinivas et al., ) where the variance is assumed to be bounded, that is k(x, x) for all x X. Baesian inference. At iteration +, given the previousl observed nois values Y at locations X, we use Baesian inference to compute the current posterior distribution (Rasmussen & Williams, 6), which is a GP of mean µ + and variance σ + given at an x X b, µ + (x) = k (x) C Y () and σ +(x) = k(x, x) k (x) C k (x), () where k (x) = [k(x t, x)] xt X is the vector of covariances between x and the quer points at time, and C = K +σ I with K = [k(x t, x t )] xt,x t X the kernel matrix, σ is the variance of the noise and I stands for the identit matrix. he Baesian inference is illustrated on Figure in a sample problem in dimension one, where the posteriors are based on four observations of a Gaussian Process with squared exponential kernel. he height of the gra area represents two posterior standard deviations at each point..3. he Gaussian Process Mutual Information algorithm he Gaussian Process Mutual Information algorithm (GP ) is presented as Algorithm. he ke statement is the choice of the quer, x t = argmax x X µ t (x) + φ t (x). he exploitation abilit of the procedure is driven b µ t, while the exploration is governed b φ t : X R, which is an increasing function of σt (x). he novelt in the GP algorithm is that φ t is empiricall controlled b the amount of exploration that has been alread done, that is, the more the algorithm has gathered information on f, the more it will focus on the optimum. In the GP algorithm from (Srinivas et al., ) the exploration coefficient is a O(log t) and therefore tends to infinit. he parameter α in Algorithm governs the tradeoff between precision and confidence, as shown in heorem. he efficienc of the algorithm is robust to the choice of its value. We confirm empiricall this propert and provide further discussion on the calibration of α in Section 5.
3 Algorithm GP pγ for t =,,... do // Baesian inference (Eq., ) Compute µ t and σt // Definition of φ t (x) for all x X φ t (x)? α ˆbσ t (x) + pγ t a pγ t // Selection of the next quer location x t argmax x X µ t (x) + φ t (x) // Update pγ t pγ t pγ t + σ t (x t ) // Quer Sample x t and observe t end for Mutual information. he quantit pγ controlling the exploration in our algorithm forms a lower bound on the information acquired on f b the quer points X. he information on f is formall defined b I (X ), the mutual information between f and the nois observations Y at X, hence the name of the GP algorithm. For a Gaussian process distribution I (X ) = log det(i + σ K ) where K is the kernel matrix [k(x i, x j )] xi,x j X. We refer to (Cover & homas, 99) for further reading on mutual information. We denote b γ = max X X : X = I (X ) the maximum mutual information obtainable b a sequence of quer points. In the case of Gaussian processes with bounded variance, the following inequalit is satisfied (Lemma 5.4 in (Srinivas et al., )): pγ = σt (x t ) C γ (3) where C = log(+σ ) and σ is the noise variance. he upper bounds on the cumulative regret we derive in the next section depend mainl on this ke quantit. 3. Main Results 3.. Generic Optimization Scheme We first consider the generic optimization scheme defined in Algorithm, where we let φ t as a generic function viewed as a parameter of the algorithm. We onl require φ t to be measurable with respect to Y t, the observations available at iteration t. he theoretical analsis of Algorithm can be used as a plugin theorem for existing algorithms. For example the GP algorithm with parameter β t = O(log t) is obtained with φ t (x) = a β t σ t (x). A generic analsis of Algorithm leads to the following upper bounds on the cumulative regret with high probabilit. Algorithm Generic Optimization Scheme (φ t ) for t =,,... do Compute µ t and φ t x t argmax x X µ t (x) + φ t (x) Sample x t and observe t end for heorem (Regret bounds for the generic algorithm). For all δ > and >, the regret R incurred b Algorithm on f distributed as a GP perturbed b independent Gaussian noise with variance σ satisfies the following bound with high probabilit, with C = log(+σ ) and α = log δ : Pr [ R pφ t (x t ) φ t (x )q + 4 a? α α(c γ + ) + ] σ t (x )? δ. C γ + he proof of heorem, relies on concentration guarantees for Gaussian processes (see Section 4.). heorem provides an intermediate result used for the calibration of φ t to face the exploration/exploitation tradeoff. For example b choosing φ t (x) =? α σ t (x) (where the dimensional constant is hidden), Algorithm becomes a variant of the GP algorithm where in particular the exploration parameter? β t is fixed to? α instead of being an increasing function of t. he upper bounds on the cumulative regret with this definition of φ t are of the form R = O(γ ), as stated in Corollar. We then also consider the case where the kernel k of the Gaussian process is under the form of a squared exponential (RBF) kernel, k(x, x ) = exp x x l, for all x, x X and length scale l R. In this setting, the maximum mutual information γ satisfies the upper bound γ = Op(log ) d+ q, where d is the dimension of the input space (Srinivas et al., ). Corollar. Consider the Algorithm where we set φ t (x) =? α σ t (x). Under the assumptions of heorem, we have that the cumulative regret for Algorithm satisfies the following upper bounds with high probabilit: For f sampled from a GP with general kernel: R = O(γ ). For f sampled from a GP with RBF kernel: R = Op(log ) d+ q. o prove Corollar we appl heorem with the given definition of φ t and then Equation 3, which leads to Pr rr? α C γ + 4 a α(c γ + )s δ. he
4 previousl known upper bounds on the cumulative regret for the GP algorithm are of the form R = Op? β γ q where β = Op log δ q. he improvement of the generic Algorithm with φ t (x) =? α σ t (x) over the GP algorithm with respect to the cumulative regret is then exponential in the case of Gaussian processes with RBF kernel. For f sampled from a GP with linear kernel, corresponding to f(x) = w x with w N (, I), we obtain R = Opd log q. We remark that the GP assumption with linear kernel is more restrictive than the linear bandit framework, as it implies a Gaussian prior over the linear coefficients w. Hence there is no contradiction with the lower bounds stated for linear bandit like those of (Dani et al., 8). We refer to (Srinivas et al., ) for the analsis of γ with other kernels widel used in practice. 3.. Regret bounds for the GP algorithm We present here the main result of the paper, the upper bound on the cumulative regret for the GP algorithm. heorem (Regret bounds for the GP algorithm). For all δ > and >, the regret R incurred b Algorithm on f distributed as a GP perturbed b independent Gaussian noise with variance σ satisfies the following bound with high probabilit, with C = log(+σ ) and α=log δ : Pr R 5 a αc γ + 4? ı α δ. he proof for heorem is provided in Section 4., where we analze the properties of the exploration functions φ t. Corollar describes the case with RBF kernel for the GP algorithm. Corollar (RBF kernels). he cumulative regret R incurred b Algorithm on f sampled from a GP with RBF kernel satisfies with high probabilit, R = O (log ) d+. he GP algorithm significantl improves the upper bounds for the cumulative regret over the GP algorithm and the alternative polic of Corollar. 4. heoretical analsis In this section, we provide the proofs for heorem and heorem. he approach presented here to stud the cumulative regret incurred b Gaussian process optimization strategies is general and can be used further for other algorithms. 4.. Analsis of the general algorithm he theoretical analsis of heorem uses a similar approach to the AzumaHoeffding inequalit adapted for Gaussian processes. Let r t = f(x ) f(x t ) for all t. We define M, which is shown later to be a martingale with respect to Y, M = r t pµ t (x ) µ t (x t )q, (4) for and M =. Let Y t be defined as the martingale difference sequence with respect to M, that is the difference between the instantaneous regret and the gap between the posterior mean for the optimum and the one for the point queried, Y t = M t M t = r t pµ t (x ) µ t (x t )q for t. Lemma. he sequence M is a martingale with respect to Y and for all t, given Y t, the random variable Y t is distributed as a Gaussian N (, l t ) with zero mean and variance l t, where: l t = σ t (x ) + σ t (x t ) k(x, x t ). (5) Proof. From the GP assumption, we know that given Y t, the distribution of f(x) is Gaussian N pµ t (x), σ t (x)q for all x X, and r t is a projection of a Gaussian random vector, that is r t is distributed as a Gaussian N pµ t (x ) µ t (x t ), l t q and Y t is distributed as Gaussian N (, l t ), with l t = σ t (x ) + σ t (x t ) k(x, x t ), hence M is a Gaussian martingale. We now give a concentration result for M using inequalities for selfnormalized martingales. Lemma. For all δ > and >, the martingale M normalized b the predictable quadratic variation l t satisfies the following concentration inequalit with α = log δ and = 8(C γ + ): Pr [ M a α + c α ] σt (x ) δ. Proof. Let = 8(C γ + ). We introduce the notation Pr > [A] for Pr[A l t > ] and Pr [A] for Pr[A l t ]. Given that M t is a Gaussian martingale, using heorem 4. and Remark 4. from (Bercu & ouati, 8) with M = l t and a = and b = we obtain for all x > : Pr > With x = b α M l t ı > x < exp x where α = log δ, we have: Pr > M > b α. ı l t < δ.
5 B definition of l t in Eq. 5 and with k(x, x t ), we have for all t that l t σt (x t ) + σt (x ). Using Equation 3 we have pγ 8, we finall get: Pr > M >? b α 8 + α σ t (x )ı < δ. (6) Now, using heorem 4. and Remark 4. from (Bercu & ouati, 8) the following inequalit is satisfied for all x > : Pr rm > xs < exp. With x =? α we have: x Pr rm >? αs < δ. (7) Combining Equations 6 and 7 leads to, Pr «proving Lemma. M > a α + c α ff σt (x ) < δ, he following lemma concludes the proof of heorem using the previous concentration result and the properties of the generic Algorithm. Lemma 3. he cumulative regret for Algorithm on f sampled from a GP satisfies the following bound for all δ > and α and defined in Lemma : Pr [ + R c α pφ t (x t ) φ t (x )q + a α ] σt (x ) δ. Proof. B construction of the generic Algorithm, we have x t = argmax x X µ t (x) + φ t (x), which guarantees for all t that µ t (x ) µ t (x t ) φ t (x t ) φ t (x ). Replacing M b its definition in Eq. 4 and using the previous propert in Lemma proves Lemma Analsis of the GP algorithm In order to bound the cumulative regret for the GP algorithm, we focus on an alternative definition of the exploration functions φ t where the last term is modified inductivel so as to simplif the sum φ t(x t ) for all >. Being a constant term for a fixed t >, Algorithm remains unchanged. Let φ t be defined as, b t φ t (x) = α(σt (x) + pγ t ) φ i (x i ), i= where x t is the point selected b Algorithm at iteration t. We have for all >, φ t (x t ) = a αpγ φ t (x t ) + φ t (x t ) = a αpγ. (8) We can now derive upper bounds for pφ t(x t ) φ t (x )q which will be plugged in heorem in order to cancel out the terms involving x. In this manner we can calibrate sharpl the exploration/exploitation tradeoff b optimizing the remaining terms. Lemma 4. For the GP algorithm, the exploration term in the equation of heorem satisfies the following inequalit: pφ t (x t ) φ t (x )q a? α αpγ σ t (x ) a pγ + Proof. Using our alternative definition of φ t which gives the equalit stated in Equation 8, we know that, pφ t (x t ) φ t (x )q =? α apγ + a b pγt pγ t + σt (x ). B concavit of the square root, we have for all a b that? a + b? a b? a. Introducing the notations a t = pγ t + σt (x ) and b t = σt (x ), we obtain, pφ t (x t ) φ t (x )q a? α αpγ + b t? at. Moreover, with σ t (x) for all x X, we have a t pγ + and b t for all t which gives, b t? σ t (x ) a, at pγ + leading to the inequalit of Lemma 4. he following lemma combines the results from heorem and Lemma 4 to derive upper bounds on the cumulative regret for the GP algorithm with high probabilit. Lemma 5. he cumulative regret for Algorithm on f sampled from a GP satisfies the following bound for all δ > and α defined in Lemma, Pr R 5 a αc γ + 4? ı α δ..
6 Proof. Considering heorem in the case of the GP algorithm and bounding pφ t(x t ) φ t (x )q with Lemma 4, we obtain the following bound on the cumulative regret incurred b GP: [ Pr R a? α αpγ + 4 a? α α(c γ + ) + Gaussian Process Optimization with Mutual Information σ t (x ) a pγ + ] σ t (x )? δ, C γ + which simplifies to the inequalit of Lemma 5 using Equation 3, and thus proves heorem. 5. Practical considerations and experiments 5.. Numerical experiments Protocol. We compare the empirical performances of our algorithm against the stateoftheart of GP optimization, the GP algorithm (Srinivas et al., ), and a commonl used heuristic, the Expected Improvement () algorithm with GP (Jones et al., 998). he tasks used for assessment come from two real applications and five snthetic problems described here. For all data sets and algorithms the learners were initialized with a random subset of observations {(x i, i )} i. When the prior distribution of the underling function was not known, the Baesian inference was made using a squared exponential kernel. We first picked the half of the data set to estimate the hperparameters of the kernel via cross validation in this subset. In this wa, each algorithm was running with the same prior information. he value of the parameter δ for the GP and the GP algorithms was fixed to δ = 6 for all these experimental tasks. Modifing this value b several orders of magnitude is insignificant with respect to the empirical mean cumulative regret incurred b the algorithms, as discussed in Section 5.. he results are provided in Figure 3. he curves show the evolution of the average regret R in term of iteration. We report the mean value with the confidence interval over a hundred experiments. Description of the data sets. data sets used for assessment. We describe briefl all the Generated GP. he generated Gaussian process functions are random GPs drawn from an isotropic Matérn kernel in dimension and 4, with the kernel bandwidth set to for dimension, and 6 for dimension 4. he Matérn parameter was set to ν = 3 and the noise standard deviation to % of the signal standard deviation. Gaussian Mixture. his snthetic function comes from the addition of three D Gaussian functions. (a) Gaussian mixture (b) Himmelblau Figure. Visualization of the snthetic functions used for assessment We then perturb these Gaussian functions with smooth variations generated from a Gaussian Process with isotropic Matérn Kernel and % of noise. It is shown on Figure (a). he highest peak being thin, the sequential search for the maximum of this function is quite challenging. Himmelblau. his task is another snthetic function in dimension. We compute a slightl tilted version of the Himmelblau s function with the addition of a linear function, and take the opposite to match the challenge of finding its maximum. his function presents four peaks but onl one global maximum. It gives a practical wa to test the abilit of a strateg to manage exploration/exploitation tradeoffs. It is represented in Figure (b).
7 Branin. he Branin or BraninHoo function is a common benchmark function for global optimization. It presents three global optimum in the D square [ 5, ] [, 5]. his benchmark is one of the two snthetic functions used b (Srinivas et al., ) to evaluate the empirical performances of the GP algorithm. No noise has been added to the original signal in this experimental task. GoldsteinPrice. he Goldstein & Price function is an other benchmark function for global optimization, with a single global optimum but several local optima in the D square [, ] [, ]. his is the second snthetic benchmark used b (Srinivas et al., ). Like in the previous challenge, no noise has been added to the original signal. sunamis. Recent posttsunami surve data as well as the numerical simulations of (Hill et al., ) have shown that in some cases the runup, which is the maximum vertical extent of wave climbing on a beach, in areas which were supposed to be protected b small islands in the vicinit of coast, was significantl higher than in neighboring locations. Motivated b these observations (Stefanakis et al., ) investigated this phenomenon b emploing numerical simulations using the VOLNA code (Dutkh et al., ) with the simplified geometr of a conical island sitting on a flat surface in front of a sloping beach. In the stud of (Stefanakis et al., 3) the setup was controlled b five phsical parameters and the aim was to find with confidence and with the least number of simulations the parameters leading to the maximum runup amplification. MackeGlass function. he MackeGlass deladifferential equation is a chaotic sstem in dimension 6, but without noise. It models real feedback sstems and is used in phsiological domains such as hematolog, cardiolog, neurolog, and pschiatr. he highl chaotic behavior of this function makes it an exceptionall difficult optimization problem. It has been used as a benchmark for example b (Flake & Lawrence, ). Empirical comparison of the algorithms. Figure 3 compares the empirical mean average regret R for the three algorithms. On the eas optimization assessments like the Branin data set (Fig. 3(e)) the three strategies behave in a comparable manner, but the GP algorithm incurs a larger cumulative regret. For more difficult assessments the GP algorithm performs poorl and our algorithm alwas surpasses the heuristic. he improvement of the GP algorithm against the two competitors is the most significant for exceptionall challenging optimization tasks as illustrated in Figures 3(a) to 3(d) and R / R / R / R / (a) Generated GP(d = ) , (c) Gaussian Mixture (e) Branin (g) sunamis (b) Generated GP(d = 4) (d) Himmelblau (f) Goldstein , (h) MackeGlass Figure 3. Empirical mean and confidence interval of the average regret R in term of iteration on real and snthetic tasks for the GP and GP algorithms and the heuristic (lower is better). 3(h), where the underling functions present several local optima. he abilit of our algorithm to deal with the exploration/exploitation tradeoff is emphasized b these experimental results as its average regret decreases directl after the first iterations, avoiding unwanted exploration like GP on Figures 3(a) to 3(d), or getting stuck in some local optimum like on Figures 3(c), 3(g) and 3(h). We further mention that the GP algorithm is empiricall robust against the number of dimensions of the data set (Fig. 3(b), 3(g), 3(h)). 5.. Practical aspects Calibration of α. he value of the parameter α is chosen following heorem as α = log δ with < δ < being a confidence parameter. he guarantees we prove in Section 4. on the cumulative regret for the GP algorithm holds with probabilit at least δ. With α increasing linearl for δ decreasing exponentiall toward, the algorithm is
8 R / δ = 9 δ = 6 δ = 3 δ = References Audibert, JY., Bubeck, S., and Munos, R. Bandit view on nois optimization. In Optimization for Machine Learning, pp Press,. Auer, P., CesaBianchi, N., and Fischer, P. Finitetime analsis of the multiarmed bandit problem. Machine Learning, 47(3):35 56,. Bercu, B. and ouati, A. Exponential inequalities for selfnormalized martingales with applications. he Annals of Applied Probabilit, 8(5): , Bubeck, S., Munos, R., Stoltz, G., and Szepesvári, C. X armed bandits. Journal of Machine Learning Research, : ,. Figure 4. Small impact of the value of δ on the mean average regret of the GP algorithm running on the Himmelblau data set. robust to the choice of δ. We present on Figure 4 the small impact of δ on the average regret for four different values selected on a wide range. Numerical Complexit. Even if the numerical cost of GP is insignificant in practice compared to the cost of the evaluation of f, the complexit of the sequential Baesian update (Osborne, ) is O( ) and might be prohibitive for large. One can reduce drasticall the computational time b means of Laz Variance Calculation (Desautels et al., ), built on the fact that σ (x) alwas decreases for increasing and for all x X. We further mention that approximated inference algorithms such as the EP approximation and MCMC sampling (Kuss et al., 5) can be used as an alternative if the computational time is a restrictive factor. 6. Conclusion We introduced the GP algorithm for GP optimization and prove upper bounds on its cumulative regret which improve exponentiall the stateoftheart in common settings. he theoretical analsis was presented in a generic framework in order to expand its impact to other similar algorithms. he experiments we performed on real and snthetic assessments confirmed empiricall the efficienc of our algorithm against both the theoretical stateoftheart of GP optimization, the GP algorithm, and the commonl used heuristic. ACKNOWLEDGEMENS he authors would like to thank David Buffoni and Raphaël Bonaque for fruitful discussions. he author also thank the anonmous reviewers for their detailed feedback. Carpentier, A., Lazaric, A., Ghavamzadeh, M., Munos, R., and Auer, P. Upperconfidencebound algorithms for active learning in multiarmed bandits. In Proceedings of the International Conference on Algorithmic Learning heor, pp SpringerVerlag,. Chen, Y. and Krause, A. Nearoptimal batch mode active learning and adaptive submodular optimization. In Proceedings of the International Conference on Machine Learning. icml.cc / Omnipress, 3. Contal, E., Buffoni, D., Robicquet, A., and Vaatis, N. Parallel Gaussian process optimization with upper confidence bound and pure exploration. In Machine Learning and Knowledge Discover in Databases, volume 888, pp Springer Berlin Heidelberg, 3. Cover,. M. and homas, J. A. Elements of Information heor. WileInterscience, 99. Dani, V., Haes,. P., and Kakade, S. M. Stochastic linear optimization under bandit feedback. In Proceedings of the st Annual Conference on Learning heor, pp , 8. de Freitas, N., Smola, A. J., and Zoghi, M. Exponential regret bounds for Gaussian process bandits with deterministic observations. In Proceedings of the 9th International Conference on Machine Learning. icml.cc / Omnipress,. Desautels,., Krause, A., and Burdick, J.W. Parallelizing explorationexploitation tradeoffs with Gaussian process bandit optimization. In Proceedings of the 9th International Conference on Machine Learning, pp icml.cc / Omnipress,. Dutkh, D., Poncet, R, and Dias, F. he VOLNA code for the numerical modelling of tsunami waves: generation, propagation and inundation. European Journal of Mechanics B/Fluids, 3:598 65,.
9 Flake, G. W. and Lawrence, S. Efficient SVM regression training with SMO. Machine Learning, 46(3):7 9,. Floudas, C.A. and Pardalos, P.M. Optimization in Computational Chemistr and Molecular Biolog: Local and Global Approaches. Nonconvex Optimization and Its Applications. Springer,. Grunewalder, S., Audibert, JY., Opper, M., and Shawe alor, J. Regret bounds for Gaussian process bandit problems. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pp Press,. Hill, E. M., Borrero, J. C., Huang, Z., Qiu, Q., Banerjee, P., Natawidjaja, D. H., Elosegui, P., Fritz, H. M., Suwargadi, B. W., Prananto, I. R., Li, L., Macpherson, K. A., Skanavis, V., Snolakis, C. E., and Sieh, K. he Mw 7.8 Mentawai earthquake: Ver shallow source of a rare tsunami earthquake determined from tsunami field surve and nearfield GPS data. J. Geophs. Res., 7: B64,. Jones, D. R., Schonlau, M., and Welch, W. J. Efficient global optimization of expensive blackbox functions. Journal of Global Optimization, 3(4):455 49, December 998. Srinivas, N., Krause, A., Kakade, S., and Seeger, M. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the International Conference on Machine Learning, pp. 5. icml.cc / Omnipress,. Srinivas, N., Krause, A., Kakade, S., and Seeger, M. Informationtheoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE ransactions on Information heor, 58(5):35 365,. Stefanakis,. S., Dias, F., Vaatis, N., and Guillas, S. Longwave runup on a plane beach behind a conical island. In Proceedings of the World Conference on Earthquake Engineering,. Stefanakis,. S., Contal, E., Vaatis, N., Dias, F., and Snolakis, C. E. Can small islands protect nearb coasts from tsunamis? An active experimental design approach. arxiv preprint arxiv: , 3. Wang, G. and Shan, S. Review of metamodeling techniques in support of engineering design optimization. Journal of Mechanical Design, 9(4):37 38, 7. Ziemba, W.. and Vickson, R. G. Stochastic optimization models in finance. World Scientific Singapore, 6. Kleinberg, R. Nearl tight bounds for the continuumarmed bandit problem. In Advances in Neural Information Processing Sstems 7, pp Press, 4. Krause, A. and Ong, C.S. Contextual Gaussian process bandit optimization. In Advances in Neural Information Processing Sstems 4, pp ,. Kuss, M., Pfingsten,., Csató, L., and Rasmussen, C.E. Approximate inference for robust Gaussian process regression. Max Planck Inst. Biological Cbern., ubingen, Germanech. Rep, 36, 5. Mockus, J. Baesian approach to global optimization. Mathematics and its applications. Kluwer Academic, 989. Osborne, Michael. Baesian Gaussian processes for sequential prediction, optimisation and quadrature. PhD thesis, Oxford Universit New College,. Rasmussen, C. E. and Williams, C. Gaussian Processes for Machine Learning. Press, 6. Snoek, J., Larochelle, H., and Adams, R. P. Practical baesian optimization of machine learning algorithms. In Advances in Neural Information Processing Sstems 5, pp ,.
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
: No Regret and Experimental Design Niranjan Srinivas Andreas Krause California Institute of Technology, Pasadena, CA, USA Sham Kakade University of Pennsylvania, Philadelphia, PA, USA Matthias Seeger
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationTrading regret rate for computational efficiency in online learning with limited feedback
Trading regret rate for computational efficiency in online learning with limited feedback Shai ShalevShwartz TTIC Hebrew University Online Learning with Limited Feedback Workshop, 2009 June 2009 Shai
More informationRegression Using Support Vector Machines: Basic Foundations
Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More information5. Linear regression and correlation
Statistics for Engineers 51 5. Linear regression and correlation If we measure a response variable at various values of a controlled variable, linear regression is the process of fitting a straight line
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRALab,
More informationMonotone multiarmed bandit allocations
JMLR: Workshop and Conference Proceedings 19 (2011) 829 833 24th Annual Conference on Learning Theory Monotone multiarmed bandit allocations Aleksandrs Slivkins Microsoft Research Silicon Valley, Mountain
More informationLocal Gaussian Process Regression for Real Time Online Model Learning and Control
Local Gaussian Process Regression for Real Time Online Model Learning and Control Duy NguyenTuong Jan Peters Matthias Seeger Max Planck Institute for Biological Cybernetics Spemannstraße 38, 776 Tübingen,
More informationIdentify a pattern and find the next three numbers in the pattern. 5. 5(2s 2 1) 2 3(s 1 2); s 5 4
Chapter 1 Test Do ou know HOW? Identif a pattern and find the net three numbers in the pattern. 1. 5, 1, 3, 7, c. 6, 3, 16, 8, c Each term is more than the previous Each term is half of the previous term;
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationCrossvalidation for detecting and preventing overfitting
Crossvalidation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationUniversal Algorithm for Trading in Stock Market Based on the Method of Calibration
Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.
More informationCORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREERREADY FOUNDATIONS IN ALGEBRA
We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREERREADY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical
More informationBig Data  Lecture 1 Optimization reminders
Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationAdaptive Search with Stochastic Acceptance Probabilities for Global Optimization
Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,
More informationPortfolio Allocation for Bayesian Optimization
Portfolio Allocation for Bayesian Optimization Matthew Hoffman, Eric Brochu, Nando de Freitas Department of Computer Science University of British Columbia Vancouver, Canada {hoffmanm,ebrochu,nando}@cs.ubc.ca
More informationOnline Learning with Switching Costs and Other Adaptive Adversaries
Online Learning with Switching Costs and Other Adaptive Adversaries Nicolò CesaBianchi Università degli Studi di Milano Italy Ofer Dekel Microsoft Research USA Ohad Shamir Microsoft Research and the Weizmann
More informationOnline Optimization and Personalization of Teaching Sequences
Online Optimization and Personalization of Teaching Sequences Benjamin Clément 1, Didier Roy 1, Manuel Lopes 1, PierreYves Oudeyer 1 1 Flowers Lab Inria Bordeaux SudOuest, Bordeaux 33400, France, didier.roy@inria.fr
More informationNon Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization
Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization Jean Damien Villiers ESSEC Business School Master of Sciences in Management Grande Ecole September 2013 1 Non Linear
More informationNonparametric adaptive age replacement with a onecycle criterion
Nonparametric adaptive age replacement with a onecycle criterion P. CoolenSchrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK email: Pauline.Schrijner@durham.ac.uk
More informationNeural Networks. CAP5610 Machine Learning Instructor: GuoJun Qi
Neural Networks CAP5610 Machine Learning Instructor: GuoJun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #47/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationCHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER
93 CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER 5.1 INTRODUCTION The development of an active trap based feeder for handling brakeliners was discussed
More informationNotes from Week 1: Algorithms for sequential prediction
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 2226 Jan 2007 1 Introduction In this course we will be looking
More informationEXIT TIME PROBLEMS AND ESCAPE FROM A POTENTIAL WELL
EXIT TIME PROBLEMS AND ESCAPE FROM A POTENTIAL WELL Exit Time problems and Escape from a Potential Well Escape From a Potential Well There are many systems in physics, chemistry and biology that exist
More informationUsing artificial intelligence for data reduction in mechanical engineering
Using artificial intelligence for data reduction in mechanical engineering L. Mdlazi 1, C.J. Stander 1, P.S. Heyns 1, T. Marwala 2 1 Dynamic Systems Group Department of Mechanical and Aeronautical Engineering,
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationDoptimal plans in observational studies
Doptimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network?  Perceptron learners  Multilayer networks What is a Support
More informationA Learning Algorithm For Neural Network Ensembles
A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICETUNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República
More informationTetris: Experiments with the LP Approach to Approximate DP
Tetris: Experiments with the LP Approach to Approximate DP Vivek F. Farias Electrical Engineering Stanford University Stanford, CA 94403 vivekf@stanford.edu Benjamin Van Roy Management Science and Engineering
More informationLineTorus Intersection for Ray Tracing: Alternative Formulations
LineTorus Intersection for Ra Tracing: Alternative Formulations VACLAV SKALA Department of Computer Science Engineering Universit of West Bohemia Univerzitni 8, CZ 30614 Plzen CZECH REPUBLIC http://www.vaclavskala.eu
More information1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09
1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch
More informationHandling Advertisements of Unknown Quality in Search Advertising
Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Carnegie Mellon University spandey@cs.cmu.edu Christopher Olston Yahoo! Research olston@yahooinc.com Abstract We consider
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationIndependent component ordering in ICA time series analysis
Neurocomputing 41 (2001) 145}152 Independent component ordering in ICA time series analysis Yiuming Cheung*, Lei Xu Department of Computer Science and Engineering, The Chinese University of Hong Kong,
More informationarxiv:1410.4984v1 [cs.dc] 18 Oct 2014
Gaussian Process Models with Parallelization and GPU acceleration arxiv:1410.4984v1 [cs.dc] 18 Oct 2014 Zhenwen Dai Andreas Damianou James Hensman Neil Lawrence Department of Computer Science University
More informationProbability and Random Variables. Generation of random variables (r.v.)
Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly
More informationA New Quantitative Behavioral Model for Financial Prediction
2011 3rd International Conference on Information and Financial Engineering IPEDR vol.12 (2011) (2011) IACSIT Press, Singapore A New Quantitative Behavioral Model for Financial Prediction Thimmaraya Ramesh
More informationL1 vs. L2 Regularization and feature selection.
L1 vs. L2 Regularization and feature selection. Paper by Andrew Ng (2004) Presentation by Afshin Rostami Main Topics Covering Numbers Definition Convergence Bounds L1 regularized logistic regression L1
More informationProbability and Statistics
CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b  0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute  Systems and Modeling GIGA  Bioinformatics ULg kristel.vansteen@ulg.ac.be
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationItems related to expected use of graphing technology appear in bold italics.
 1  Items related to expected use of graphing technology appear in bold italics. Investigating the Graphs of Polynomial Functions determine, through investigation, using graphing calculators or graphing
More informationLoad Balancing and Switch Scheduling
EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationMonte Carlo Methods in Finance
Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction
More informationINVESTIGATIONS AND FUNCTIONS 1.1.1 1.1.4. Example 1
Chapter 1 INVESTIGATIONS AND FUNCTIONS 1.1.1 1.1.4 This opening section introduces the students to man of the big ideas of Algebra 2, as well as different was of thinking and various problem solving strategies.
More informationFinitetime Analysis of the Multiarmed Bandit Problem*
Machine Learning, 47, 35 56, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Finitetime Analysis of the Multiarmed Bandit Problem* PETER AUER University of Technology Graz, A8010
More informationMultiple Kernel Learning on the Limit Order Book
JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John ShaweTaylor
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationDiscussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.
Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine GenonCatalot To cite this version: Fabienne
More informationPrentice Hall Algebra 2 2011 Correlated to: Colorado P12 Academic Standards for High School Mathematics, Adopted 12/2009
Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level
More informationIntroduction to nonparametric regression: Least squares vs. Nearest neighbors
Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,
More information(Quasi)Newton methods
(Quasi)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable nonlinear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More informationAutomated Stellar Classification for Large Surveys with EKF and RBF Neural Networks
Chin. J. Astron. Astrophys. Vol. 5 (2005), No. 2, 203 210 (http:/www.chjaa.org) Chinese Journal of Astronomy and Astrophysics Automated Stellar Classification for Large Surveys with EKF and RBF Neural
More informationDetection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
More informationMachine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
More information20142015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering
20142015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering Compulsory Courses IENG540 Optimization Models and Algorithms In the course important deterministic optimization
More informationHigher. Polynomials and Quadratics 64
hsn.uk.net Higher Mathematics UNIT OUTCOME 1 Polnomials and Quadratics Contents Polnomials and Quadratics 64 1 Quadratics 64 The Discriminant 66 3 Completing the Square 67 4 Sketching Parabolas 70 5 Determining
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationGeometrical Approaches for Artificial Neural Networks
Geometrical Approaches for Artificial Neural Networks Centre for Computational Intelligence De Montfort University Leicester, UK email:elizondo@dmu.ac.uk http://www.cci.dmu.ac.uk/ Workshop on Principal
More informationAlgebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
More informationApplications to Data Smoothing and Image Processing I
Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is
More informationFoundations of Machine Learning OnLine Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Foundations of Machine Learning OnLine Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More information2.1 Complexity Classes
15859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationFlexible Neural Trees Ensemble for Stock Index Modeling
Flexible Neural Trees Ensemble for Stock Index Modeling Yuehui Chen 1, Ju Yang 1, Bo Yang 1 and Ajith Abraham 2 1 School of Information Science and Engineering Jinan University, Jinan 250022, P.R.China
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationComputing with Finite and Infinite Networks
Computing with Finite and Infinite Networks Ole Winther Theoretical Physics, Lund University Sölvegatan 14 A, S223 62 Lund, Sweden winther@nimis.thep.lu.se Abstract Using statistical mechanics results,
More informationAachen Summer Simulation Seminar 2014
Aachen Summer Simulation Seminar 2014 Lecture 07 Input Modelling + Experimentation + Output Analysis PeerOlaf Siebers pos@cs.nott.ac.uk Motivation 1. Input modelling Improve the understanding about how
More informationConvex Optimization SVM s and Kernel Machines
Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationDuality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
More informationMATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!
MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Prealgebra Algebra Precalculus Calculus Statistics
More informationA Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data
A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt
More informationZeros of Polynomial Functions. The Fundamental Theorem of Algebra. The Fundamental Theorem of Algebra. zero in the complex number system.
_.qd /7/ 9:6 AM Page 69 Section. Zeros of Polnomial Functions 69. Zeros of Polnomial Functions What ou should learn Use the Fundamental Theorem of Algebra to determine the number of zeros of polnomial
More informationA Potentialbased Framework for Online Multiclass Learning with Partial Feedback
A Potentialbased Framework for Online Multiclass Learning with Partial Feedback Shijun Wang Rong Jin Hamed Valizadegan Radiology and Imaging Sciences Computer Science and Engineering Computer Science
More informationPrimality  Factorization
Primality  Factorization Christophe Ritzenthaler November 9, 2009 1 Prime and factorization Definition 1.1. An integer p > 1 is called a prime number (nombre premier) if it has only 1 and p as divisors.
More informationStochastic Gradient Method: Applications
Stochastic Gradient Method: Applications February 03, 2015 P. Carpentier Master MMMEF Cours MNOS 20142015 114 / 267 Lecture Outline 1 Two Elementary Exercices on the Stochastic Gradient TwoStage Recourse
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationLearning in Abstract Memory Schemes for Dynamic Optimization
Fourth International Conference on Natural Computation Learning in Abstract Memory Schemes for Dynamic Optimization Hendrik Richter HTWK Leipzig, Fachbereich Elektrotechnik und Informationstechnik, Institut
More informationSummary of Probability
Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible
More informationIdeal Class Group and Units
Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals
More informationA Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM
Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationVirtual Power Limiter System which Guarantees Stability of Control Systems
Virtual Power Limiter Sstem which Guarantees Stabilit of Control Sstems Katsua KANAOKA Department of Robotics, Ritsumeikan Universit Shiga 5258577, Japan Email: kanaoka@se.ritsumei.ac.jp Abstract In this
More informationThinkwell s Homeschool Algebra 2 Course Lesson Plan: 34 weeks
Thinkwell s Homeschool Algebra 2 Course Lesson Plan: 34 weeks Welcome to Thinkwell s Homeschool Algebra 2! We re thrilled that you ve decided to make us part of your homeschool curriculum. This lesson
More informationCurrent Standard: Mathematical Concepts and Applications Shape, Space, and Measurement Primary
Shape, Space, and Measurement Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two and threedimensional shapes by demonstrating an understanding of:
More informationTHE SCHEDULING OF MAINTENANCE SERVICE
THE SCHEDULING OF MAINTENANCE SERVICE Shoshana Anily Celia A. Glass Refael Hassin Abstract We study a discrete problem of scheduling activities of several types under the constraint that at most a single
More informationCHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
More informationCOMPLETE MARKETS DO NOT ALLOW FREE CASH FLOW STREAMS
COMPLETE MARKETS DO NOT ALLOW FREE CASH FLOW STREAMS NICOLE BÄUERLE AND STEFANIE GRETHER Abstract. In this short note we prove a conjecture posed in Cui et al. 2012): Dynamic meanvariance problems in
More informationGeorgia Department of Education Kathy Cox, State Superintendent of Schools 7/19/2005 All Rights Reserved 1
Accelerated Mathematics 3 This is a course in precalculus and statistics, designed to prepare students to take AB or BC Advanced Placement Calculus. It includes rational, circular trigonometric, and inverse
More informationTHE SVM APPROACH FOR BOX JENKINS MODELS
REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 23 36 THE SVM APPROACH FOR BOX JENKINS MODELS Authors: Saeid Amiri Dep. of Energy and Technology, Swedish Univ. of Agriculture Sciences, P.O.Box
More information