Learning using Large Datasets

Size: px
Start display at page:

Download "Learning using Large Datasets"

Transcription

1 Learning using Large Datasets Léon Bottou a an Olivier Bousquet b a NEC Laboratories America, Princeton, NJ08540, USA b Google Zürich, 8002 Zürich, Switzerlan Abstract. This contribution evelops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows istinct traeoffs for the case of small-scale an large-scale learning problems. Small-scale learning problems are subject to the usual approximation estimation traeoff. Large-scale learning problems are subject to a qualitatively ifferent traeoff involving the computational complexity of the unerlying optimization algorithms in non-trivial ways. For instance, a meiocre optimization algorithms, stochastic graient escent, is shown to perform very well on large-scale learning problems. Keywors. Large-scale learning. Optimization. Statistics. Introuction The computational complexity of learning algorithms has selom been taken into account by the learning theory. Valiant [1] states that a problem is learnable when there exists a probably approximatively correct learning algorithm with polynomial complexity. Whereas much progress has been mae on the statistical aspect (e.g., [2,3,4]), very little has been tol about the complexity sie of this proposal (e.g., [5].) Computational complexity becomes the limiting factor when one envisions large amounts of training ata. Two important examples come to min: Data mining exists because competitive avantages can be achieve by analyzing the masses of ata that escribe the life of our computerize society. Since virtually every computer generates ata, the ata volume is proportional to the available computing power. Therefore one nees learning algorithms that scale roughly linearly with the total volume of ata. Artificial intelligence attempts to emulate the cognitive capabilities of human beings. Our biological brains can learn quite efficiently from the continuous streams of perceptual ata generate by our six senses, using limite amounts of sugar as a source of power. This observation suggests that there are learning algorithms whose computing time requirements scale roughly linearly with the total volume of ata. This contribution fins its source in the iea that approximate optimization algorithms might be sufficient for learning purposes. The first part proposes new ecomposition of the test error where an aitional term represents the impact of approximate optimization. In the case of small-scale learning problems, this ecomposition reuces to the well known traeoff between approximation error an estimation error. In the case of

2 large-scale learning problems, the traeoff is more complex because it involves the computational complexity of the learning algorithm. The secon part explores the asymptotic properties of the large-scale learning traeoff for various prototypical learning algorithms uner various assumptions regaring the statistical estimation rates associate with the chosen objective functions. This part clearly shows that the best optimization algorithms are not necessarily the best learning algorithms. Maybe more surprisingly, certain algorithms perform well regarless of the assume rate for the statistical estimation error. Finally, the final part presents some experimental results. 1. Approximate Optimization Following [6,2], we consier a space of input-output pairs (x,y) X Y enowe with a probability istribution P(x, y). The conitional istribution P(y x) represents the unknown relationship between inputs an outputs. The iscrepancy between the preicte output ŷ an the real output y is measure with a loss function l(ŷ,y). Our benchmark is the function f that minimizes the expecte risk E(f) = l(f(x),y)p(x,y) = E[l(f(x),y)], that is, f (x) = arg min E[l(ŷ,y) x]. ŷ Although the istribution P(x,y) is unknown, we are given a sample S of n inepenently rawn training examples (x i,y i ), i = 1...n. We efine the empirical risk E n (f) = 1 n n l(f(x i ),y i ) = E n [l(f(x),y)]. i=1 Our first learning principle consists in choosing a family F of caniate preiction functions an fining the function f n = arg min f F E n (f) that minimizes the empirical risk. Well known combinatorial results (e.g., [2]) support this approach provie that the chosen family F is sufficiently restrictive. Since the optimal function f is unlikely to belong to the family F, we also efine f F = arg min f F E(f). For simplicity, we assume that f, f F an f n are well efine an unique. We can then ecompose the excess error as E [E(f n) E(f )] = E[E(fF) E(f )] {z } + E [E(fn) E(f F)] {z }, (1) = E app + E est where the expectation is taken with respect to the ranom choice of training set. The approximation error E app measures how closely functions in F can approximate the optimal solution f. The estimation error E est measures the effect of minimizing the empirical risk E n (f) instea of the expecte risk E(f). The estimation error is etermine by the number of training examples an by the capacity of the family of functions [2]. Large

3 families 1 of functions have smaller approximation errors but lea to higher estimation errors. This traeoff has been extensively iscusse in the literature [2,3] an lea to excess error that scale between the inverse an the inverse square root of the number of examples [7,8] Optimization Error Fining f n by minimizing the empirical risk E n (f) is often a computationally expensive operation. Since the empirical risk E n (f) is alreay an approximation of the expecte risk E(f), it shoul not be necessary to carry out this minimization with great accuracy. For instance, we coul stop an iterative optimization algorithm long before its convergence. Let us assume that our minimization algorithm returns an approximate solution f n that minimizes the objective function up to a preefine tolerance ρ 0. E n ( f n ) < E n (f n ) + ρ We can then ecompose the excess error E = E [ E( f n ) E(f ) ] as E = E [E(fF) E(f )] {z } + E [E(fn) E(f F)] {z } + EˆE( f n) E(f n) {z }. (2) = E app + E est + E opt We call the aitional term E opt the optimization error. It reflects the impact of the approximate optimization on the generalization performance. Its magnitue is comparable to ρ (see section 2.1.) 1.2. The Approximation Estimation Optimization Traeoff This ecomposition leas to a more complicate compromise. It involves three variables an two constraints. The constraints are the maximal number of available training example an the maximal computation time. The variables are the size of the family of functions F, the optimization accuracy ρ, an the number of examples n. This is formalize by the following optimization problem. min F,ρ,n E = E app + E est + E opt { n n subject to max (3) T(F,ρ,n) T max The number n of training examples is a variable because we coul choose to use only a subset of the available training examples in orer to complete the optimization within the allote time. This happens often in practice. Table 1 summarizes the typical evolution of the quantities of interest with the three variables F, n, an ρ increase. The solution of the optimization program (3) epens critically of which buget constraint is active: constraint n < n max on the number of examples, or constraint T < T max on the training time. 1 We often consier neste families of functions of the form F c = {f H, Ω(f) c}. Then, for each value of c, function f n is obtaine by minimizing the regularize empirical risk E n(f)+λω(f) for a suitable choice of the Lagrange coefficient λ. We can then control the estimation-approximation traeoff by choosing λ instea of c.

4 Table 1. Typical variations when F, n, an ρ increase. F n ρ E app (approximation error) ց E est (estimation error) ր ց E opt (optimization error) ր T (computation time) ր ր ց We speak of small-scale learning problem when (3) is constraine by the maximal number of examples n max. Since the computing time is not limite, we can reuce the optimization error E opt to insignificant levels by choosing ρ arbitrarily small. The excess error is then ominate by the approximation an estimation errors, E app an E est. Taking n = n max, we recover the approximation-estimation traeoff that is the object of abunant literature. We speak of large-scale learning problem when (3) is constraine by the maximal computing time T max. Approximate optimization, that is choosing ρ > 0, possibly can achieve better generalization because more training examples can be processe uring the allowe time. The specifics epen on the computational properties of the chosen optimization algorithm through the expression of the computing time T(F, ρ, n). 2. The Asymptotics of Large-scale Learning In the previous section, we have extene the classical approximation-estimation traeoff by taking into account the optimization error. We have given an objective criterion to istiguish small-scale an large-scale learning problems. In the small-scale case, we recover the classical traeoff between approximation an estimation. The large-scale case is substantially ifferent because it involves the computational complexity of the learning algorithm. In orer to clarify the large-scale learning traeoff with sufficient generality, this section makes several simplifications: We are stuying upper bouns of the approximation, estimation, an optimization errors (2). It is often accepte that these upper bouns give a realistic iea of the actual convergence rates [9,10,11,12]. Another way to fin comfort in this approach is to say that we stuy guarantee convergence rates instea of the possibly pathological special cases. We are stuying the asymptotic properties of the traeoff when the problem size increases. Instea of carefully balancing the three terms, we write E = O(E app )+ O(E est ) + O(E opt ) an only nee to ensure that the three terms ecrease with the same asymptotic rate. We are consiering a fixe family of functions F an therefore avoi taking into account the approximation error E app. This part of the traeoff covers a wie spectrum of practical realities such as choosing moels an choosing features. In the context of this work, we o not believe we can meaningfully aress this without iscussing, for instance, the thorny issue of feature selection. Instea we focus on the choice of optimization algorithm. Finally, in orer to keep this paper short, we consier that the family of functions F is linearly parametrize by a vector w R. We also assume that x, y an w

5 are boune, ensuring that there is a constant B such that 0 l(f w (x),y) B an l(,y) is Lipschitz. We first explain how the uniform convergence bouns provie convergence rates that take the optimization error into account. Then we iscuss an compare the asymptotic learning properties of several optimization algorithms Convergence of the Estimation an Optimization Errors The optimization error E opt epens irectly on the optimization accuracy ρ. However, the accuracy ρ involves the empirical quantity E n ( f n ) E n (f n ), whereas the optimization error E opt involves its expecte counterpart E( f n ) E(f n ). This section iscusses the impact on the optimization error E opt an of the optimization accuracy ρ on generalization bouns that leverage the uniform convergence concepts pioneere by Vapnik an Chervonenkis (e.g., [2].) In this iscussion, we use the letter c to refer to any positive constant. Multiple occurences of the letter c o not necessarily imply that the constants have ientical values Simple Uniform Convergence Bouns Recall that we assume that F is linearly parametrize by w R. Elementary uniform convergence results then state that» E sup E(f) E n(f) f F r c n, where the expectation is taken with respect to the ranom choice of the training set. 2 This result immeiately provies a boun on the estimation error: E est = E ˆ `E(f n) E n(f n) + `E n(f n) E n(ff) + `E n(ff) E(fF)» r 2 E sup E(f) E n(f) c f F n. This same result also provies a combine boun for the estimation an optimization errors: E est + E opt = EˆE( f n) E n( f n) + EˆE n( f n) E n(f n) + E [E n(f n) E n(ff)] + E [E n(ff) E(fF)] r r r! c n + ρ c n = c ρ +. n Unfortunately, this convergence rate is known to be pessimistic in many important cases. More sophisticate bouns are require. q 2 Although the original Vapnik-Chervonenkis bouns have the form c n log n, the logarithmic term can be eliminate using the chaining technique (e.g., [10].)

6 Faster Rates in the Realizable Case When the loss functions l(ŷ,y) is positive, with probability 1 e τ for any τ > 0, relative uniform convergence bouns state that r E(f) E n(f) sup p c f F E(f) n log n + τ n. This result is very useful because it provies faster convergence rates O(log n/n) in the realizable case, that is when l(f n (x i ),y i ) = 0 for all training examples (x i,y i ). We have then E n (f n ) = 0, E n ( f n ) ρ, an we can write q E( f n) ρ c E( f n) r n log n + τ n. Viewing this as a secon egree polynomial inequality in variable E( f n) c ρ + n log n + τ «. n E( f n ), we obtain Integrating this inequality using a stanar technique (see, e.g., [13]), we obtain a better convergence rate of the combine estimation an optimization error: h E est + E opt = E E( f i h n) E(fF) E E( f i n) = c ρ + n log n « Fast Rate Bouns Many authors (e.g., [10,4,12]) obtain fast statistical estimation rates in more general conitions. These bouns have the general form ( ( E app + E est c E app + n log n ) α ) for 1 α 1. (4) 2 This result hols when one can establish the following variance conition: f F E[ (l(f(x),y ) l(f F (X),Y ) ) 2 ] c ( ) 2 1 E(f) E(fF) α. (5) The convergence rate of (4) is escribe by the exponent α which is etermine by the quality of the variance boun (5). Works on fast statistical estimation ientify two main ways to establish such a variance conition. Exploiting the strict convexity of certain loss functions [12, theorem 12]. For instance, Lee et al. [14] establish a O(log n/n) rate using the square loss l(ŷ,y) = (ŷ y) 2. Making assumptions on the ata istribution. In the case of pattern recognition problems, for instance, the Tsybakov conition inicates how cleanly the posterior istributions P(y x) cross near the optimal ecision bounary [11,12]. The realizable case iscusse in section can be viewe as an extreme case of this.

7 Despite their much greater complexity, fast rate estimation results can accomoate the optimization accuracy ρ using essentially the methos illustrate in sections an We then obtain a boun of the form E = E app + E est + E opt = E h E( f n) E(f ) i c E app + n log n «α «+ ρ. (6) For instance, a general result with α = 1 is provie by Massart [13, theorem 4.2]. Combining this result with stanar bouns on the complexity of classes of linear functions (e.g., [10]) yiels the following result: E = E app + E est + E opt = E h E( f i n) E(f ) c E app + n log n «+ ρ. (7) See also [15,4] for more bouns taking into account the optimization accuracy Graient Optimization Algorithms We now iscuss an compare the asymptotic learning properties of four graient optimization algorithms. Recall that the family of function F is linearly parametrize by w R. Let w F an w n correspon to the functions f F an f n efine in section 1. In this section, we assume that the functions w l(f w (x),y) are convex an twice ifferentiable with continuous secon erivatives. Convexity ensures that the empirical const function C(w) = E n (f w ) has a single minimum. Two matrices play an important role in the analysis: the Hessian matrix H an the graient covariance matrix G, both measure at the empirical optimum w n. H = 2 C (wn) = En w2 G = E n» l(fwn (x), y) w» 2 l(f wn (x), y) w 2 «l(fwn (x), y) w, (8) «. (9) The relation between these two matrices epens on the chosen loss function. In orer to summarize them, we assume that there are constants λ max λ min > 0 an ν > 0 such that, for any η > 0, we can choose the number of examples n large enough to ensure that the following assertion is true with probability greater than 1 η : tr(g H 1 ) ν an EigenSpectrum(H) [λ min, λ max ] (10) The conition number κ = λ max /λ min is a goo inicator of the ifficulty of the optimization [16]. The conition λ min > 0 avois complications with stochastic graient algorithms. Note that this conition only implies strict convexity aroun the optimum. For instance, consier the loss function l is obtaine by smoothing the well known hinge loss l(z, y) = max{0, 1 yz} in a small neighborhoo of its non-ifferentiable points. Function C(w) is then piecewise linear with smoothe eges an vertices. It is not strictly convex. However its minimum is likely to be on a smoothe vertex with a non singular Hessian. When we have strict convexity, the argument of [12, theorem 12] yiels fast estimation rates α 1 in (4) an (6). This is not necessarily the case here. The four algorithm consiere in this paper use information about the graient of the cost function to iteratively upate their current estimate w(t) of the parameter vector.

8 Graient Descent (GD) iterates w(t + 1) = w(t) η C w (w(t)) = w(t) η 1 n n i=1 w l( ) f w(t) (x i ),y i where η > 0 is a small enough gain. GD is an algorithm with linear convergence [16]. When η = 1/λ max, this algorithm requires O(κ log(1/ρ)) iterations to reach accuracy ρ. The exact number of iterations epens on the choice of the initial parameter vector. Secon Orer Graient Descent (2GD) iterates 1 C w(t + 1) = w(t) H w (w(t)) = w(t) 1 n H 1 n i=1 w l( ) f w(t) (x i ),y i where matrix H 1 is the inverse of the Hessian matrix (8). This is more favorable than Newton s algorithm because we o not evaluate the local Hessian at each iteration but simply assume that we know in avance the Hessian at the optimum. 2GD is a superlinear optimization algorithm with quaratic convergence [16]. When the cost is quaratic, a single iteration is sufficient. In the general case, O(log log(1/ρ)) iterations are require to reach accuracy ρ. Stochastic Graient Descent (SGD) picks a ranom training example (x t,y t ) at each iteration an upates the parameter w on the basis of this example only, w(t + 1) = w(t) η t w l( f w(t) (x t ),y t ). Murata [17, section 2.2], characterizes the mean E S [w(t)] an variance Var S [w(t)] with respect to the istribution implie by the ranom examples rawn from the training set S at each iteration. Applying this result to the iscrete training set istribution for η = 1/λ min, we have δw(t) 2 = O(1/t) where δw(t) is a shorthan notation for w(t) w n. We can then write E S[ C(w(t)) inf C ] = E Sˆtr`H δw(t) δw(t) + o`1 t = tr ` H E S[δw(t)] E S[δw(t)] + H Var S[w(t)] + o`1 t tr(gh) + o`1 t t νκ 2 + o`1 t t. (11) Therefore the SGD algorithm reaches accuracy ρ after less than νκ 2 /ρ + o(1/ρ) iterations on average. The SGD convergence is essentially limite by the stochastic noise inuce by the ranom choice of one example at each iteration. Neither the initial value of the parameter vector w nor the total number of examples n appear in the ominant term of this boun! When the training set is large, one coul reach the esire accuracy ρ measure on the whole training set without even visiting all the training examples. This is in fact a kin of generalization boun.

9 Table 2. Asymptotic results for graient algorithms (with probability 1). Compare the secon last column (time to optimize) with the last column (time to reach the excess test error ǫ). Legen: n number of examples; parameter imension; κ, ν see equation (10). Algorithm Cost of one Iterations Time to reach Time to reach iteration to reach ρ accuracy ρ E c(e app + ε) GD O(n) O κ log 1 O nκlog 1 O 2 κ ρ ρ ε 1/α log2 1 ε 2GD O` 2 + n O log log 1 ρ O `2 + n log log 1 O 2 ρ log 1 ε 1/α ε log log 1 ε νκ SGD O() 2 ρ + o 1 O νκ 2 O ν κ 2 ρ ρ ε The first three columns of table 2 report for each algorithm the time for a single iteration, the number of iterations neee to reach a preefine accuracy ρ, an their prouct, the time neee to reach accuracy ρ. These asymptotic results are vali with probability 1, since the probability of their complement is smaller than η for any η > 0. The fourth column bouns the time necessary to reuce the excess error E below c(e app + ε) where c is the constant from (6). This is compute by observing that choosing ρ ` log n α in (6) achieves the fastest rate for ε, with minimal computation time. n We can then use the asymptotic equivalences ρ ε an n log 1. Setting the fourth ε 1/α ε column expressions to T max an solving for ǫ yiels the best excess error achieve by each algorithm within the limite time T max. This provies the asymptotic solution of the Estimation Optimization traeoff (3) for large scale problems satisfying our assumptions. These results clearly show that the generalization performance of large-scale learning systems epens on both the statistical properties of the estimation proceure an the computational properties of the chosen optimization algorithm. Their combination leas to surprising consequences: The SGD result oes not epen on the estimation rate α. When the estimation rate is poor, there is less nee to optimize accurately. That leaves time to process more examples. A potentially more useful interpretation leverages the fact that (11) is alreay a kin of generalization boun: its fast rate trumps the slower rate assume for the estimation error. Superlinear optimization brings little asymptotical improvements in ε. Although the superlinear 2GD algorithm improves the logarithmic term, the learning performance of all these algorithms is ominate by the polynomial term in (1/ε). This explains why improving the constants, κ an ν using preconitioning methos an sensible software engineering often proves more effective than switching to more sophisticate optimization techniques [18]. The SGD algorithm yiels the best generalization performance espite being the worst optimization algorithm. This ha been escribe before [19] in the case of a secon orer stochastic graient escent an observe in experiments. In contrast, since the optimization error E opt of small-scale learning systems can be reuce to insignificant levels, their generalization performance is solely etermine by the statistical properties of their estimation proceure.

10 Table 3. Results with linear SVM on the RCV1 ataset. Moel Algorithm Training Time Objective Test Error Hinge loss, λ = 10 4 See [21,22]. Logistic loss, λ = 10 5 See [23]. SVMLight 23,642 secs % SVMPerf 66 secs % SGD 1.4 secs % LibLinear (ρ = 10 2 ) 30 secs % LibLinear (ρ = 10 3 ) 44 secs % SGD 2.3 secs % 0.25 Testing loss 0.20 Training time (secs) 100 SGD Testing loss SGD 0.25 CONJUGATE GRADIENTS n=10000 n= n= n=30000 n= LibLinear e 05 1e 06 1e 07 1e 08 1e 09 Optimization accuracy (trainingcost optimaltrainingcost) Figure 1. Training time an testing loss as a function of the optimization accuracy ρ for SGD an LibLinear [23] Training time (secs) Figure 2. Testing loss versus training time for SGD, an for Conjugate Graients running on subsets of the training set. 3. Experiments This section empirically compares the SGD algorithm with other optimization algorithms on a well-known text categorization task, the classification of ocuments belonging to the CCAT category in the RCV1-v2 ataset [20]. Refer to projects/sg for source coe an for aitional experiments that coul not fit in this paper because of space constraints. In orer to collect a large training set, we swap the RCV1-v2 official training an test sets. The resulting training sets an testing sets contain 781,265 an 23,149 examples respectively. The 47,152 TF/IDF features were recompute on the basis of this new split. We use a simple linear moel with the usual hinge loss SVM objective function min w C(w,b) = λ n n l(y t (wx t + b)) i=1 with l(z) = max{0,1 z}. The first two rows of table 3 replicate earlier results [21] reporte for the same ata an the same value of the hyper-parameter λ. The thir row of table 3 reports results obtaine with the SGD algorithm ( w t+1 = w t η t λw + l(y ) t(wx t + b)) w with η t = 1 λ(t + t 0 ). The bias b is upate similarly. Since λ is a lower boun of the smallest eigenvalue of the hessian, our choice of gains η t approximates the optimal scheule (see section 2.2 ).

11 The offset t 0 was chosen to ensure that the initial gain is comparable with the expecte size of the parameter w. The results clearly inicate that SGD offers a goo alternative to the usual SVM solvers. Comparable results were obtaine in [22] using an algorithm that essentially amounts to a stochastic graient correcte by a projection step. Our results inicates that the projection step is not an essential component of this performance. Table 3 also reports results obtaine with the logistic loss l(z) = log(1 + e z ) in orer to avoi the issues relate to the nonifferentiability of the hinge loss. Note that this experiment uses a much better value for λ. Our comparison points were obtaine with a state-of-the-art superlinear optimizer [23], for two values of the optimization accuracy ρ. Yet the very simple SGD algorithm learns faster. Figure 1 shows how much time each algorithm takes to reach a given optimization accuracy. The superlinear algorithm reaches the optimum with 10 igits of accuracy in less than one minute. The stochastic graient starts more quickly but is unable to eliver such a high accuracy. However the upper part of the figure clearly shows that the testing set loss stops ecreasing long before the moment where the superlinear algorithm overcomes the stochastic graient. Figure 2 shows how the testing loss evolves with the training time. The stochastic graient escent curve can be compare with the curves obtaine using conjugate graients 3 on subsets of the training examples with increasing sizes. Assume for instance that our computing time buget is 1 secon. Running the conjugate graient algorithm on a ranom subset of training examples achieves a much better performance than running it on the whole training set. How to guess the right subset size a priori remains unclear. Meanwhile running the SGD algorithm on the full training set reaches the same testing set performance much faster. 4. Conclusion Taking in account buget constraints on both the number of examples an the computation time, we fin qualitative ifferences between the generalization performance of small-scale learning systems an large-scale learning systems. The generalization properties of large-scale learning systems epen on both the statistical properties of the estimation proceure an the computational properties of the optimization algorithm. We illustrate this fact by eriving asymptotic results on graient algorithms supporte by an experimental valiation. Consierable refinements of this framework can be expecte. Extening the analysis to regularize risk formulations woul make results on the complexity of primal an ual optimization algorithms [21,24] irectly exploitable. The choice of surrogate loss function [7,12] coul also have a non-trivial impact in the large-scale case. Acknowlegments Part of this work was fune by NSF grant CCR This experimental setup was suggeste by Olivier Chapelle (personal communication). His specialize variant of the conjugate graients algorithm works nicely in this context because it converges superlinearly with very limite overhea.

12 References [1] Leslie G. Valiant. A theory of learnable. Proc. of the 1984 STOC, pages , [2] Vlaimir N. Vapnik. Estimation of Depenences Base on Empirical Data. Springer Series in Statistics. Springer-Verlag, Berlin, [3] Stéphane Boucheron, Olivier Bousquet, an Gábor Lugosi. Theory of classification: a survey of recent avances. ESAIM: Probability an Statistics, 9: , [4] Peter L. Bartlett an Shahar Menelson. Empirical minimization. Probability Theory an Relate Fiels, 135(3): , [5] J. Stephen Ju. On the complexity of loaing shallow neural networks. Journal of Complexity, 4(3): , [6] Richar O. Dua an Peter E. Hart. Pattern Classification An Scene Analysis. Wiley an Son, [7] Tong Zhang. Statistical behavior an consistency of classification methos base on convex risk minimization. The Annals of Statistics, 32:56 85, [8] Clint Scovel an Ingo Steinwart. Fast rates for support vector machines. In Peter Auer an Ron Meir, eitors, Proceeings of the 18th Conference on Learning Theory (COLT 2005), volume 3559 of Lecture Notes in Computer Science, pages , Bertinoro, Italy, June Springer-Verlag. [9] Vlaimir N. Vapnik, Esther Levin, an Yann LeCun. Measuring the VC-imension of a learning machine. Neural Computation, 6(5): , [10] Olivier Bousquet. Concentration Inequalities an Empirical Processes Theory Applie to the Analysis of Learning Algorithms. PhD thesis, Ecole Polytechnique, [11] Alexanre B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Statististics, 32(1), [12] Peter L. Bartlett, Michael I. Joran, an Jon D. McAuliffe. Convexity, classification an risk bouns. Journal of the American Statistical Association, 101(473): , March [13] Pascal Massart. Some applications of concentration inequalities to statistics. Annales e la Faculté es Sciences e Toulouse, series 6, 9(2): , [14] Wee S. Lee, Peter L. Bartlett, an Robert C. Williamson. The importance of convexity in learning with square loss. IEEE Transactions on Information Theory, 44(5): , [15] Shahar Menelson. A few notes on statistical learning theory. In Shahar Menelson an Alexaner J. Smola, eitors, Avance Lectures in Machine Learning, volume 2600 of Lecture Notes in Computer Science, pages Springer-Verlag, Berlin, [16] John E. Dennis, Jr. an Robert B. Schnabel. Numerical Methos For Unconstraine Optimization an Nonlinear Equations. Prentice-Hall, Inc., Englewoo Cliffs, New Jersey, [17] Noboru Murata. A statistical stuy of on-line learning. In Davi Saa, eitor, Online Learning an Neural Networks. Cambrige University Press, Cambrige, UK, [18] Yann Le Cun, Léon Bottou, Genevieve B. Orr, an Klaus-Robert Müller. Efficient backprop. In Neural Networks, Tricks of the Trae, Lecture Notes in Computer Science LNCS Springer Verlag, [19] Léon Bottou an Yann Le Cun. Large scale online learning. In Sebastian Thrun, Lawrence K. Saul, an Bernhar Schölkopf, eitors, Avances in Neural Information Processing Systems 16. MIT Press, Cambrige, MA, [20] Davi D. Lewis, Yiming Yang, Tony G. Rose, an Fan Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5: , [21] Thorsten Joachims. Training linear SVMs in linear time. In Proceeings of the 12th ACM SIGKDD International Conference, Philaelphia, PA, August ACM Press. [22] Shai Shalev-Shwartz, Yoram Singer, an Nathan Srebro. Pegasos: Primal estimate subgraient solver for SVM. In Zoubin Ghahramani, eitor, Proceeings of the 24th International Machine Learning Conference, pages , Corvallis, OR, June ACM. [23] Chih-Jen Lin, Ruby C. Weng, an S. Sathiya Keerthi. Trust region newton methos for large-scale logistic regression. In Zoubin Ghahramani, eitor, Proceeings of the 24th International Machine Learning Conference, pages , Corvallis, OR, June ACM. [24] Don Hush, Patrick Kelly, Clint Scovel, an Ingo Steinwart. QP algorithms with guarantee accuracy an run time for support vector machines. Journal of Machine Learning Research, 7: , 2006.

Large-Scale Machine Learning with Stochastic Gradient Descent

Large-Scale Machine Learning with Stochastic Gradient Descent Large-Scale Machine Learning with Stochastic Gradient Descent Léon Bottou NEC Labs America, Princeton NJ 08542, USA leon@bottou.org Abstract. During the last decade, the data sizes have grown faster than

More information

On Adaboost and Optimal Betting Strategies

On Adaboost and Optimal Betting Strategies On Aaboost an Optimal Betting Strategies Pasquale Malacaria 1 an Fabrizio Smerali 1 1 School of Electronic Engineering an Computer Science, Queen Mary University of Lonon, Lonon, UK Abstract We explore

More information

A Generalization of Sauer s Lemma to Classes of Large-Margin Functions

A Generalization of Sauer s Lemma to Classes of Large-Margin Functions A Generalization of Sauer s Lemma to Classes of Large-Margin Functions Joel Ratsaby University College Lonon Gower Street, Lonon WC1E 6BT, Unite Kingom J.Ratsaby@cs.ucl.ac.uk, WWW home page: http://www.cs.ucl.ac.uk/staff/j.ratsaby/

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS Contents 1. Moment generating functions 2. Sum of a ranom number of ranom variables 3. Transforms

More information

MSc. Econ: MATHEMATICAL STATISTICS, 1995 MAXIMUM-LIKELIHOOD ESTIMATION

MSc. Econ: MATHEMATICAL STATISTICS, 1995 MAXIMUM-LIKELIHOOD ESTIMATION MAXIMUM-LIKELIHOOD ESTIMATION The General Theory of M-L Estimation In orer to erive an M-L estimator, we are boun to make an assumption about the functional form of the istribution which generates the

More information

Factoring Dickson polynomials over finite fields

Factoring Dickson polynomials over finite fields Factoring Dickson polynomials over finite fiels Manjul Bhargava Department of Mathematics, Princeton University. Princeton NJ 08544 manjul@math.princeton.eu Michael Zieve Department of Mathematics, University

More information

State of Louisiana Office of Information Technology. Change Management Plan

State of Louisiana Office of Information Technology. Change Management Plan State of Louisiana Office of Information Technology Change Management Plan Table of Contents Change Management Overview Change Management Plan Key Consierations Organizational Transition Stages Change

More information

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

More information

10.2 Systems of Linear Equations: Matrices

10.2 Systems of Linear Equations: Matrices SECTION 0.2 Systems of Linear Equations: Matrices 7 0.2 Systems of Linear Equations: Matrices OBJECTIVES Write the Augmente Matrix of a System of Linear Equations 2 Write the System from the Augmente Matrix

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Firewall Design: Consistency, Completeness, and Compactness

Firewall Design: Consistency, Completeness, and Compactness C IS COS YS TE MS Firewall Design: Consistency, Completeness, an Compactness Mohame G. Goua an Xiang-Yang Alex Liu Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188,

More information

Web Appendices to Selling to Overcon dent Consumers

Web Appendices to Selling to Overcon dent Consumers Web Appenices to Selling to Overcon ent Consumers Michael D. Grubb MIT Sloan School of Management Cambrige, MA 02142 mgrubbmit.eu www.mit.eu/~mgrubb May 2, 2008 B Option Pricing Intuition This appenix

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics D.G. Simpson, Ph.D. Department of Physical Sciences an Engineering Prince George s Community College December 5, 007 Introuction In this course we have been stuying

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Modelling and Resolving Software Dependencies

Modelling and Resolving Software Dependencies June 15, 2005 Abstract Many Linux istributions an other moern operating systems feature the explicit eclaration of (often complex) epenency relationships between the pieces of software

More information

Tracking Control of a Class of Hamiltonian Mechanical Systems with Disturbances

Tracking Control of a Class of Hamiltonian Mechanical Systems with Disturbances Proceeings of Australasian Conference on Robotics an Automation, -4 Dec 4, The University of Melbourne, Melbourne, Australia Tracking Control of a Class of Hamiltonian Mechanical Systems with Disturbances

More information

Ch 10. Arithmetic Average Options and Asian Opitons

Ch 10. Arithmetic Average Options and Asian Opitons Ch 10. Arithmetic Average Options an Asian Opitons I. Asian Option an the Analytic Pricing Formula II. Binomial Tree Moel to Price Average Options III. Combination of Arithmetic Average an Reset Options

More information

Detecting Possibly Fraudulent or Error-Prone Survey Data Using Benford s Law

Detecting Possibly Fraudulent or Error-Prone Survey Data Using Benford s Law Detecting Possibly Frauulent or Error-Prone Survey Data Using Benfor s Law Davi Swanson, Moon Jung Cho, John Eltinge U.S. Bureau of Labor Statistics 2 Massachusetts Ave., NE, Room 3650, Washington, DC

More information

Math 230.01, Fall 2012: HW 1 Solutions

Math 230.01, Fall 2012: HW 1 Solutions Math 3., Fall : HW Solutions Problem (p.9 #). Suppose a wor is picke at ranom from this sentence. Fin: a) the chance the wor has at least letters; SOLUTION: All wors are equally likely to be chosen. The

More information

Optimal Control Policy of a Production and Inventory System for multi-product in Segmented Market

Optimal Control Policy of a Production and Inventory System for multi-product in Segmented Market RATIO MATHEMATICA 25 (2013), 29 46 ISSN:1592-7415 Optimal Control Policy of a Prouction an Inventory System for multi-prouct in Segmente Market Kuleep Chauhary, Yogener Singh, P. C. Jha Department of Operational

More information

An intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations

An intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations This page may be remove to conceal the ientities of the authors An intertemporal moel of the real exchange rate, stock market, an international ebt ynamics: policy simulations Saziye Gazioglu an W. Davi

More information

The one-year non-life insurance risk

The one-year non-life insurance risk The one-year non-life insurance risk Ohlsson, Esbjörn & Lauzeningks, Jan Abstract With few exceptions, the literature on non-life insurance reserve risk has been evote to the ultimo risk, the risk in the

More information

ISSN: 2277-3754 ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 12, June 2014

ISSN: 2277-3754 ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 12, June 2014 ISSN: 77-754 ISO 900:008 Certifie International Journal of Engineering an Innovative echnology (IJEI) Volume, Issue, June 04 Manufacturing process with isruption uner Quaratic Deman for Deteriorating Inventory

More information

Sensitivity Analysis of Non-linear Performance with Probability Distortion

Sensitivity Analysis of Non-linear Performance with Probability Distortion Preprints of the 19th Worl Congress The International Feeration of Automatic Control Cape Town, South Africa. August 24-29, 214 Sensitivity Analysis of Non-linear Performance with Probability Distortion

More information

Large Scale Learning to Rank

Large Scale Learning to Rank Large Scale Learning to Rank D. Sculley Google, Inc. dsculley@google.com Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing

More information

Cross-Over Analysis Using T-Tests

Cross-Over Analysis Using T-Tests Chapter 35 Cross-Over Analysis Using -ests Introuction his proceure analyzes ata from a two-treatment, two-perio (x) cross-over esign. he response is assume to be a continuous ranom variable that follows

More information

Unsteady Flow Visualization by Animating Evenly-Spaced Streamlines

Unsteady Flow Visualization by Animating Evenly-Spaced Streamlines EUROGRAPHICS 2000 / M. Gross an F.R.A. Hopgoo Volume 19, (2000), Number 3 (Guest Eitors) Unsteay Flow Visualization by Animating Evenly-Space Bruno Jobar an Wilfri Lefer Université u Littoral Côte Opale,

More information

Parameterized Algorithms for d-hitting Set: the Weighted Case Henning Fernau. Univ. Trier, FB 4 Abteilung Informatik 54286 Trier, Germany

Parameterized Algorithms for d-hitting Set: the Weighted Case Henning Fernau. Univ. Trier, FB 4 Abteilung Informatik 54286 Trier, Germany Parameterize Algorithms for -Hitting Set: the Weighte Case Henning Fernau Trierer Forschungsberichte; Trier: Technical Reports Informatik / Mathematik No. 08-6, July 2008 Univ. Trier, FB 4 Abteilung Informatik

More information

JON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT

JON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT OPTIMAL INSURANCE COVERAGE UNDER BONUS-MALUS CONTRACTS BY JON HOLTAN if P&C Insurance Lt., Oslo, Norway ABSTRACT The paper analyses the questions: Shoul or shoul not an iniviual buy insurance? An if so,

More information

Web Appendices of Selling to Overcon dent Consumers

Web Appendices of Selling to Overcon dent Consumers Web Appenices of Selling to Overcon ent Consumers Michael D. Grubb A Option Pricing Intuition This appenix provies aitional intuition base on option pricing for the result in Proposition 2. Consier the

More information

Optimal Energy Commitments with Storage and Intermittent Supply

Optimal Energy Commitments with Storage and Intermittent Supply Submitte to Operations Research manuscript OPRE-2009-09-406 Optimal Energy Commitments with Storage an Intermittent Supply Jae Ho Kim Department of Electrical Engineering, Princeton University, Princeton,

More information

INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES

INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES 1 st Logistics International Conference Belgrae, Serbia 28-30 November 2013 INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES Goran N. Raoičić * University of Niš, Faculty of Mechanical

More information

Data Center Power System Reliability Beyond the 9 s: A Practical Approach

Data Center Power System Reliability Beyond the 9 s: A Practical Approach Data Center Power System Reliability Beyon the 9 s: A Practical Approach Bill Brown, P.E., Square D Critical Power Competency Center. Abstract Reliability has always been the focus of mission-critical

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Product Differentiation for Software-as-a-Service Providers

Product Differentiation for Software-as-a-Service Providers University of Augsburg Prof. Dr. Hans Ulrich Buhl Research Center Finance & Information Management Department of Information Systems Engineering & Financial Management Discussion Paper WI-99 Prouct Differentiation

More information

Calibration of the broad band UV Radiometer

Calibration of the broad band UV Radiometer Calibration of the broa ban UV Raiometer Marian Morys an Daniel Berger Solar Light Co., Philaelphia, PA 19126 ABSTRACT Mounting concern about the ozone layer epletion an the potential ultraviolet exposure

More information

Sensor Network Localization from Local Connectivity : Performance Analysis for the MDS-MAP Algorithm

Sensor Network Localization from Local Connectivity : Performance Analysis for the MDS-MAP Algorithm Sensor Network Localization from Local Connectivity : Performance Analysis for the MDS-MAP Algorithm Sewoong Oh an Anrea Montanari Electrical Engineering an Statistics Department Stanfor University, Stanfor,

More information

Search Advertising Based Promotion Strategies for Online Retailers

Search Advertising Based Promotion Strategies for Online Retailers Search Avertising Base Promotion Strategies for Online Retailers Amit Mehra The Inian School of Business yeraba, Inia Amit Mehra@isb.eu ABSTRACT Web site aresses of small on line retailers are often unknown

More information

ThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters

ThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters ThroughputScheuler: Learning to Scheule on Heterogeneous Haoop Clusters Shehar Gupta, Christian Fritz, Bob Price, Roger Hoover, an Johan e Kleer Palo Alto Research Center, Palo Alto, CA, USA {sgupta, cfritz,

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Differentiability of Exponential Functions

Differentiability of Exponential Functions Differentiability of Exponential Functions Philip M. Anselone an John W. Lee Philip Anselone (panselone@actionnet.net) receive his Ph.D. from Oregon State in 1957. After a few years at Johns Hopkins an

More information

Digital barrier option contract with exponential random time

Digital barrier option contract with exponential random time IMA Journal of Applie Mathematics Avance Access publishe June 9, IMA Journal of Applie Mathematics ) Page of 9 oi:.93/imamat/hxs3 Digital barrier option contract with exponential ranom time Doobae Jun

More information

Cost Efficient Datacenter Selection for Cloud Services

Cost Efficient Datacenter Selection for Cloud Services Cost Efficient Datacenter Selection for Clou Services Hong u, Baochun Li henryxu, bli@eecg.toronto.eu Department of Electrical an Computer Engineering University of Toronto Abstract Many clou services

More information

Professional Level Options Module, Paper P4(SGP)

Professional Level Options Module, Paper P4(SGP) Answers Professional Level Options Moule, Paper P4(SGP) Avance Financial Management (Singapore) December 2007 Answers Tutorial note: These moel answers are consierably longer an more etaile than woul be

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

How To Segmentate An Insurance Customer In An Insurance Business

How To Segmentate An Insurance Customer In An Insurance Business International Journal of Database Theory an Application, pp.25-36 http://x.oi.org/10.14257/ijta.2014.7.1.03 A Case Stuy of Applying SOM in Market Segmentation of Automobile Insurance Customers Vahi Golmah

More information

Low-Complexity and Distributed Energy Minimization in Multi-hop Wireless Networks

Low-Complexity and Distributed Energy Minimization in Multi-hop Wireless Networks Low-Complexity an Distribute Energy inimization in ulti-hop Wireless Networks Longbi Lin, Xiaojun Lin, an Ness B. Shroff Center for Wireless Systems an Applications (CWSA) School of Electrical an Computer

More information

A Theory of Exchange Rates and the Term Structure of Interest Rates

A Theory of Exchange Rates and the Term Structure of Interest Rates Review of Development Economics, 17(1), 74 87, 013 DOI:10.1111/roe.1016 A Theory of Exchange Rates an the Term Structure of Interest Rates Hyoung-Seok Lim an Masao Ogaki* Abstract This paper efines the

More information

MODELLING OF TWO STRATEGIES IN INVENTORY CONTROL SYSTEM WITH RANDOM LEAD TIME AND DEMAND

MODELLING OF TWO STRATEGIES IN INVENTORY CONTROL SYSTEM WITH RANDOM LEAD TIME AND DEMAND art I. robobabilystic Moels Computer Moelling an New echnologies 27 Vol. No. 2-3 ransport an elecommunication Institute omonosova iga V-9 atvia MOEING OF WO AEGIE IN INVENOY CONO YEM WIH ANOM EA IME AN

More information

Safety Stock or Excess Capacity: Trade-offs under Supply Risk

Safety Stock or Excess Capacity: Trade-offs under Supply Risk Safety Stock or Excess Capacity: Trae-offs uner Supply Risk Aahaar Chaturvei Victor Martínez-e-Albéniz IESE Business School, University of Navarra Av. Pearson, 08034 Barcelona, Spain achaturvei@iese.eu

More information

RUNESTONE, an International Student Collaboration Project

RUNESTONE, an International Student Collaboration Project RUNESTONE, an International Stuent Collaboration Project Mats Daniels 1, Marian Petre 2, Vicki Almstrum 3, Lars Asplun 1, Christina Björkman 1, Carl Erickson 4, Bruce Klein 4, an Mary Last 4 1 Department

More information

Lecture L25-3D Rigid Body Kinematics

Lecture L25-3D Rigid Body Kinematics J. Peraire, S. Winall 16.07 Dynamics Fall 2008 Version 2.0 Lecture L25-3D Rigi Boy Kinematics In this lecture, we consier the motion of a 3D rigi boy. We shall see that in the general three-imensional

More information

Option Pricing for Inventory Management and Control

Option Pricing for Inventory Management and Control Option Pricing for Inventory Management an Control Bryant Angelos, McKay Heasley, an Jeffrey Humpherys Abstract We explore the use of option contracts as a means of managing an controlling inventories

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

A Comparison of Performance Measures for Online Algorithms

A Comparison of Performance Measures for Online Algorithms A Comparison of Performance Measures for Online Algorithms Joan Boyar 1, Sany Irani 2, an Kim S. Larsen 1 1 Department of Mathematics an Computer Science, University of Southern Denmark, Campusvej 55,

More information

Optimizing Multiple Stock Trading Rules using Genetic Algorithms

Optimizing Multiple Stock Trading Rules using Genetic Algorithms Optimizing Multiple Stock Traing Rules using Genetic Algorithms Ariano Simões, Rui Neves, Nuno Horta Instituto as Telecomunicações, Instituto Superior Técnico Av. Rovisco Pais, 040-00 Lisboa, Portugal.

More information

Achieving quality audio testing for mobile phones

Achieving quality audio testing for mobile phones Test & Measurement Achieving quality auio testing for mobile phones The auio capabilities of a cellular hanset provie the funamental interface between the user an the raio transceiver. Just as RF testing

More information

An Introduction to Event-triggered and Self-triggered Control

An Introduction to Event-triggered and Self-triggered Control An Introuction to Event-triggere an Self-triggere Control W.P.M.H. Heemels K.H. Johansson P. Tabuaa Abstract Recent evelopments in computer an communication technologies have le to a new type of large-scale

More information

Mathematics Review for Economists

Mathematics Review for Economists Mathematics Review for Economists by John E. Floy University of Toronto May 9, 2013 This ocument presents a review of very basic mathematics for use by stuents who plan to stuy economics in grauate school

More information

Improving Direct Marketing Profitability with Neural Networks

Improving Direct Marketing Profitability with Neural Networks Volume 9 o.5, September 011 Improving Direct Marketing Profitability with eural etworks Zaiyong Tang Salem State University Salem, MA 01970 ABSTRACT Data mining in irect marketing aims at ientifying the

More information

Minimum-Energy Broadcast in All-Wireless Networks: NP-Completeness and Distribution Issues

Minimum-Energy Broadcast in All-Wireless Networks: NP-Completeness and Distribution Issues Minimum-Energy Broacast in All-Wireless Networks: NP-Completeness an Distribution Issues Mario Čagal LCA-EPFL CH-05 Lausanne Switzerlan mario.cagal@epfl.ch Jean-Pierre Hubaux LCA-EPFL CH-05 Lausanne Switzerlan

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Dynamic Network Security Deployment Under Partial Information

Dynamic Network Security Deployment Under Partial Information Dynamic Network Security Deployment Uner Partial nformation nvite Paper) George Theoorakopoulos EPFL Lausanne, Switzerlan Email: george.theoorakopoulos @ epfl.ch John S. Baras University of Marylan College

More information

The most common model to support workforce management of telephone call centers is

The most common model to support workforce management of telephone call centers is Designing a Call Center with Impatient Customers O. Garnett A. Manelbaum M. Reiman Davison Faculty of Inustrial Engineering an Management, Technion, Haifa 32000, Israel Davison Faculty of Inustrial Engineering

More information

A Blame-Based Approach to Generating Proposals for Handling Inconsistency in Software Requirements

A Blame-Based Approach to Generating Proposals for Handling Inconsistency in Software Requirements International Journal of nowlege an Systems Science, 3(), -7, January-March 0 A lame-ase Approach to Generating Proposals for Hanling Inconsistency in Software Requirements eian Mu, Peking University,

More information

FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY

FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY Jörg Felhusen an Sivakumara K. Krishnamoorthy RWTH Aachen University, Chair an Insitute for Engineering

More information

Which Networks Are Least Susceptible to Cascading Failures?

Which Networks Are Least Susceptible to Cascading Failures? Which Networks Are Least Susceptible to Cascaing Failures? Larry Blume Davi Easley Jon Kleinberg Robert Kleinberg Éva Taros July 011 Abstract. The resilience of networks to various types of failures is

More information

Oberwolfach Preprints

Oberwolfach Preprints Oberwolfach Preprints OWP 2007-01 Gerhar Huisken Geometric Flows an 3-Manifols Mathematisches Forschungsinstitut Oberwolfach ggmbh Oberwolfach Preprints (OWP) ISSN 1864-7596 Oberwolfach Preprints (OWP)

More information

Measures of distance between samples: Euclidean

Measures of distance between samples: Euclidean 4- Chapter 4 Measures of istance between samples: Eucliean We will be talking a lot about istances in this book. The concept of istance between two samples or between two variables is funamental in multivariate

More information

Enterprise Resource Planning

Enterprise Resource Planning Enterprise Resource Planning MPC 6 th Eition Chapter 1a McGraw-Hill/Irwin Copyright 2011 by The McGraw-Hill Companies, Inc. All rights reserve. Enterprise Resource Planning A comprehensive software approach

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

More information

Patch Complexity, Finite Pixel Correlations and Optimal Denoising

Patch Complexity, Finite Pixel Correlations and Optimal Denoising Patch Complexity, Finite Pixel Correlations an Optimal Denoising Anat Levin Boaz Naler Freo Duran William T. Freeman Weizmann Institute MIT CSAIL Abstract. Image restoration tasks are ill-pose problems,

More information

Heat-And-Mass Transfer Relationship to Determine Shear Stress in Tubular Membrane Systems Ratkovich, Nicolas Rios; Nopens, Ingmar

Heat-And-Mass Transfer Relationship to Determine Shear Stress in Tubular Membrane Systems Ratkovich, Nicolas Rios; Nopens, Ingmar Aalborg Universitet Heat-An-Mass Transfer Relationship to Determine Shear Stress in Tubular Membrane Systems Ratkovich, Nicolas Rios; Nopens, Ingmar Publishe in: International Journal of Heat an Mass Transfer

More information

A Universal Sensor Control Architecture Considering Robot Dynamics

A Universal Sensor Control Architecture Considering Robot Dynamics International Conference on Multisensor Fusion an Integration for Intelligent Systems (MFI2001) Baen-Baen, Germany, August 2001 A Universal Sensor Control Architecture Consiering Robot Dynamics Frierich

More information

BV has the bounded approximation property

BV has the bounded approximation property The Journal of Geometric Analysis volume 15 (2005), number 1, pp. 1-7 [version, April 14, 2005] BV has the boune approximation property G. Alberti, M. Csörnyei, A. Pe lczyński, D. Preiss Abstract: We prove

More information

y or f (x) to determine their nature.

y or f (x) to determine their nature. Level C5 of challenge: D C5 Fining stationar points of cubic functions functions Mathematical goals Starting points Materials require Time neee To enable learners to: fin the stationar points of a cubic

More information

Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes

Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes Proceeings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Moeling an Preicting Popularity Dynamics via Reinforce Poisson Processes Huawei Shen 1, Dashun Wang 2, Chaoming Song 3, Albert-László

More information

Risk Adjustment for Poker Players

Risk Adjustment for Poker Players Risk Ajustment for Poker Players William Chin DePaul University, Chicago, Illinois Marc Ingenoso Conger Asset Management LLC, Chicago, Illinois September, 2006 Introuction In this article we consier risk

More information

2r 1. Definition (Degree Measure). Let G be a r-graph of order n and average degree d. Let S V (G). The degree measure µ(s) of S is defined by,

2r 1. Definition (Degree Measure). Let G be a r-graph of order n and average degree d. Let S V (G). The degree measure µ(s) of S is defined by, Theorem Simple Containers Theorem) Let G be a simple, r-graph of average egree an of orer n Let 0 < δ < If is large enough, then there exists a collection of sets C PV G)) satisfying: i) for every inepenent

More information

Consumer Referrals. Maria Arbatskaya and Hideo Konishi. October 28, 2014

Consumer Referrals. Maria Arbatskaya and Hideo Konishi. October 28, 2014 Consumer Referrals Maria Arbatskaya an Hieo Konishi October 28, 2014 Abstract In many inustries, rms rewar their customers for making referrals. We analyze the optimal policy mix of price, avertising intensity,

More information

A New Pricing Model for Competitive Telecommunications Services Using Congestion Discounts

A New Pricing Model for Competitive Telecommunications Services Using Congestion Discounts A New Pricing Moel for Competitive Telecommunications Services Using Congestion Discounts N. Keon an G. Ananalingam Department of Systems Engineering University of Pennsylvania Philaelphia, PA 19104-6315

More information

Stock Market Value Prediction Using Neural Networks

Stock Market Value Prediction Using Neural Networks Stock Market Value Preiction Using Neural Networks Mahi Pakaman Naeini IT & Computer Engineering Department Islamic Aza University Paran Branch e-mail: m.pakaman@ece.ut.ac.ir Hamireza Taremian Engineering

More information

Hull, Chapter 11 + Sections 17.1 and 17.2 Additional reference: John Cox and Mark Rubinstein, Options Markets, Chapter 5

Hull, Chapter 11 + Sections 17.1 and 17.2 Additional reference: John Cox and Mark Rubinstein, Options Markets, Chapter 5 Binomial Moel Hull, Chapter 11 + ections 17.1 an 17.2 Aitional reference: John Cox an Mark Rubinstein, Options Markets, Chapter 5 1. One-Perio Binomial Moel Creating synthetic options (replicating options)

More information

Inverse Trig Functions

Inverse Trig Functions Inverse Trig Functions c A Math Support Center Capsule February, 009 Introuction Just as trig functions arise in many applications, so o the inverse trig functions. What may be most surprising is that

More information

Chapter 9 AIRPORT SYSTEM PLANNING

Chapter 9 AIRPORT SYSTEM PLANNING Chapter 9 AIRPORT SYSTEM PLANNING. Photo creit Dorn McGrath, Jr Contents Page The Planning Process................................................... 189 Airport Master Planning..............................................

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

A Data Placement Strategy in Scientific Cloud Workflows

A Data Placement Strategy in Scientific Cloud Workflows A Data Placement Strategy in Scientific Clou Workflows Dong Yuan, Yun Yang, Xiao Liu, Jinjun Chen Faculty of Information an Communication Technologies, Swinburne University of Technology Hawthorn, Melbourne,

More information

Net Neutrality, Network Capacity, and Innovation at the Edges

Net Neutrality, Network Capacity, and Innovation at the Edges Net Neutrality, Network Capacity, an Innovation at the Eges Jay Pil Choi Doh-Shin Jeon Byung-Cheol Kim May 22, 2015 Abstract We stuy how net neutrality regulations affect a high-banwith content provier(cp)

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Answers to the Practice Problems for Test 2

Answers to the Practice Problems for Test 2 Answers to the Practice Problems for Test 2 Davi Murphy. Fin f (x) if it is known that x [f(2x)] = x2. By the chain rule, x [f(2x)] = f (2x) 2, so 2f (2x) = x 2. Hence f (2x) = x 2 /2, but the lefthan

More information

Mandate-Based Health Reform and the Labor Market: Evidence from the Massachusetts Reform

Mandate-Based Health Reform and the Labor Market: Evidence from the Massachusetts Reform Manate-Base Health Reform an the Labor Market: Evience from the Massachusetts Reform Jonathan T. Kolsta Wharton School, University of Pennsylvania an NBER Amana E. Kowalski Department of Economics, Yale

More information

Optimal Control Of Production Inventory Systems With Deteriorating Items And Dynamic Costs

Optimal Control Of Production Inventory Systems With Deteriorating Items And Dynamic Costs Applie Mathematics E-Notes, 8(2008), 194-202 c ISSN 1607-2510 Available free at mirror sites of http://www.math.nthu.eu.tw/ amen/ Optimal Control Of Prouction Inventory Systems With Deteriorating Items

More information

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Support Vector Machine. Tutorial. (and Statistical Learning Theory) Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced

More information

View Synthesis by Image Mapping and Interpolation

View Synthesis by Image Mapping and Interpolation View Synthesis by Image Mapping an Interpolation Farris J. Halim Jesse S. Jin, School of Computer Science & Engineering, University of New South Wales Syney, NSW 05, Australia Basser epartment of Computer

More information

How To Understand The Structure Of A Can (Can)

How To Understand The Structure Of A Can (Can) Thi t t ith F M k 4 0 4 BOSCH CAN Specification Version 2.0 1991, Robert Bosch GmbH, Postfach 50, D-7000 Stuttgart 1 The ocument as a whole may be copie an istribute without restrictions. However, the

More information

Forecasting and Staffing Call Centers with Multiple Interdependent Uncertain Arrival Streams

Forecasting and Staffing Call Centers with Multiple Interdependent Uncertain Arrival Streams Forecasting an Staffing Call Centers with Multiple Interepenent Uncertain Arrival Streams Han Ye Department of Statistics an Operations Research, University of North Carolina, Chapel Hill, NC 27599, hanye@email.unc.eu

More information

SEC Issues Proposed Guidance to Fund Boards Relating to Best Execution and Soft Dollars

SEC Issues Proposed Guidance to Fund Boards Relating to Best Execution and Soft Dollars September 2008 / Issue 21 A legal upate from Dechert s Financial Services Group SEC Issues Propose Guiance to Fun Boars Relating to Best Execution an Soft Dollars The Securities an Exchange Commission

More information

Innovation Union means: More jobs, improved lives, better society

Innovation Union means: More jobs, improved lives, better society The project follows the Lisbon an Gothenburg Agenas, an supports the EU 2020 Strategy, in particular SMART Growth an the Innovation Union: Innovation Union means: More jobs, improve lives, better society

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information