From Maxent to Machine Learning and Back
|
|
- Abner Reynolds
- 7 years ago
- Views:
Transcription
1 From Maxent to Machine Learning and Back T. Sears ANU March 2007 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
2 50 Years Ago... The principles and mathematical methods of statistical mechanics are seen to be of much more general applicability... In the problem of prediction, the maximization of entropy is not an application of a law of physics, but merely a method of reasoning which ensures that no unconscious arbitrary assumptions have been introduced. E.T. Jaynes, 1957 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
3 ... a method of reasoning... Jenkins, if I want another yes-man I ll build one.
4 Outline 1 Generalizing Maxent 2 Two Examples 3 Broader Comparisons 4 Extensions/Conclusions T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
5 You are here Generalizing Maxent 1 Generalizing Maxent 2 Two Examples 3 Broader Comparisons 4 Extensions/Conclusions T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
6 Generalizing Maxent The Classic Maxent Problem Minimize negative entropy subject to linear constraints: min p S(p) := subject to Ap = b p i 0 N p i log(p i ) i=1 A is M N. M < N, a wide matrix. b is a [ data ] vector. B A := 1 T contains a normalization constraint. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
7 Generalizing Maxent Extending the Classic Maxent Problem min S(p) p subject to Ap = b Original problem. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
8 Generalizing Maxent Extending the Classic Maxent Problem min p S(p) + δ {0} (Ap b) Original problem. Convert constraints to a convex function. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
9 Generalizing Maxent Extending the Classic Maxent Problem min p S(p) + δ {0} ( Ap b P ) Original problem. Convert constraints to a convex function. Use any norm... T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
10 Generalizing Maxent Extending the Classic Maxent Problem min p S(p) + δ ɛbp (Ap b) Original problem. Convert constraints to a convex function. Use any norm... and relax constraints. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
11 Generalizing Maxent Extending the Classic Maxent Problem min p F (p, p 0 ) + δ ɛbp (Ap b) Original problem. Convert constraints to a convex function. Use any norm... and relax constraints. Generalize SBG entropy to Bregman divergence. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
12 Generalizing Maxent Extending the Classic Maxent Problem min µ F (A T µ + p 0) + µ, b + ɛ µ Q Original problem. Convert constraints to a convex function. Use any norm... and relax constraints. Generalize SBG entropy to Bregman divergence. Find the Fenchel dual problem to solve. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
13 Generalizing Maxent Extending the Classic Maxent Problem min F (A T µ + p µ 0) + µ, b + ɛ µ } {{ } } {{ Q } Likelihood Prior Original problem. Convert constraints to a convex function. Use any norm... and relax constraints. Generalize SBG entropy to Bregman divergence. Find the Fenchel dual problem to solve. It s a more general form of the MAP problem. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
14 Generalizing Maxent Characterizing the solution Compare to statistical models After solving for µ we can recover the optimal primal solution: p = Score {}}{ F }{{} ( A T µ +p 0) Family T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
15 Generalizing Maxent Characterizing the solution Compare to statistical models After solving for µ we can recover the optimal primal solution: p = Score {}}{ F }{{} ( A T µ +p 0) Family p comes from a family of distributions. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
16 Generalizing Maxent Characterizing the solution Compare to statistical models After solving for µ we can recover the optimal primal solution: p = Score {}}{ F }{{} ( A T µ +p 0) Family p comes from a family of distributions. Entropy function (F ) determines the family ( F ). T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
17 Generalizing Maxent Characterizing the solution Compare to statistical models After solving for µ we can recover the optimal primal solution: p = Score {}}{ F }{{} ( A T µ +p 0) Family p comes from a family of distributions. Entropy function (F ) determines the family ( F ). SBG entropy exponential family. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
18 Generalizing Maxent Characterizing the solution Compare to statistical models After solving for µ we can recover the optimal primal solution: p = Score {}}{ F }{{} ( A T µ +p 0) Family p comes from a family of distributions. Entropy function (F ) determines the family ( F ). SBG entropy exponential family. Any nice F some family. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
19 Generalizing Maxent Generalizing the Exponential Family q-exponential exp q q 1.5 q 1. q 0.5 Asymptote for q q (1 + (1 q)p) exp q (p) := + q 1 exp(p) q = 1 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
20 Tail Behavior Tail Behavior Generalizing Maxent 1.0 exp q q > 1 naturally gives fat tails. q < 1 truncates the tail. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
21 You are here Two Examples 1 Generalizing Maxent 2 Two Examples 3 Broader Comparisons 4 Extensions/Conclusions T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
22 Two Examples Loaded Die Example Setup A die with 6 faces. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
23 Two Examples Loaded Die Example Setup A die with 6 faces. Expected value of 4.5, instead of 3.5 for a fair die. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
24 Two Examples Loaded Die Example Setup A die with 6 faces. Expected value of 4.5, instead of 3.5 for a fair die. For this problem: ( ) A = and b = ( ) T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
25 Two Examples Loaded Die Example Setup A die with 6 faces. Expected value of 4.5, instead of 3.5 for a fair die. For this problem: ( ) A = and b = ( ) Find p, assuming S S q, p 0 is uniform, ɛ = 0. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
26 Two Examples Loaded Die Example Setup A die with 6 faces. Expected value of 4.5, instead of 3.5 for a fair die. For this problem: ( ) A = and b = ( ) Find p, assuming S S q, p 0 is uniform, ɛ = 0. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
27 Two Examples Loaded Die Example Setup A die with 6 faces. Expected value of 4.5, instead of 3.5 for a fair die. For this problem: ( ) A = and b = ( ) Find p, assuming S S q, p 0 is uniform, ɛ = 0. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
28 Two Examples Loaded Die Example Probability q Sensitivity of Each Event Varies q Higher q raises weight on face 1 and face 6. Opposite for 3,4,5. Task: Make a two-way market on each die face. Which is easiest? T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
29 Two Examples Example: The Dantzig Selector Entropy Function as Prior Information Background: Consider a variation on linear regression ŷ = Xβ. Choose β via min β 1 + δ ɛb (X T (Xβ y)) β T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
30 Two Examples Example: The Dantzig Selector Entropy Function as Prior Information Background: Consider a variation on linear regression ŷ = Xβ. Choose β via min β 1 + δ ɛb (X T (Xβ y)) β The non-zero entries of the solution can exactly identify the correct set of regressors with high probability under special conditions. (Candace and Tao, Ann. Stat. 2007) T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
31 Two Examples Example: The Dantzig Selector Entropy Function as Prior Information Background: Consider a variation on linear regression ŷ = Xβ. Choose β via min β 1 + δ ɛb (X T (Xβ y)) β The non-zero entries of the solution can exactly identify the correct set of regressors with high probability under special conditions. (Candace and Tao, Ann. Stat. 2007) Special conditions: low noise, sparse true model β. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
32 Two Examples Example: The Dantzig Selector Entropy Function as Prior Information Background: Consider a variation on linear regression ŷ = Xβ. Choose β via min β 1 + δ ɛb (X T (Xβ y)) β The non-zero entries of the solution can exactly identify the correct set of regressors with high probability under special conditions. (Candace and Tao, Ann. Stat. 2007) Special conditions: low noise, sparse true model β. Application area: Compressed Sensing. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
33 Dantzig Selector Connection Two Examples Change of variables ( +/- trick) β = [ I I ] p, T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
34 Dantzig Selector Connection Two Examples Change of variables ( +/- trick) β = [ I I ] p, β 1 can be approached using S q with q 0. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
35 Dantzig Selector Connection Two Examples Change of variables ( +/- trick) β = [ I I ] p, β 1 can be approached using S q with q 0. Entropy function S q captures part of the prior knowledge. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
36 You are here Broader Comparisons 1 Generalizing Maxent 2 Two Examples 3 Broader Comparisons 4 Extensions/Conclusions T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
37 Broader Comparisons Making Broader Comparisons Value Regularization Problem: Model preferences over parameters can t be easily compared. Solution: Compare outputs instead (Rifkin and Lippert, JMLR, 2007). Many methods can be viewed as also solving min y R(y) + L(y b) T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
38 Broader Comparisons Making Broader Comparisons Value Regularization Problem: Model preferences over parameters can t be easily compared. Solution: Compare outputs instead (Rifkin and Lippert, JMLR, 2007). Many methods can be viewed as also solving min y R(y) + L(y b) The regularizer, R, wants smooth outputs, y. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
39 Broader Comparisons Making Broader Comparisons Value Regularization Problem: Model preferences over parameters can t be easily compared. Solution: Compare outputs instead (Rifkin and Lippert, JMLR, 2007). Many methods can be viewed as also solving min y R(y) + L(y b) The regularizer, R, wants smooth outputs, y. The loss L, wants a close fit to the data, b (e.g. match labels). T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
40 Broader Comparisons Making Broader Comparisons Value Regularization Problem: Model preferences over parameters can t be easily compared. Solution: Compare outputs instead (Rifkin and Lippert, JMLR, 2007). Many methods can be viewed as also solving min y R(y) + L(y b) The regularizer, R, wants smooth outputs, y. The loss L, wants a close fit to the data, b (e.g. match labels). These goals typically compete. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
41 Broader Comparisons Generalized Maxent and Value Regularization To apply this idea to maxent: Change variables y = Ap. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
42 Broader Comparisons Generalized Maxent and Value Regularization To apply this idea to maxent: Change variables y = Ap. The regularizer corresponds to an image function: R(y) = AS(y) = min p S(p) + δ {0} (Ap y) T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
43 Broader Comparisons Generalized Maxent and Value Regularization To apply this idea to maxent: Change variables y = Ap. The regularizer corresponds to an image function: Loss is straightforward: R(y) = AS(y) = min p S(p) + δ {0} (Ap y) L(y) = δ ɛbp (y b) T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
44 Broader Comparisons SVMs and Value Regularization The Support Vector Machine (SVM, Vapnik) is one of the best known machine learning algorithms. Loss function is the soft margin hinge loss: 1 2 max(0, 1 by) T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
45 Broader Comparisons SVMs and Value Regularization The Support Vector Machine (SVM, Vapnik) is one of the best known machine learning algorithms. Loss function is the soft margin hinge loss: 1 2 max(0, 1 by) Regularizer uses a data-dependent positive definite matrix K T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
46 Broader Comparisons SVMs and Value Regularization The Support Vector Machine (SVM, Vapnik) is one of the best known machine learning algorithms. Loss function is the soft margin hinge loss: 1 2 max(0, 1 by) Regularizer uses a data-dependent positive definite matrix K In value regularization terms the objective function is: 1 2 λyt K 1 y T + hingeloss(y i, b i ) } {{ } i R } {{ } L T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
47 Broader Comparisons SVMs and Value Regularization The Support Vector Machine (SVM, Vapnik) is one of the best known machine learning algorithms. Loss function is the soft margin hinge loss: 1 2 max(0, 1 by) Regularizer uses a data-dependent positive definite matrix K In value regularization terms the objective function is: 1 2 λyt K 1 y T + hingeloss(y i, b i ) } {{ } i R } {{ } L T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
48 Broader Comparisons SVMs and Value Regularization The Support Vector Machine (SVM, Vapnik) is one of the best known machine learning algorithms. Loss function is the soft margin hinge loss: 1 2 max(0, 1 by) Regularizer uses a data-dependent positive definite matrix K In value regularization terms the objective function is: 1 2 λyt K 1 y T + hingeloss(y i, b i ) } {{ } i R } {{ } L Compare to the generalized maxent objective function. AS(y) + δ } {{ } ɛbp (y b) } {{ } R L T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
49 You are here Extensions/Conclusions 1 Generalizing Maxent 2 Two Examples 3 Broader Comparisons 4 Extensions/Conclusions T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
50 Extensions/Conclusions Other Models, Briefly Many NLP models owe direct debt. Connection is easily seen. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
51 Extensions/Conclusions Other Models, Briefly Many NLP models owe direct debt. Connection is easily seen. Conditional models, graphical models. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
52 Extensions/Conclusions Other Models, Briefly Many NLP models owe direct debt. Connection is easily seen. Conditional models, graphical models. Use exponential family (SBG entropy), almost always. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
53 Extensions/Conclusions Other Models, Briefly Many NLP models owe direct debt. Connection is easily seen. Conditional models, graphical models. Use exponential family (SBG entropy), almost always. Often replace marginal distributions with empirical counterparts. Strong assumption, big simplification. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
54 Extensions/Conclusions Other Models, Briefly Many NLP models owe direct debt. Connection is easily seen. Conditional models, graphical models. Use exponential family (SBG entropy), almost always. Often replace marginal distributions with empirical counterparts. Strong assumption, big simplification. Non-probabilistic models. Relax normalization. Use +/- trick. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
55 Extensions/Conclusions Other Models, Briefly Many NLP models owe direct debt. Connection is easily seen. Conditional models, graphical models. Use exponential family (SBG entropy), almost always. Often replace marginal distributions with empirical counterparts. Strong assumption, big simplification. Non-probabilistic models. Relax normalization. Use +/- trick. Continuous/mixed models. p becomes a function, A becomes an operator. Call in the mathematicians and approximation theory. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
56 Summary Extensions/Conclusions There is a class of models based on convex functions, which have interchangeable parts. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
57 Summary Extensions/Conclusions There is a class of models based on convex functions, which have interchangeable parts. Strong/exact connection to MAP estimation. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
58 Summary Extensions/Conclusions There is a class of models based on convex functions, which have interchangeable parts. Strong/exact connection to MAP estimation. Fenchel duality permits a quick switch of model assumptions. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
59 Summary Extensions/Conclusions There is a class of models based on convex functions, which have interchangeable parts. Strong/exact connection to MAP estimation. Fenchel duality permits a quick switch of model assumptions. Benefit: modular approach allows exploration of model space, by the modeler, or the computer. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
60 Summary Extensions/Conclusions There is a class of models based on convex functions, which have interchangeable parts. Strong/exact connection to MAP estimation. Fenchel duality permits a quick switch of model assumptions. Benefit: modular approach allows exploration of model space, by the modeler, or the computer. Key required tool: flexible, non-smooth optimization tools. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
61 Summary Extensions/Conclusions There is a class of models based on convex functions, which have interchangeable parts. Strong/exact connection to MAP estimation. Fenchel duality permits a quick switch of model assumptions. Benefit: modular approach allows exploration of model space, by the modeler, or the computer. Key required tool: flexible, non-smooth optimization tools. Harder: Characterize prior knowledge represented in choice of Regularizer and Loss. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
62 Summary Extensions/Conclusions There is a class of models based on convex functions, which have interchangeable parts. Strong/exact connection to MAP estimation. Fenchel duality permits a quick switch of model assumptions. Benefit: modular approach allows exploration of model space, by the modeler, or the computer. Key required tool: flexible, non-smooth optimization tools. Harder: Characterize prior knowledge represented in choice of Regularizer and Loss. Harder: Incorporate/factor out knowledge of the task(s) to be performed with the model. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
63 The End Extensions/Conclusions Thank You T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
64 You are here Appendix 5 Appendix Generalizing the Maxent Problem The Consequences of Normalization Phi-Exponential Families p as a Projection T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
65 Appendix Software for Experiments Apply quasi-newton method (LMVM) to dual problem. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
66 Appendix Software for Experiments Apply quasi-newton method (LMVM) to dual problem. Objective function requires matrix-vector multiplication (A T v M 1 ) T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
67 Appendix Software for Experiments Apply quasi-newton method (LMVM) to dual problem. Objective function requires matrix-vector multiplication (A T v M 1 ) Gradient requires additional matrix-vector multiplication (A v N 1 ) T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
68 Appendix Software for Experiments Apply quasi-newton method (LMVM) to dual problem. Objective function requires matrix-vector multiplication (A T v M 1 ) Gradient requires additional matrix-vector multiplication (A v N 1 ) Built on PETSC/TAO/Elefant. Will run single or parallel (MPI) with a simple switch. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
69 Appendix Software for Experiments Apply quasi-newton method (LMVM) to dual problem. Objective function requires matrix-vector multiplication (A T v M 1 ) Gradient requires additional matrix-vector multiplication (A v N 1 ) Built on PETSC/TAO/Elefant. Will run single or parallel (MPI) with a simple switch. Additional features to accommodate non-smooth duals to constraint relaxations. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
70 Appendix Software for Experiments Apply quasi-newton method (LMVM) to dual problem. Objective function requires matrix-vector multiplication (A T v M 1 ) Gradient requires additional matrix-vector multiplication (A v N 1 ) Built on PETSC/TAO/Elefant. Will run single or parallel (MPI) with a simple switch. Additional features to accommodate non-smooth duals to constraint relaxations. Possible synergy: Choon-Hui, Alex, and Vishy announce high performance non-smooth optimization package. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
71 Classic Maxent Solution Exponential Family Distribution Appendix Generalizing the Maxent Problem [ ] [ 1 1 Constraint equivalent to: A p = p = B b B Normalization is just another feature. Try to hide its existence in the solution: ] p = exp[a T µ] = exp[b T µ + 1 µ 1 ] = exp[b T 1 µ B 1T ( µ B )] = Z( µ B ) exp[bt µ B ] T is the log-partition function. Z is the partition function. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
72 Convex Analysis Recap A quick detour Appendix Generalizing the Maxent Problem The convex conjugate of a convex function F is Legendre if F (p ) := 1 C = int(dom F ) is non-empty 2 F is differentiable on C sup { p, p F (p)}. p dom(f ) 3 F (p) as p bdry(dom F ) For Legendre functions (in the int(dom F ) ) we have p = F (p ). T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
73 Appendix A More General Objective Function Bregman Divergence Generalizing the Maxent Problem F (p, q) := F (p) F (q) F (q), p q Let q be uniform (q i = 1/N). S is SBG entropy. S(p, q) = i +p i log(p i ) q i log(q i ) (1 + log(q i ))(p i q i ) = i p i log(1/p i ) + i p i log(n) i p i + i q i = S(p) + log(n). S is relative entropy when q not uniform. But we are not restricted to SBG entropy.... T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
74 Appendix A More General Maxent Problem New Objective Function Generalizing the Maxent Problem min F (p, p 0) subject to Ap = b and p i 0. p R n Solve it by using the Fenchel dual max F (A T µ + p 0) + b, µ µ dom F where (if F is Legendre) p = F ( A T µ + p 0 ). T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
75 Solution to the Problem New Distribution Families Appendix Generalizing the Maxent Problem This solution is more general but similar to the exponential family. p = F (B T µ B + p µ 1 ) = F (B T µ B + p 0 1T ( µ B )) Here, T (µ B ) is defined implicitly via 1 T F (B T µ B + p 0 1T ( µ B )) = 1 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
76 Scale Function Properties Analog to partition function Appendix The Consequences of Normalization T is not simple to calculate. But we can deduce that T is convex and use implicit differentiation to calculate its gradient. 0 = (B T (µ B )1 T ) 2 F (B T µ B + p 0 1T (µ B ))1 which on rearrangement gives T (µ B ) = B 2 F ( p )1 1 T 2 F ( p )1 = B q T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
77 Escort Distribution Appendix The Consequences of Normalization When F is additively separable q is indeed a probability distribution. (Can you see why?) q := 2 F ( p )1 1 T 2 F ( p )1 So B q is an expectation. When does p = q? T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
78 Appendix A Concrete Class of Entropies based on φ-logarithms Phi-Exponential Families log(p) = p 1 1 dx Usual construction x T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
79 Appendix A Concrete Class of Entropies based on φ-logarithms Phi-Exponential Families log φ (p) = p 1 1 dx Deformed Log φ(x) T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
80 Appendix A Concrete Class of Entropies based on φ-logarithms Phi-Exponential Families p 1 log φ (p) = dx Deformed Log 1 φ(x) Any positive increasing φ will do. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
81 Appendix A Concrete Class of Entropies based on φ-logarithms Phi-Exponential Families p 1 log φ (p) = dx Deformed Log 1 φ(x) Any positive increasing φ will do. Apply a scaling/smoothing normalization operation to obtain another such function: ψ(p) T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
82 Appendix A Concrete Class of Entropies based on φ-logarithms Phi-Exponential Families p 1 log φ (p) = dx Deformed Log 1 φ(x) Any positive increasing φ will do. Apply a scaling/smoothing normalization operation to obtain another such function: ψ(p) Form negative entropy term: s φ (p) = p log ψ (1/p) T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
83 Appendix A Concrete Class of Entropies based on φ-logarithms Phi-Exponential Families p 1 log φ (p) = dx Deformed Log 1 φ(x) Any positive increasing φ will do. Apply a scaling/smoothing normalization operation to obtain another such function: ψ(p) Form negative entropy term: s φ (p) = p log ψ (1/p) Leads to Convenient gradient: s φ (p) = log φ (p) + k φ φ-exponential family: p = exp φ [A T µ + p 0 k φ] T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
84 Appendix Phi-Exponential Families Example from the Physics Literature φ(x) = x q Φ p 2 a : Φ p p q q 1.5 q 1 1 q 0.5 Try this: Pick q (between 0 and 2) Let φ(p) = p q 1 2 p T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
85 Appendix Phi-Exponential Families Example from the Physics Literature φ(x) = x q 2 log Φ log q Yields the q-logarithm from the non-extensive thermodynamics literature. log q (x) := x 1 q 1 1 q q q 1 3 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
86 Appendix Phi-Exponential Families Example from the Physics Literature φ(x) = x q 2 b : Ψ p 2 q x 2 q 1 q 0.5 q 1.5 q 1 Scaling/Smoothing operation: ψ(x) = ( 1/u 0 ) 1 u φ(u) du In this case the operation only scales and reparameterizes φ to yield ψ. 1 2 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
87 Appendix Phi-Exponential Families Example from the Physics Literature φ(x) = x q 1 d : log Ψ p log 2 q p 2 q q 1 Use this log to form negative entropy: p log ψ (1/p) 1 q 0.5 q T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
88 Appendix Phi-Exponential Families Example from the Physics Literature φ(x) = x q 1 e : s Φ p p log q p 2 q q Only Legendre for q > 1. Why? q 1 1 q 1.5 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
89 Looking At Projections q Examples Appendix p as a Projection Orthogonal Projection q 0 Same as orthogonal projection. A p b p 0 p T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
90 Looking At Projections q Examples Appendix p as a Projection A p b Curved Projection q 0 Same as orthogonal projection. q =.6 p 0 p T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
91 Looking At Projections q Examples Appendix p as a Projection Oblique Projection A p b p p 0 q 0 Same as orthogonal projection. q =.6 Usual normalization. Actually relates directly to projection under SBG entropy. T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
92 Looking At Projections q Examples Appendix p as a Projection Curved Again q 0 Same as orthogonal projection. A p b p p 0 q =.6 Usual normalization. Actually relates directly to projection under SBG entropy. q = 1.6 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
93 Four Views of Optimality Appendix p as a Projection Solution to primal problem Bregman Projection A p b p 0 p T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
94 Four Views of Optimality Appendix p as a Projection Solution to primal problem Intersection of e-flat and m-flat manifolds. T Manifold Intersection Q p T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
95 Four Views of Optimality Appendix p as a Projection Solution to primal problem Intersection of e-flat and m-flat manifolds. Reverse distance solution. Non-convex! Smallest Reverse Distance p b Q p T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
96 Four Views of Optimality Appendix p as a Projection Solution to primal problem Intersection of e-flat and m-flat manifolds. Reverse distance solution. Non-convex! Orthogonality conditions. Sometimes used in algorithm design. Pseudo Orthogonality p b p p 0 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent / 36
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationNonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationA NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION
1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationMINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS. denote the family of probability density functions g on X satisfying
MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS I. Csiszár (Budapest) Given a σ-finite measure space (X, X, µ) and a d-tuple ϕ = (ϕ 1,..., ϕ d ) of measurable functions on X, for a = (a 1,...,
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More information1 Review of Least Squares Solutions to Overdetermined Systems
cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationDetermining distribution parameters from quantiles
Determining distribution parameters from quantiles John D. Cook Department of Biostatistics The University of Texas M. D. Anderson Cancer Center P. O. Box 301402 Unit 1409 Houston, TX 77230-1402 USA cook@mderson.org
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationFixed Point Theorems
Fixed Point Theorems Definition: Let X be a set and let T : X X be a function that maps X into itself. (Such a function is often called an operator, a transformation, or a transform on X, and the notation
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationThe p-norm generalization of the LMS algorithm for adaptive filtering
The p-norm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology
More informationALMOST COMMON PRIORS 1. INTRODUCTION
ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type
More informationLinear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.
1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationDuality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationLecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationElasticity Theory Basics
G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold
More informationThe Cobb-Douglas Production Function
171 10 The Cobb-Douglas Production Function This chapter describes in detail the most famous of all production functions used to represent production processes both in and out of agriculture. First used
More informationLinear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
More informationPart II Redundant Dictionaries and Pursuit Algorithms
Aisenstadt Chair Course CRM September 2009 Part II Redundant Dictionaries and Pursuit Algorithms Stéphane Mallat Centre de Mathématiques Appliquées Ecole Polytechnique Sparsity in Redundant Dictionaries
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationTrading regret rate for computational efficiency in online learning with limited feedback
Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai
More informationMarkov Chain Monte Carlo Simulation Made Simple
Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical
More informationNatural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression
Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15
More informationDuality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
More informationReview of Fundamental Mathematics
Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationIncreasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
More informationBindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8
Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e
More informationLevel Set Framework, Signed Distance Function, and Various Tools
Level Set Framework Geometry and Calculus Tools Level Set Framework,, and Various Tools Spencer Department of Mathematics Brigham Young University Image Processing Seminar (Week 3), 2010 Level Set Framework
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationLECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method
LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method Introduction to dual linear program Given a constraint matrix A, right
More informationSupport Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced
More informationCHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.
Some Continuous Probability Distributions CHAPTER 6: Continuous Uniform Distribution: 6. Definition: The density function of the continuous random variable X on the interval [A, B] is B A A x B f(x; A,
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationWhat is Linear Programming?
Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to
More informationDefinition and Properties of the Production Function: Lecture
Definition and Properties of the Production Function: Lecture II August 25, 2011 Definition and : Lecture A Brief Brush with Duality Cobb-Douglas Cost Minimization Lagrangian for the Cobb-Douglas Solution
More informationConstrained optimization.
ams/econ 11b supplementary notes ucsc Constrained optimization. c 2010, Yonatan Katznelson 1. Constraints In many of the optimization problems that arise in economics, there are restrictions on the values
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More information3 An Illustrative Example
Objectives An Illustrative Example Objectives - Theory and Examples -2 Problem Statement -2 Perceptron - Two-Input Case -4 Pattern Recognition Example -5 Hamming Network -8 Feedforward Layer -8 Recurrent
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationDual Methods for Total Variation-Based Image Restoration
Dual Methods for Total Variation-Based Image Restoration Jamylle Carter Institute for Mathematics and its Applications University of Minnesota, Twin Cities Ph.D. (Mathematics), UCLA, 2001 Advisor: Tony
More informationAn Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.
An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu In fact, the great watershed in optimization isn t between linearity
More informationA Potential-based Framework for Online Multi-class Learning with Partial Feedback
A Potential-based Framework for Online Multi-class Learning with Partial Feedback Shijun Wang Rong Jin Hamed Valizadegan Radiology and Imaging Sciences Computer Science and Engineering Computer Science
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationDuality in Linear Programming
Duality in Linear Programming 4 In the preceding chapter on sensitivity analysis, we saw that the shadow-price interpretation of the optimal simplex multipliers is a very useful concept. First, these shadow
More informationOn Adaboost and Optimal Betting Strategies
On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of
More informationSome stability results of parameter identification in a jump diffusion model
Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss
More informationLinear Programming in Matrix Form
Linear Programming in Matrix Form Appendix B We first introduce matrix concepts in linear programming by developing a variation of the simplex method called the revised simplex method. This algorithm,
More informationLog-Linear Models. Michael Collins
Log-Linear Models Michael Collins 1 Introduction This note describes log-linear models, which are very widely used in natural language processing. A key advantage of log-linear models is their flexibility:
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationPractical Guide to the Simplex Method of Linear Programming
Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationLinear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
More informationLasso on Categorical Data
Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More information10. Proximal point method
L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing
More informationReview D: Potential Energy and the Conservation of Mechanical Energy
MSSCHUSETTS INSTITUTE OF TECHNOLOGY Department of Physics 8.01 Fall 2005 Review D: Potential Energy and the Conservation of Mechanical Energy D.1 Conservative and Non-conservative Force... 2 D.1.1 Introduction...
More information1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.
Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
More informationInternational Doctoral School Algorithmic Decision Theory: MCDA and MOO
International Doctoral School Algorithmic Decision Theory: MCDA and MOO Lecture 2: Multiobjective Linear Programming Department of Engineering Science, The University of Auckland, New Zealand Laboratoire
More informationFitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,
More informationTwo-Stage Stochastic Linear Programs
Two-Stage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 Two-Stage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationNonlinear Regression:
Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day : Improved Inference
More informationLinear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.
Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:
More informationOperation Research. Module 1. Module 2. Unit 1. Unit 2. Unit 3. Unit 1
Operation Research Module 1 Unit 1 1.1 Origin of Operations Research 1.2 Concept and Definition of OR 1.3 Characteristics of OR 1.4 Applications of OR 1.5 Phases of OR Unit 2 2.1 Introduction to Linear
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationVariational Mean Field for Graphical Models
Variational Mean Field for Graphical Models CS/CNS/EE 155 Baback Moghaddam Machine Learning Group baback @ jpl.nasa.gov Approximate Inference Consider general UGs (i.e., not tree-structured) All basic
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationEpipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.
Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationNon-Inferiority Tests for Two Proportions
Chapter 0 Non-Inferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for non-inferiority and superiority tests in twosample designs in which
More informationErrata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page
Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More information5 Scalings with differential equations
5 Scalings with differential equations 5.1 Stretched coordinates Consider the first-order linear differential equation df dx + f = 0. Since it is first order, we expect a single solution to the homogeneous
More informationFrom Sparse Approximation to Forecast of Intraday Load Curves
From Sparse Approximation to Forecast of Intraday Load Curves Mathilde Mougeot Joint work with D. Picard, K. Tribouley (P7)& V. Lefieux, L. Teyssier-Maillard (RTE) 1/43 Electrical Consumption Time series
More informationOnline Convex Optimization
E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
More informationLinear Programming. April 12, 2005
Linear Programming April 1, 005 Parts of this were adapted from Chapter 9 of i Introduction to Algorithms (Second Edition) /i by Cormen, Leiserson, Rivest and Stein. 1 What is linear programming? The first
More informationBIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION
BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli
More informationVariational approach to restore point-like and curve-like singularities in imaging
Variational approach to restore point-like and curve-like singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure Blanc-Féraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012
More informationOptimization. J(f) := λω(f) + R emp (f) (5.1) m l(f(x i ) y i ). (5.2) i=1
5 Optimization Optimization plays an increasingly important role in machine learning. For instance, many machine learning algorithms minimize a regularized risk functional: with the empirical risk min
More informationSummer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.
Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is
More informationMulti-variable Calculus and Optimization
Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus
More information