Multiplicative Update Algorithms, Boosting and Ensemble Methods

Similar documents
E0 370 Statistical Learning Theory Lecture 20 (Nov 17, 2011)

Multiobjective Prediction with Expert Advice

ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS

A Probability Density Function for Google s stocks

cooking trajectory boiling water B (t) microwave time t (mins)

Analogue and Digital Signal Processing. First Term Third Year CS Engineering By Dr Mukhtiar Ali Unar

The Transport Equation

Chapter 7. Response of First-Order RL and RC Circuits

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

Large Scale Online Learning.

Differential Equations. Solving for Impulse Response. Linear systems are often described using differential equations.

4 Convolution. Recommended Problems. x2[n] 1 2[n]

Signal Processing and Linear Systems I

Accelerated Gradient Methods for Stochastic Optimization and Online Learning

Signal Rectification

Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.

Making a Faster Cryptanalytic Time-Memory Trade-Off

MTH6121 Introduction to Mathematical Finance Lesson 5

Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)

Permutations and Combinations

Acceleration Lab Teacher s Guide

17 Laplace transform. Solving linear ODE with piecewise continuous right hand sides

4. International Parity Conditions

Bayesian Filtering with Online Gaussian Process Latent Variable Models

1. y 5y + 6y = 2e t Solution: Characteristic equation is r 2 5r +6 = 0, therefore r 1 = 2, r 2 = 3, and y 1 (t) = e 2t,

PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE

TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS

Module 4. Single-phase AC circuits. Version 2 EE IIT, Kharagpur

AP Calculus AB 2013 Scoring Guidelines

A Short Introduction to Boosting

The Application of Multi Shifts and Break Windows in Employees Scheduling

On Learning Algorithms for Nash Equilibria

Chapter 8: Regression with Lagged Explanatory Variables

Measuring macroeconomic volatility Applications to export revenue data,

How To Predict A Person'S Behavior

INTRODUCTION TO MARKETING PERSONALIZATION. How to increase your sales with personalized triggered s


Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.

An Online Learning-based Framework for Tracking

Full-wave rectification, bulk capacitor calculations Chris Basso January 2009

Random Walk in 1-D. 3 possible paths x vs n. -5 For our random walk, we assume the probabilities p,q do not depend on time (n) - stationary

Economics Honors Exam 2008 Solutions Question 5

Hedging with Forwards and Futures

Analysis of Planck and the Equilibrium ofantis in Tropical Physics

AP Calculus BC 2010 Scoring Guidelines

Real-time Particle Filters

AP Calculus AB 2010 Scoring Guidelines

Inventory Planning with Forecast Updates: Approximate Solutions and Cost Error Bounds

Stability. Coefficients may change over time. Evolution of the economy Policy changes

Optimal Withdrawal Strategies for Retirees with Multiple Savings Accounts

The option pricing framework

Chapter 13. Network Flow III Applications Edge disjoint paths Edge-disjoint paths in a directed graphs

Vector Autoregressions (VARs): Operational Perspectives

A New Adaptive Ensemble Boosting Classifier for Concept Drifting Stream Data

Performance Center Overview. Performance Center Overview 1

The Real Business Cycle paradigm. The RBC model emphasizes supply (technology) disturbances as the main source of

Technical Appendix to Risk, Return, and Dividends

SPEC model selection algorithm for ARCH models: an options pricing evaluation framework

Morningstar Investor Return

How Much Can Taxes Help Selfish Routing?

Time Consisency in Porfolio Managemen

Online Multi-Class LPBoost

Chapter 8 Student Lecture Notes 8-1

Predicting Stock Market Index Trading Signals Using Neural Networks

SOLID MECHANICS TUTORIAL GEAR SYSTEMS. This work covers elements of the syllabus for the Edexcel module 21722P HNC/D Mechanical Principles OUTCOME 3.

Usefulness of the Forward Curve in Forecasting Oil Prices

Unstructured Experiments

11/6/2013. Chapter 14: Dynamic AD-AS. Introduction. Introduction. Keeping track of time. The model s elements

Optimal Stock Selling/Buying Strategy with reference to the Ultimate Average

Chapter 2 Kinematics in One Dimension

2.4 Network flows. Many direct and indirect applications telecommunication transportation (public, freight, railway, air, ) logistics

Single-machine Scheduling with Periodic Maintenance and both Preemptive and. Non-preemptive jobs in Remanufacturing System 1

Stochastic Optimal Control Problem for Life Insurance

Dynamic programming models and algorithms for the mutual fund cash balance problem

Hierarchical Mixtures of AR Models for Financial Time Series Analysis

CLASSIFICATION OF REINSURANCE IN LIFE INSURANCE

An Online Portfolio Selection Algorithm with Regret Logarithmic in Price Variation

Information Theoretic Evaluation of Change Prediction Models for Large-Scale Software

Cost-Sensitive Learning by Cost-Proportionate Example Weighting

Working Paper No Net Intergenerational Transfers from an Increase in Social Security Benefits

Module 3 Design for Strength. Version 2 ME, IIT Kharagpur

RC (Resistor-Capacitor) Circuits. AP Physics C

Present Value Methodology

Optimal Investment and Consumption Decision of Family with Life Insurance

Distributed and Secure Computation of Convex Programs over a Network of Connected Processors

A Re-examination of the Joint Mortality Functions

HFCC Math Lab Intermediate Algebra - 13 SOLVING RATE-TIME-DISTANCE PROBLEMS

Towards Optimal Capacity Segmentation with Hybrid Cloud Pricing

Imagine a Source (S) of sound waves that emits waves having frequency f and therefore

Maintenance scheduling and process optimization under uncertainty

C Fast-Dealing Property Trading Game C

THE FIRM'S INVESTMENT DECISION UNDER CERTAINTY: CAPITAL BUDGETING AND RANKING OF NEW INVESTMENT PROJECTS

C Fast-Dealing Property Trading Game C

9. Capacitor and Resistor Circuits

APPLICATION OF THE KALMAN FILTER FOR ESTIMATING CONTINUOUS TIME TERM STRUCTURE MODELS: THE CASE OF UK AND GERMANY. January, 2005

Stock Trading with Recurrent Reinforcement Learning (RRL) CS229 Application Project Gabriel Molina, SUID

Cointegration: The Engle and Granger approach

Individual Health Insurance April 30, 2008 Pages

Algorithms for Portfolio Management based on the Newton Method

Transcription:

CS369M: Algorihms for Modern Massive Daa Se Analysis Lecure 16-11/11/2009 Muliplicaive Updae Algorihms, Boosing and Ensemble Mehods Lecurer: Michael Mahoney Scribes: Mark Wagner and Yuning Sun *Undied Noes 1 Graph Pariioning Expansion of random cus can relax o real numbers d λ 2 = min xr V φ = min S V E ( S, S 1 n S ( 1 S Aij (x i x j 2 ij (x i x j 2 specral ( 2 or can relax o a vecor Claim: d λ 2 = min x jr n ij A ij x i x j 2 2 ij x i x j 2 ( 3 2 Proof: ( 3 is relaxaion of ( 2, he direc soluion of ( 3 is Claim ( 3 is equal o SDP min ij A ij x i x j 2 2 s ij x i x j 2 2 = n which is equal o min L G [r]x s L kn [race]x = n x 0 ( 4 Problem harder (SDP versus eigenvalue problem useful - look a duals include exra informaion Fac: 1

Dual of ( 4 is max y s L G y 1 n L n A feasible soluion is a number y and a marix Y such ha L G = y n L n + Y Recall: S S = 1 T S L n 1 S E ( S, S = 1 T S L G L S bu So cos of cu y Wha s going on here? ( y 1 T S L G 1 S = 1 S n L n + Y 1 S y 1 S n L n1 S Recall embedding a scaled version of he complee graph in G we know he expansion and cu values for K n and so relae i o G. Noe: K n is an expander. Flow - if graph H of known expansion can be embedded in G as a flow hen h H h G. Then he opimal soluion for a fixed H can be compued as he soluion o a concurren mulicommodiy flow problem. O(log n approximaion, which is igh Specral - relax o an eigenvalue problem and use Cheeger. ARV-ype mehods Can I consruc ieraively a graph H (and es is expansion and sop when i s a good expander and ge a bound on h G. yes - wrie as an SDP. Can compue faser by using primal-dual ideas ARV - original O ( log n AHK - primal-dual mehod in heoreical compuer science. boh using mulicommodiy flows KRV - single-commodiy flows using cu maching game OSVV - exended KRV LMO - empirical evaluaion. Describes as specral modified 2

2 Online Learning predicion/inference problem - given daa predic somehing. Ways o formalize his, differen assumpions on wha he daa are (real numbers, graphs, srings where hey come/generaed from (according o an underlying disribuion; access o side informaion Tradiional Saisic daa generaed according o an underlying disribion learn paramers describing disribion evaluae qualiy by Risk - expeced value of some loss funcion over he disribion in he daa ERM SRM Wha if he daa are no generaed by some underlying process? wih no assumpions, hard o predic Idea: ge daa elemens sequenially {y i, x i } R predic he nex elemen. Evaluaed by he loss funcion e.g. number of incorrec predicions Access o side informaion, namely predicion of a se of expers. Expers make predicions according o some rule deerminisic, random, adversarial, ec A each ime sep, he expers also have a loss Goal: wan loss no oo much worse han he bes exper. Also: in predicion a ime you have access o your predicion and losses in he pas predicions and losses of he expers in he pas Wha are he expers? oracle saisical model cerain seps in an algorihm basis funcions 3 Muliplicaive weighs updae rule mainain probabiliy disribuion over expers 3

a each sep, increase or decrease he weigh muliplicaively ie by muliplying by (1 + = parameer judges how much confidence o place in exper s predicion/regularizaion parameer Discree Expers: se of expers E ha makes predicions f Ei, R n se of vecors {xr n : n i=1 x i = 1} = weighs on expers l (i = loss of exper i a sage l (ˆp, y = loss of algorihm = n i=1 x il (i Algorihm 1. W 0 = 1 2. when y and he expers predicion algorihm uses his updae rule W +1,i = W,i (1 l(i = P T (1 l(i = e n P T l(i, where n = log (1 Thm: For any exper E j j [n] Proof use poenial funcion argumen. T l ( ˆp log n + y T l (i Firs, relae poeial funcion W = n i=1 W,i P T W +1 W +1,i = (1 l(i = e n P n l(i Nex relae poenial funcion o performance of algorihm n n W +1 = w +1,i = w,i (1 l(i i=1 i=1 Noe (1 x 1 x for 0 1 4

So w +1 w,i (1 l (i i ( = w 1 w,i l (i w = w (1 l (ˆp w exp ( l ( ˆp ( T w exp l (ˆp i e η P l(i W +1 ne P l(ˆp η l (i log n l ( ˆp l ( ˆp log n + n l (n log n + (1 + l (i Define he regre if = log n T Q: is log n large or small? R T = T l (p min l (f E, expers log n + l (i 2 T log n If exra informaion is given ha one exper will be perfec find he bes exper in logn misakes -muliplicaive weighs updae rule says you re no much worse han his scenario, in more general cases applicaions o algorihms AHK generalize he losses o marix losses o solve SDPs O ( n 2 ime KRV - cu-maching game o solve sparses cus. 2 players: a cu player, and a maching player 1. G 0 = 0 2. in each round, cu player chooses a bisecion ( S, S and he maching player chooses a perfec maching M across ( S, S. hen G +1 G + M. 3. game sops when G is an expander eg l G 1 10 4. value of game is number of seps i ook. goal: cu player - sop soon (find expander fas, maching player - delay sop. 5

Dual algorihm. 1. Le G = γg 2. approximae he 2nd eigenvecor of L G. Degree of approximaion governed by regularizaion parameer. 3. use he bisecion ( S n/2, S n/2 from he sweep cu. Call flow-based improvemen algorihm o ge a cu ( T, T and a maching M unil sopping rule is saisfied. Le G = G + M. Reurn bes cu (T T Why would you hope/expec ha hese muliplicaive weigh updae algorihms would perform well in pracice? faser han naive compuaion ofen give beer answers han he exac algorihm Boosing - example of an ensemble mehod Given X, learn C : X {0, 1} a classificaion rule from some concep class C Risk = E [error]. Define a γ weak learning algorihm is one ha has error 1 2 γ. An srong learning algorihm wih error. Can one combine a se of weak learners ino a srong learner. Idea - weak learners are a lile beer han chance, so combining hem doesn make hings worse. If hey are differen hen we can hope for improvemen by averaging predicions. Boosing - AdaBoos - do boosing by sampling. Take a sample of daa and use algorihm o boos on ha sample do his in an ieraive manner by updaing weighs on daa poins o find new classificaion rule for daa poins ha are misclassified Evens - hypoheses for he classificaion rule oupu a each sep A each sep - ge a classificaion rule h (weak learner and final classificaion algorihm use h as predicion 6