Transfer Learning! and Prior Estimation! for VC Classes!

Size: px
Start display at page:

Download "Transfer Learning! and Prior Estimation! for VC Classes!"

Transcription

1 15-859(B) Machine Learning Theory! Transfer Learning! and Prior Estimation! for VC Classes! Liu Yang! 04/25/2012! Liu Yang Notation! Instance space X = R n! Concept space C of classifiers h: X -> {0,1! - Assume C has VC dimension vc <! Data Distribution D on X! Unknown target function h*: the true labeling function (Realizable case: h* in C)! Assume (h, g)=p x~d [h(x) g(x)] for any classifiers h, g, is a metric on C! Err (h) = P x~d [h(x) h*(x)]! Liu Yang

2 Transfer Learning Principle: solving a new learning problem is easier given that we ve solved several already!! How does it help?! - New task directly ``related to previous task! [eg, Ben-David & Schuller 03; Evgeniou, Micchelli, & Pontil 2005]! - Previous tasks give us useful sub-concepts [eg, Thrun 96]! - Can gather statistical info on the variety of concepts! [eg, Baxter 97; Ando & Zhang 04]! Example: Speech Recognition! - After training a few times, figured out the dialects! - Next time, just identify the dialect! - Much easier than training a recognizer from scratch! Liu Yang Model of Transfer Learning! Motivation: Learners often Not Too Altruistic! Layer 1: draw task iid from unknown prior! h 1 * prior Task 1 Task T Better Estimate of Prior!!! h T * Task 2 x 1 1,y1 1 x 1 k,y1 k h 2 * x T 1,yT 1 x T k,yt k Layer 2: per task, draw data iid from target! x 2 1,y2 1 x 2 k,y2 k Liu Yang

3 Identifiability of priors from joint distribs! Let prior π be any distribution on C! - example: (w, b) ~ multivariate normal! Target h* π ~ π! Data X = (X 1, X 2, ) iid D indep h* π! Z(π) = ((X 1, h* π (X 1 ), (X 2, h* π (X 2 ), )! Let [m] = {1,, m! Denote X I = {X i i in I (I : subset of natural numbers)! Z I (π) = {(X i, h* π (X i )) I in I! Theorem: Z [VC] (π 1 ) = d Z [VC] (π 2 ) iff π 1 = π 2! Liu Yang Identifiability of priors by! VC-dim joint distri! Threshold:! for two points x 1, x 2, if x 1 < x 2, then! Pr(+,+)=Pr(+), Pr(-,-)=Pr(-), Pr(+,-)=0,! So Pr(-,+)=Pr(+)-Pr(++) = Pr(+)-Pr(+)! - for any k > 1 points, can directly to reduce number of labels in the! joint prob from k to 1!! P( (-+) )! = P( (-+) )! = P( (+) ) - P( (++) ) = P( (+) ) - P( (+) )! + P( (+-) ) (unrealized labeling!!)! = P( (+) ) - P( (+) )! Liu Yang

4 Theorem: Z [VC] (π 1 ) = d Z [VC] (π 2 ) iff π 1 = π 2 Proof Sketch! Let m (h,g) = 1/m i=1 m II(h(X m ) g(x m ))! Then vc < implies wp1 forall h, g in C with h g! lim m -> m (h,g) = (h,g) > 0! is a metric on C by assumption,! so wp1 each h in C labels -seq (X 1, X 2 ) distinctly (h(x 1 ), h(x 2 ), )! => wp1 conditional distribution of the label seq Z(π) X identifies π! => distrib of Z(π) identifies π! ie Z (π 1 ) = d Z (π 2 ) implies π 1 = π 2! Liu Yang Identifiability of Priors from Joint Distributions! lower dim cond distrib y closer to ỹ Liu Yang

5 Identifiability of Priors from Joint Distributions! Liu Yang Identifiability of Priors from Joint Distributions! Liu Yang

6 Transfer Learning Setting! Collection of distribs on C (known)! Target distrib π* in (unknown)! Indep target fns h 1 *,, h T * ~ π* (unknown)! Indep iid D data sets X (t) = (X 1 (t), X 2 (t), ), t in [T]! Define Z (t) = ((X 1 (t), h t *(X 1 (t) )), (X 2 (t), h t *(X 2 (t) )), )! Learning alg gets Z (1), then produces ĥ 1, then gets Z (2), then produces ĥ 2, etc in sequence! Interested in: values of (ĥ t, h*(t)), and the number of h* t (X j (t) ) value alg needs to access &! Liu Yang Estimating the prior! Principle: learning would be easier if know π*! Fact: π* is identifiable by distrib of Z (t)! [VC] Strategy: Take samples Z (i) [VC] from past tasks 1,, t-1, use them to estimate distrib of Z (i) [VC], convert that into an estimate π of π*,! t-1 Use π in a prior-dependent learning alg for t-1 new task h t *! Assume is totally bounded in total variation! Can estimate π* at a bounded rate:! π* - π t < t converges to 0 (holds whp)! Liu Yang

7 Main Theorem! Pf Idea: relate convergence of estimator for d-dim joint to convergence! of estimator for the prior! r k Prior ( -dim joint)! k-dim joints! k-dim conditionals! d-dim joints! d-dim conditionals! Standard result: exist a converging estimator! for distrib on d exs, for totally bounded families! Liu Yang Transfer Learning! Given a prior-dependent learning A(, π), with! E[# labels accessed] = (, π) and producing ĥ! with E[ (ĥ, h*)]! For t = 1,, T! If t-1 > /4,! run prior-indep learning on Z (t) [VC/ ] to get ĥ t! Else let π t = argmin π in B(π t-1, t-1 ) ( /2, π) and run A( /2, π t ) on Z (t) to get ĥ t! Theorem: Forall t, E[ (ĥ t, h t *)], and! limsup T -> E[#labels accessed]/t ( /2, π*) + vc Liu Yang

8 Lemma:! Relate Prior to k-dim joint! Proof:! - The left inequality follows from, for any θ,θ in Θ and t (natural num), P Ztk (θ) - P Z tk (θ ) P Z t ( ) - P Z t ( ) = π θ - π θ! - To show the right inequality: Fix θ,θ in Θ, let γ>0,let B subseteq (Χ {-1, +1 ) be a measurable set st! - Carathéodory s extention theorem implies there exist disjoint sets {A i i in N! where A i is an event for finite number of data pts, st! P Zt (θ) (B) - P Z t (θ ) (B) < Σ i in N P Z t (θ) (A i ) - Σ i in N P Z t (θ ) (A i )+γ - Since these sums are bounded, there must exist n in N st! Σ i in N P Zt (θ) (A i )<γ+σ i=1 n P Zt (θ) (A i ) Liu Yang Relate Prior to k-dim joint! So that! As there exists k (natural num) & measurable A subset of! (X {-1, +1) k st! Thus! In sum,! - Taking the limit as γ-> 0 implies! - Particularly, it implies there exists a sequence st! QED! Liu Yang

9 Relate k-dim Joint to k-dim Cond! Want to bound between tvd of k-dim joints! Easier to bound diff between tvd of k-dim cond Distris! Use Jensen s ineqn to relate tvd of k-dim joint distri to k-dim cond distri :! P Ztk (θ) - P Z tk ( ) E [ P Y tk ( ) X tk - P Ytk ( ) X tk ]! Liu Yang Relate k-dim Condto d-dim Cond! - By def of total variation dist! - By Sauer's Lemma this is! - Notations: (By P(A and B) = P(A) P(A and not B) Two terms, one reduce dim by 1, the other brought y vector closer to the unrealizable labeling by one bit)! - Apply this to theta and theta, interested in the tvd between the cond Prob! Liu Yang

10 Tree Argument: Combinatorics! - Consider these two terms inductively define a binary tree! - Branch based on modification to the y vector! Branches left once, Branches right once, gets a gets a diff of probs difference of probs! for set I of one less for a one closer to an element! unrealized than parent! At any level, left to right nodes have Stop branching upon decreasing I values reaching a set I and a st either is an unrealized labeling, or I = d! - Any path can branch left k - d times (total) before reaching a set I w/ only d elements; can branch right d + 1 times in a row before reaching a st both probs zero, so the diff is zero! Liu Yang Tree Argument: Conclusions Bound original (root node) diff of probs by sum of the diff of probs for leaf nodes with I = d! Depth of any leaf node with I = d is at most (k - d)d! Maximum width of the tree is at most k - d So total #leaf nodes with I = d is at most d (k d) 2! -! For any {-1, +1 k,! Liu Yang

11 Relate k-dim Joint to d-dim Joint! - Note! - By exchangeability, the last line equals! - Want d-dim joint instead of d-dim cond Claim:! Liu Yang Proof of the Claim! Proof:! Suppose! Liu Yang

12 Reflect the Path of Proof! Earlier, π θ π θ P Ztk ( ) -P Z tk ( ) + r k! Just showed! Carathéodory s extention! Thm in general form! P Zt ( ) -P Z t ( ) P Z tk ( ) -P Z tk ( ) + r k! (tree) g k ( P Z td ( ) -P Z td ( ) ) + r k! where P Ztk ( ) -P Z tk ( ) 4 (2ek)2d+2 P Ztd ( ) - P Z td ( )! So in total! For any k in N, π θ π θ 4 (2ek) 2d+2 P Ztd (θ) - P Z td ( ) + r k! In particular, r k ->0 as k-> Let g(ε) = min k (4 (2ek) 2d+2 ε +r k )! (Why? Let ε k = (r k /(4 (2ek)2d+2)) 2 ε k =o(1) g(εk ) 4 (2ek)2d+2 ε k + r k =2r k g is monotonic in ε=> lim ε-> g(ε) = lim k-> g(ε k ) = lim k-> 2r k = 0 ) Liu Yang Distri Estimation Rate! The last component: rate of conv of our estimate of! - N( ) is the -covering number of! - Taking as the minimum distance skeleton estimate of Yatracos (1985)! achieves expected tvd from, for some! Solving for eps in terms of T implies E[tvd of d dim] -> 0 as T->! Conclusion for prior estimation:! - Pick the sequence of R t st R - Let w t be E[tvd of d dim] For any t -> 0, but with E[w t, apply Markov t ]/R ineq t -> 0 => P(w t > R t ) < E[w t ]/R t - Since E[tvd of d dim] -> 0, Markov s ineq => there is a bound on tvd -> 0 which! holds with prob that -> 1, as T->! - If tvd of d-dim joints -> 0, plugging into g() (just proved), tvd of priors -> 0 Together we just proved the theorem! Liu Yang

13 Rate of Conv under Hölder-Smooth! Definition:! Theorem! π θ π θ min k 4 (2ek) 2d+2 P Ztd (θ) - P Z td ( ) 1/2 + r k! Under Hölder-smooth! r k = O(L(d/k log(k/d)) )! Liu Yang Rate of Conv under Hölder-Smooth! Definition:! Theorem! Proof:! - By PAC bound, for any γ>0, wp>1-γ, a sample of partition C! into regions of width < γ! - For any! - For any! - By smoothness, w p >1-γ, we have everywhere! Liu Yang

14 Rate of Conv under Hölder-Smooth! - Thus for any wp>1-γ,! - Since the regions that define are the same,! r k! - Thus, wp 1-γ,! - Proceed as before, we get! - Plug in get! (*) Liu Yang Rate of Conv under Hölder-Smooth! Rate of conv of estimate of! - -cover size bounded by grid-argument under höldersmooth, plug that into the SC of Yachocos (1985), get T = O(ε -2 (L/ε) d/α log(1/ε)) for ε Solving for ε, we get! ε = O(L (log(tl)/t) α/(d+2α) )! - Plug this into (*), get the follow (hold for any γ)! - With, QED! Liu Yang

15 Is this Better than without Transfer?! The question becomes:! - How much does knowledge of target distrib π*help?! There are some (constant factor) gains for passive learning [eg HKS1992]! It really helps in Active learning:! - Earlier, we showed can get o(1/ ) for all π! For many C (eg linear separators), no prior-indep alg has this guarantee! Plugging in that method, transfer method accesses o(1/ ) labels on avg! Liu Yang An Example of! Prior-Dependent Learning Self-verifying Bayesian Active Learning! (a special type of stopping criterion)! - Given, adaptively decides # of query, then halts! - has the property that E[err] < when halts! Question: Can you do with E[#query] = o(1/ )? (passive learning need 1/ labels)! Liu Yang

16 Example: Intervals! Verification Lower Bound! In non-bayesian setting, supposing h* is empty interval! Given any classifier h,! just to verify err(h) < ε,! Need to verify h* is not an interval of width 2ε! Need an example in Ω(1/ε) regions to verify this fact! 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 2ε 0 1 h* Suppose h* is empty interval, D is uniform on [0,1]! Liu Yang Interval Example with prior Algorithm: Query random pts till find first +, do binary search to find end-pts Halt when reach a prespecified prior-based query budget Output posterior s Bayes classifier! Let budget N be high enough so E[err] <! - N = o(1/ ) sufficient for E[err w*>0] < : if w* > 0, even prior-independent analysis needs only E[#queries w*] = O(1/w* + log(1/ )) = o(1/ )! - N = o(1/ ) sufficient for E[err w*=0] < : if P(w*=0)>0, then after some L = O(log(1/ )) queries, wp> 1-, most prob mass on empty interval, so posterior s Bayes classifier has 0 error rate! Liu Yang

17 Can do o(1/eps) for any VC-class! Theorem: With the prior, can get o(1/ ) QC! There are methods that find a good classifier in o(1/eps) queries (though they aren t self-verifying) [BHW08]! Need set a stopping criterion for those alg! The stop criterion we use : budget! Set the budget to be just large enough so E[err] <! Liu Yang

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Lecture 4: BK inequality 27th August and 6th September, 2007

Lecture 4: BK inequality 27th August and 6th September, 2007 CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

More information

Influences in low-degree polynomials

Influences in low-degree polynomials Influences in low-degree polynomials Artūrs Bačkurs December 12, 2012 1 Introduction In 3] it is conjectured that every bounded real polynomial has a highly influential variable The conjecture is known

More information

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

More information

Interactive Machine Learning. Maria-Florina Balcan

Interactive Machine Learning. Maria-Florina Balcan Interactive Machine Learning Maria-Florina Balcan Machine Learning Image Classification Document Categorization Speech Recognition Protein Classification Branch Prediction Fraud Detection Spam Detection

More information

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

More information

1 Formulating The Low Degree Testing Problem

1 Formulating The Low Degree Testing Problem 6.895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Linearity Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz In the last lecture, we proved a weak PCP Theorem,

More information

Lecture 4: AC 0 lower bounds and pseudorandomness

Lecture 4: AC 0 lower bounds and pseudorandomness Lecture 4: AC 0 lower bounds and pseudorandomness Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: Jason Perry and Brian Garnett In this lecture,

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Notes from Week 1: Algorithms for sequential prediction

Notes from Week 1: Algorithms for sequential prediction CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

More information

Full and Complete Binary Trees

Full and Complete Binary Trees Full and Complete Binary Trees Binary Tree Theorems 1 Here are two important types of binary trees. Note that the definitions, while similar, are logically independent. Definition: a binary tree T is full

More information

9 More on differentiation

9 More on differentiation Tel Aviv University, 2013 Measure and category 75 9 More on differentiation 9a Finite Taylor expansion............... 75 9b Continuous and nowhere differentiable..... 78 9c Differentiable and nowhere monotone......

More information

6.2 Permutations continued

6.2 Permutations continued 6.2 Permutations continued Theorem A permutation on a finite set A is either a cycle or can be expressed as a product (composition of disjoint cycles. Proof is by (strong induction on the number, r, of

More information

Data Mining on Streams

Data Mining on Streams Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

2.1 Complexity Classes

2.1 Complexity Classes 15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

1 The Line vs Point Test

1 The Line vs Point Test 6.875 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Low Degree Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz Having seen a probabilistic verifier for linearity

More information

Lecture 2: Universality

Lecture 2: Universality CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal

More information

x a x 2 (1 + x 2 ) n.

x a x 2 (1 + x 2 ) n. Limits and continuity Suppose that we have a function f : R R. Let a R. We say that f(x) tends to the limit l as x tends to a; lim f(x) = l ; x a if, given any real number ɛ > 0, there exists a real number

More information

Lecture 1: Course overview, circuits, and formulas

Lecture 1: Course overview, circuits, and formulas Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik

More information

RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis

RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis RANDOM INTERVAL HOMEOMORPHISMS MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis This is a joint work with Lluís Alsedà Motivation: A talk by Yulij Ilyashenko. Two interval maps, applied

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results

More information

Lecture 5 - CPA security, Pseudorandom functions

Lecture 5 - CPA security, Pseudorandom functions Lecture 5 - CPA security, Pseudorandom functions Boaz Barak October 2, 2007 Reading Pages 82 93 and 221 225 of KL (sections 3.5, 3.6.1, 3.6.2 and 6.5). See also Goldreich (Vol I) for proof of PRF construction.

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

Lecture 4 Online and streaming algorithms for clustering

Lecture 4 Online and streaming algorithms for clustering CSE 291: Geometric algorithms Spring 2013 Lecture 4 Online and streaming algorithms for clustering 4.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

More information

Lecture 6 Online and streaming algorithms for clustering

Lecture 6 Online and streaming algorithms for clustering CSE 291: Unsupervised learning Spring 2008 Lecture 6 Online and streaming algorithms for clustering 6.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

More information

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

Stationary random graphs on Z with prescribed iid degrees and finite mean connections Stationary random graphs on Z with prescribed iid degrees and finite mean connections Maria Deijfen Johan Jonasson February 2006 Abstract Let F be a probability distribution with support on the non-negative

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Classification/Decision Trees (II)

Classification/Decision Trees (II) Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).

More information

Online Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes

Online Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes Advanced Course in Machine Learning Spring 2011 Online Learning Lecturer: Shai Shalev-Shwartz Scribe: Shai Shalev-Shwartz In this lecture we describe a different model of learning which is called online

More information

Practice with Proofs

Practice with Proofs Practice with Proofs October 6, 2014 Recall the following Definition 0.1. A function f is increasing if for every x, y in the domain of f, x < y = f(x) < f(y) 1. Prove that h(x) = x 3 is increasing, using

More information

A Practical Scheme for Wireless Network Operation

A Practical Scheme for Wireless Network Operation A Practical Scheme for Wireless Network Operation Radhika Gowaikar, Amir F. Dana, Babak Hassibi, Michelle Effros June 21, 2004 Abstract In many problems in wireline networks, it is known that achieving

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm. Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

More information

Secretary Problems. October 21, 2010. José Soto SPAMS

Secretary Problems. October 21, 2010. José Soto SPAMS Secretary Problems October 21, 2010 José Soto SPAMS A little history 50 s: Problem appeared. 60 s: Simple solutions: Lindley, Dynkin. 70-80: Generalizations. It became a field. ( Toy problem for Theory

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

Dynamic TCP Acknowledgement: Penalizing Long Delays

Dynamic TCP Acknowledgement: Penalizing Long Delays Dynamic TCP Acknowledgement: Penalizing Long Delays Karousatou Christina Network Algorithms June 8, 2010 Karousatou Christina (Network Algorithms) Dynamic TCP Acknowledgement June 8, 2010 1 / 63 Layout

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear

More information

Analysis of Algorithms I: Optimal Binary Search Trees

Analysis of Algorithms I: Optimal Binary Search Trees Analysis of Algorithms I: Optimal Binary Search Trees Xi Chen Columbia University Given a set of n keys K = {k 1,..., k n } in sorted order: k 1 < k 2 < < k n we wish to build an optimal binary search

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n + 1 +...

n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n + 1 +... 6 Series We call a normed space (X, ) a Banach space provided that every Cauchy sequence (x n ) in X converges. For example, R with the norm = is an example of Banach space. Now let (x n ) be a sequence

More information

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)! Math 7 Fall 205 HOMEWORK 5 SOLUTIONS Problem. 2008 B2 Let F 0 x = ln x. For n 0 and x > 0, let F n+ x = 0 F ntdt. Evaluate n!f n lim n ln n. By directly computing F n x for small n s, we obtain the following

More information

minimal polyonomial Example

minimal polyonomial Example Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We

More information

CIS 700: algorithms for Big Data

CIS 700: algorithms for Big Data CIS 700: algorithms for Big Data Lecture 6: Graph Sketching Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us Sketching Graphs? We know how to sketch vectors: v Mv

More information

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications Jie Gao Department of Computer Science Stanford University Joint work with Li Zhang Systems Research Center Hewlett-Packard

More information

4.1 Introduction - Online Learning Model

4.1 Introduction - Online Learning Model Computational Learning Foundations Fall semester, 2010 Lecture 4: November 7, 2010 Lecturer: Yishay Mansour Scribes: Elad Liebman, Yuval Rochman & Allon Wagner 1 4.1 Introduction - Online Learning Model

More information

The Butterfly, Cube-Connected-Cycles and Benes Networks

The Butterfly, Cube-Connected-Cycles and Benes Networks The Butterfly, Cube-Connected-Cycles and Benes Networks Michael Lampis mlambis@softlab.ntua.gr NTUA The Butterfly, Cube-Connected-Cycles and Benes Networks p.1/16 Introduction Hypercubes are computationally

More information

6.852: Distributed Algorithms Fall, 2009. Class 2

6.852: Distributed Algorithms Fall, 2009. Class 2 .8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparison-based algorithms. Basic computation in general synchronous networks: Leader election

More information

The Heat Equation. Lectures INF2320 p. 1/88

The Heat Equation. Lectures INF2320 p. 1/88 The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

10.2 Series and Convergence

10.2 Series and Convergence 10.2 Series and Convergence Write sums using sigma notation Find the partial sums of series and determine convergence or divergence of infinite series Find the N th partial sums of geometric series and

More information

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES CHRISTOPHER HEIL 1. Cosets and the Quotient Space Any vector space is an abelian group under the operation of vector addition. So, if you are have studied

More information

Gambling and Data Compression

Gambling and Data Compression Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Probability Generating Functions

Probability Generating Functions page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence

More information

On the k-path cover problem for cacti

On the k-path cover problem for cacti On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China zeminjin@eyou.com, x.li@eyou.com Abstract In this paper we

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012 Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic

More information

Online Algorithms: Learning & Optimization with No Regret.

Online Algorithms: Learning & Optimization with No Regret. Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

How Efficient can Memory Checking be?

How Efficient can Memory Checking be? How Efficient can Memory Checking be? Cynthia Dwork 1, Moni Naor 2,, Guy N. Rothblum 3,, and Vinod Vaikuntanathan 4, 1 Microsoft Research 2 The Weizmann Institute of Science 3 MIT 4 IBM Research Abstract.

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1]. Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm. Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

More information

I. Pointwise convergence

I. Pointwise convergence MATH 40 - NOTES Sequences of functions Pointwise and Uniform Convergence Fall 2005 Previously, we have studied sequences of real numbers. Now we discuss the topic of sequences of real valued functions.

More information

Solutions of Equations in One Variable. Fixed-Point Iteration II

Solutions of Equations in One Variable. Fixed-Point Iteration II Solutions of Equations in One Variable Fixed-Point Iteration II Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

Chapter 6: Episode discovery process

Chapter 6: Episode discovery process Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

Classification - Examples

Classification - Examples Lecture 2 Scheduling 1 Classification - Examples 1 r j C max given: n jobs with processing times p 1,...,p n and release dates r 1,...,r n jobs have to be scheduled without preemption on one machine taking

More information

Find-The-Number. 1 Find-The-Number With Comps

Find-The-Number. 1 Find-The-Number With Comps Find-The-Number 1 Find-The-Number With Comps Consider the following two-person game, which we call Find-The-Number with Comps. Player A (for answerer) has a number x between 1 and 1000. Player Q (for questioner)

More information

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics Undergraduate Notes in Mathematics Arkansas Tech University Department of Mathematics An Introductory Single Variable Real Analysis: A Learning Approach through Problem Solving Marcel B. Finan c All Rights

More information

Introduction to Automata Theory. Reading: Chapter 1

Introduction to Automata Theory. Reading: Chapter 1 Introduction to Automata Theory Reading: Chapter 1 1 What is Automata Theory? Study of abstract computing devices, or machines Automaton = an abstract computing device Note: A device need not even be a

More information

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are

More information

Lecture 7: Finding Lyapunov Functions 1

Lecture 7: Finding Lyapunov Functions 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1

More information

Lectures 5-6: Taylor Series

Lectures 5-6: Taylor Series Math 1d Instructor: Padraic Bartlett Lectures 5-: Taylor Series Weeks 5- Caltech 213 1 Taylor Polynomials and Series As we saw in week 4, power series are remarkably nice objects to work with. In particular,

More information

SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov

SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov Serdica Math. J. 30 (2004), 95 102 SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS Nickolay Khadzhiivanov, Nedyalko Nenov Communicated by V. Drensky Abstract. Let Γ(M) where M V (G) be the set of all vertices

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

PROCESS AND TRUTH-TABLE CHARACTERISATIONS OF RANDOMNESS

PROCESS AND TRUTH-TABLE CHARACTERISATIONS OF RANDOMNESS PROCESS AND TRUTH-TABLE CHARACTERISATIONS OF RANDOMNESS ADAM R. DAY Abstract. This paper uses quick process machines to provide characterisations of computable randomness, Schnorr randomness and weak randomness.

More information

LECTURE 15: AMERICAN OPTIONS

LECTURE 15: AMERICAN OPTIONS LECTURE 15: AMERICAN OPTIONS 1. Introduction All of the options that we have considered thus far have been of the European variety: exercise is permitted only at the termination of the contract. These

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Numerical methods for American options

Numerical methods for American options Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment

More information

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS

DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS ASHER M. KACH, KAREN LANGE, AND REED SOLOMON Abstract. We construct two computable presentations of computable torsion-free abelian groups, one of isomorphism

More information

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you

More information