1 Example 1: Axis-aligned rectangles



Similar documents
Recurrence. 1 Definitions and main statements

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

We are now ready to answer the question: What are the possible cardinalities for finite fields?

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

What is Candidate Sampling

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Generalizing the degree sequence problem

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

n + d + q = 24 and.05n +.1d +.25q = 2 { n + d + q = 24 (3) n + 2d + 5q = 40 (2)

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

The Greedy Method. Introduction. 0/1 Knapsack Problem

Support Vector Machines

8 Algorithm for Binary Searching in Trees

BERNSTEIN POLYNOMIALS

The OC Curve of Attribute Acceptance Plans

Extending Probabilistic Dynamic Epistemic Logic

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Ring structure of splines on triangulations

Embedding lattices in the Kleene degrees

General Auction Mechanism for Search Advertising

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Energies of Network Nastsemble

Forecasting the Direction and Strength of Stock Market Movement

1. Measuring association using correlation and regression

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Implementation of Deutsch's Algorithm Using Mathcad

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

Production. 2. Y is closed A set is closed if it contains its boundary. We need this for the solution existence in the profit maximization problem.

Figure 1. Inventory Level vs. Time - EOQ Problem

Quantization Effects in Digital Filters

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

REGULAR MULTILINEAR OPERATORS ON C(K) SPACES

Product-Form Stationary Distributions for Deficiency Zero Chemical Reaction Networks

J. Parallel Distrib. Comput.

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

The University of Texas at Austin. Austin, Texas December Abstract. programs in which operations of dierent processes mayoverlap.

A Probabilistic Theory of Coherence

PERRON FROBENIUS THEOREM

Fisher Markets and Convex Programs

CHAPTER 14 MORE ABOUT REGRESSION

VoIP over Multiple IEEE Wireless LANs

The Power of Slightly More than One Sample in Randomized Load Balancing

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ

COLLOQUIUM MATHEMATICUM

Project Networks With Mixed-Time Constraints

Upper Bounds on the Cross-Sectional Volumes of Cubes and Other Problems

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Performance Analysis and Coding Strategy of ECOC SVMs

How Much to Bet on Video Poker

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

Support vector domain description

Chapter 7: Answers to Questions and Problems

INSTITUT FÜR INFORMATIK

To Fill or not to Fill: The Gas Station Problem

Complete Fairness in Secure Two-Party Computation

Sngle Snk Buy at Bulk Problem and the Access Network

Equlbra Exst and Trade S effcent proportionally

Relay Secrecy in Wireless Networks with Eavesdropper

An Overview of Financial Mathematics

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

Section 5.4 Annuities, Present Value, and Amortization

A Lyapunov Optimization Approach to Repeated Stochastic Games

On Leonid Gurvits s proof for permanents

Availability-Based Path Selection and Network Vulnerability Assessment

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

OPTIMAL INVESTMENT POLICIES FOR THE HORSE RACE MODEL. Thomas S. Ferguson and C. Zachary Gilstein UCLA and Bell Communications May 1985, revised 2004

Global stability of Cohen-Grossberg neural network with both time-varying and continuous distributed delays

Combinatorial Agency of Threshold Functions

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

INTERPRETING TRUE ARITHMETIC IN THE LOCAL STRUCTURE OF THE ENUMERATION DEGREES.

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

How To Calculate The Accountng Perod Of Nequalty

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

Loop Parallelization

Logistic Regression. Steve Kroon

Interest Rate Fundamentals

From Selective to Full Security: Semi-Generic Transformations in the Standard Model

Lecture 3: Force of Interest, Real Interest Rate, Annuity

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Calculate An Approxmaton Factor Of 1 1/E

On Lockett pairs and Lockett conjecture for π-soluble Fitting classes

A LAW OF LARGE NUMBERS FOR FINITE-RANGE DEPENDENT RANDOM MATRICES

Strategy Machines. Representation and Complexity of Strategies in Infinite Games

An Alternative Way to Measure Private Equity Performance

STATISTICAL DATA ANALYSIS IN EXCEL

Pricing Overage and Underage Penalties for Inventory with Continuous Replenishment and Compound Renewal Demand via Martingale Methods

This circuit than can be reduced to a planar circuit

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Level Annuities with Payments Less Frequent than Each Interest Period

Lecture 2: Single Layer Perceptrons Kevin Swingler

THE HIT PROBLEM FOR THE DICKSON ALGEBRA

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks

FINITE HILBERT STABILITY OF (BI)CANONICAL CURVES

Nonbinary Quantum Error-Correcting Codes from Algebraic Curves

A generalized hierarchical fair service curve algorithm for high network utilization and link-sharing

Transcription:

COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton wth VC-dmenson, reduced the problem of fndng a good PAClearnng algorthm to the problem of computng the VC-dmenson of a gven hypothess space. Recall that VC-dmeson s defned usng the noton of a shattered set,.e. a subset S of the doman such that Π H (S 2 S. In ths lecture, we compute the VC-dmenson of several hypothess spaces by computng the maxmum sze of a shattered set. 1 Example 1: Axs-algned rectangles Not all sets of four ponts are shattered. For example the followng arrangement s mpossble: - Fgure 1: An mpossble assgnment of /- to the data, as all rectangles that contan the outer three ponts (marked must also contan the one pont. However, ths s not suffcent to conclude that the VC-dmenson s at most three. Note that the followng set does shatter: Fgure 2: A set of four ponts that shatters, as there s an axs-algned rectangle that contans any gven subset of the ponts but contans no others. Therefore, the VC-dmenson s at least four. In fact, t s exactly four. Consder any set of fve dstnct ponts {v 1, v 2, v 3, v 4, v 5 } R 2. Consder a rectangle that contans the ponts wth maxmum x-coordnate, mnmum x-coordnate, maxmum y-coordnate, and mnmum y-coordnate. These ponts may not be dstnct. However, there are at most four such ponts. Call ths set of ponts S {v 1, v 2, v 3, v 4, v 5 }. Any axs-algned rectangle that

contans S must also contan all of the ponts v 1, v 2, v 3, v 4, and v 5. There s at least one v that s not n S, but stll must be n the rectangle. Therefore, the labelng that labels all vertces n S wth and v wth cannot be consstent wth any axs-algned rectangle. Ths means that there s no shattered set of sze 5, snce all possble labelngs of a shattered set must be realzed by some concept. By a smlar argument, we can show that the VC-dmenson of axs-algned rectangles n R n s 2n. By generalzng the approach for provng that the VC-dmenson of the postve half nterval learnng problem s 1, one can show that the VC-dmenson of n 1 dmensonal hyperplanes n R n that pass through the orgn s n. Ths concepts are nequaltes of the form w x > 0 for any fxed w R n and varable x R n. In ths case, concepts label ponts wth f they are one sde of a hyperplane and otherwse. 2 Other remarks on VC-dmenson In the cases mentoned prevously, note that the VC-dmenson s smlar to the number of parameters needed to specfy any partcular concept. In the case of axs-algned rectangles, for example, they are equal snce rectangles requre a left boundary, a rght boundary, a top boundary, and a bottom boundary. Unfortunately, ths smlarty does not always hold, although t often does. There are some hypothess spaces wth nfnte VC-dmenson that can be specfed wth one parameter. Note that f H s fnte, the VC-dmenson s at most log 2 H, as at least 2 r dstnct hypotheses must exst to shatter a set of sze r. For a hypothess space wth nfnte VC-dmenson, there s a set of sze m that s shattered for any m > 0. Therefore, Π H (m 2 m, whch we mentoned last class as an ndcaton of a class that s hard to learn. In the next secton, we wll show that all classes wth bounded VC-dmenson d have Π H (m O(m d, completng the descrpton of PAC-learnablty by VC-dmenson. 3 Sauer s Lemma Recall that ( n k n! (n k!k! f 0 k n and ( n k 0 f k < 0 or k > n. k and n are ntegers and n s nonnegatve for our purposes. Note that ( n k O(n k when k s regarded as a postve constant. We wll show the followng lemma, whch mmedately mples the desred result: Lemma 3.1 (Sauer s Lemma. Let H be a hypothess wth fnte VC-dmenson d. Then, Π H (m d ( m : Φ d (m Proof. We wll prove ths by nducton on m d. There are two base cases: Case 1 (m 0. There s only one possble assgnment of and to the empty set,.e. Π H (m 1 here. Note that Φ d (0 ( 0 0 ( 0 1... ( 0 d 1, as desred. 2

Case 2 (d 0. Not even a sngle pont can be shattered n ths stuaton. Therefore, on any gven pont, all hypotheses have the same value. Therefore, there s only one possble hypothess and Π H (m 1. Ths agrees wth Φ, as Φ 0 (m ( m 0 1. Now, we wll prove the nducton step. For ths, we wll need Pascal s Identty, whch states that ( ( ( n n n 1 k k 1 k 1 for all ntegers n and k wth n 0. Consder a hypothess space H wth VC-dmenson d and a set of m examples S : {x 1, x 2,..., x m }. Let T : {x 1, x 2,..., x m 1 }. Form two hypothess spaces H 1 and H 2 on T as follows (an example s n Fgure 3. Let H 1 be the set of restrctons of hypotheses from H to T. Let h T denote the restrcton of h to T for h H,.e. the functon h T : T {, } such that h T (x h(x for all x T. An element ρ on T s added to H 2 f and only f there are two dstnct hypotheses h 1, h 2 H such that h 1 T h 2 T ρ. Note that Π H (S Π H1 (T Π H2 (T. What are the VC-dmensons of H 1 and H 2? Frst, note that the VC-dmenson of H 1 s at most d, as any shatterng set of sze d 1 n T s also a subset of S that s shattered by the elements of H, contradctng the fact that the VC-dmenson of H s d. Suppose that there s a set of sze d n T that s shattered by H 2. Snce every hypothess n H 2 s the restrcton of two dfferent hypotheses n H, x m can be added to the shattered set of sze d n T to obtan a set shattered by H of sze d 1. Ths s a contradcton, so the VC-dmenson of H 2 s at most d 1. By the nductve hypothess, Π H1 (m 1 Φ d (m 1. Smlarly, Π H2 (m 1 Φ d 1 (m 1. Combnng these two nequaltes shows that Π H (m Φ d (m 1 Φ d 1 (m 1 ( d ( m 1 d 1 ( m 1 j j0 ( m 1 d 1 (( m 1 0 ( m d 1 ( m 0 1 Φ d (m completng the nductve step. ( m 1 1 Often, the polynomal Φ d (m s hard to work wth. Instead, we often use the followng result: Lemma 3.2. Φ d (m (em/d d when m d 1. Proof. m d 1 mples that d m 1. Therefore, snce d n the summand, 3

H x 1 x 2 x 3 x 4 x 5 0 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1 0 0 1 0 1 0 0 1 1 1 1 0 0 1 H 1 x 1 x 2 x 3 x 4 0 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 H 2 x 1 x 2 x 3 x 4 0 1 1 0 1 0 0 1 Fgure 3: The constructon of H 1 and H 2 ( d m d d ( m d ( d ( m m ( 1 d m m e d Multplyng on both sdes by (m/d d on both sdes gves the desred result. Pluggng ths result nto the examples bound proven last class shows that ( ( 1 err(h O d ln m m d ln 1 δ We can also wrte ths n terms of the number of examples requred to learn: ( 1 m O (ln 1/δ d ln 1/ɛ ɛ Note that the number of examples requred to learn scales lnearly wth the VC-dmenson. 4 Lower bounds on learnng The bound proven n the prevous secton shows that the VC-dmenson of a hypothess space yelds an upper bound on the number of examples needed to learn. Lower bounds on the requred number of examples also exst. If the VC-dmenson of a hypothess space s d, there s a shattered set of sze d. Intutvely, any hypothess learned from a subset of sze at most d 1 cannot predct the value of the last element wth probablty better than 1/2. Ths suggests that at least Ω(d examples are requred to learn. In future classes, we wll prove the followng Theorem 4.1. For all learnng algorthms A, there s a concept c C and a dstrbuton D such that f A s gven m d/2 examples labeled by c and dstrbuted accordng to D, then Pr[err(h A > 1/8] 1 8 4

One can try to prove ths as follows. Choose a unform dstrbuton D on examples {z 1,..., z d } and run A on m d/2 examples. Call ths set of examples S. Label the elements of S arbtrarly wth and -. Suppose that c C s selected to be consstent wth all of the labels on S and c(x h A (x for all x / S. err D (h A 1 2 snce c agrees wth h A on at most (d/2/2 1/2 of the probablty mass of the doman, whch means that there s no PAC-learnng algorthm on d/2 examples. Ths proof s flawed, as c needs to be chosen before the examples. We wll dscuss a correct proof n future classes. 5