Nearest Neighbor Classification. The Nearest-Neighbor Rule Error Bounds k-nearest Neighbor Rule Computational Considerations

Size: px

Start display at page:

Download "Nearest Neighbor Classification. The Nearest-Neighbor Rule Error Bounds k-nearest Neighbor Rule Computational Considerations"

Neal Miles
7 years ago
Views:

1 Nearest Neighbor Classification The Nearest-Neighbor Rule Error Bounds k-nearest Neighbor Rule Computational Considerations

2 Example of Nearest Neighbor Rule Two class problem: yellow triangles and blue squares. Circle represents the unknown sample x and as its nearest neighbor comes from class θ 1, it is labeled as class θ 1. Figure 1: The NN rule CSE 555: Srihari 1

3 Example of k-nn rule with k = 3 There are two classes: yellow triangles and blue squares. The circle represents the unknown sample x and as two of its nearest neighbors come from class θ 2, it is labeled class θ 2. The number k should be: 1) large to minimize probability of misclassifying x. 2) small (with respect to no of samples) so that points are close enough to x to give an accurate estimate of the true class of x. CSE 555: Srihari 2

Nearest Neighbor and Voronoi Tesselation N-n classifier effectively partitions the feature space into cells consisting of all points closer to a given training point x than to any

4 Nearest Neighbor and Voronoi Tesselation N-n classifier effectively partitions the feature space into cells consisting of all points closer to a given training point x than to any other training points. All points in such a cell are thus labeled by the category of the training point Voronoi tesselation of the space 2- dimensions 3- dimensions CSE 555: Srihari 3

5 Nearest Neighbor Rule Probability of Error Let D n = {x 1, x 2,, x n } be a set of n labeled prototypes Let x D n be the nearest prototype to a test point x The nearest-neighbor rule for classifying x is to assign it the label associated with x Nearest-neighbor rule is a sub-optimal procedure Does not yield the Bayes error rate Yet it is never worse than twice the Bayes error rate CSE 555: Srihari 4

6 Why does Nearest Neighbor rule work well? Label θ associated with nearest neighbor is a random variable Probability that θ = ω i is the a posteriori probability P(ω i x ) As n, it is always possible to find x sufficiently close so that: P(ω i x ) P(ω i x) Because this is exactly the probability that nature will be in state ω i the nearest neighbor rule is effectively matching probabilities with nature CSE 555: Srihari 5

7 Bayesian Probability of Error If we define ωm(x) by then the Bayes decision rule always selects ω m. From this the Bayesian condition probability of error is P* ( e x) = 1 P( ω x) m CSE 555: Srihari 6

8 Bayesian Probability of Error If we let P*(e x) be the minimum possible value of P(e x), and P* be the minimum possible value of P(e), then by averaging over the a priori distribution of x we get P* = P*( e x) p( x) dx = (1 P( ωm x)) p( x) dx CSE 555: Srihari 7

9 Evaluation of Nearest Neighbor Error If P n (e) is the n - sample error rate, and if Then we want to show that CSE 555: Srihari 8

10 Nearest-Neighbor Probability of Error The Random Variables Begin by looking at all the random variables in the construction of an x, x n, θ, θ n system. We denote θ as the true class of x and θ n as the labeled class of x n, where x n is the nearest neighbor of x. It is clear that x and its θ are random input parameters to the problem. Note that the underlying statistics of the labeled space are random too. Thus the x n, θ n pair are also unknown and thus random inputs. The probability of x having true class θ and that of x n being labeled θ n are independent. Thus we have CSE 555: Srihari 9

11 Expressing the Probability of Error CSE 555: Srihari 10

12 Convergence of Probability of Error Notice that as n approaches infinity the space of labeled items will become increasingly filled. Thus the nearest neighbor of x will become x n with probability 1. So we can say that: n lim P( e x, x n ) = n lim P( e x, x) = n lim P( e x) CSE 555: Srihari 11

13 Final Expression for Nearest-Neighbor Probability of Error CSE 555: Srihari 12

14 Bounds on the Conditional Probability of Error CSE 555: Srihari 13

15 Nearest Neighbor Error Bound Derivation CSE 555: Srihari 14

16 Error Bound Conclusion Error bounds are tight in that for any P* there exist Conditional and prior distributions for which the Bounds are achieved. CSE 555: Srihari 15

17 Bounds on nearest neighbor error rate in c-category problem Assuming Infinite Training data Possible Asymptotic Error rates CSE 555: Srihari 16

18 The k Nearest-Neighbor Rule Classify x by assigning it the label most frequently represented among the k nearest samples and use a voting scheme k = 3 CSE 555: Srihari 17

19 Analysis of k Nearest-Neighbor Rule Select w m if a majority of the k nearest neighbors are labeled w m, an event of probability It can be shown that if k is odd, the large-sample two-class error rate for the k-nearest-neighbor rule is bounded above by the function C k (P*), where C k (P*) is defined to be the smallest concave function of P* greater than CSE 555: Srihari 18

20 Bounds on Error Rate of k-nearest Neighbor Rule Bound is C k (P*) As k gets larger the error rate equals the Bayes rate k should be a small fraction of the total number of samples CSE 555: Srihari 19

21 Computational Complexity of k-nearest- Neighbor Rule Each Distance Calculation is O(d) Finding single nearest neighbor is O(n) Finding k nearest neighbors involves sorting; thus O(dn 2 ) Methods for speed-up: Parallelism Partial Distance Pre-structuring Editing, pruning or condensing CSE 555: Srihari 20

22 Parallel Implementation of k-nearest-neighbor Rule Constant time or O(1) in time and O(n) in space Classify as ω 1 if one of the cells says yes Three units corresponding to 3 cells associated with ω 1 Each box corresponds to a face of the cell and determines if x lies on its close or open side CSE 555: Srihari 21

23 Partial Distance Method of n-n speedup The partial distance based on r selected dimensions is Terminate a distance calculation once its partial distance is greater than the full r =d Euclidean distance to the current closest prototype CSE 555: Srihari 22

24 Search Tree Method of nn speedup Create a search tree where prototypes are selectively linked Consider only the prototypes linked to entry point Entry points Points in neighboring region may actually be closer Tradeoff of accuracy versus speed CSE 555: Srihari 23

25 Editing Method of nn speedup Eliminate Prototypes that are surrounded by training points of the same category Complexity is O(d 3 n d/2 ln n) CSE 555: Srihari 24

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications