Nearest Neighbor Classification. The Nearest-Neighbor Rule Error Bounds k-nearest Neighbor Rule Computational Considerations

Size: px
Start display at page:

Download "Nearest Neighbor Classification. The Nearest-Neighbor Rule Error Bounds k-nearest Neighbor Rule Computational Considerations"

Transcription

1 Nearest Neighbor Classification The Nearest-Neighbor Rule Error Bounds k-nearest Neighbor Rule Computational Considerations

2 Example of Nearest Neighbor Rule Two class problem: yellow triangles and blue squares. Circle represents the unknown sample x and as its nearest neighbor comes from class θ 1, it is labeled as class θ 1. Figure 1: The NN rule CSE 555: Srihari 1

3 Example of k-nn rule with k = 3 There are two classes: yellow triangles and blue squares. The circle represents the unknown sample x and as two of its nearest neighbors come from class θ 2, it is labeled class θ 2. The number k should be: 1) large to minimize probability of misclassifying x. 2) small (with respect to no of samples) so that points are close enough to x to give an accurate estimate of the true class of x. CSE 555: Srihari 2

4 Nearest Neighbor and Voronoi Tesselation N-n classifier effectively partitions the feature space into cells consisting of all points closer to a given training point x than to any other training points. All points in such a cell are thus labeled by the category of the training point Voronoi tesselation of the space 2- dimensions 3- dimensions CSE 555: Srihari 3

5 Nearest Neighbor Rule Probability of Error Let D n = {x 1, x 2,, x n } be a set of n labeled prototypes Let x D n be the nearest prototype to a test point x The nearest-neighbor rule for classifying x is to assign it the label associated with x Nearest-neighbor rule is a sub-optimal procedure Does not yield the Bayes error rate Yet it is never worse than twice the Bayes error rate CSE 555: Srihari 4

6 Why does Nearest Neighbor rule work well? Label θ associated with nearest neighbor is a random variable Probability that θ = ω i is the a posteriori probability P(ω i x ) As n, it is always possible to find x sufficiently close so that: P(ω i x ) P(ω i x) Because this is exactly the probability that nature will be in state ω i the nearest neighbor rule is effectively matching probabilities with nature CSE 555: Srihari 5

7 Bayesian Probability of Error If we define ωm(x) by then the Bayes decision rule always selects ω m. From this the Bayesian condition probability of error is P* ( e x) = 1 P( ω x) m CSE 555: Srihari 6

8 Bayesian Probability of Error If we let P*(e x) be the minimum possible value of P(e x), and P* be the minimum possible value of P(e), then by averaging over the a priori distribution of x we get P* = P*( e x) p( x) dx = (1 P( ωm x)) p( x) dx CSE 555: Srihari 7

9 Evaluation of Nearest Neighbor Error If P n (e) is the n - sample error rate, and if Then we want to show that CSE 555: Srihari 8

10 Nearest-Neighbor Probability of Error The Random Variables Begin by looking at all the random variables in the construction of an x, x n, θ, θ n system. We denote θ as the true class of x and θ n as the labeled class of x n, where x n is the nearest neighbor of x. It is clear that x and its θ are random input parameters to the problem. Note that the underlying statistics of the labeled space are random too. Thus the x n, θ n pair are also unknown and thus random inputs. The probability of x having true class θ and that of x n being labeled θ n are independent. Thus we have CSE 555: Srihari 9

11 Expressing the Probability of Error CSE 555: Srihari 10

12 Convergence of Probability of Error Notice that as n approaches infinity the space of labeled items will become increasingly filled. Thus the nearest neighbor of x will become x n with probability 1. So we can say that: n lim P( e x, x n ) = n lim P( e x, x) = n lim P( e x) CSE 555: Srihari 11

13 Final Expression for Nearest-Neighbor Probability of Error CSE 555: Srihari 12

14 Bounds on the Conditional Probability of Error CSE 555: Srihari 13

15 Nearest Neighbor Error Bound Derivation CSE 555: Srihari 14

16 Error Bound Conclusion Error bounds are tight in that for any P* there exist Conditional and prior distributions for which the Bounds are achieved. CSE 555: Srihari 15

17 Bounds on nearest neighbor error rate in c-category problem Assuming Infinite Training data Possible Asymptotic Error rates CSE 555: Srihari 16

18 The k Nearest-Neighbor Rule Classify x by assigning it the label most frequently represented among the k nearest samples and use a voting scheme k = 3 CSE 555: Srihari 17

19 Analysis of k Nearest-Neighbor Rule Select w m if a majority of the k nearest neighbors are labeled w m, an event of probability It can be shown that if k is odd, the large-sample two-class error rate for the k-nearest-neighbor rule is bounded above by the function C k (P*), where C k (P*) is defined to be the smallest concave function of P* greater than CSE 555: Srihari 18

20 Bounds on Error Rate of k-nearest Neighbor Rule Bound is C k (P*) As k gets larger the error rate equals the Bayes rate k should be a small fraction of the total number of samples CSE 555: Srihari 19

21 Computational Complexity of k-nearest- Neighbor Rule Each Distance Calculation is O(d) Finding single nearest neighbor is O(n) Finding k nearest neighbors involves sorting; thus O(dn 2 ) Methods for speed-up: Parallelism Partial Distance Pre-structuring Editing, pruning or condensing CSE 555: Srihari 20

22 Parallel Implementation of k-nearest-neighbor Rule Constant time or O(1) in time and O(n) in space Classify as ω 1 if one of the cells says yes Three units corresponding to 3 cells associated with ω 1 Each box corresponds to a face of the cell and determines if x lies on its close or open side CSE 555: Srihari 21

23 Partial Distance Method of n-n speedup The partial distance based on r selected dimensions is Terminate a distance calculation once its partial distance is greater than the full r =d Euclidean distance to the current closest prototype CSE 555: Srihari 22

24 Search Tree Method of nn speedup Create a search tree where prototypes are selectively linked Consider only the prototypes linked to entry point Entry points Points in neighboring region may actually be closer Tradeoff of accuracy versus speed CSE 555: Srihari 23

25 Editing Method of nn speedup Eliminate Prototypes that are surrounded by training points of the same category Complexity is O(d 3 n d/2 ln n) CSE 555: Srihari 24

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

Classification Techniques (1)

Classification Techniques (1) 10 10 Overview Classification Techniques (1) Today Classification Problem Classification based on Regression Distance-based Classification (KNN) Net Lecture Decision Trees Classification using Rules Quality

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

CMPSCI611: Approximating MAX-CUT Lecture 20

CMPSCI611: Approximating MAX-CUT Lecture 20 CMPSCI611: Approximating MAX-CUT Lecture 20 For the next two lectures we ll be seeing examples of approximation algorithms for interesting NP-hard problems. Today we consider MAX-CUT, which we proved to

More information

4. How many integers between 2004 and 4002 are perfect squares?

4. How many integers between 2004 and 4002 are perfect squares? 5 is 0% of what number? What is the value of + 3 4 + 99 00? (alternating signs) 3 A frog is at the bottom of a well 0 feet deep It climbs up 3 feet every day, but slides back feet each night If it started

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

ALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite

ALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite ALGEBRA Pupils should be taught to: Generate and describe sequences As outcomes, Year 7 pupils should, for example: Use, read and write, spelling correctly: sequence, term, nth term, consecutive, rule,

More information

Stat 5102 Notes: Nonparametric Tests and. confidence interval

Stat 5102 Notes: Nonparametric Tests and. confidence interval Stat 510 Notes: Nonparametric Tests and Confidence Intervals Charles J. Geyer April 13, 003 This handout gives a brief introduction to nonparametrics, which is what you do when you don t believe the assumptions

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

I remember that when I

I remember that when I 8. Airthmetic and Geometric Sequences 45 8. ARITHMETIC AND GEOMETRIC SEQUENCES Whenever you tell me that mathematics is just a human invention like the game of chess I would like to believe you. But I

More information

Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction

Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction Oliver Sutton February, 2012 Contents 1 Introduction 1 1.1 Example........................................

More information

Validity Measure of Cluster Based On the Intra-Cluster and Inter-Cluster Distance

Validity Measure of Cluster Based On the Intra-Cluster and Inter-Cluster Distance International Journal of Electronics and Computer Science Engineering 2486 Available Online at www.ijecse.org ISSN- 2277-1956 Validity Measure of Cluster Based On the Intra-Cluster and Inter-Cluster Distance

More information

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Linear and quadratic Taylor polynomials for functions of several variables.

Linear and quadratic Taylor polynomials for functions of several variables. ams/econ 11b supplementary notes ucsc Linear quadratic Taylor polynomials for functions of several variables. c 010, Yonatan Katznelson Finding the extreme (minimum or maximum) values of a function, is

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

BALTIC OLYMPIAD IN INFORMATICS Stockholm, April 18-22, 2009 Page 1 of?? ENG rectangle. Rectangle

BALTIC OLYMPIAD IN INFORMATICS Stockholm, April 18-22, 2009 Page 1 of?? ENG rectangle. Rectangle Page 1 of?? ENG rectangle Rectangle Spoiler Solution of SQUARE For start, let s solve a similar looking easier task: find the area of the largest square. All we have to do is pick two points A and B and

More information

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

More information

Lies My Calculator and Computer Told Me

Lies My Calculator and Computer Told Me Lies My Calculator and Computer Told Me 2 LIES MY CALCULATOR AND COMPUTER TOLD ME Lies My Calculator and Computer Told Me See Section.4 for a discussion of graphing calculators and computers with graphing

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Prime Factorization 0.1. Overcoming Math Anxiety

Prime Factorization 0.1. Overcoming Math Anxiety 0.1 Prime Factorization 0.1 OBJECTIVES 1. Find the factors of a natural number 2. Determine whether a number is prime, composite, or neither 3. Find the prime factorization for a number 4. Find the GCF

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Geometry: Classifying, Identifying, and Constructing Triangles

Geometry: Classifying, Identifying, and Constructing Triangles Geometry: Classifying, Identifying, and Constructing Triangles Lesson Objectives Teacher's Notes Lesson Notes 1) Identify acute, right, and obtuse triangles. 2) Identify scalene, isosceles, equilateral

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Characteristics of the Four Main Geometrical Figures

Characteristics of the Four Main Geometrical Figures Math 40 9.7 & 9.8: The Big Four Square, Rectangle, Triangle, Circle Pre Algebra We will be focusing our attention on the formulas for the area and perimeter of a square, rectangle, triangle, and a circle.

More information

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

More information

Voronoi Treemaps in D3

Voronoi Treemaps in D3 Voronoi Treemaps in D3 Peter Henry University of Washington phenry@gmail.com Paul Vines University of Washington paul.l.vines@gmail.com ABSTRACT Voronoi treemaps are an alternative to traditional rectangular

More information

Towards running complex models on big data

Towards running complex models on big data Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Paper 1. Calculator not allowed. Mathematics test. First name. Last name. School. Remember KEY STAGE 3 TIER 5 7

Paper 1. Calculator not allowed. Mathematics test. First name. Last name. School. Remember KEY STAGE 3 TIER 5 7 Ma KEY STAGE 3 Mathematics test TIER 5 7 Paper 1 Calculator not allowed First name Last name School 2009 Remember The test is 1 hour long. You must not use a calculator for any question in this test. You

More information

Student Outcomes. Lesson Notes. Classwork. Exercises 1 3 (4 minutes)

Student Outcomes. Lesson Notes. Classwork. Exercises 1 3 (4 minutes) Student Outcomes Students give an informal derivation of the relationship between the circumference and area of a circle. Students know the formula for the area of a circle and use it to solve problems.

More information

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

More information

Session 6 Number Theory

Session 6 Number Theory Key Terms in This Session Session 6 Number Theory Previously Introduced counting numbers factor factor tree prime number New in This Session composite number greatest common factor least common multiple

More information

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina stefano.martina@stud.unifi.it

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina stefano.martina@stud.unifi.it Seminar Path planning using Voronoi diagrams and B-Splines Stefano Martina stefano.martina@stud.unifi.it 23 may 2016 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International

More information

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability Classification of Fingerprints Sarat C. Dass Department of Statistics & Probability Fingerprint Classification Fingerprint classification is a coarse level partitioning of a fingerprint database into smaller

More information

No Solution Equations Let s look at the following equation: 2 +3=2 +7

No Solution Equations Let s look at the following equation: 2 +3=2 +7 5.4 Solving Equations with Infinite or No Solutions So far we have looked at equations where there is exactly one solution. It is possible to have more than solution in other types of equations that are

More information

Lesson 3.1 Factors and Multiples of Whole Numbers Exercises (pages 140 141)

Lesson 3.1 Factors and Multiples of Whole Numbers Exercises (pages 140 141) Lesson 3.1 Factors and Multiples of Whole Numbers Exercises (pages 140 141) A 3. Multiply each number by 1, 2, 3, 4, 5, and 6. a) 6 1 = 6 6 2 = 12 6 3 = 18 6 4 = 24 6 5 = 30 6 6 = 36 So, the first 6 multiples

More information

LIES MY CALCULATOR AND COMPUTER TOLD ME

LIES MY CALCULATOR AND COMPUTER TOLD ME LIES MY CALCULATOR AND COMPUTER TOLD ME See Section Appendix.4 G for a discussion of graphing calculators and computers with graphing software. A wide variety of pocket-size calculating devices are currently

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm. Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

The positive minimum degree game on sparse graphs

The positive minimum degree game on sparse graphs The positive minimum degree game on sparse graphs József Balogh Department of Mathematical Sciences University of Illinois, USA jobal@math.uiuc.edu András Pluhár Department of Computer Science University

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

MATHS ACTIVITIES FOR REGISTRATION TIME

MATHS ACTIVITIES FOR REGISTRATION TIME MATHS ACTIVITIES FOR REGISTRATION TIME At the beginning of the year, pair children as partners. You could match different ability children for support. Target Number Write a target number on the board.

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

LCs for Binary Classification

LCs for Binary Classification Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it

More information

Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

More information

L4: Bayesian Decision Theory

L4: Bayesian Decision Theory L4: Bayesian Decision Theory Likelihood ratio test Probability of error Bayes risk Bayes, MAP and ML criteria Multi-class problems Discriminant functions CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Geometry. Higher Mathematics Courses 69. Geometry

Geometry. Higher Mathematics Courses 69. Geometry The fundamental purpose of the course is to formalize and extend students geometric experiences from the middle grades. This course includes standards from the conceptual categories of and Statistics and

More information

Paper 1. Calculator not allowed. Mathematics test. First name. Last name. School. Remember KEY STAGE 3 TIER 4 6

Paper 1. Calculator not allowed. Mathematics test. First name. Last name. School. Remember KEY STAGE 3 TIER 4 6 Ma KEY STAGE 3 Mathematics test TIER 4 6 Paper 1 Calculator not allowed First name Last name School 2009 Remember The test is 1 hour long. You must not use a calculator for any question in this test. You

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

10.2 Series and Convergence

10.2 Series and Convergence 10.2 Series and Convergence Write sums using sigma notation Find the partial sums of series and determine convergence or divergence of infinite series Find the N th partial sums of geometric series and

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

Paper 1. Calculator not allowed. Mathematics test. First name. Last name. School. Remember KEY STAGE 3 TIER 4 6

Paper 1. Calculator not allowed. Mathematics test. First name. Last name. School. Remember KEY STAGE 3 TIER 4 6 Ma KEY STAGE 3 Mathematics test TIER 4 6 Paper 1 Calculator not allowed First name Last name School 2007 Remember The test is 1 hour long. You must not use a calculator for any question in this test. You

More information

CS570 Data Mining Classification: Ensemble Methods

CS570 Data Mining Classification: Ensemble Methods CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification:

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Inv 1 5. Draw 2 different shapes, each with an area of 15 square units and perimeter of 16 units.

Inv 1 5. Draw 2 different shapes, each with an area of 15 square units and perimeter of 16 units. Covering and Surrounding: Homework Examples from ACE Investigation 1: Questions 5, 8, 21 Investigation 2: Questions 6, 7, 11, 27 Investigation 3: Questions 6, 8, 11 Investigation 5: Questions 15, 26 ACE

More information

Lesson 26: Reflection & Mirror Diagrams

Lesson 26: Reflection & Mirror Diagrams Lesson 26: Reflection & Mirror Diagrams The Law of Reflection There is nothing really mysterious about reflection, but some people try to make it more difficult than it really is. All EMR will reflect

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Definition and Calculus of Probability

Definition and Calculus of Probability In experiments with multivariate outcome variable, knowledge of the value of one variable may help predict another. For now, the word prediction will mean update the probabilities of events regarding the

More information

1 Review of Least Squares Solutions to Overdetermined Systems

1 Review of Least Squares Solutions to Overdetermined Systems cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares

More information

1 Error in Euler s Method

1 Error in Euler s Method 1 Error in Euler s Method Experience with Euler s 1 method raises some interesting questions about numerical approximations for the solutions of differential equations. 1. What determines the amount of

More information

Sentiment analysis using emoticons

Sentiment analysis using emoticons Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm. Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

More information

Scalar Valued Functions of Several Variables; the Gradient Vector

Scalar Valued Functions of Several Variables; the Gradient Vector Scalar Valued Functions of Several Variables; the Gradient Vector Scalar Valued Functions vector valued function of n variables: Let us consider a scalar (i.e., numerical, rather than y = φ(x = φ(x 1,

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

Category 3 Number Theory Meet #1, October, 2000

Category 3 Number Theory Meet #1, October, 2000 Category 3 Meet #1, October, 2000 1. For how many positive integral values of n will 168 n be a whole number? 2. What is the greatest integer that will always divide the product of four consecutive integers?

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Representation of functions as power series

Representation of functions as power series Representation of functions as power series Dr. Philippe B. Laval Kennesaw State University November 9, 008 Abstract This document is a summary of the theory and techniques used to represent functions

More information

Math 120 Final Exam Practice Problems, Form: A

Math 120 Final Exam Practice Problems, Form: A Math 120 Final Exam Practice Problems, Form: A Name: While every attempt was made to be complete in the types of problems given below, we make no guarantees about the completeness of the problems. Specifically,

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

Answer: The relationship cannot be determined.

Answer: The relationship cannot be determined. Question 1 Test 2, Second QR Section (version 3) In City X, the range of the daily low temperatures during... QA: The range of the daily low temperatures in City X... QB: 30 Fahrenheit Arithmetic: Ranges

More information

Claudio J. Tessone. Pau Amengual. Maxi San Miguel. Raúl Toral. Horacio Wio. Eur. Phys. J. B 39, 535 (2004) http://www.imedea.uib.

Claudio J. Tessone. Pau Amengual. Maxi San Miguel. Raúl Toral. Horacio Wio. Eur. Phys. J. B 39, 535 (2004) http://www.imedea.uib. Horacio Wio Raúl Toral Eur. Phys. J. B 39, 535 (2004) Claudio J. Tessone Pau Amengual Maxi San Miguel http://www.imedea.uib.es/physdept Models of Consensus vs. Polarization, or Segregation: Voter model,

More information

Lesson #13 Congruence, Symmetry and Transformations: Translations, Reflections, and Rotations

Lesson #13 Congruence, Symmetry and Transformations: Translations, Reflections, and Rotations Math Buddies -Grade 4 13-1 Lesson #13 Congruence, Symmetry and Transformations: Translations, Reflections, and Rotations Goal: Identify congruent and noncongruent figures Recognize the congruence of plane

More information

Closest Pair Problem

Closest Pair Problem Closest Pair Problem Given n points in d-dimensions, find two whose mutual distance is smallest. Fundamental problem in many applications as well as a key step in many algorithms. p q A naive algorithm

More information

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

!!!#$$%&'()*+$(,%!#$%$&'()*%(+,'-*&./#-$&'(-&(0*.$#-$1(2&.3$'45 !"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:

More information

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit? ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

More information

Cubes and Cube Roots

Cubes and Cube Roots CUBES AND CUBE ROOTS 109 Cubes and Cube Roots CHAPTER 7 7.1 Introduction This is a story about one of India s great mathematical geniuses, S. Ramanujan. Once another famous mathematician Prof. G.H. Hardy

More information

LINEAR EQUATIONS IN TWO VARIABLES

LINEAR EQUATIONS IN TWO VARIABLES 66 MATHEMATICS CHAPTER 4 LINEAR EQUATIONS IN TWO VARIABLES The principal use of the Analytic Art is to bring Mathematical Problems to Equations and to exhibit those Equations in the most simple terms that

More information

Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska. Classification Lecture Notes Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Scalable Bloom Filters

Scalable Bloom Filters Scalable Bloom Filters Paulo Sérgio Almeida Carlos Baquero Nuno Preguiça CCTC/Departamento de Informática Universidade do Minho CITI/Departamento de Informática FCT, Universidade Nova de Lisboa David Hutchison

More information

Classifying Quadrilaterals

Classifying Quadrilaterals 1 lassifying Quadrilaterals Identify and sort quadrilaterals. 1. Which of these are parallelograms?,, quadrilateral is a closed shape with 4 straight sides. trapezoid has exactly 1 pair of parallel sides.

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Reflection and Refraction

Reflection and Refraction Equipment Reflection and Refraction Acrylic block set, plane-concave-convex universal mirror, cork board, cork board stand, pins, flashlight, protractor, ruler, mirror worksheet, rectangular block worksheet,

More information

Random Forest Based Imbalanced Data Cleaning and Classification

Random Forest Based Imbalanced Data Cleaning and Classification Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem

More information

Chapter 9. Systems of Linear Equations

Chapter 9. Systems of Linear Equations Chapter 9. Systems of Linear Equations 9.1. Solve Systems of Linear Equations by Graphing KYOTE Standards: CR 21; CA 13 In this section we discuss how to solve systems of two linear equations in two variables

More information

MA 408 Computer Lab Two The Poincaré Disk Model of Hyperbolic Geometry. Figure 1: Lines in the Poincaré Disk Model

MA 408 Computer Lab Two The Poincaré Disk Model of Hyperbolic Geometry. Figure 1: Lines in the Poincaré Disk Model MA 408 Computer Lab Two The Poincaré Disk Model of Hyperbolic Geometry Put your name here: Score: Instructions: For this lab you will be using the applet, NonEuclid, created by Castellanos, Austin, Darnell,

More information

Math 115 Extra Problems for 5.5

Math 115 Extra Problems for 5.5 Math 115 Extra Problems for 5.5 1. The sum of two positive numbers is 48. What is the smallest possible value of the sum of their squares? Solution. Let x and y denote the two numbers, so that x + y 48.

More information