Nearest Neighbor Classification. The Nearest-Neighbor Rule Error Bounds k-nearest Neighbor Rule Computational Considerations
|
|
- Neal Miles
- 7 years ago
- Views:
Transcription
1 Nearest Neighbor Classification The Nearest-Neighbor Rule Error Bounds k-nearest Neighbor Rule Computational Considerations
2 Example of Nearest Neighbor Rule Two class problem: yellow triangles and blue squares. Circle represents the unknown sample x and as its nearest neighbor comes from class θ 1, it is labeled as class θ 1. Figure 1: The NN rule CSE 555: Srihari 1
3 Example of k-nn rule with k = 3 There are two classes: yellow triangles and blue squares. The circle represents the unknown sample x and as two of its nearest neighbors come from class θ 2, it is labeled class θ 2. The number k should be: 1) large to minimize probability of misclassifying x. 2) small (with respect to no of samples) so that points are close enough to x to give an accurate estimate of the true class of x. CSE 555: Srihari 2
4 Nearest Neighbor and Voronoi Tesselation N-n classifier effectively partitions the feature space into cells consisting of all points closer to a given training point x than to any other training points. All points in such a cell are thus labeled by the category of the training point Voronoi tesselation of the space 2- dimensions 3- dimensions CSE 555: Srihari 3
5 Nearest Neighbor Rule Probability of Error Let D n = {x 1, x 2,, x n } be a set of n labeled prototypes Let x D n be the nearest prototype to a test point x The nearest-neighbor rule for classifying x is to assign it the label associated with x Nearest-neighbor rule is a sub-optimal procedure Does not yield the Bayes error rate Yet it is never worse than twice the Bayes error rate CSE 555: Srihari 4
6 Why does Nearest Neighbor rule work well? Label θ associated with nearest neighbor is a random variable Probability that θ = ω i is the a posteriori probability P(ω i x ) As n, it is always possible to find x sufficiently close so that: P(ω i x ) P(ω i x) Because this is exactly the probability that nature will be in state ω i the nearest neighbor rule is effectively matching probabilities with nature CSE 555: Srihari 5
7 Bayesian Probability of Error If we define ωm(x) by then the Bayes decision rule always selects ω m. From this the Bayesian condition probability of error is P* ( e x) = 1 P( ω x) m CSE 555: Srihari 6
8 Bayesian Probability of Error If we let P*(e x) be the minimum possible value of P(e x), and P* be the minimum possible value of P(e), then by averaging over the a priori distribution of x we get P* = P*( e x) p( x) dx = (1 P( ωm x)) p( x) dx CSE 555: Srihari 7
9 Evaluation of Nearest Neighbor Error If P n (e) is the n - sample error rate, and if Then we want to show that CSE 555: Srihari 8
10 Nearest-Neighbor Probability of Error The Random Variables Begin by looking at all the random variables in the construction of an x, x n, θ, θ n system. We denote θ as the true class of x and θ n as the labeled class of x n, where x n is the nearest neighbor of x. It is clear that x and its θ are random input parameters to the problem. Note that the underlying statistics of the labeled space are random too. Thus the x n, θ n pair are also unknown and thus random inputs. The probability of x having true class θ and that of x n being labeled θ n are independent. Thus we have CSE 555: Srihari 9
11 Expressing the Probability of Error CSE 555: Srihari 10
12 Convergence of Probability of Error Notice that as n approaches infinity the space of labeled items will become increasingly filled. Thus the nearest neighbor of x will become x n with probability 1. So we can say that: n lim P( e x, x n ) = n lim P( e x, x) = n lim P( e x) CSE 555: Srihari 11
13 Final Expression for Nearest-Neighbor Probability of Error CSE 555: Srihari 12
14 Bounds on the Conditional Probability of Error CSE 555: Srihari 13
15 Nearest Neighbor Error Bound Derivation CSE 555: Srihari 14
16 Error Bound Conclusion Error bounds are tight in that for any P* there exist Conditional and prior distributions for which the Bounds are achieved. CSE 555: Srihari 15
17 Bounds on nearest neighbor error rate in c-category problem Assuming Infinite Training data Possible Asymptotic Error rates CSE 555: Srihari 16
18 The k Nearest-Neighbor Rule Classify x by assigning it the label most frequently represented among the k nearest samples and use a voting scheme k = 3 CSE 555: Srihari 17
19 Analysis of k Nearest-Neighbor Rule Select w m if a majority of the k nearest neighbors are labeled w m, an event of probability It can be shown that if k is odd, the large-sample two-class error rate for the k-nearest-neighbor rule is bounded above by the function C k (P*), where C k (P*) is defined to be the smallest concave function of P* greater than CSE 555: Srihari 18
20 Bounds on Error Rate of k-nearest Neighbor Rule Bound is C k (P*) As k gets larger the error rate equals the Bayes rate k should be a small fraction of the total number of samples CSE 555: Srihari 19
21 Computational Complexity of k-nearest- Neighbor Rule Each Distance Calculation is O(d) Finding single nearest neighbor is O(n) Finding k nearest neighbors involves sorting; thus O(dn 2 ) Methods for speed-up: Parallelism Partial Distance Pre-structuring Editing, pruning or condensing CSE 555: Srihari 20
22 Parallel Implementation of k-nearest-neighbor Rule Constant time or O(1) in time and O(n) in space Classify as ω 1 if one of the cells says yes Three units corresponding to 3 cells associated with ω 1 Each box corresponds to a face of the cell and determines if x lies on its close or open side CSE 555: Srihari 21
23 Partial Distance Method of n-n speedup The partial distance based on r selected dimensions is Terminate a distance calculation once its partial distance is greater than the full r =d Euclidean distance to the current closest prototype CSE 555: Srihari 22
24 Search Tree Method of nn speedup Create a search tree where prototypes are selectively linked Consider only the prototypes linked to entry point Entry points Points in neighboring region may actually be closer Tradeoff of accuracy versus speed CSE 555: Srihari 23
25 Editing Method of nn speedup Eliminate Prototypes that are surrounded by training points of the same category Complexity is O(d 3 n d/2 ln n) CSE 555: Srihari 24
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationClassification Techniques (1)
10 10 Overview Classification Techniques (1) Today Classification Problem Classification based on Regression Distance-based Classification (KNN) Net Lecture Decision Trees Classification using Rules Quality
More informationClustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationCMPSCI611: Approximating MAX-CUT Lecture 20
CMPSCI611: Approximating MAX-CUT Lecture 20 For the next two lectures we ll be seeing examples of approximation algorithms for interesting NP-hard problems. Today we consider MAX-CUT, which we proved to
More information4. How many integers between 2004 and 4002 are perfect squares?
5 is 0% of what number? What is the value of + 3 4 + 99 00? (alternating signs) 3 A frog is at the bottom of a well 0 feet deep It climbs up 3 feet every day, but slides back feet each night If it started
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite
ALGEBRA Pupils should be taught to: Generate and describe sequences As outcomes, Year 7 pupils should, for example: Use, read and write, spelling correctly: sequence, term, nth term, consecutive, rule,
More informationStat 5102 Notes: Nonparametric Tests and. confidence interval
Stat 510 Notes: Nonparametric Tests and Confidence Intervals Charles J. Geyer April 13, 003 This handout gives a brief introduction to nonparametrics, which is what you do when you don t believe the assumptions
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
More informationI remember that when I
8. Airthmetic and Geometric Sequences 45 8. ARITHMETIC AND GEOMETRIC SEQUENCES Whenever you tell me that mathematics is just a human invention like the game of chess I would like to believe you. But I
More informationIntroduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction
Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction Oliver Sutton February, 2012 Contents 1 Introduction 1 1.1 Example........................................
More informationValidity Measure of Cluster Based On the Intra-Cluster and Inter-Cluster Distance
International Journal of Electronics and Computer Science Engineering 2486 Available Online at www.ijecse.org ISSN- 2277-1956 Validity Measure of Cluster Based On the Intra-Cluster and Inter-Cluster Distance
More informationReference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationLinear and quadratic Taylor polynomials for functions of several variables.
ams/econ 11b supplementary notes ucsc Linear quadratic Taylor polynomials for functions of several variables. c 010, Yonatan Katznelson Finding the extreme (minimum or maximum) values of a function, is
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationBALTIC OLYMPIAD IN INFORMATICS Stockholm, April 18-22, 2009 Page 1 of?? ENG rectangle. Rectangle
Page 1 of?? ENG rectangle Rectangle Spoiler Solution of SQUARE For start, let s solve a similar looking easier task: find the area of the largest square. All we have to do is pick two points A and B and
More informationMachine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
More informationLies My Calculator and Computer Told Me
Lies My Calculator and Computer Told Me 2 LIES MY CALCULATOR AND COMPUTER TOLD ME Lies My Calculator and Computer Told Me See Section.4 for a discussion of graphing calculators and computers with graphing
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationPrime Factorization 0.1. Overcoming Math Anxiety
0.1 Prime Factorization 0.1 OBJECTIVES 1. Find the factors of a natural number 2. Determine whether a number is prime, composite, or neither 3. Find the prime factorization for a number 4. Find the GCF
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationGeometry: Classifying, Identifying, and Constructing Triangles
Geometry: Classifying, Identifying, and Constructing Triangles Lesson Objectives Teacher's Notes Lesson Notes 1) Identify acute, right, and obtuse triangles. 2) Identify scalene, isosceles, equilateral
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationCharacteristics of the Four Main Geometrical Figures
Math 40 9.7 & 9.8: The Big Four Square, Rectangle, Triangle, Circle Pre Algebra We will be focusing our attention on the formulas for the area and perimeter of a square, rectangle, triangle, and a circle.
More informationIntroduction to nonparametric regression: Least squares vs. Nearest neighbors
Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,
More informationVoronoi Treemaps in D3
Voronoi Treemaps in D3 Peter Henry University of Washington phenry@gmail.com Paul Vines University of Washington paul.l.vines@gmail.com ABSTRACT Voronoi treemaps are an alternative to traditional rectangular
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationPaper 1. Calculator not allowed. Mathematics test. First name. Last name. School. Remember KEY STAGE 3 TIER 5 7
Ma KEY STAGE 3 Mathematics test TIER 5 7 Paper 1 Calculator not allowed First name Last name School 2009 Remember The test is 1 hour long. You must not use a calculator for any question in this test. You
More informationStudent Outcomes. Lesson Notes. Classwork. Exercises 1 3 (4 minutes)
Student Outcomes Students give an informal derivation of the relationship between the circumference and area of a circle. Students know the formula for the area of a circle and use it to solve problems.
More informationChapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one
More informationSession 6 Number Theory
Key Terms in This Session Session 6 Number Theory Previously Introduced counting numbers factor factor tree prime number New in This Session composite number greatest common factor least common multiple
More informationSeminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina stefano.martina@stud.unifi.it
Seminar Path planning using Voronoi diagrams and B-Splines Stefano Martina stefano.martina@stud.unifi.it 23 may 2016 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International
More informationClassification of Fingerprints. Sarat C. Dass Department of Statistics & Probability
Classification of Fingerprints Sarat C. Dass Department of Statistics & Probability Fingerprint Classification Fingerprint classification is a coarse level partitioning of a fingerprint database into smaller
More informationNo Solution Equations Let s look at the following equation: 2 +3=2 +7
5.4 Solving Equations with Infinite or No Solutions So far we have looked at equations where there is exactly one solution. It is possible to have more than solution in other types of equations that are
More informationLesson 3.1 Factors and Multiples of Whole Numbers Exercises (pages 140 141)
Lesson 3.1 Factors and Multiples of Whole Numbers Exercises (pages 140 141) A 3. Multiply each number by 1, 2, 3, 4, 5, and 6. a) 6 1 = 6 6 2 = 12 6 3 = 18 6 4 = 24 6 5 = 30 6 6 = 36 So, the first 6 multiples
More informationLIES MY CALCULATOR AND COMPUTER TOLD ME
LIES MY CALCULATOR AND COMPUTER TOLD ME See Section Appendix.4 G for a discussion of graphing calculators and computers with graphing software. A wide variety of pocket-size calculating devices are currently
More information! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.
Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationThe positive minimum degree game on sparse graphs
The positive minimum degree game on sparse graphs József Balogh Department of Mathematical Sciences University of Illinois, USA jobal@math.uiuc.edu András Pluhár Department of Computer Science University
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationMATHS ACTIVITIES FOR REGISTRATION TIME
MATHS ACTIVITIES FOR REGISTRATION TIME At the beginning of the year, pair children as partners. You could match different ability children for support. Target Number Write a target number on the board.
More informationWeek 4: Standard Error and Confidence Intervals
Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.
More informationLCs for Binary Classification
Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it
More informationEmail Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
More informationL4: Bayesian Decision Theory
L4: Bayesian Decision Theory Likelihood ratio test Probability of error Bayes risk Bayes, MAP and ML criteria Multi-class problems Discriminant functions CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationGeometry. Higher Mathematics Courses 69. Geometry
The fundamental purpose of the course is to formalize and extend students geometric experiences from the middle grades. This course includes standards from the conceptual categories of and Statistics and
More informationPaper 1. Calculator not allowed. Mathematics test. First name. Last name. School. Remember KEY STAGE 3 TIER 4 6
Ma KEY STAGE 3 Mathematics test TIER 4 6 Paper 1 Calculator not allowed First name Last name School 2009 Remember The test is 1 hour long. You must not use a calculator for any question in this test. You
More informationClustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
More information10.2 Series and Convergence
10.2 Series and Convergence Write sums using sigma notation Find the partial sums of series and determine convergence or divergence of infinite series Find the N th partial sums of geometric series and
More informationMachine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
More informationPaper 1. Calculator not allowed. Mathematics test. First name. Last name. School. Remember KEY STAGE 3 TIER 4 6
Ma KEY STAGE 3 Mathematics test TIER 4 6 Paper 1 Calculator not allowed First name Last name School 2007 Remember The test is 1 hour long. You must not use a calculator for any question in this test. You
More informationCS570 Data Mining Classification: Ensemble Methods
CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification:
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationInv 1 5. Draw 2 different shapes, each with an area of 15 square units and perimeter of 16 units.
Covering and Surrounding: Homework Examples from ACE Investigation 1: Questions 5, 8, 21 Investigation 2: Questions 6, 7, 11, 27 Investigation 3: Questions 6, 8, 11 Investigation 5: Questions 15, 26 ACE
More informationLesson 26: Reflection & Mirror Diagrams
Lesson 26: Reflection & Mirror Diagrams The Law of Reflection There is nothing really mysterious about reflection, but some people try to make it more difficult than it really is. All EMR will reflect
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationDefinition and Calculus of Probability
In experiments with multivariate outcome variable, knowledge of the value of one variable may help predict another. For now, the word prediction will mean update the probabilities of events regarding the
More information1 Review of Least Squares Solutions to Overdetermined Systems
cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares
More information1 Error in Euler s Method
1 Error in Euler s Method Experience with Euler s 1 method raises some interesting questions about numerical approximations for the solutions of differential equations. 1. What determines the amount of
More informationSentiment analysis using emoticons
Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More information! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.
Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of
More informationScalar Valued Functions of Several Variables; the Gradient Vector
Scalar Valued Functions of Several Variables; the Gradient Vector Scalar Valued Functions vector valued function of n variables: Let us consider a scalar (i.e., numerical, rather than y = φ(x = φ(x 1,
More informationLocal classification and local likelihoods
Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor
More informationCategory 3 Number Theory Meet #1, October, 2000
Category 3 Meet #1, October, 2000 1. For how many positive integral values of n will 168 n be a whole number? 2. What is the greatest integer that will always divide the product of four consecutive integers?
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationRepresentation of functions as power series
Representation of functions as power series Dr. Philippe B. Laval Kennesaw State University November 9, 008 Abstract This document is a summary of the theory and techniques used to represent functions
More informationMath 120 Final Exam Practice Problems, Form: A
Math 120 Final Exam Practice Problems, Form: A Name: While every attempt was made to be complete in the types of problems given below, we make no guarantees about the completeness of the problems. Specifically,
More informationINDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem
More informationAnswer: The relationship cannot be determined.
Question 1 Test 2, Second QR Section (version 3) In City X, the range of the daily low temperatures during... QA: The range of the daily low temperatures in City X... QB: 30 Fahrenheit Arithmetic: Ranges
More informationClaudio J. Tessone. Pau Amengual. Maxi San Miguel. Raúl Toral. Horacio Wio. Eur. Phys. J. B 39, 535 (2004) http://www.imedea.uib.
Horacio Wio Raúl Toral Eur. Phys. J. B 39, 535 (2004) Claudio J. Tessone Pau Amengual Maxi San Miguel http://www.imedea.uib.es/physdept Models of Consensus vs. Polarization, or Segregation: Voter model,
More informationLesson #13 Congruence, Symmetry and Transformations: Translations, Reflections, and Rotations
Math Buddies -Grade 4 13-1 Lesson #13 Congruence, Symmetry and Transformations: Translations, Reflections, and Rotations Goal: Identify congruent and noncongruent figures Recognize the congruence of plane
More informationClosest Pair Problem
Closest Pair Problem Given n points in d-dimensions, find two whose mutual distance is smallest. Fundamental problem in many applications as well as a key step in many algorithms. p q A naive algorithm
More information!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
More informationQuestion: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?
ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the
More informationCubes and Cube Roots
CUBES AND CUBE ROOTS 109 Cubes and Cube Roots CHAPTER 7 7.1 Introduction This is a story about one of India s great mathematical geniuses, S. Ramanujan. Once another famous mathematician Prof. G.H. Hardy
More informationLINEAR EQUATIONS IN TWO VARIABLES
66 MATHEMATICS CHAPTER 4 LINEAR EQUATIONS IN TWO VARIABLES The principal use of the Analytic Art is to bring Mathematical Problems to Equations and to exhibit those Equations in the most simple terms that
More informationProfessor Anita Wasilewska. Classification Lecture Notes
Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationScalable Bloom Filters
Scalable Bloom Filters Paulo Sérgio Almeida Carlos Baquero Nuno Preguiça CCTC/Departamento de Informática Universidade do Minho CITI/Departamento de Informática FCT, Universidade Nova de Lisboa David Hutchison
More informationClassifying Quadrilaterals
1 lassifying Quadrilaterals Identify and sort quadrilaterals. 1. Which of these are parallelograms?,, quadrilateral is a closed shape with 4 straight sides. trapezoid has exactly 1 pair of parallel sides.
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationReflection and Refraction
Equipment Reflection and Refraction Acrylic block set, plane-concave-convex universal mirror, cork board, cork board stand, pins, flashlight, protractor, ruler, mirror worksheet, rectangular block worksheet,
More informationRandom Forest Based Imbalanced Data Cleaning and Classification
Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem
More informationChapter 9. Systems of Linear Equations
Chapter 9. Systems of Linear Equations 9.1. Solve Systems of Linear Equations by Graphing KYOTE Standards: CR 21; CA 13 In this section we discuss how to solve systems of two linear equations in two variables
More informationMA 408 Computer Lab Two The Poincaré Disk Model of Hyperbolic Geometry. Figure 1: Lines in the Poincaré Disk Model
MA 408 Computer Lab Two The Poincaré Disk Model of Hyperbolic Geometry Put your name here: Score: Instructions: For this lab you will be using the applet, NonEuclid, created by Castellanos, Austin, Darnell,
More informationMath 115 Extra Problems for 5.5
Math 115 Extra Problems for 5.5 1. The sum of two positive numbers is 48. What is the smallest possible value of the sum of their squares? Solution. Let x and y denote the two numbers, so that x + y 48.
More information