Prototype based methods: Mathematical foundations, interpretability, and data visualization

Size: px
Start display at page:

Download "Prototype based methods: Mathematical foundations, interpretability, and data visualization"

Transcription

1 Prototype based methods: Mathematical foundations, interpretability, and data visualization Barbara Hammer, Xibin Zhu CITEC Centre of Excellence Bielefeld University ijcnn14_tutorial.html

2

3 Why LVQ? [Machine Learning that Matters, Kiri L. Wagstaff, ICML 2012]... of 152 non-cross-conference papers published at ICML 2011:! there is a need for machine learning techniques which facilitate a direct interpretation of the results

4 Why LVQ?! LVQ is a prime example of a Machine Learning model which is intuitive and interpretable! but classical LVQ is a mere heuristic! This Tutorial: modern LVQ variants and their mathematics

5 Prototypes! prototypes are points in the data space:! which decompose the space into receptive fields:! induce a classification ~w i 2 R n R( ~w i )={~x k ~w i ~xk 2 applek~w j ~xk 2 8j 6= i}

6 Prototypes! prototypes offer a sparse encoding! prototypes represent data! manual inspection possible

7 Prototypes WSOM 2005, Paris 7

8 Prototypes WSOM 2005, Paris 8

9 Prototype learning! supervised: classes are known a priori: training set: P = {(~x i,y i ) i =1,...,p} R n {1,...,C}! LVQ, GLVQ, RSLVQ,...! unsupervised: clusters are not known priorly! NG, GTM, AP,...!... usually solid mathematical foundation available

10 LVQ Learning vector quantization [Kohonen, 1988] init positions of ~w j, labels are c( ~w j ) repeat: pick data point (~x i,y i ) randomly determine winner ~w I if y i = c( ~w I ): ~w I (~x i ~w I ) otherwise: ~w I (~x i ~w I )

11 LVQ LVQ 2.1 [Kohonen, 1990] init positions of ~w j, labels are c( ~w j ) repeat: pick data point (~x i,y i ) randomly determine closest prototype with y i = c( ~w + ): ~w + determine closest prototype with y i 6= c( ~w ): ~w if prototypes fall into a window around decision boundary: ~w + (~x i ~w + ) ~w (~x i ~w )

12

13 Cognitive Interaction Technology Center of Excellence Online detection of faults sensors

14 [T.Bojer et al., 2003] Cognitive Interaction Technology Center of Excellence Online detection of faults Setting: high dim. features few training data online training LVQ: close to 100% accuracy prototypes can be stored can be inspected

15 Clinical proteomics unhappy because possibly ill.. take serum put into mass spectrometer observe a characteristic spectrum which tells us more about the peptides in the serum

16 [F.-M.Schleif et al., 2009] Cognitive Interaction Technology Center of Excellence Clinical proteomics prostate cancer [National Cancer Institute, Prostate Cancer Dataset, l]:! 318 examples, SELDI-TOF from blood serum, 130 dim after preprocessing (normalization, peak detection)! 2 classes (healthy versus cancer in different states) LVQ GRLVQ SVM 62.5% 93.7% 92.7%

17 Steroid metabolomics unhappy because possibly ill.. extract steroid markers (32 selected steorid metabolites) by means of GC/MS take serum ACC / ACA

18 [W.Arlt, M.Biehl et al, 2011] Cognitive Interaction Technology Center of Excellence Steroid metabolomics

19 [S.Kirstein, H.Wersing, H.-M.Gross, E.Koerner, 2012] Cognitive Interaction Technology Center of Excellence Object recognition

20 Take home message! LVQ offers an intuitive classifier with high potential for industrial applications! interpretability of the technique is a big plus

21 LVQ code! lvq PAK ( only basic versions! included in popular software such as WEKA: only basic versions! SOM toolbox ( also GLVQ, matrix learning! mloss: also GLVQ, matrix learning! see also material at tutorial web site in particular for advanced versions as covered in the following:

22

23 LVQ! LVQ 1 does not have a valid cost function: X f LV Q (d +,d ) where d ± =(~x i ~w ± ) 2 squared distance to closest correct / wrong prototype and i f LV Q (a, b) = a b if a apple b else

24 LVQ2.1! LVQ2.1 has a valid cost function: X f LV Q2.1 (d +,d ) where d ± =(~x i ~w ± ) 2 squared distance to closest correct / wrong prototype and f LV Q2.1 (a, b) = window (a b) But this is unbounded! i

25 LVQ2.1! behavior without window in simple model situations: generalization error of LVQ depending on its initialization in simple model setting: result can be far from optimum [Biehl,Ghosh,Hammer,2007] (p + > p - )! so tricky choice of window necessary... (p - )

26 More reasonable cost function for LVQ! based on margin maximization: GLVQ [Sato/Yamada 1996, Hammer/Villmann 2002, Crammer et al 2002, Schneider et al. 2009]! based on probabilistic modeling: RSLVQ [Seo/Obermayer 2003]

27 Colt for LVQ in a nutshell! function class F given by possible LVQ-networks! training data (x i,y i )! machine learner! LVQ-function f in F! often: f(x i ) = y i for training points (i.e. small empirical error)! desired: P(f(x) = y) should be large (i.e. small real error)

28 Colt for LVQ in a nutshell safe classification insecure classification! (hypothesis) margin of x i : m(x i ) = d - - d + where d + / d - is the squared distance to closest correct / wrong prototype! mathematics! error is bounded by: E/m + O( p 2 (B 3 ln 1/δ) 1/2 ) / (ρm 1/2 )) good bounds for few training errors and large margin + does not include dimensionality where E = number of misclassified training data with margin smaller than ρ (including errors) δ = confidence m = number of examples, B = support, p = number of prototypes

29 Colt for LVQ in a nutshell safe classification insecure classification! (hypothesis) margin of x i : m(x i ) = d - - d + where d + / d - is the squared distance to closest correct / wrong prototype! mathematics! error is bounded by: good bounds for few training errors and large margin data with E/m (too) + O( p 2 (B 3 term ln 1/δ) / margin 1/2 ) / (ρm 1/2 )) small margin where E = number of misclassified training data with margin smaller than ρ (including errors) δ = confidence m = number of examples, B = support, p = number of prototypes + does not include dimensionality

30 Margin maximization! mathematical objective: maximize margin maximize margin

31 Margin maximization! mathematical objective: unbounded min P i d (~x i) d + (~x i )

32 Margin maximization! mathematical objective: minimize Σ i (d + (x i ) d - (x i )) / (d + (x i ) + d - (x i )) min X i d (~x i ) d + (~x i ) d (~x i )+d + (~x i )

33 [Sato/Yamada 1996] Cognitive Interaction Technology Center of Excellence Generalized LVQ (GLVQ) derivatives GLVQ

34 Generalized LVQ (GLVQ) derivatives GLVQ

35 Generalized LVQ (GLVQ) derivatives scaling LVQ2.1 GLVQ

36 Probabilsitic modeling! Mixture of Gaussians with labels

37 Robust soft LVQ (RSLVQ)

38 RSLVQ Cognitive Interaction Technology Center of Excellence

39 RSLVQ Cognitive Interaction Technology Center of Excellence

40 Prototype locations Cognitive Interaction Technology Center of Excellence

41 Take home! LVQ can be substantiated by large margin generalization bounds (independent of dimensionality)! LVQ can be based on cost functions:! probabilistic modeling! excellent results! bandwidth is very crititcal parameter (crisp limit does not perform well)! prototypes not always representative! margin maximization! very good results! parameters not critical! prototypes are representative for data! enables stable training, principled mathematical modelling

42

43 Why metric learning? Example: acceptance of papers at some conference L - layout, T - technical quality, I - interesting subject, F - famous author, S appropriate subject, Q - overall quality, P - author registers for conference, E - appropriate length, B - likes beer, P - looks pretty, G - gives good talks, K - knows programm committee, M - member of programm committee, C - special session, R - has red hairs

44 Why metric learning?! data are usually represented by feature vectors! feature vectors are compared using Euclidean distance! but this might tell you nothing useful smell head belly human (42,42,42,0,...) (41,43,44,1,...) (-41,43,44,1,...)

45 Why metric learning?

46 Metric parameterization

47 Metric learning: G relevance LVQ! mathematical objective: minimize Σ i (d λ + (x i ) d λ- (x i )) / (d λ+ (x i ) + d λ- (x i )) where d λ (x,y) = Σ l λ l (x l -y l ) 2 normalize the relevance terms relevance learning

48 GRLVQ! mathematical objective: min Σ i (d λ + (x i ) d λ- (x i )) / (d λ+ (x i ) + d λ- (x i )) derivatives intuitive, fast, well founded, flexible, suited for large dimensions

49 GRLVQ! mathematical objective: min Σ i (d λ + (x i ) d λ- (x i )) / (d λ+ (x i ) + d λ- (x i )) derivatives scaling LVQ2.1 relevance update intuitive, fast, well founded, flexible, suited for large dimensions

50 GRLVQ 2D data embedded in 10 D with noise/noisy copies

51 Generalized Matrix LVQ (GMLVQ) Substitute metric by general quadratic form:

52 LGMLVQ Cognitive Interaction Technology Center of Excellence

53 UCI benchmarks... Cognitive Interaction Technology Center of Excellence

54 [W.Arlt, M.Biehl et al, 2011] Cognitive Interaction Technology Center of Excellence Interpretability: Steroid metabolomics

55

56 GMLVQ yields (local) matrices, i.e. (local) scaling and rotations of the space GRLVQ: global scaling GMLVQ: global scaling and rotation LGMLVQ: local scaling and rotation

57 GMLVQ! GMLVQ with positiv semidefinite matrices: * = quadratic complexity w.r.t data dimensionality

58 Low rank GMLVQ! GMLVQ with positiv semidefinite low rank matrices matrices: * = linear complexity w.r.t data dimensionality equivalent to full version (if data are intrincically low dimensional)

59 Low rank GMLVQ Cognitive Interaction Technology Center of Excellence

60 [Bunte et al. 2012] Cognitive Interaction Technology Center of Excellence LiRamLVQ glob al local global local * = induces global projection: glob al f: x " * x

61 Discriminative visualization Example: USPS digits

62

63 Stationary solutions of GMLVQ! assume fixed receptive fields, what is the optimum metric?! update of matrix has the form (prefactor indicates sign): (x centered in prototype) plus normalization! similar to van Mises iteration! converges to first eigenvector of! in particular convergence to low rank matrix!

64 Stationary solution contributes with + contributes with -

65

66 Interpretation of matrix terms high medium low alcohol content infra-red spectral data: 124 wine spamples 256 wavelengths 30 training data 94 test spectra

67 Interpretation of matrix terms! often: diagonal terms are interpreted as relevance! problem: for high dimensional data holds for all matrices with differences in the null space of C = XX t

68 Interpretation of matrix terms! dividing out null space yields the profile! direct interpretation of relevance profile misleading for high dim data, get rid of null space first!

69 Interpretation of matrix terms GMLVQ over-fitting effect best performance 7 dimensions remaining null-space correction P=30 dimensions

70 Take home! metric adaptation:! increases accuracy! does not deteriorating its generalization ability! low rank matrix:! allows efficient training! data visualization! no restriction as compared to optimum metric! intrepretation:! by looking at feature weighting,! for high dimensionali data, normalization is necessary

71 Schneider, Biehl, Hammer...matrix learning is cool! Neural Computation 2009

72

73 Dissimilarity or similarity data! feature extraction " vectorial data size softness color curvature... " (20,7,...)! pairwise (dis)similarity measurement " (dis)similarity matrix

74 (Dis)similarity data! (dis)similarity measures, e.g.: 1.Alignment 2.Normalized Compression Distance 3.Graph structure kernels 4. GTTACAGGT GGTACACGT GTGACAAGT

75 LVQ for dis-/similarities! kernel GLVQ (Suganthan et al.)! differentiable kernel GLVQ (Villmann et al.)! relational GLVQ/SRLVQ (Xibin et al.)! kernel SRLVQ (Hofmann et al.)!...

76 Relational GLVQ Cognitive Interaction Technology Center of Excellence Assumption: Prototypes are expressed as linear combinations w i = α j ij x j where Fact: for every symmetric bilinear form and linear representation as above we find 2 x j w i = (D α i ) j 1 α T 2 i D α i Method: Substitute all terms x j w i in original methods and use

77 Relational GLVQ assume prototypes have the form then GLVQ costs become "... ugly formulas

78 Benchmark data Cognitive Interaction Technology Center of Excellence

79 Similarities/dissimilarities euclid general k~x i ~x j k 2 d ij = d(x i,x j ) h~x i, ~x j i s ij = s(x i,x j ) assumption: symmetric: d ij = d ji s ij = s ji zero diagonal: d ii =0 normalization of s is possible: s ii =1

80 Similarities/dissimilarities euclid general k~x i ~x j k 2 d ij = d(x i,x j ) h~x i, ~x j i s ij = s(x i,x j ) d ij = s ii 2s ij + s jj

81 Similarities/dissimilarities euclid general k~x i ~x j k 2 d ij = d(x i,x j ) h~x i, ~x j i s ij = s(x i,x j ) s ij = 1 2 d ij 1 n P l d il 1 n Pl d lj + 1 P n 2 l,l d 0 ll 0

82 Pseudo-euclidean embedding pseudo-euclid general k~x i ~x j k 2 pq = k~x 1 i ~x 1 j k2 k~x 2 i ~x 2 j k2 d ij = d(x i,x j ) h~x i, ~x j i pq = h~x 1 i, ~x1 j i h~x2 i, ~x2 j i s ij = s(x i,x j ) signature (p, q, n p q) euclideanity can be obtained by clip / flip

83 Pseudo-Euclidean Space For every symmetric D a vector space embedding in pseudo-euclidean space exists; symmetric bilinear form induces dissimilarities -1 P2=(-6.1,1) P4=(-0.1,0) P3=(0.1,0) P1=(6.1,1) +1 P6=(-4,-1) P5=(4,-1)

84 LVQ for dis-/similarities classification based on k~x i ~w j k 2 = k~x i k 2 2h~x i, ~w j i + k ~w j k 2 training optimizes f k~x i ~w j k 2 i,j

85 LVQ for dis-/similarities classification based on k~x i ~w j k 2 = k~x i k 2 2h~x i, ~w j i + k ~w j k 2 training optimizes f k~x i ~w j k 2 i,j prototypes as linear combinations ~w j = P ji ~x i possible assumptions: P j ji = 1, ji 0

86 LVQ for dis-/similarities classification based on k~x i ~w j k 2 = k~x i k 2 2h~x i, ~w j i + k ~w j k 2 training optimizes f k~x i ~w j k 2 i,j kernel aproach k~x i ~w j k 2 = s ii 2 X l jl s il + X l,l 0 jl jl 0s ll 0

87 LVQ for dis-/similarities classification based on k~x i ~w j k 2 = k~x i k 2 2h~x i, ~w j i + k ~w j k 2 training optimizes f k~x i ~w j k 2 i,j relational aproach k~x i ~w j k 2 = X l jl d il 1 2 X l,l 0 jl jl 0d ll 0 for normalized jl

88 LVQ for dis-/similarities 00 optimize: f X ii jl d il X l 1 X jl jl 0d ll 0A l,l 0 i,j 1 A 1 jl s il + X jl jl 0s ll 0A l,l 0 1 A i,j gradient descent with respect to followed by normalization jl! relational GLVQ / SRLVQ

89 LVQ for dis-/similarities gradient descent with respect j f hence: k~x i ~w j k 2 i,j! kernel GLVQ / SRLVQ ~w j = X l = 2f 0 (~x i ~w j ) Pl jl~x l 2f 0 (~x i P l jl~x l ) jl ~x l this can be decomposed into contributions of the coe... only for euclidean form! cients

90 LVQ for dis-/similarities GLVQ similarities gradient w.r.t. coefficients RSLVQ dissimilarities gradient w.r.t. prototypes only in the euclidean case: kernel variants resemble gradient w.r.t w large margin generalization bounds interpretation as likelihood ratio

91 Results Cognitive Interaction Technology Center of Excellence

92 Computational effort Size of Matrix (Double Precision) n Size MB 10, MB 20, GB 50, GB 200, GB

93 Computational effort? k~x i ~w j k 2 = s ii 2 X l jl s il + X l,l 0 jl jl 0s ll 0 = e t ise i 2 e i S j + t js j sample m landmarks only S m,n approximate S S m,n S 1 m,ms n,m S m,m S n,m [Nyström approximation, Williams/Seeger]

94 Experiments Cognitive Interaction Technology Center of Excellence

95

96 Take home! there exist cool methods which enable the application of LVQ for similarities / dissimilarities! quadratic complexity! Nystroem approximation for low rank data reduces to linear complexity! metric adaptation possible in a similar way as for GMLVQ: adapt w.r.t similarity/dissimilarity parameters (has been done for alignment distance " ESANN 14)

97

98 Confidence measures! Certainty of a classification? x?!

99 Conformal prediction! framework to accompany pointwise classification of online methods by provable guarantees: classifier trained on N (exchangeable) data conformity measure yields possible labels such that for a new point it holds: [Shafer & Vovk,2008]

100 Conformal prediction! pick conformity measure, e.g.! induces two terms: Credibility: how sure that a prediction is correct Confidence: how sure that ALL OTHER labels are incorrect.. any measure is valid,but some measures are more useful... higher credibility higher confidence lower credibility lower confidence

101 Conformal prediction algorithm [Shafer,Vovk]

102 Simplified conformal prediction given training data and new point 1. train the model on training data 2. compute nonconformity of training set 3. for every non conformity of 4. compare values is 5. output label with best r-value credibility: largest r-value confidence: 1- second largest r-value

103 Qualitative result Cognitive Interaction Technology Center of Excellence

104 Growing conformal semi-supervised LVQ given labeled data and unlabeled data init model with minimum number of prototypes train model on Loop: predict confidence/credibility on predict labels on based on secures part and consider secure part add the part of with high confidence/credibility identify regions with poor confidence/credibility for generate new protoype

105 Growing conformal LVQ

106 Semi-supervised growing conformal LVQ

107 Example evaluations

108 Take home! conformal prediction enables to accompany classification results by confidence values! can be realised efficiently for LVQ based on distance measures! allows incremental versions (also for relational setting, semi-supervised training)

109

110 Literature! T. Kohonen. Self-Organizing Maps. Springer, Berlin, 1997.! T. Kohonen. Learning vector quantization. In: M.A. Arbib, editor, The Handbook of Brain Theory and Neural Networks., pages MIT Press, Cambridge, MA, 1995.! M. Biehl, B. Hammer, P. Schneider, T. Villmann, Metric Learning for Prototype-based, in: Innovations in Neural Information Paradigms and Applications, M. Bianchini, M. Maggini, F. Scarselli, L.C. Jain (eds.), Springer Studies in Computational Intelligence, Vol 247 (2009), ! M. Biehl, B. Hammer, F.-M. Schleif, P. Schneider, T. Villmann, Stationarity of Matrix Relevance Learning Vector Quantization, Machine Learning Reports 01/2009, Univ. Leipzig (2009)! M. Biehl, A. Ghosh, and B. Hammer, Dynamics and generalization ability of LVQ algorithms, J. Machine Learning Research 8 (Feb): , 2007! W. Arlt, M. Biehl, A.E. Taylor, S. Hahner, R. Libe, B.A. Hughes, P. Schneider, D.J. Smith, H. Stiekema, N. Krone, E. Porfiri, G. Opocher, J. Bertherat, F. Mantero, B. Allolio, M. Terzolo, P. Nightingale, C.H.L. Shackleton, X. Bertagna, M. Fassnacht, P.M. Stewart Urine steroid metabolomics as a biomarker tool for detecting malignancy in adrenal tumors J. of Clinical Endocrinology & Metabolism 96: (2011).! Frank-Michael Schleif, Thomas Villmann, Markus Kostrzewa, Barbara Hammer, Alexander Gammerman: Cancer informatics by prototype networks in mass spectrometry. Artificial Intelligence in Medicine 45(2-3): (2009)! S. Kirstein, H. Wersing, H.-M. Gross, and E. Körner. A Life-Long Learning Vector Quantization Approach for Interactive Learning of Multiple Categories. Neural Networks 28: (2012).! Sambu Seo, Klaus Obermayer: Soft Learning Vector Quantization. Neural Computation 15(7): (2003)! Barbara Hammer, Daniela Hofmann, Frank-Michael Schleif, Xibin Zhu: Learning vector quantization for (dis-)similarities. Neurocomputing (IJON) 131:43-51 (2014)! Marc Strickert, Barbara Hammer, Thomas Villmann, Michael Biehl: Regularization and improved interpretation of linear data mappings and adaptive distance measures. CIDM 2013:10-17! Sato, Yamada, Generalized Learning Vector Quantization, NIPS 96

111 Literature! B. Mokbel, B. Paassen, and B. Hammer. Adaptive distance measures for sequential data. In Michel Verleysen, editor, ESANN, pages , 2014.! Daniela Hofmann, Frank-Michael Schleif, Benjamin Paa.en, and Barbara Hammer. Learning interpretable kernelized prototype-based models. Neurocomputing, accepted, 2013.! Xibin Zhu, Frank-Michael Schleif, and Barbara Hammer. Semi-supervised vector quantization for proximity data. In ESANN, pages 89 94, 2013.! Frank-Michael Schleif, Xibin Zhu, and Barbara Hammer. Sparse conformal prediction for dissimilarity data. Annals of Mathematics and Artificial Intelligence (AMAI), 2014.! Barbara Hammer, Daniela Hofmann, Frank-Michael Schleif, and Xibin Zhu. Learning vector quantization for (dis-)similarities. Neurocomputing, 131:43 51, 2014.! Xibin Zhu, Frank-Michael Schleif, and Barbara Hammer. Patch processing for relational learning vector quantization. In Jun Wang, Gary G. Yen, and Marios M. Polycarpou, editors, Advances in Neural Networks - ISNN th International Symposium on Neural Networks, Shenyang, China, July 11-14, Proceedings, Part I, volume 7367, pages Springer, 2012.! Andrej Gisbrecht, Bassam Mokbel, Frank-Michael Schleif, Xibin Zhu, and Barbara Hammer. Linear time relational prototype based learning. Int. J. Neural Syst., 22(5), 2012.! Kerstin Bunte, Petra Schneider, Barbara Hammer, Frank-Michael Schleif, Thomas Villmann, and Michael Biehl. Limited rank matrix learning, discriminative dimension reduction and visualization. Neural Networks, 26: , 2012.! P. Schneider, K. Bunte, H. Stiekema, B. Hammer, T. Villmann, and M. Biehl. Regularization in matrix relevance learning. IEEE Transactions on Neural Networks, 21: , 2010.! M. Biehl, B. Hammer, F.-M. Schleif, P. Schneider, and T. Villmann. Stationarity of matrix relevance learning vector quantization machine learning reports. Technical Report 01/2009, University of Leipzig, 2009.! Petra Schneider, Michael Biehl, Barbara Hammer: Adaptive Relevance Matrices in Learning Vector Quantization. Neural Computation 21(12): (2009)! Koby Crammer, Ran Gilad-Bachrach, Amir Navot, Naftali Tishby: Margin Analysis of the LVQ Algorithm. NIPS 2002: ! Shafer, Vovk, JMLR 51, A Tutorial on Conformal Prediction,2008.

Nonlinear Discriminative Data Visualization

Nonlinear Discriminative Data Visualization Nonlinear Discriminative Data Visualization Kerstin Bunte 1, Barbara Hammer 2, Petra Schneider 1, Michael Biehl 1 1- University of Groningen - Institute of Mathematics and Computing Sciences P.O. Box 47,

More information

Supervised Median Clustering

Supervised Median Clustering Supervised Median Clustering Barbara Hammer 1, Alexander Hasenfuss 1, Frank-Michael Schleif 2, and Thomas Villmann 3 IfI Technical Report Series IfI-06-09 Impressum Publisher: Institut für Informatik,

More information

Matrix adaptation in discriminative vector quantization

Matrix adaptation in discriminative vector quantization Matrix adaptation in discriminative vector quantization Petra Schneider, Michael Biehl, Barbara Hammer 2 IfI Technical Report Series IFI-8-8 Impressum Publisher: Institut für Informatik, Technische Universität

More information

Learning Vector Quantization: generalization ability and dynamics of competing prototypes

Learning Vector Quantization: generalization ability and dynamics of competing prototypes Learning Vector Quantization: generalization ability and dynamics of competing prototypes Aree Witoelar 1, Michael Biehl 1, and Barbara Hammer 2 1 University of Groningen, Mathematics and Computing Science

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

A Computational Framework for Exploratory Data Analysis

A Computational Framework for Exploratory Data Analysis A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Self Organizing Maps for Visualization of Categories

Self Organizing Maps for Visualization of Categories Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES

CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES Proceedings of the 2 nd Workshop of the EARSeL SIG on Land Use and Land Cover CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES Sebastian Mader

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

SACOC: A spectral-based ACO clustering algorithm

SACOC: A spectral-based ACO clustering algorithm SACOC: A spectral-based ACO clustering algorithm Héctor D. Menéndez, Fernando E. B. Otero, and David Camacho Abstract The application of ACO-based algorithms in data mining is growing over the last few

More information

Online Semi-Supervised Learning

Online Semi-Supervised Learning Online Semi-Supervised Learning Andrew B. Goldberg, Ming Li, Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences University of Wisconsin Madison Xiaojin Zhu (Univ. Wisconsin-Madison) Online Semi-Supervised

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

Visualization by Linear Projections as Information Retrieval

Visualization by Linear Projections as Information Retrieval Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland jaakko.peltonen@tkk.fi

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Comparing large datasets structures through unsupervised learning

Comparing large datasets structures through unsupervised learning Comparing large datasets structures through unsupervised learning Guénaël Cabanes and Younès Bennani LIPN-CNRS, UMR 7030, Université de Paris 13 99, Avenue J-B. Clément, 93430 Villetaneuse, France cabanes@lipn.univ-paris13.fr

More information

Self Organizing Maps: Fundamentals

Self Organizing Maps: Fundamentals Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

INTERACTIVE DATA EXPLORATION USING MDS MAPPING

INTERACTIVE DATA EXPLORATION USING MDS MAPPING INTERACTIVE DATA EXPLORATION USING MDS MAPPING Antoine Naud and Włodzisław Duch 1 Department of Computer Methods Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract: Interactive

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Data topology visualization for the Self-Organizing Map

Data topology visualization for the Self-Organizing Map Data topology visualization for the Self-Organizing Map Kadim Taşdemir and Erzsébet Merényi Rice University - Electrical & Computer Engineering 6100 Main Street, Houston, TX, 77005 - USA Abstract. The

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

More information

Machine Learning in FX Carry Basket Prediction

Machine Learning in FX Carry Basket Prediction Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

MA2823: Foundations of Machine Learning

MA2823: Foundations of Machine Learning MA2823: Foundations of Machine Learning École Centrale Paris Fall 2015 Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr TAs: Jiaqian Yu jiaqian.yu@centralesupelec.fr

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

An Introduction to Neural Networks

An Introduction to Neural Networks An Introduction to Vincent Cheung Kevin Cannons Signal & Data Compression Laboratory Electrical & Computer Engineering University of Manitoba Winnipeg, Manitoba, Canada Advisor: Dr. W. Kinsner May 27,

More information

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut. Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

Large-Scale Similarity and Distance Metric Learning

Large-Scale Similarity and Distance Metric Learning Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March

More information

Integrated Data Mining Strategy for Effective Metabolomic Data Analysis

Integrated Data Mining Strategy for Effective Metabolomic Data Analysis The First International Symposium on Optimization and Systems Biology (OSB 07) Beijing, China, August 8 10, 2007 Copyright 2007 ORSC & APORC pp. 45 51 Integrated Data Mining Strategy for Effective Metabolomic

More information

Learning Feedback in Intelligent Tutoring Systems

Learning Feedback in Intelligent Tutoring Systems Learning Feedback in Intelligent Tutoring Systems Sebastian Gross Bassam Mokbel Barbara Hammer Niels Pinkwart (This is a preprint of the publication [9], as provided by the authors.) Abstract Intelligent

More information

Novelty Detection in image recognition using IRF Neural Networks properties

Novelty Detection in image recognition using IRF Neural Networks properties Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Visualization of Breast Cancer Data by SOM Component Planes

Visualization of Breast Cancer Data by SOM Component Planes International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

K-Means Clustering Tutorial

K-Means Clustering Tutorial K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July

More information

Learning with Local and Global Consistency

Learning with Local and Global Consistency Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8 Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

Learning with Local and Global Consistency

Learning with Local and Global Consistency Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical

More information

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods

More information

Efficient online learning of a non-negative sparse autoencoder

Efficient online learning of a non-negative sparse autoencoder and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil

More information

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions Edith Cowan University Research Online ECU Publications Pre. JPEG compression of monochrome D-barcode images using DCT coefficient distributions Keng Teong Tan Hong Kong Baptist University Douglas Chai

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Real-Life Industrial Process Data Mining RISC-PROS. Activities of the RISC-PROS project

Real-Life Industrial Process Data Mining RISC-PROS. Activities of the RISC-PROS project Real-Life Industrial Process Data Mining Activities of the RISC-PROS project Department of Mathematical Information Technology University of Jyväskylä Finland Postgraduate Seminar in Information Technology

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Lecture 8 February 4

Lecture 8 February 4 ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecture Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml3e CHAPTER 1: INTRODUCTION Big Data 3 Widespread

More information

Maximum Margin Clustering

Maximum Margin Clustering Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Visualization of Topology Representing Networks

Visualization of Topology Representing Networks Visualization of Topology Representing Networks Agnes Vathy-Fogarassy 1, Agnes Werner-Stark 1, Balazs Gal 1 and Janos Abonyi 2 1 University of Pannonia, Department of Mathematics and Computing, P.O.Box

More information

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz AdaBoost Jiri Matas and Jan Šochman Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Presentation Outline: AdaBoost algorithm Why is of interest? How it works? Why

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

Visualization of General Defined Space Data

Visualization of General Defined Space Data International Journal of Computer Graphics & Animation (IJCGA) Vol.3, No.4, October 013 Visualization of General Defined Space Data John R Rankin La Trobe University, Australia Abstract A new algorithm

More information

Data Mining and Neural Networks in Stata

Data Mining and Neural Networks in Stata Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining 6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

More information

Music Mood Classification

Music Mood Classification Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information