Learning, Sparsity and Big Data
|
|
- Anissa Park
- 8 years ago
- Views:
Transcription
1 Learning, Sparsity and Big Data M. Magdon-Ismail (Joint Work) January 22, 2014.
2 Out-of-Sample is What Counts NO YES A pattern exists We don t know it We have data to learn it Tested on new cases? Teaching Machine Learning to a Diverse Audience: the Foundation-based Approach, Lin, M-I, Abu-Mostafa, ICML c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 2 /19 DMOZ Data and SVMs
3 DMOZ Data and SVMs Good way to generate human labeled text data. US:Pennsylvania:Business&Economy vs. US:Arkansas:Localities:M Bag of (18K) words and SVMs: 300 important words All words Error 24.7% 27.6% important words: pennsylvania, arkansas, business, baptist, company,... Other kinds of tasks: polarity relevance interest topic clustering/classification. c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 3 /19 Financial Data
4 Financial Market Price Data 50 stocks in S&P stocks reconstruct price changes(30% error): AA: Alcoa AMGN: Amgen AVP: Avon BHI: Baker Hughes WMB: Williams Companies DIS: Disney C: Citigroup AEP: American Express F: Ford GD: General Dynamics INTC: Intel IBM: IBM MCD: MacDonald s TXN: Texas Instruments XRX: Xerox Optimal, 15 d.o.f. (PCA): 24% error. c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 4 /19 Sparsity is good
5 Throwing Out Unnecessary Features is Good Sparsity: represent your solution using only a few features. Sparse solutions generalize to out-of-sample better less overfitting. Sparse solutions are easier to interpret few important features. Computations are more efficient. Problem (NP-Hard): Find the few relevant features quickly. Question: Are those features provably good? c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 5 /19 Three Learning Problems
6 Three Learning Problems 1. Data Summarization/Compression (PCA) 2. Categorization/Clustering (k-means) 3. Prediction (Regression) c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 6 /19 Data
7 Data Data Matrix Response Matrix d dimensions n data points name age debt income hair weight sex John 21yrs $10K $65K black 175lbs M Joe 74yrs $100K $25K blonde 275lbs M Jane 27yrs $20K $85K blonde 135lbs F. Jen 37yrs $400K $105K brun 155lbs F credit? limit risk 2K high 0 10K low. 15K high X R n d Y R n ω c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 7 /19 More beautiful data
8 More Beautiful Data X R Y R % reconstruction error 10 5 % reconstruction error number of principal components number of principal components c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 8 /19 PCA, K-means, Linear Regression
9 PCA, K-means, Linear Regression k = 20 r = 2k PCA K-Means Regression [ ] = c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 9 /19 Sparsity example
10 Sparsity Represent your solution using only a few... Example: linear regression [ ] = Xw = y y is an optimal linear combination of only a few columns in X. (sparse regression; regularization ( w 0 k); feature subset selection;...) c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 10 /19 SVD
11 Singular Value Decomposition (SVD) X = [ ] [ ] [ ] Σ U k U k 0 V t k d k 0 Σ d k Vd k t O(nd 2 ) U Σ V t (n d) (d d) (d d) X k = U k Σ k V t k = XV k V t k X k is the best rank-k approximation to X. Reconstruction of X using only a few deg. of freedom. X X 20 X 40 X 60 V k is an orthonormal basis for the best k-dimensional subspace of the row space of X. c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 11 /19 Fast approximate SVD
12 Fast Approximate SVD 1: Z = XR R N(d r), Z R n r 2: Q = qr.factorize(z) 3: ˆVk svd k (Q t X) Theorem. Let r = k(1+ 1 ǫ ) and E = X XˆV kˆvt k. Then, E[ E ] (1+ǫ) X X k running time is O(ndk) = o(svd) [BDM, FOCS 2011] We can construct V k quickly c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 12 /19 V k and sparsity
13 V k and Sparsity Important dimensions of V t k are important for X s 1 s 2 s 3 s 4 s 5 V t k ˆV t k Rk r The sampled r columns are good if I = V t kv k ˆV t kˆv k. Sampling schemes: Largest norm (Jollife, 1972); Randomized norm sampling (Rudelson, 1999; RudelsonVershynin, 2007); Greedy (Batson et al, 2009; BDM, 2011). V k is important c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 13 /19 Sparse PCA algorithm
14 Sparse PCA Algorithm 1: Choose a few columns C of X; C R n r. 2: Find the best rank-k approximation of X in the span of C, X C,k. 3: Compute the SVD k of X C,k = U C,k Σ C,k V t C,k. 4: Z = XV C,k. Each feature in Z is a mixture of only the few original r feature dimensions in C. X XV C,k V t C,k X X C,k V C,k V t C,k = X X C,k ( 1+O( 2k r )) X X k. [BDM, FOCS 2011] c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 14 /19 Sparse PCA - pictures
15 Sparse PCA k = 20 k = 40 k = 60 Dense PCA Sparse PCA, r = 2k Theorem. One can construct, in o(svd), k features that are r-sparse, r = O(k), that are as good as exact dense top-k PCA-features. c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 15 /19 Clustering: K-Means
16 Full, slow Clustering: K-Means Fast, sparse 3 clusters 4 Clusters Theorem. There is a subset of features of size O(#clusters) which produces nearly the optimal partition (within a constant factor). One can quickly produce features with a log-approximation factor. [BDM,2013] c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 16 /19 Regression using few important features
17 PCA Regression using Few Important Features PCA, slow, dense Sparse, fast k = 20 k = 40 Theorem. Can find O(k) pure features which performs as well top-k pca-regression (additive error controlled by X X k F /σ k ). [BDM,2013] c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 17 /19 Proofs
18 The Proofs All the algorithms use the sparsifier of V t k in [BDM,FOCS2011]. 1.Choose columns of V t k to preserve its singular values. 2. Ensure that the selected columns preserve the structural properties of the objective with respect to the columns of X that are sampled. 3. Use dual set sparsification algorithms to accomplish (2). c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 18 /19 Thanks
19 THANKS! Data Compression (PCA): Unsupervised k-means Clustering: Supervised PCA Regression: Support Vector Machines Coresets (subsets of data instead features) Few features: easy to interpret; better generalizers; faster computations. c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 19 /19
CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationLinear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
More informationNimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff
Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear
More informationGreedy Column Subset Selection for Large-scale Data Sets
Knowledge and Information Systems manuscript No. will be inserted by the editor) Greedy Column Subset Selection for Large-scale Data Sets Ahmed K. Farahat Ahmed Elgohary Ali Ghodsi Mohamed S. Kamel Received:
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationMultidimensional data analysis
Multidimensional data analysis Ella Bingham Dept of Computer Science, University of Helsinki ella.bingham@cs.helsinki.fi June 2008 The Finnish Graduate School in Astronomy and Space Physics Summer School
More informationSketch As a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn)
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationCOMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers
COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: Pierre-Luc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)
More information6. Cholesky factorization
6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 3, 2014 Text Analytics (Text Mining) LSI (uses SVD), Visualization Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey
More informationα = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationLecture Topic: Low-Rank Approximations
Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original
More informationThe Many Facets of Big Data
Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong ACPR 2013 Big Data 1 volume sample size is big feature dimensionality is big 2 variety multiple formats:
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationChapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationLearning Big (Image) Data via Coresets for Dictionaries
Learning Big (Image) Data via Coresets for Dictionaries Dan Feldman, Micha Feigin, and Nir Sochen Abstract. Signal and image processing have seen in the last few years an explosion of interest in a new
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationW. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
More informationMachine Learning for Data Science (CS4786) Lecture 1
Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:
More informationOrthogonal Projections and Orthonormal Bases
CS 3, HANDOUT -A, 3 November 04 (adjusted on 7 November 04) Orthogonal Projections and Orthonormal Bases (continuation of Handout 07 of 6 September 04) Definition (Orthogonality, length, unit vectors).
More informationRandomized Robust Linear Regression for big data applications
Randomized Robust Linear Regression for big data applications Yannis Kopsinis 1 Dept. of Informatics & Telecommunications, UoA Thursday, Apr 16, 2015 In collaboration with S. Chouvardas, Harris Georgiou,
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationBIG Biomedicine and the Foundations of BIG Data Analysis
BIG Biomedicine and the Foundations of BIG Data Analysis Insider s vs outsider s views (1 of 2) Ques: Genetics vs molecular biology vs biochemistry vs biophysics: What s the difference? Insider s vs outsider
More informationTurning Big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering
Turning Big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering Dan Feldman Melanie Schmidt Christian Sohler Abstract We prove that the sum of the squared Euclidean distances
More informationInference from sub-nyquist Samples
Inference from sub-nyquist Samples Alireza Razavi, Mikko Valkama Department of Electronics and Communications Engineering/TUT Characteristics of Big Data (4 V s) Volume: Traditional computing methods are
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationSyllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015
Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Lecture: MWF: 1:00-1:50pm, GEOLOGY 4645 Instructor: Mihai
More informationText Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that
More informationScalable Machine Learning - or what to do with all that Big Data infrastructure
- or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationDefending Networks with Incomplete Information: A Machine Learning Approach. Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject
Defending Networks with Incomplete Information: A Machine Learning Approach Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject Agenda Security Monitoring: We are doing it wrong Machine Learning
More informationMonotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationOrthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationWhen is missing data recoverable?
When is missing data recoverable? Yin Zhang CAAM Technical Report TR06-15 Department of Computational and Applied Mathematics Rice University, Houston, TX 77005 October, 2006 Abstract Suppose a non-random
More informationRole of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationWavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)
Wavelet analysis In the case of Fourier series, the orthonormal basis is generated by integral dilation of a single function e jx Every 2π-periodic square-integrable function is generated by a superposition
More informationOrthogonal Projections
Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
More informationNMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing
NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu
More informationSYMMETRIC EIGENFACES MILI I. SHAH
SYMMETRIC EIGENFACES MILI I. SHAH Abstract. Over the years, mathematicians and computer scientists have produced an extensive body of work in the area of facial analysis. Several facial analysis algorithms
More informationIntroduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
More informationGoing Big in Data Dimensionality:
LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DEPARTMENT INSTITUTE FOR INFORMATICS DATABASE Going Big in Data Dimensionality: Challenges and Solutions for Mining High Dimensional Data Peer Kröger Lehrstuhl für
More informationInner Product Spaces and Orthogonality
Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,
More informationMLlib: Scalable Machine Learning on Spark
MLlib: Scalable Machine Learning on Spark Xiangrui Meng Collaborators: Ameet Talwalkar, Evan Sparks, Virginia Smith, Xinghao Pan, Shivaram Venkataraman, Matei Zaharia, Rean Griffith, John Duchi, Joseph
More informationRANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING
= + RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING Stefan Savev Berlin Buzzwords June 2015 KEYWORD-BASED SEARCH Document Data 300 unique words per document 300 000 words in vocabulary Data sparsity:
More informationHealth Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important
Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important Floyd Ray Martin, FSA, MAAA Thomas A. McInteer, FSA, MAAA Jonathan P. Polon, FSA Dental Insurance Fraud Detection
More informationLecture 5: Singular Value Decomposition SVD (1)
EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationTurning Big Data Into Small Data: Hardware Aware Approximate Clustering With Randomized SVD and Coresets
Turning Big Data Into Small Data: Hardware Aware Approximate Clustering With Randomized SVD and Coresets The Harvard community has made this article openly available. Please share how this access benefits
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
More informationMachine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
More informationModels for Millions. Bob Stine Department of Statistics The Wharton School, University of Pennsylvania stat.wharton.upenn.
Models for Millions Bob Stine The School, University of Pennsylvania stat.wharton.upenn.edu/~stine 34 th NJ ASA Spring Symposium June 7, 2013 Introduction Statistics in the News Hot topics Big Data Business
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More informationNonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
More informationBig learning: challenges and opportunities
Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,
More informationCompact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
More informationMATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.
MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. Nullspace Let A = (a ij ) be an m n matrix. Definition. The nullspace of the matrix A, denoted N(A), is the set of all n-dimensional column
More informationLearning Tools for Big Data Analytics
Learning Tools for Big Data Analytics Georgios B. Giannakis Acknowledgments: Profs. G. Mateos and K. Slavakis NSF 1343860, 1442686, and MURI-FA9550-10-1-0567 Center for Advanced Signal and Image Sciences
More informationThese axioms must hold for all vectors ū, v, and w in V and all scalars c and d.
DEFINITION: A vector space is a nonempty set V of objects, called vectors, on which are defined two operations, called addition and multiplication by scalars (real numbers), subject to the following axioms
More informationThe Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More informationFionn Murtagh, Pedro Contreras International Conference p-adic MATHEMATICAL PHYSICS AND ITS APPLICATIONS. p-adics.2015, September 2015
Constant Time Search and Retrieval in Big Data, with Linear Time and Space Preprocessing, through Randomly Projected Piling and Sparse Ultrametric Coding Fionn Murtagh, Pedro Contreras International Conference
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationClustering Very Large Data Sets with Principal Direction Divisive Partitioning
Clustering Very Large Data Sets with Principal Direction Divisive Partitioning David Littau 1 and Daniel Boley 2 1 University of Minnesota, Minneapolis MN 55455 littau@cs.umn.edu 2 University of Minnesota,
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationLarge-Scale Spectral Clustering on Graphs
Large-Scale Spectral Clustering on Graphs Jialu Liu Chi Wang Marina Danilevsky Jiawei Han University of Illinois at Urbana-Champaign, Urbana, IL {jliu64, chiwang1, danilev1, hanj}@illinois.edu Abstract
More informationLearning from Diversity
Learning from Diversity Epitope Prediction with Sequence and Structure Features using an Ensemble of Support Vector Machines Rob Patro and Carl Kingsford Center for Bioinformatics and Computational Biology
More informationBUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com Outline Predictive modeling methodology k-nearest Neighbor
More informationBilinear Prediction Using Low-Rank Models
Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.
More informationSubspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationKATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of
More informationAnalysis of Financial Data Using Non-Negative Matrix Factorization
International Mathematical Forum,, 008, no. 8, 8-870 Analysis of Financial Data Using Non-Negative Matrix Factorization Konstantinos Drakakis UCD CASL, University College Dublin Belfield, Dublin, Ireland
More informationA Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationSo which is the best?
Manifold Learning Techniques: So which is the best? Todd Wittman Math 8600: Geometric Data Analysis Instructor: Gilad Lerman Spring 2005 Note: This presentation does not contain information on LTSA, which
More informationCOLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining
More information3D Tranformations. CS 4620 Lecture 6. Cornell CS4620 Fall 2013 Lecture 6. 2013 Steve Marschner (with previous instructors James/Bala)
3D Tranformations CS 4620 Lecture 6 1 Translation 2 Translation 2 Translation 2 Translation 2 Scaling 3 Scaling 3 Scaling 3 Scaling 3 Rotation about z axis 4 Rotation about z axis 4 Rotation about x axis
More informationPredictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
More informationHyperspectral Data Analysis and Supervised Feature Reduction Via Projection Pursuit
Hyperspectral Data Analysis and Supervised Feature Reduction Via Projection Pursuit Medical Image Analysis Luis O. Jimenez and David A. Landgrebe Ion Marqués, Grupo de Inteligencia Computacional, UPV/EHU
More informationMathematical Models of Supervised Learning and their Application to Medical Diagnosis
Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical
More informationDistance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More information