Learning Tools for Big Data Analytics
|
|
|
- Alisha Caldwell
- 10 years ago
- Views:
Transcription
1 Learning Tools for Big Data Analytics Georgios B. Giannakis Acknowledgments: Profs. G. Mateos and K. Slavakis NSF , , and MURI-FA Center for Advanced Signal and Image Sciences (CASIS) 1 Livermore, May 13, 2015
2 Growing data torrent Source: McKinsey Global Institute, Big Data: The next frontier for innovation, competition, and productivity, May
3 Big Data: Capturing its value Source: McKinsey Global Institute, Big Data: The next frontier for innovation, competition, and productivity, May
4 Challenges BIG Sheer volume of data Decentralized and parallel processing Security and privacy measures Modern massive datasets involve many attributes Parsimonious models to ease interpretability and enhance learning performance Fast Real-time streaming data Online processing Quick-rough answer vs. slow-accurate answer? Outliers and misses Robust imputation approaches Messy 4
5 Opportunities Big tensor data models and factorizations High-dimensional statistical SP Network data visualization Theoretical and Statistical Foundations of Big Data Analytics Pursuit of low-dimensional structure Analysis of multi-relational data Common principles across networks Resource tradeoffs Randomized algorithms Scalable online, decentralized optimization Algorithms and Implementation Platforms to Learn from Massive Datasets Convergence and performance guarantees Information processing over graphs Novel architectures for large-scale data analytics Robustness to outliers and missing data Graph SP 5
6 Roadmap Context and motivation Critical Big Data tasks Encompassing and parsimonious data modeling Dimensionality reduction Data cleansing, anomaly detection, and inference Randomized learning via data sketching Conclusions and future research directions 6
7 Encompassing model Background (low rank) Patterns, innovations, (co-)clusters, outliers Observed data Dictionary Noise Sparse matrix Subset of observations and projection operator allow for misses Large-scale data and/or h Any of unknown 7
8 Subsumed paradigms Structure leveraging criterion Nuclear norm: : singular val. of -norm (With or without misses) known Compressive sampling (CS) [Candes-Tao 05] Dictionary learning (DL) [Olshausen-Field 97] Non-negative matrix factorization (NMF) [Lee-Seung 99] Principal component pursuit (PCP) [Candes etal 11] Principal component analysis (PCA) [Pearson 1901] 8
9 PCA formulations Training data Minimum reconstruction error Compression Reconstruction Component analysis model Solution: 9
10 Dual and kernel PCA SVD: Ma Q Gram matrix Inner products Q Q. What if approximating low-dim space not a hyperplane? Q A1. Stretch it to become linear: Kernel PCA; e.g., [Scholkopf-Smola 01] maps to, and leverages dual PCA in high-dim spaces A2. General (non)linear models; e.g., union of hyperplanes, or, locally linear tangential hyperplanes B. Schölkopf and A. J. Smola, Learning with Kernels, Cambridge, MIT Press,
11 Identification of network communities Kernel PCA instrumental for partitioning of large graphs (spectral clustering) Relies on graph Laplacian to capture nodal correlations Facebook egonet 744 nodes, 30,023 edges arxiv collaboration network (General Relativity) 4,158 nodes, 13,422 edges For random sketching and validation reduces complexity to P. A. Traganitis, K. Slavakis, and G. B. Giannakis, Spectral clustering of large-scale communities via random sketching and validation, Proc. Conf. on Info. Science and Systems, Baltimore, Maryland, March 18-20,
12 Local linear embedding For each find neighborhood, e.g., k-nearest neighbors Weight matrix captures local affine relations Sparse and [Elhamifar-Vidal 11] Identify low-dimensional vectors preserving local geometry [Saul-Roweis 03] Solution: The rows of are the minor, excluding, L. K. Saul and S. T. Roweis, Think globally, fit locally: Unsupervised learning of low dimensional manifolds, J. Machine Learning Research, vol. 4, pp ,
13 Dictionary learning Solve for dictionary and sparse : (Lasso task; sparse coding) (Constrained LS task) Alternating minimization; both and are convex Under conditions, converges to a stationary point of [Tseng 01] B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by V1? Vis. Res., vol. 37, no. 23, pp ,
14 Joint DL-LLE paradigm DL fit LLE fit Sparsity regularization Dictionary morphs data to a smooth basis; reduces noise and complexity Inpainting DL-LLE offers by local-affine-geometry-preserving data-driven non-linear embedding DL; for robust misses; (de-)compression PSNR db K. Slavakis, G. B. Giannakis, and G. Leus, Robust sparse embedding and reconstruction via dictionary learning, Proc. of Conf. on Info. Science and Systems, JHU, Baltimore, March
15 From low-rank matrices to tensors Data cube, e.g., sub-sampled MRI frames a r A= α i b r B= β i PARAFAC decomposition per slab t [Harshman 70] c r Tensor subspace comprises R rank-one matrices C= γ i Goal: Given streaming, learn the subspace matrices (A,B) recursively, and impute possible misses of Y t J. A. Bazerque, G. Mateos, and G. B. Giannakis, "Rank regularization and Bayesian inference for tensor completion and xtrapolation," IEEE Trans. on Signal Processing, vol. 61, no. 22, pp , November
16 Online tensor subspace learning Image domain low tensor rank Tikhonov regularization promotes low rank Proposition [Bazerque-GG 13]: With Stochastic alternating minimization; parallelizable across bases Real-time reconstruction (FFT per iteration) M. Mardani, G. Mateos, and G. B. Giannakis, "Subspace learning and imputation for streaming big data matrices and tensors," IEEE Trans. on Signal Processing, vol. 63, pp , May
17 Dynamic cardiac MRI test in vivo dataset: 256 k-space 200x256 frames Ground-truth frame Sampling trajectory R=100, 90% misses R=150, 75% misses Potential for accelerating MRI at high spatio-temporal resolution Low-rank plus can also capture motion effects M. Mardani and G. B. Giannakis, "Accelerating dynamic MRI via tensor subspace learning, Proc. of ISMRM 23rd Annual Meeting and Exhibition, Toronto, Canada, May 30 - June 5,
18 Roadmap Context and motivation Critical Big Data tasks Randomized learning via data sketching Johnson-Lindenstrauss lemma Randomized linear regression Randomized clustering Conclusions and future research directions 18
19 Randomized linear algebra Basic tools: Random sampling and random projections Attractive features Reduced dimensionality to lower complexity with Big Data Rigorous error analysis at reduced dimension Ordinary least-squares (LS) Given If SVD incurs complexity. Q: What if? M. W. Mahoney, Randomized Algorithms for Matrices and Data, Foundations and Trends In Machine Learning, vol. 3, no. 2, pp , Nov
20 Randomized LS for linear regression LS estimate using (pre-conditioned) random projection matrix R d x D Random diagonal w/ and Hadamard matrix subsets of data obtained by uniform sampling/scaling via yield LS estimates of comparable quality Select reduced dimension Complexity reduced from to N. Ailon and B. Chazelle, The fast Johnson-Lindenstrauss transform and approximate nearest neighbors, SIAM Journal on Computing, 39(1): ,
21 Johnson-Lindenstrauss lemma The workhorse for proofs involving random projections JL lemma: If, integer, and reduced dimension satisfies then for any there exists a mapping s.t. Almost preserves pairwise distances! ( ) If with i.i.d. entries of and reduced dimension, then ( ) holds w.h.p. [Indyk-Motwani'98] If with i.i.d. uniform over {+1,-1} entries of and reduced dimension as in JL lemma, then holds w.h.p. [Achlioptas 01] ( ) W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz maps into a Hilbert space, Contemp. Math, vol. 26, pp ,
22 Performance of randomized LS Theorem For any, if, then w.h.p. condition number of ; and Uniform sampling vs Hadamard preconditioning D = 10,000 and p =50 Performance depends on X M. W. Mahoney, Randomized Algorithms for Matrices and Data, Foundations and Trends In Machine Learning, vol. 3, no. 2, pp , Nov
23 Online censoring for large-scale regression Key idea: Sequentially test and update RLS estimates only for informative data Adaptive censoring (AC) rule Criterion reveals causal support vectors (SVs) Threshold controls avg. data reduction: D. K. Berberidis, G. Wang, G. B. Giannakis, and V. Kekatos, "Adaptive Estimation from Big Data via Censored Stochastic Approximation," Proc. of Asilomar Conf., Pacific Grove, CA, Nov
24 Censoring algorithms and performance AC least mean-squares (LMS) AC recursive least-squares (RLS) at complexity Proposition: AC-RLS AC-LMS AC Kalman Filtering and Smoothing for tracking with a budget D. K. Berberidis, and G. B. Giannakis, "Online Censoring for Large-Scale Regressions," IEEE Trans. on SP, 2015 (submitted); also in Proc. of ICASSP, Brisbane, Australia, April 2015.
25 Censoring vis-a-vis random projections Random projections for linear regression [Mahoney 11] Data-agnostic reduction decoupled from LS solution Adaptive censoring (AC-RLS) Data-driven measurement selection Suitable also for streaming data Minimal memory requirements 25
26 Performance comparison Synthetic: D=10,000, p=300 (50 MC runs); Real data: estimated from full set Highly non-uniform data AC-RLS outperforms alternatives at comparable complexity Robustness to uniform data (all rows of X equally important ) 26
27 Big data clustering Given with assign them to clusters Key idea: Reduce dimensionality via random projections Desiderata: Preserve the pairwise data distances in lower dimensions Feature extraction Construct combined features (e.g., via ) Apply K-means to -space Feature selection Select of input features (rows of ) Apply K-means to -space 27
28 Random sketching and validation (SkeVa) Randomly select informative dimensions Algorithm For Sketch dimensions: Run k-means on Re-sketch dimensions Augment centroids, Validate using consensus set Similar approaches possible for Sequential and kernel variants available P. A. Traganitis, K. Slavakis, and G. B. Giannakis, Clustering High-Dimensional Data via Random Sampling and Consensus, Proc. of GlobalSIP, Atlanta, GA, December
29 RP versus SkeVA comparisons KDDb dataset (subset) D = 2,990,384, T = 10,000, K = 2 RP: [Boutsidis etal 13] SkeVa: Sketch and validate P. A. Traganitis, K. Slavakis, and G. B. Giannakis, "Sketch and Validate for Big Data Clustering," IEEE Journal on Special Topics in Signal Processing, June
30 Closing comments Big Data modeling and tasks Dimensionality reduction Succinct representations Vectors, matrices, and tensors Learning algorithms Data sketching via random projections Streaming, parallel, decentralized Implementation platforms Scalable computing platforms Analytics in the cloud K. Slavakis, G. B. Giannakis, and G. Mateos, Modeling and optimization for Big Data analytics, 30 IEEE Signal Processing Magazine, vol. 31, no. 5, pp , Sep
Robust and Scalable Algorithms for Big Data Analytics
Robust and Scalable Algorithms for Big Data Analytics Georgios B. Giannakis Acknowledgment: Drs. G. Mateos, K. Slavakis, G. Leus, and M. Mardani Arlington, VA, USA March 22, 2013 1 Roadmap n Robust principal
Signal Processing for Big Data
Signal Processing for Big Data G. B. Giannakis, K. Slavakis, and G. Mateos Acknowledgments: NSF Grants EARS-1343248, EAGER-1343860 MURI Grant No. AFOSR FA9550-10-1-0567 Lisbon, Portugal 1 September 1,
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
Randomized Robust Linear Regression for big data applications
Randomized Robust Linear Regression for big data applications Yannis Kopsinis 1 Dept. of Informatics & Telecommunications, UoA Thursday, Apr 16, 2015 In collaboration with S. Chouvardas, Harris Georgiou,
Unsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
So which is the best?
Manifold Learning Techniques: So which is the best? Todd Wittman Math 8600: Geometric Data Analysis Instructor: Gilad Lerman Spring 2005 Note: This presentation does not contain information on LTSA, which
Signal Processing Tools for. Big Data Analy5cs. G. B. Giannakis, K. Slavakis, and G. Mateos. Call for Papers. Nice, France Location and venue
Signal Processing Tools for General chairs Big Data Analy5cs Jean-Luc Dugelay, EURECOM Dirk Slock, EURECOM Technical chairs Marc Antonini, I3S/UNS/CNRS Nicholas Evans, EURECOM Cédric Richard, UNS/OCA Plenary
Going Big in Data Dimensionality:
LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DEPARTMENT INSTITUTE FOR INFORMATICS DATABASE Going Big in Data Dimensionality: Challenges and Solutions for Mining High Dimensional Data Peer Kröger Lehrstuhl für
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015
Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Lecture: MWF: 1:00-1:50pm, GEOLOGY 4645 Instructor: Mihai
Greedy Column Subset Selection for Large-scale Data Sets
Knowledge and Information Systems manuscript No. will be inserted by the editor) Greedy Column Subset Selection for Large-scale Data Sets Ahmed K. Farahat Ahmed Elgohary Ali Ghodsi Mohamed S. Kamel Received:
A Survey on Pre-processing and Post-processing Techniques in Data Mining
, pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,
Visualization by Linear Projections as Information Retrieval
Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland [email protected]
CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Subspace clustering of dimensionality-reduced data
Subspace clustering of dimensionality-reduced data Reinhard Heckel, Michael Tschannen, and Helmut Bölcskei ETH Zurich, Switzerland Email: {heckel,boelcskei}@nari.ee.ethz.ch, [email protected] Abstract
A Computational Framework for Exploratory Data Analysis
A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,
Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering
IV International Congress on Ultra Modern Telecommunications and Control Systems 22 Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering Antti Juvonen, Tuomo
Geometric-Guided Label Propagation for Moving Object Detection
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Geometric-Guided Label Propagation for Moving Object Detection Kao, J.-Y.; Tian, D.; Mansour, H.; Ortega, A.; Vetro, A. TR2016-005 March 2016
Machine Learning for Data Science (CS4786) Lecture 1
Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:
SYMMETRIC EIGENFACES MILI I. SHAH
SYMMETRIC EIGENFACES MILI I. SHAH Abstract. Over the years, mathematicians and computer scientists have produced an extensive body of work in the area of facial analysis. Several facial analysis algorithms
CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff
Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing
NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu
Bilinear Prediction Using Low-Rank Models
Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 10, MAY 15, 2015 2663
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 63, NO 10, MAY 15, 2015 2663 Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors Morteza Mardani, StudentMember,IEEE, Gonzalo Mateos,
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Maximum-Margin Matrix Factorization
Maximum-Margin Matrix Factorization Nathan Srebro Dept. of Computer Science University of Toronto Toronto, ON, CANADA [email protected] Jason D. M. Rennie Tommi S. Jaakkola Computer Science and Artificial
Sketch As a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn)
See All by Looking at A Few: Sparse Modeling for Finding Representative Objects
See All by Looking at A Few: Sparse Modeling for Finding Representative Objects Ehsan Elhamifar Johns Hopkins University Guillermo Sapiro University of Minnesota René Vidal Johns Hopkins University Abstract
Bag of Pursuits and Neural Gas for Improved Sparse Coding
Bag of Pursuits and Neural Gas for Improved Sparse Coding Kai Labusch, Erhardt Barth, and Thomas Martinetz University of Lübec Institute for Neuro- and Bioinformatics Ratzeburger Allee 6 23562 Lübec, Germany
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining [email protected] Outline Predictive modeling methodology k-nearest Neighbor
Collaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition
P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition K. Osypov* (WesternGeco), D. Nichols (WesternGeco), M. Woodward (WesternGeco) & C.E. Yarman (WesternGeco) SUMMARY Tomographic
Sparse Nonnegative Matrix Factorization for Clustering
Sparse Nonnegative Matrix Factorization for Clustering Jingu Kim and Haesun Park College of Computing Georgia Institute of Technology 266 Ferst Drive, Atlanta, GA 30332, USA {jingu, hpark}@cc.gatech.edu
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major
ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 237 ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization Hujun Yin Abstract When used for visualization of
Big Data Analytics in Future Internet of Things
Big Data Analytics in Future Internet of Things Guoru Ding, Long Wang, Qihui Wu College of Communications Engineering, PLA University of Science and Technology, Nanjing 210007, China email: [email protected];
When is missing data recoverable?
When is missing data recoverable? Yin Zhang CAAM Technical Report TR06-15 Department of Computational and Applied Mathematics Rice University, Houston, TX 77005 October, 2006 Abstract Suppose a non-random
Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication
Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Thomas Reilly Data Physics Corporation 1741 Technology Drive, Suite 260 San Jose, CA 95110 (408) 216-8440 This paper
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
Learning, Sparsity and Big Data
Learning, Sparsity and Big Data M. Magdon-Ismail (Joint Work) January 22, 2014. Out-of-Sample is What Counts NO YES A pattern exists We don t know it We have data to learn it Tested on new cases? Teaching
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Large-Scale Similarity and Distance Metric Learning
Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Big learning: challenges and opportunities
Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
Soft Clustering with Projections: PCA, ICA, and Laplacian
1 Soft Clustering with Projections: PCA, ICA, and Laplacian David Gleich and Leonid Zhukov Abstract In this paper we present a comparison of three projection methods that use the eigenvectors of a matrix
Tensor Factorization for Multi-Relational Learning
Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany [email protected] 2 Siemens AG, Corporate
Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.
Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Previous Lecture 13 Indexes for Multimedia Data 13.1
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data
CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address
Sparsity-promoting recovery from simultaneous data: a compressive sensing approach
SEG 2011 San Antonio Sparsity-promoting recovery from simultaneous data: a compressive sensing approach Haneet Wason*, Tim T. Y. Lin, and Felix J. Herrmann September 19, 2011 SLIM University of British
1 Spectral Methods for Dimensionality
1 Spectral Methods for Dimensionality Reduction Lawrence K. Saul Kilian Q. Weinberger Fei Sha Jihun Ham Daniel D. Lee How can we search for low dimensional structure in high dimensional data? If the data
Rank one SVD: un algorithm pour la visualisation d une matrice non négative
Rank one SVD: un algorithm pour la visualisation d une matrice non négative L. Labiod and M. Nadif LIPADE - Universite ParisDescartes, France ECAIS 2013 November 7, 2013 Outline Outline 1 Data visualization
Visualization of Large Font Databases
Visualization of Large Font Databases Martin Solli and Reiner Lenz Linköping University, Sweden ITN, Campus Norrköping, Linköping University, 60174 Norrköping, Sweden [email protected], [email protected]
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data
Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
350 Serra Mall, Stanford, CA 94305-9515
Meisam Razaviyayn Contact Information Room 260, Packard Building 350 Serra Mall, Stanford, CA 94305-9515 E-mail: [email protected] Research Interests Education Appointments Large scale data driven optimization
Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
Signal and Information Processing
The Fu Foundation School of Engineering and Applied Science Department of Electrical Engineering COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK Signal and Information Processing Prof. John Wright SIGNAL AND
THE problem of computing sparse solutions (i.e., solutions
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 7, JULY 2005 2477 Sparse Solutions to Linear Inverse Problems With Multiple Measurement Vectors Shane F. Cotter, Member, IEEE, Bhaskar D. Rao, Fellow,
How To Make Visual Analytics With Big Data Visual
Big-Data Visualization Customizing Computational Methods for Visual Analytics with Big Data Jaegul Choo and Haesun Park Georgia Tech O wing to the complexities and obscurities in large-scale datasets (
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
CNFSAT: Predictive Models, Dimensional Reduction, and Phase Transition
CNFSAT: Predictive Models, Dimensional Reduction, and Phase Transition Neil P. Slagle College of Computing Georgia Institute of Technology Atlanta, GA [email protected] Abstract CNFSAT embodies the P
Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining
Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining Kun Liu Hillol Kargupta and Jessica Ryan Abstract This paper explores the possibility of using multiplicative
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
A New Method for Dimensionality Reduction using K- Means Clustering Algorithm for High Dimensional Data Set
A New Method for Dimensionality Reduction using K- Means Clustering Algorithm for High Dimensional Data Set D.Napoleon Assistant Professor Department of Computer Science Bharathiar University Coimbatore
Spark and the Big Data Library
Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Forschungskolleg Data Analytics Methods and Techniques
Forschungskolleg Data Analytics Methods and Techniques Martin Hahmann, Gunnar Schröder, Phillip Grosse Prof. Dr.-Ing. Wolfgang Lehner Why do we need it? We are drowning in data, but starving for knowledge!
Scalable Machine Learning - or what to do with all that Big Data infrastructure
- or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection
