Signal Processing for Big Data


 Amberly Bruce
 2 years ago
 Views:
Transcription
1 Signal Processing for Big Data G. B. Giannakis, K. Slavakis, and G. Mateos Acknowledgments: NSF Grants EARS , EAGER MURI Grant No. AFOSR FA Lisbon, Portugal 1 September 1, 2014
2 Big Data: A growing torrent Source: McKinsey Global Institute, Big Data: The next frontier for innovation, competition, and productivity, May
3 Big Data: Capturing its value Source: McKinsey Global Institute, Big Data: The next frontier for innovation, competition, and productivity, May
4 Big Data and NetSci analytics Online social media Internet Clean energy and grid analytics Robot and sensor networks Biological networks Square kilometer array telescope Desiderata: process, analyze, and learn from large pools of network data 4
5 Challenges BIG Sheer volume of data Distributed and parallel processing Security and privacy concerns Modern massive datasets involve many attributes Parsimonious models to ease interpretability Enhanced predictive performance Fast Realtime streaming data Online processing Quickrough answer vs. slowaccurate answer? Outliers and misses Robust imputation algorithms Good news: Ample research opportunities arise! Messy 5
6 Opportunities Big tensor data models and factorizations Highdimensional statistical SP Network data visualization Theoretical and Statistical Foundations of Big Data Analytics Pursuit of lowdimensional structure Analysis of multirelational data Common principles across networks Resource tradeoffs Randomized algorithms Scalable online, decentralized optimization Algorithms and Implementation Platforms to Learn from Massive Datasets Convergence and performance guarantees Information processing over graphs Novel architectures for largescale data analytics Robustness to outliers and missing data Graph SP 6
7 SPrelevant Big Data themes K. Slavakis, G. B. Giannakis, and G. Mateos, Modeling and optimization for Big Data analytics, IEEE Signal Processing Magazine, vol. 31, no. 5, pp , Sep
8 Roadmap Context and motivation Critical Big Data tasks Encompassing and parsimonious data modeling Dimensionality reduction, data visualization Data cleansing, anomaly detection, and inference Optimization algorithms for Big Data Randomized learning Scalable computing platforms for Big Data Conclusions and future research directions 8
9 Encompassing model Background (low rank) Patterns, innovations, (co)clusters, outliers Observed data Dictionary Noise Sparse matrix Subset of observations and projection operator allow for misses Largescale data and/or h Any of unknown 9
10 Subsumed paradigms Structure leveraging criterion Nuclear norm: : singular val. of norm (With or without misses) known Compressive sampling (CS) [CandesTao 05] Dictionary learning (DL) [OlshausenField 97] Nonnegative matrix factorization (NMF) [LeeSeung 99] Principal component pursuit (PCP) [Candes etal 11] Principal component analysis (PCA) [Pearson 1901] 10
11 PCA formulations Training data Minimum reconstruction error Compression Reconstruction Maximum variance Component analysis model Solution: 11
12 Dual and kernel PCA SVD: Gram matrix Inner products Kernel (K)PCA; e.g., [ScholkopfSmola 01] Mapping to stretches data geometry Kernel trick e.g., B. Schölkopf and A. J. Smola, Learning with Kernels, Cambridge, MIT Press,
13 Multidimensional scaling Given dissimilarities or distances, identify lowdim vectors that preserve LS or KruskalShephard MDS: If s.t. Classical MDS: Solution Dual PCA: Distancepreserving downscaling of data dimension I. Borg and P. J. Groenen, Modern Multidimensional Scaling: Theory and Applications, NY, Springer,
14 Local linear embedding For each find neighborhood, e.g., knearest neighbors Weight matrix captures local affine relations Sparse and [ElhamifarVidal 11] Identify lowdimensional vectors preserving local geometry [SaulRoweis 03] Solution: The rows of are the minor, excluding, L. K. Saul and S. T. Roweis, Think globally, fit locally: Unsupervised learning of low dimensional manifolds, J. Machine Learning Research, vol. 4, pp ,
15 Application to graph visualization Undirected graph Nodes or vertices Edges or communicating pairs of nodes All nodes that communicate with node (neighborhood) Given centrality metrics Identify weights or geometry, centralityconstrained (CC) LLE Visualize Centrality constraints, e.g., node degree Only corr. or dissimilarities need be known B. Baingana and G. B. Giannakis, Embedding graphs under centrality constraints for network visualization, 2014, [Online]. Available: arxiv:
16 Visualizing the Gnutella network Gnutella: Peertopeer filesharing network CCLLE: centrality captured by node degree (08/04/2012) Processing time: sec (08/24/2012) Processing Time: sec B. Baingana and G. B. Giannakis, Embedding graphs under centrality constraints for network visualization, 2014, [Online]. Available: arxiv:
17 Dictionary learning Solve for dictionary and sparse : (Lasso task; sparse coding) (Constrained LS task) Alternating minimization; both and are convex Special case of block coordinate descent methods (BCDMs) [Tseng 01] Under certain conditions, converges to a stationary point of B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by V1? Vis. Res., vol. 37, no. 23, pp ,
18 Joint DLLLE paradigm Image inpainting by localaffinegeometrypreserving DL missing values; PSNR db K. Slavakis, G. B. Giannakis, and G. Leus, Robust sparse embedding and reconstruction via dictionary learning, In Proc. CISS, Baltimore: USA,
19 Online dictionary learning Data arrive sequentially Inpainting of 12Mpixel damaged image S1. Learn dictionary from clean patches S2. Remove text by sparse coding from clean J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online learning for matrix factorization and sparse coding, J. Machine Learning Research, vol. 11, pp , Mar
20 Union of subspaces and subspace clustering Parsimonious model for clustered data Subspace clustering (SC) Identify Bases for Datacluster associations SC as DL with blocksparse with If then SC PCA R. Vidal, Subspace clustering, IEEE Signal Processing Magazine, vol. 28, no. 2, pp , March
21 Modeling outliers Outlier variables s.t. outlier otherwise Nominal data obey ; outliers something else Linear regression [Fuchs 99], [WrightMa 10], [Giannakis et al 11] Both and unknown, underdetermined typically sparse! 21
22 Robustifying PCA Natural (sparsityleveraging) estimator Tuning parameter controls sparsity in number of outliers Q: Does (P1) yield robust estimates? A: Yap! Huber estimator is a special case Formally justifies the outlieraware model and its estimator Ties sparse regression with robust statistics G. Mateos and G. B. Giannakis, ``Robust PCA as bilinear decomposition with outlier sparsity regularization,'' IEEE Transactions on Signal Processing, pp , Oct
23 Prior art Robust covariance matrix estimators [Campbell 80], [Huber 81] Mtype estimators in computer vision [XuYuille 95], [De la TorreBlack 03] Rank minimization with the nuclear norm, e.g., [RechtFazelParrilo 10] Matrix decomposition [Candes et al 10], [Chandrasekaran et al 11] Principal Component Pursuit (PCP) Observed Low rank Sparse Singing voice separation [Huang Broadening et al 12] Face recognition [WrightMa 10] the model = + = + 23
24 Video surveillance Background modeling from video feeds [De la TorreBlack 01] Data PCA Robust PCA Outliers Data: 24
25 Big Five personality factors Measure five broad dimensions of personality traits [CostaMcRae 92] EugeneSpringfield BFI sample (WEIRD) Data: courtesy of Prof. L. Goldberg, provided by Prof. N. Waller Big Five Inventory (BFI) Shortquestionnaire (44 items) Rate 15, e.g., `I see myself as someone who is talkative is full of energy Outlier ranking Handbook of personality: Theory and research, O. P. John, R. W. Robins, and L. A. Pervin, Eds. New York, NY: Guilford Press,
26 Robust unveiling of communities Robust kernel PCA for identification of cohesive subgroups Network: NCAA football teams (vertices), Fall 00 games (edges) Partitioned graph with outliers Row/columnpermuted adjacency matrix ARI= Identified exactly: Big 10, Big 12, MWC, SEC, ; Outliers: Independent teams Data: 26
27 Load curve data cleansing and imputation Load curve: electric power consumption recorded periodically Reliable data: key to realize smart grid vision [Hauser 09] Missing data: Faulty meters, communication errors, few PMUs Outliers: Unscheduled maintenance, strikes, sport events [Chen et al 10] Lowrank nominal load profiles Sparse outliers across buses and time Approach: load cleansing and imputation via distributed RPCA G. Mateos and G. B. Giannakis, Load curve data cleansing and imputation via sparsity and low rank," IEEE Transactions on Smart Grid, pp , Dec
28 NorthWrite data Power consumption of schools, government building, grocery store ( 0510) Cleansing Imputation Outliers: Building operational transition shoulder periods Prediction error: 6% for 30% missing data (8% for 50%) Data: courtesy of NorthWrite Energy Group. 28
29 Anomalies in social networks Approach: graph data, decompose the egonet feature matrix using PCP i (Y) Low rank arxiv Collaboration Network (General Relativity) Index (i) Payoff: unveil anomalous nodes and features Outlook: change detection, macro analysis to identify outlying graphs 3regular 6regular 3regular 3regular 3regular 29
30 Modeling Internet traffic anomalies Anomalies: changes in origindestination (OD) flows [Lakhina et al 04] Failures, congestions, DoS attacks, intrusions, flooding Graph G (N, L) with N nodes, L links, and F flows (F >> L); OD flow z f,t Packet counts per link l and time slot t Anomaly f 2 l f є {0,1} Matrix model across T time slots: M. Mardani, G. Mateos, and G. B. Giannakis, Recovery of lowrank plus compressed sparse matrices with application to unveiling traffic anomalies," IEEE Transactions on Information Theory, pp Aug
31 Lowrank plus sparse matrices Z (and X:=RZ) low rank, e.g., [Zhang et al 05]; A is sparse across time and flows 4 x 108 a f,t Time index(t) (P1) Data: 31
32 Anomaly volume Internet2 data Real network data, Dec. 828, 2003 Detection probability [Lakhina04], rank=1 [Lakhina04], rank=2 0.4 [Lakhina04], rank=3 Proposed method [Zhang05], rank=1 0.2 [Zhang05], rank=2 [Zhang05], rank= False alarm probability Flows True  Estimated Time P fa = 0.03 P d = Improved performance by leveraging sparsity and low rank Succinct depiction of the network health state across flows and time Data: 32
33 Online estimator Construct an estimated map of anomalies in real time Streaming data model: Approach: regularized exponentiallyweighted LS formulation 5 Tracking cleansed link traffic ATLAHSTN 4 Real time unveiling of anomalies CHINATLA 2 Link traffic level DNVRKSCY HSTNATLA  Estimated  True Anomaly amplitude WASHSTTL WASHWASH o Estimated  True 0 Time index (t) Time index (t) M. Mardani, G. Mateos, and G. B. Giannakis, "Dynamic anomalography: Tracking network anomalies via sparsity and low rank," IEEE Journal of Selected Topics in Signal Processing, pp , Feb
34 Roadmap Context and motivation Critical Big Data tasks Optimization algorithms for Big Data Decentralized (innetwork) operation Parallel processing Streaming analytics Randomized learning Scalable computing platforms for Big Data Conclusions and future research directions 34
35 Decentralized processing paradigms Goal: Learning over networks. Why? Decentralized data, privacy Fusion Center (FC) Incremental Innetwork Limitations of FCbased architectures Lack of robustness (isolated point of failure, nonideal links) High Tx power and routing overhead (as geographical area grows) Less suitable for realtime applications Limitations of incremental processing Nonrobust to node failures (Re) routing? Hamiltonian routes NPhard to establish 35
36 Innetwork decentralized processing Network anomaly detection: spatiallydistributed link count data Agent 1 Centralized: Decentralized: Agent N Innetwork processing model: n Local processing and singlehop communications Given local link counts per agent, unveil anomalies in a decentralized fashion Challenge: not separable across rows (links/agents) 36
37 Separable rank regularization Neat identity [Srebro 05] Lxρ rank[x] Nonconvex, separable formulation equivalent to (P1) (P2) Proposition: If stat. pt. of (P2) and, then is a global optimum of (P1). Key for parallel [RechtRe 12], decentralized and online rank min. [Mardani et al 12] 37
38 Decentralized algorithm Alternatingdirection method of multipliers (ADMM) solver for (P2) Method [GlowinskiMarrocco 75], [GabayMercier 76] Learning over networks [SchizasRibeiroGiannakis 07] Consensusbased optimization Attains centralized performance Potential for scalable computing M. Mardani, G. Mateos, and G. B. Giannakis, Decentralized sparsity regularized rank minimization: Algorithms and applications," IEEE Transactions on Signal Processing, pp , Nov
39 Alternating direction method of multipliers Canonical problem Two sets of variables, separable cost, affine constraints Augmented Lagrangian ADMM One GaussSeidel pass over primal variables + dual ascent D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods,
40 Scaled form and variants Complete the squares in, define ADMM (scaled) Proximalsplitting: Variants Ex: More than two sets of variables Multiple GaussSeidel passes, or, inexact primal updates Strictly convex : S. Boyd et al, Distributed optimization and statistical learning via the alternating direction method of multipliers," Foundations and Trends in Machine Learning, vol. 3,
41 Convergence Theorem: If and have closed and convex epigraphs, and has a saddle point, then as Feasibility Objective convergence Dual variable convergence Under additional assumptions Primal variable convergence Linear convergence No results for nonconvex objectives Good empirical performance for e.g., biconvex problems M. Hong and Z. Q. Luo, On the linear convergence of the alternating direction method of multiplers," Mathematical Programming Series A,
42 Decentralized consensus optimization Generic learning problem Innetwork Local costs per agent, graph Neat trick: local copies of primal variables + consensus constraints Equivalent problems for connected graph Amenable to decentralized implementation with ADMM I. D. Schizas, A. Ribeiro, and G. B. Giannakis, Consensus in ad hoc WSNs with noisy links  Part 1: Distributed estimation of deterministic signals," IEEE Transactions on Signal Processing, pp , Jan
43 ADMM for innetwork optimization ADMM (innetwork optimization at node ) Auxiliary variables eliminated! Communication of primary variables within neighborhoods only Attractive features Fully decentralized, devoid of coordination Robust to nonideal links; additive noise and intermittent edges [Zhu et al 10] Provably convergent, attains centralized performance G. Mateos, J. A. Bazerque, and G. B. Giannakis, Distributed sparse linear regression," IEEE Transactions on Signal Processing, pp , Oct
44 Example 1: Decentralized SVM ADMMbased DSVM attains centralized performance; outperforms local SVM Nonlinear discriminant functions effected via kernels P. Forero, A. Cano, and G. B. Giannakis, ``Consensusbased distributed support vector machines,'' Journal of Machine Learning Research, pp , May
45 Example 2: RF cartography Idea: collaborate to form a spatial map of the spectrum Goal: find s.t. is the spectrum at position Approach: Basis expansion for, decentralized, nonparametric basis pursuit Identify idle bands across space and frequency J. A. Bazerque, G. Mateos, and G. B. Giannakis, ``GroupLasso on splines for spectrum cartography,'' IEEE Transactions on Signal Processing, pp , Oct
46 Parallel algorithms for BD optimization Computer clusters offer ample opportunities for parallel processing (PP) Recent software platforms promote PP Smooth loss Nonsmooth regularizer Main idea: Divide into blocks and conquer; e.g., [KimGG 11] Parallelization All blocks other than S.J. Kim and G. B. Giannakis, "Optimal Resource Allocation for MIMO Ad Hoc Cognitive Radio Networks, IEEE Transactions on Information Theory, vol. 57, no. 5, pp , May
47 A challenging parallel optimization paradigm (As1) Smooth + nonconvex Nonsmooth + convex + separable Ex. Having available, find Nonconvex in general! Computationally cumbersome! Approximate locally by s.t. (As2) Convex in Ex. Find Quadratic proximal term renders task strongly convex! 47
48 Flexible parallel algorithm FLEXA S1. In parallel, find for all blocks a solution of with accuracy S2. Theorem In addition to (As1) and (As2), if satisfy,,, and for some, then every cluster point of converges to a stationary solution of F. Facchinei, S. Sagratella, and G. Scutari, Flexible parallel algorithms for big data optimization, Proc. ICASSP, Florence: Italy, 2014 [Online] Available: arxiv.org/abs/
49 Flexa on Lasso Relative error Regression matrix of dimensions Sparse vector with only nonzero entries F. Facchinei, S. Sagratella, and G. Scutari, Flexible parallel algorithms for big data optimization, Proc. ICASSP, Florence: Italy, 2014 [Online] Available: arxiv.org/abs/
50 Streaming BD analytics Data arrive sequentially, timely response needed 400M tweets/day 500TB data/day Limits of storage and computational capability Process one new datum at a time Applications: Goal: Develop simple online algorithms with performance guarantees K. Slavakis, S. J. Kim, G. Mateos, and G. B. Giannakis, Stochastic approximation visàvis online learning for Big Data, IEEE Signal Processing Magazine, vol. 31, no. 6, Nov
51 Roadmap of stochastic approximation NewtonRaphson iteration Law of large numbers Det. variable Pdfs not available! Solve Rand. variable Stochastic optimization Adaptive filtering LMS and RLS as special cases Applications Estimation of pdfs, Gaussian mixtures maximum likelihood estimation, etc. 51
52 Numerical analysis basics Problem 1: Find root of ; i.e., NewtonRaphson iteration: select and run Popular choice of stepsize NewtonRaphson iteration Problem 2: Find min or max of with gradient method Stochastic counterparts of these deterministic iterations? 52
53 RobbinsMonro algorithm RobbinsMonro (RM) iteration estimated onthefly by sample averaging Find a root of using online estimates and Q: How general can the class of such problems be? A: As general as adaptive algorithms are in: SP, communications, control, machine learning, pattern recognition Function and data pdfs unknown! H. Robbins and S. Monro, A stochastic approximation method, Annals Math. Statistics, vol. 22, no. 3, pp ,
54 Convergence analysis Theorem If then RM converges in the m.s.s., i.e., (As1) ensures unique (two lines with +tive slope cross at ) (As2) is a finite variance requirement Diminishing stepsize (e.g., ) Limit does not oscillate around (as in LMS) But convergence ( ) should not be too fast I.i.d. and (As1) assumptions can be relaxed! H. Kushner and G. G. Yin, Stochastic Approximation and Recursive Algorithms and Applications, Second ed., Springer,
55 Online learning and SA Given basis functions, learn the nonlinear function by, and training data Solve Normal equations Batch LMMSE solution: RM iteration: [least meansquares (LMS)] 55
56 RLS as SA Scalar step size 1 st order iteration Matrix step size 2 nd order iteration Usually unavailable! : Sample correlation : RLS 56
57 SA visavis stochastic optimization Find Online convex optimization utilizes projection onto the convex RM iteration: Strong convexity Theorem Under (As1), (As2) with i.i.d. and with Corollary If is Lipschitz, then A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM J. Optimization, vol. 19, No. 4. pp , Jan
58 Online convex optimization Motivating example: Website advertising company Spends advertising on different sites Obtain clicks at the end of week Online learning: A repeated game between learner and nature (adversary) For OCO Learner updates Nature provides a convex loss Learner suffers loss 58
59 Regret as performance metric Def. 1 Def. 2 (Worst case regret) Goal: Select s.t. Sublinear! How to update? 59
60 Online gradient descent OGD Let and for Choose a stepsize Update and a (sub)gradient Theorem For convex, it holds that If is Lipschitz cont. w.r.t., and s.t., then for, and with, regret is sublinear Unlike SA, no stochastic assumptions; but also no primal var. convergence S. ShalevShwartz, Online learning and online convex optimization, Found. Trends Machine Learn., vol. 4, no. 2, pp ,
61 Online mirror descent OMD Given a function,, then for Update via Ex. 1 Online gradient descent! Ex. 2 Negative entropy Exponentiated gradient! S. ShalevShwartz, Online learning and online convex optimization, Found. Trends Machine Learn., vol. 4, no. 2, pp ,
62 OMD regret Def. is called strongly convex w.r.t., if s.t. Theorem If is stongly convex w.r.t. a norm, is Lipschitz cont. w.r.t. the same norm, and s.t., then If in addition and, then S. ShalevShwartz, Online learning and online convex optimization, Found. Trends Machine Learn., vol. 4, no. 2, pp ,
63 Gradientfree OCO Setting: Form of is unknown; only samples can be obtained Main idea: Obtain unbiased gradient estimate onthefly Def. Given, define a smoothed version of as : uniform distribution over Unit ball Theorem It holds that If is Lipschitz continuous, then If is differentiable, then Unit sphere A. D. Flaxman, A. T. Kalai, and H. B. McMahan, "Online convex optimization in the bandit setting: Gradient descent without a gradient," Proc. ACMSIAM Symp. Discrete Algorithms,
64 When OCO meets SA For, S1. Find S2. Pick, and form S3. Let S4. Theorem If are convex, Lipschitz continuous, and is convex, with and, then In particular, if and, then regret is bounded by Compare in the agnostic vs in the full information case A. D. Flaxman, A. T. Kalai, and H. B. McMahan, "Online convex optimization in the bandit setting: Gradient descent without a gradient," Proc. ACMSIAM Symp. Discrete Algorithms,
65 Roadmap Context and motivation Critical Big Data tasks Optimization algorithms for Big Data Randomized learning Randomized linear regression, leverage scores JohnsonLindenstrauss lemma Randomized classification and clustering Scalable computing platforms for Big Data Conclusions and future research directions 65
66 Randomized linear algebra Basic tools: Random sampling and random projections Attractive features Reduced dimensionality to lower complexity with Big Data Rigorous error analysis at reduced dimension Ordinary leastsquares (LS) Given If SVD LSoptimal prediction SVD incurs complexity. Q: What if? M. W. Mahoney, Randomized Algorithms for Matrices and Data, Foundations and Trends In Machine Learning, vol. 3, no. 2, pp , Nov
67 Randomized linear regression Key idea: Sample and rescale (down to ) important observations Diagonal rescaling matrix Random sampling matrix Random sampling: For Pick th row (entry) of w.p. Q: How to determine 67
68 Statistical leverage scores Prediction error, where and Large Small contribution of to prediction error! Definition: Statistical leverage scores (SLS) Sample rows (entries) of according to Theorem For any, if, then w.h.p. condition number of ; and SLS complexity M. W. Mahoney, Randomized Algorithms for Matrices and Data, Foundations and Trends In Machine Learning, vol. 3, no. 2, pp , Nov
69 JohnsonLindenstrauss lemma The workhorse for proofs involving random projections JL lemma: If, integer, and reduced dimension satisfies then for any there exists a mapping s.t. Almost preserves pairwise distances! ( ) If with i.i.d. entries of and reduced dimension, then ( ) holds w.h.p. [IndykMotwani'98] If with i.i.d. uniform over {+1,1} entries of and reduced dimension as in JL lemma, then holds w.h.p. [Achlioptas 01] ( ) W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz maps into a Hilbert space, Contemp. Math, vol. 26, pp ,
70 Bypassing cumbersome SLS Preconditioned LS estimate Hadamard matrix Random diagonal matrix with Select reduced dimension Sample/rescale via with uniform pdf Complexity lower than incurred by SLS computation N. Ailon and B. Chazelle, The fast JohnsonLindenstrauss transform and approximate nearest neighbors, SIAM Journal on Computing, 39(1): ,
71 Testing randomized linear regression M. W. Mahoney, Randomized Algorithms for Matrices and Data, Foundations and Trends In Machine Learning, vol. 3, no. 2, pp , Nov
72 Big data classification Support vector machines (SVM): The workhorse for linear discriminant analysis Data: Goal: Separating hyperplane with labels to max betweenclass margin Binary classification Primal problem Margin: 72
73 Randomized SVM classifier Dual problem Random projections approach Given, let Solve the transformed dual problem S. Paul, C. Boutsidis, M. MagdonIsmail, P. Drineas, Random projections for linear support vector machines, ACM Trans. Knowledge Discovery from Data,
74 Performance analysis Theorem Given, accuracy,, and there exists such that if and are SVMoptimal margins corresponding to and, respectively, then w.p. Lemma VapnikChernovenkis dimension almost unchanged! Outofsample error kept almost unchanged in the reduceddimension solution S. Paul, C. Boutsidis, M. MagdonIsmail, P. Drineas, Random projections for linear support vector machines, ACM Trans. Knowledge Discovery from Data, to appear,
75 Testing randomized SVM 295 datasets from Open Directory Project (ODP) Each dataset includes = 150 to 280 documents = 10,000 to 40,000 words (features) Binary classification Performance in terms of Outofsample error ( ) Margin ( ) Projection and run time ( and ) Insample error matches for full/reduced dimensions and improves as increases S. Paul, C. Boutsidis, M. MagdonIsmail, P. Drineas, Random projections for linear support vector machines, ACM Trans. Knowledge Discovery from Data, to appear,
76 Big data clustering Given with assign them to clusters Key idea: Reduce dimensionality via random projections Desiderata: Preserve the pairwise data distances in lower dimensions Feature extraction Construct combined features (e.g., via ) Apply Kmeans to space Feature selection Select of input features (rows of ) Apply Kmeans to space 76
77 Randomized Kmeans Cluster with centroid Datacluster association matrix Kmeans criterion: Def. approximation algorithm: Any algorithm yielding s.t. w.h.p. Given a projection matrix and C. Boutsidis, A. Zouzias, M. W. Mahoney, and P. Drineas, Randomized dimensionality reduction for Kmeans clustering, arxiv: ,
78 Feature extraction via random projections Theorem Given number of clusters, accuracy, and, construct i.i.d. Bernoulli with. Then, any approximation means on, yielding, satisfies w.h.p. Complexity: Related results [Boutsidis et al 14] For, ( : principal comp. of ); compl. Approx. by rand. alg. ; complexity C. Boutsidis, P. Drineas, and M. MagdonIsmail, Nearoptimal columnbased matrix reconstruction, SIAM Journal of Computing,
79 Feature selection via random projections Given select features by constructing S1. S2. S3. S3a. Leverage scores! S3b. and for Pick w.p. Complexity: S4. Theorem Any approximation means on, yielding, satisfies w.p. C. Boutsidis, A. Zouzias, M. W. Mahoney, and P. Drineas, Randomized dimensionality reduction for Kmeans clustering, arxiv: ,
80 Random projection matrices Random projection matrix Reduced dimension Projection cost I.i.d. random Gaussian (RG) Random sign matrix (RS) Random Hadamard transform (FHT) Sparse embedding matrix* [Clarkson, Woodruff 13] * Exploits sparsity in data matrix ** Number of nonzero entries in Polynomial complexity w.r.t. * K. L. Clarkson and D. P. Woodruff, Low rank approximation and regression in input sparsity time, arxiv: v4,
81 Testing randomized clustering T = 1,000, D = 2,000, and K = 5 well separated clusters C. Boutsidis, A. Zouzias, M. W. Mahoney, and P. Drineas, Randomized dimensionality reduction for Kmeans clustering, arxiv: ,
82 Roadmap Context and motivation Critical Big Data tasks Optimization algorithms for Big Data Randomized learning Scalable computing platforms for Big Data MapReduce platform Leastsquares and kmeans in MapReduce Graph algorithms Conclusions and future research directions 82
83 Parallel programming issues Challenges Load balancing; communications Scattered data; synchronization (deadlocks, readwrite racing) General vs applicationspecific implementations Efficient vs development overhead tradeoffs Choosing the right tool is important! 83
84 MapReduce General parallel programming (PP) paradigm for largescale problems Easy parallelization at a high level Synchronization, communication challenges are taken care of Initially developed by Google Processing divided in two stages (S1 and S2) S1 (Map): PP for portions of the input data S2 (Reduce): Aggregation of S1 results (possibly in parallel) Example Reduce Map per processor 84
85 Basic ingredients of MapReduce All data are in the form <key,value> e.g., key = machine, value = 5 MapReduce model consists of [LinDryer 10] Input keyvalue pairs Mappers Programs applying a Map operation to a subset of input data Run same operation in parallel Produce intermediate keyvalue pairs Reducers Programs applying a Reduce operation (e.g., aggregation) to a subset of intermediate keyvalue pairs with the same key Reducers execute same operation in parallel to generate output keyvalue pairs Execution framework Transparent to the programmer; performs tasks, e.g., sync and com J. Lin and C. Dryer, DataIntensive Text Processing with MapReduce, Morgan & Claypool Pubs.,
86 Running MapReduce Each job has three phases Input Input Input Phase 1: Map Receive input keyvalue pairs to produce intermediate keyvalue pairs Phase 2: Shuffle and sort Intermediate keyvalue pairs sorted; sent to proper reducers Handled by the execution framework Single point of communication Slowest phase of MapReduce Phase 3: Reduce Receive intermediate keyvalue pairs the final ones All mappers must have finished for reducers to start! Shuffle Map Map Map Reduce Output Reduce 86
87 The MapReduce model Map phase Aggregates intermediat e keyvalue pairs of each mapper Reduce phase Assign intermediate keyvalue pairs to reducers e.g. key mod #reducers 87
88 Leastsquares in MapReduce Given find Map Phase Mappers (here 3) calculate partial sums Reduce Phase Reducer aggregates partial sums Computes Finds Aggregate partial sums Calculate C. Chu, S. K. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, K. Olokotun, MapReduce for machine learning on multicore, pp , NIPS,
89 Kmeans in MapReduce Serial kmeans Pick K random initial centroids Iterate:  Assigning points to cluster of closest centroid  Updating centroids by averaging data per clusterning points to cluster of closest centroid  Updating centroids by averaging data per cluster A central unit picks K random initial centroids Each iteration is a MapReduce job (multiple required!) MapReduce kmeans Mappers Each assigned a subset of data Find closest centroid, partial sum of data, and #data per cluster Generate keyvalue pairs; key = cluster, value = partial sum, #points Reducers Each assigned a subset of clusters Aggregate partial sums for corresponding clusters Update centroid location : centroids : Mapper 1 : Mapper 2 : Reducer 1 : Reducer 2 89
90 Graph algorithms Require communication between processing nodes across multiple iterations; MapReduce not ideal! Pregel Based on Bulk Synchronous Parallel (BSP) model Developed by Google for use on large graph problems Each vertex is characterized by a userdefined value Each edge is characterized by source vertex, destination vertex and a userdefined value Computation is performed in Supersteps A userdefined function is executed on each vertex (parallel) Messages from previous superstep are received Messages for the next superstep are sent Supersteps are separated by global synchronization points Superstep i Vertex 1 Vertex 2 Vertex 3 Superstep i+1 Vertex 1 Vertex 2 Vertex 3 More efficient than MapReduce! Sync barrier G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, Pregel: A system for largescale graph processing, SIGMOD Indianapolis,
91 Wrapup for scalable computing MapReduce Simple parallel processing Code easy to understand Great for batch problems Applicable to many problems However Framework can be restrictive Not designed for online algorithms Inefficient for graph processing Pregel Tailored for graph algorithms More efficient relative to MapReduce Available open source implementations Hadoop (MapReduce) Hama (Pregel) Giraph (Pregel) Machine learning algorithms can be found at Mahout! 91
92 Roadmap Context and motivation Critical Big Data tasks Optimization algorithms for Big Data Randomized learning Scalable computing platforms for Big Data Conclusions and future research directions 92
93 Tutorial summary Big Data modeling and tasks Dimensionality reduction Succinct representations Vectors, matrices, and tensors Optimization algorithms Decentralized, parallel, streaming Data sketching Implementation platforms Scalable computing platforms Analytics in the cloud 93
94 Timeliness Special sessions in recent IEEE SP Society / EURASIP meetings Signal and Information Processing for Big Data Challenges in High Dimensional Learning Information Processing over Networks Trends in Sparse Signal Processing Sparse Signal Techniques for Web Information Processing SP for Big Data Optimization Algorithms for HighDimensional SP Tensorbased SP Information Processing for Big Data 94
95 Importance to funding agencies NSF, NIH, DARPA, DoD, DoE, and US Geological Survey Sample programs: NSF (BIGDATA) DARPA ADAMS (Anomaly Detection on Multiple Scales) DoD: 20 + solicitations Data to decisions Autonomy Humanmachine systems Source: 95
96 NSFECCS sponsored workshop NSF Workshop on Big Data From Signal Processing to Systems Engineering March 21 22, 2013 Arlington, Virginia, USA Workshop program and slides Workshop final report Sponsored by Electrical, Communications and Cyber Systems (ECCS) 96
97 Open JSTSP special issue Special issue on Signal Processing for Big Data IEEE Journal of Selected Topics in Signal Processing Guest editors Georgios B. Giannakis University of Minnesota, USA Raphael Cendrillon Google, USA Volkan Cevher EPFL, Switzerland Ananthram Swami Army Research Laboratory, USA Zhi Tian Michigan Tech Univ. and NSF, USA Full manuscripts due by July 1, 2014 Sept. 14 issue of the IEEE Signal Processing Magazine (also on SP for Big Data) 97
Learning Tools for Big Data Analytics
Learning Tools for Big Data Analytics Georgios B. Giannakis Acknowledgments: Profs. G. Mateos and K. Slavakis NSF 1343860, 1442686, and MURIFA95501010567 Center for Advanced Signal and Image Sciences
More informationRobust and Scalable Algorithms for Big Data Analytics
Robust and Scalable Algorithms for Big Data Analytics Georgios B. Giannakis Acknowledgment: Drs. G. Mateos, K. Slavakis, G. Leus, and M. Mardani Arlington, VA, USA March 22, 2013 1 Roadmap n Robust principal
More informationSignal Processing Tools for. Big Data Analy5cs. G. B. Giannakis, K. Slavakis, and G. Mateos. Call for Papers. Nice, France Location and venue
Signal Processing Tools for General chairs Big Data Analy5cs JeanLuc Dugelay, EURECOM Dirk Slock, EURECOM Technical chairs Marc Antonini, I3S/UNS/CNRS Nicholas Evans, EURECOM Cédric Richard, UNS/OCA Plenary
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationBig Data  Lecture 1 Optimization reminders
Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? Web data (web logs, click histories) ecommerce applications (purchase histories) Retail purchase histories
More informationComparison of Nonlinear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Nonlinear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Nonlinear
More informationCheng Soon Ong & Christfried Webers. Canberra February June 2016
c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationNimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff
Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear
More informationRandomized Robust Linear Regression for big data applications
Randomized Robust Linear Regression for big data applications Yannis Kopsinis 1 Dept. of Informatics & Telecommunications, UoA Thursday, Apr 16, 2015 In collaboration with S. Chouvardas, Harris Georgiou,
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationRegression Using Support Vector Machines: Basic Foundations
Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering
More informationMachine learning and optimization for massive data
Machine learning and optimization for massive data Francis Bach INRIA  Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Eric Moulines  IHES, May 2015 Big data revolution?
More informationGraph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
More informationBeyond stochastic gradient descent for largescale machine learning
Beyond stochastic gradient descent for largescale machine learning Francis Bach INRIA  Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt  ECMLPKDD,
More informationBig learning: challenges and opportunities
Big learning: challenges and opportunities Francis Bach SIERRA Projectteam, INRIA  Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,
More informationAnalysis of MapReduce Algorithms
Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 4089241000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP  Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationMapReduce for Machine Learning on Multicore
MapReduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers  dual core to 12+core Shift to more concurrent programming paradigms and languages Erlang,
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationCS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationSee All by Looking at A Few: Sparse Modeling for Finding Representative Objects
See All by Looking at A Few: Sparse Modeling for Finding Representative Objects Ehsan Elhamifar Johns Hopkins University Guillermo Sapiro University of Minnesota René Vidal Johns Hopkins University Abstract
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationBig Data Analytics in Future Internet of Things
Big Data Analytics in Future Internet of Things Guoru Ding, Long Wang, Qihui Wu College of Communications Engineering, PLA University of Science and Technology, Nanjing 210007, China email: dingguoru@gmail.com;
More informationLargeScale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 59565963 Available at http://www.jofcis.com LargeScale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationMapReduce/Bigtable for Distributed Optimization
MapReduce/Bigtable for Distributed Optimization Keith B. Hall Google Inc. kbhall@google.com Scott Gilpin Google Inc. sgilpin@google.com Gideon Mann Google Inc. gmann@google.com Abstract With large data
More informationNonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning
Nonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationPrincipal Component Analysis Application to images
Principal Component Analysis Application to images Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRALab,
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationBig Data, Statistics, and the Internet
Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes
More informationComparision of kmeans and kmedoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of kmeans and kmedoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
More informationStochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA  Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt  April 2013 Context Machine
More informationParallel Selective Algorithms for Nonconvex Big Data Optimization
1874 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 7, APRIL 1, 2015 Parallel Selective Algorithms for Nonconvex Big Data Optimization Francisco Facchinei, Gesualdo Scutari, Senior Member, IEEE,
More informationParallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d dimensional subspace Axes of this subspace
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationFace Recognition using Principle Component Analysis
Face Recognition using Principle Component Analysis Kyungnam Kim Department of Computer Science University of Maryland, College Park MD 20742, USA Summary This is the summary of the basic idea about PCA
More informationADVANCED LINEAR ALGEBRA FOR ENGINEERS WITH MATLAB. Sohail A. Dianat. Rochester Institute of Technology, New York, U.S.A. Eli S.
ADVANCED LINEAR ALGEBRA FOR ENGINEERS WITH MATLAB Sohail A. Dianat Rochester Institute of Technology, New York, U.S.A. Eli S. Saber Rochester Institute of Technology, New York, U.S.A. (g) CRC Press Taylor
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationSemiSupervised Support Vector Machines and Application to Spam Filtering
SemiSupervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationBig Data Optimization: Randomized lockfree methods for minimizing partially separable convex functions
Big Data Optimization: Randomized lockfree methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh)
More informationDetecting Network Anomalies. Anant Shah
Detecting Network Anomalies using Traffic Modeling Anant Shah Anomaly Detection Anomalies are deviations from established behavior In most cases anomalies are indications of problems The science of extracting
More informationlargescale machine learning revisited Léon Bottou Microsoft Research (NYC)
largescale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
More informationIntroduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones
Introduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationData Clustering. Dec 2nd, 2013 Kyrylo Bessonov
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms kmeans Hierarchical Main
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
More informationOn Covariance Structure in Noisy, Big Data
On Covariance Structure in Noisy, Big Data Randy C. Paffenroth a, Ryan Nong a and Philip C. Du Toit a a Numerica Corporation, Loveland, CO, USA; ABSTRACT Herein we describe theory and algorithms for detecting
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
More informationFederated Optimization: Distributed Optimization Beyond the Datacenter
Federated Optimization: Distributed Optimization Beyond the Datacenter Jakub Konečný School of Mathematics University of Edinburgh J.Konecny@sms.ed.ac.uk H. Brendan McMahan Google, Inc. Seattle, WA 98103
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationGoing Big in Data Dimensionality:
LUDWIG MAXIMILIANS UNIVERSITY MUNICH DEPARTMENT INSTITUTE FOR INFORMATICS DATABASE Going Big in Data Dimensionality: Challenges and Solutions for Mining High Dimensional Data Peer Kröger Lehrstuhl für
More informationprinceton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora Scribe: One of the running themes in this course is the notion of
More informationLecture Topic: LowRank Approximations
Lecture Topic: LowRank Approximations LowRank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original
More informationData Mining Project Report. Document Clustering. Meryem UzunPer
Data Mining Project Report Document Clustering Meryem UzunPer 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. Kmeans algorithm...
More informationBilinear Prediction Using LowRank Models
Bilinear Prediction Using LowRank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with CJ.
More informationAlgorithmic Techniques for Big Data Analysis. Barna Saha AT&T LabResearch
Algorithmic Techniques for Big Data Analysis Barna Saha AT&T LabResearch Challenges of Big Data VOLUME Large amount of data VELOCITY Needs to be analyzed quickly VARIETY Different types of structured
More informationMODULE 15 Clustering Large Datasets LESSON 34
MODULE 15 Clustering Large Datasets LESSON 34 Incremental Clustering Keywords: Single Database Scan, Leader, BIRCH, Tree 1 Clustering Large Datasets Pattern matrix It is convenient to view the input data
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationSpectral Methods for Dimensionality Reduction
Spectral Methods for Dimensionality Reduction Prof. Lawrence Saul Dept of Computer & Information Science University of Pennsylvania UCLA IPAM Tutorial, July 1114, 2005 Dimensionality reduction Question
More informationBIG DATA PROBLEMS AND LARGESCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION
BIG DATA PROBLEMS AND LARGESCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli
More informationLargeScale Similarity and Distance Metric Learning
LargeScale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March
More informationMethodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph
More informationClassification On The Clouds Using MapReduce
Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal simao.martins@tecnico.ulisboa.pt Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal claudia.antunes@tecnico.ulisboa.pt
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationSoftware tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia Antipolis SCALE (exoasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia Antipolis SCALE (exoasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More informationIEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 7, JULY 2009 1181
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 7, JULY 2009 1181 The Global Kernel kmeans Algorithm for Clustering in Feature Space Grigorios F. Tzortzis and Aristidis C. Likas, Senior Member, IEEE
More informationManaging Incompleteness, Complexity and Scale in Big Data
Managing Incompleteness, Complexity and Scale in Big Data Nick Duffield Electrical and Computer Engineering Texas A&M University http://nickduffield.net/work Three Challenges for Big Data Complexity Problem:
More informationConcepts in Machine Learning, Unsupervised Learning & Astronomy Applications
Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications ChingWa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers
More informationGreedy Column Subset Selection for Largescale Data Sets
Knowledge and Information Systems manuscript No. will be inserted by the editor) Greedy Column Subset Selection for Largescale Data Sets Ahmed K. Farahat Ahmed Elgohary Ali Ghodsi Mohamed S. Kamel Received:
More informationTOWARD BIG DATA ANALYSIS WORKSHOP
TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.0506 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)
More informationApache Hama Design Document v0.6
Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 3448 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationBig Data Science. Prof. Lise Getoor University of Maryland, College Park. http://www.cs.umd.edu/~getoor. October 17, 2013
Big Data Science Prof Lise Getoor University of Maryland, College Park October 17, 2013 http://wwwcsumdedu/~getoor BIG Data is not flat 20042013 lonnitaylor Data is multimodal, multirelational, spatiotemporal,
More informationStochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA  Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt  April 2013 Context Machine
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationMachine learning challenges for big data
Machine learning challenges for big data Francis Bach SIERRA Projectteam, INRIA  Ecole Normale Supérieure Joint work with R. Jenatton, J. Mairal, G. Obozinski, N. Le Roux, M. Schmidt  December 2012
More informationELECE8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems
Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed
More informationCognitive Radio Network as Wireless Sensor Network (II): Security Consideration
Cognitive Radio Network as Wireless Sensor Network (II): Security Consideration Feng Lin, Zhen Hu, Shujie Hou, Jingzhi Yu, Changchun Zhang, Nan Guo, Michael Wicks, Robert C Qiu, Kenneth Currie Cognitive
More information10810 /02710 Computational Genomics. Clustering expression data
10810 /02710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intracluster similarity low intercluster similarity Informally,
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationAdvanced Data Science on Spark
Advanced Data Science on Spark Reza Zadeh @Reza_Zadeh http://rezazadeh.com Data Science Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in
More information