Compressed Sensing of Correlated Social Network Data

Size: px
Start display at page:

Download "Compressed Sensing of Correlated Social Network Data"

Transcription

1 Compressed Sensing of Correlated Social Network Data Abstract In this paper, we present preliminary work on compressed sensing of social network data and identify its non-trivial sparse structure. basis learnt from data samples and the inherent sparse structure induced by graphs are combined into a tensor-based sparse representation of high-dimensional social network data. By solving the Compressed Sensing Problem, we could effectively recover correlated social data from a few samples. An efficient greedy algorithm, called Matching Pursuit (TMP), is also proposed to handle the computational intractability with big data. We severely test our algorithm and implementations over social network datasets like Twitter, Facebook and Weibo. The results show that our approach robustly outperforms the baseline algorithms and hence in a sense captures the sparsity of social network data better. 1. Introduction This paper considers the data gathering problem of large-scale social networks, where each user is modeled with a vector of features like education, hobby, opinion, etc. Such social networks as Twitter, Weibo are pervasive nowadays and involved extensively with network analysis, data mining (Russell, 2011), machine learning. In general, social network data is acquired in an accumulative fashion with user profiles and features collected and stored independently. As far as we know, however, social network data is far from independent (Anagnostopoulos et al., 2008). We, therefore, ask the fundamental question: can one essentially reduce the number of samples required for social network analysis with hopefully little accuracy tradeoff? The answer lies in the haystack of data correlations Network Science Term Project. work. Do not distribute. Unpublished preliminary in social networks. In particular, two representative types of correlations are frequently encountered, which we highlight as follows: Social Correlation. Your friend and you tend to like the same TV show. Social correlation, such as social influence (Anagnostopoulos et al., 2008) or information cascading (Easley & Kleinberg, 2010), characterizes the coordination of people s behavior and features over a connected graph component. Correlation. Being a Geek is being cool. The high-dimensional feature of each entity could have correlations among multiple dimensions. In contemporary signal processing and machine learning, the silver bullet to recover correlated data is compressed sensing (Candès & Wakin, 2008). This paper presents our preliminary attempt to apply compressed sensing techniques to social network data gathering. Our general idea is depicted in Figure. Instead of collecting the feature vectors of the entire data, we could just randomly sample a subset of the feature entries and later use sparse recovery algorithms to acquire the rest. Two major contributions are made in this paper. First, a novel sparse representation that simultaneously deals with both social and feature correlations is introduced by exploring the idea of hierarchical sparse modeling. Second, an efficient implementation of the sparse recovery algorithm under the new representation is devised that allows for fast solving of our optimization problem. The algorithms are extensively tested on several datasets and the results show the advantage of our approach against baseline algorithms. The rest of the paper is organized as follows. Section 2 reviews related work. Section 3 establishes the compressed sensing framework for social graphs with various types of data gathering strategies. Section 4 introduces the combination of feature basis learning and diffusion wavelets in a hierarchical style for sparse representation of social networks. Section 5 presents two efficient implementations of our proposed algorithm. Section 6 demonstrates the test results on three

2 datasets. Finally, Section 7 concludes this paper and discusses potential future works. 2. Related Works Over the past decades, network science research is becoming increasingly popular. The mainstream splits into two parts: network modeling (e.g. small world model (Newman, 2000) and network analysis (e.g. the link prediction problem (Liben-Nowell & Kleinberg, 2007). These research share a common feature: dealing with big data. Kwak et al. crawled the entire twitter network and gathered the data for analysis (Kwak et al., 2010). The field of compressed sensing grows out of the work by Candes, Romberg, Tao and Donoho(Donoho, 2006). Recent years have witnessed the advancement of compressed sensing in both theory and application. Baraniuk gives a simple proof the RIP property of random matrices (Baraniuk et al., 2008). Candes et al. considers signal recovery from highly incomplete information (Candès et al., 2006). Compressed sensing has find great applications in multiple fields like image processing, signal processing, audio analysis and sensor networks. The first application of compressed sensing technique owes back to Luo et al.(luo et al., 2009). They construct a sensor network with a sink collecting compressed measurements, which is equivalent to a random matrix projection. Xu et al.(liwen, 2013) considers more general compressed sparse functions for sparse representation of signals over graph. In terms of sparse representations, dictionary learning originates from efforts in reproducing V1-like visual neurons through sparse coding (Lee et al., 2007). Marial et al. proposed online dictionary learning methods, which leads to efficient computation of sparse coding (Mairal et al., 2009). Diffusion wavelet is first invented by Maggioni in 2004 (Mahadevan & Maggioni, 2006) and has found its application in compressed sensing over graphs (Coates et al., 2007). The typical numerical approach to solving compressed sensing problem is through l 1 minimization. Nevertheless, several greedy pursuit algorithms have been proposed. The pursuit algorithm could date back to 1974 (Friedman & Tukey, 1974). The most classic Matching Pursuit algorithm is proposed by Matllat and Zhang in 1993 (Mallat & Zhang, 1993). There is now a family of greedy pursuit algorithms and some theoretical guarantees have been established (Lu & Do, 2008). 3. Compressed Social Data Gathering Compressed Sensing, also known as Compressive Sensing, is a signal processing technique for efficiently acquiring and reconstructing a signal by finding solutions to underdetermined linear systems (Candès & Wakin, 2008). m = MΦ x (1) This takes advantage of the signal s sparseness or compressibility (representable by a few non-zero coefficients x) in some domain Φ, allowing the entire signal Φ x to be determined from relatively few measurements m. A compressed data gathering scheme should cleverly devise the measurement matrix so as to maximize information expressiveness. We propose two appealing measurement schemes for compressed social data gathering in this section Sampling According to traditional theory of compressed sensing, we need to choose at least Ck log n/k samples for the ideal recovery of k-sparse signals, where c is a small constant. In the setting of graph compressed sensing, we hence need to select Θ(k log n/k) feature entries at various vertices to achieve compressed sampling. For simplicity, we just sample the data uniformly at random Message Propagation On social networks, messages such as tweets and microblogs are propagated along the edges from vertex to vertex. When a message m passes a vertex v (represents a client or a person), the property of m could influenced by the property of the vertex v. Measuring the changes on a message may help us to learn more about the graph s features, and thus in a way viewed as measuring information during the compressed sensing process. Suppose that each vertex u has a high-dimensional feature vector v u. Also, for each message m spread in the network, there is a corresponding vector v m R l that implies the properties of this message. v m could be changed in propagation. Since traditional compressed sensing considers problem settings with linear transformations, we can assume that v m is changed linearly according the the weight of the edge as it passes the edge. More precisely, suppose for each edge e = (s, t), there is a vector α e R l that represents the weigh value of different properties of this edge, then we can

3 assume that v mi = v si + α i v mi 1 + α i, 1 i l. (2) With this assumption, we measure social networks through paths of messages propagation. Measuring the difference of the messages s property vector after it passes an edge, and from that we can construct a set of linear equations for solving the properties of the vertices. However, what limits the use of this paradigm is how to properly set the edge weights, which is a strong prior information is somehow restrictive. Therefore, we consider a simpler, applied way for gathering the network data by sampling. 4. Sparse Recovery In this section, we address a fundamental issue encountered in sparse recovery of gathered social network data. Social network data could be big in two dimensions: rows (i.e. number of entities on graph) and columns (i.e. number of features for each entity). Since correlation could happen in both dimensions, it is unclear how one could find a suitable way to sparsely represent the data. There are basically two straightforward paradigms to identify a sparse representation of data on social graphs. One trivial approach would simply ignore network structure and adopt the dictionary learning methods (Lee et al., 2007; Mairal et al., 2009) to find a basis, under which feature vectors of entities {v i } n would have sparse coefficients. This approach overlooks most significant role of social ties as shown in previous works (Anagnostopoulos et al., 2008), which could possibly lead to sparser representation. The other paradigm follows preliminary work by Coates et al. (Coates et al., 2007) as well as Xu et al. (Liwen, 2013), both of which in different ways consider sparse decomposition of signal values with respect to some specific functional basis on sensor networks. Although it suggests a direct way to model social correlation, it does not generalize naturally to high-dimensional social vector graphs with feature correlations. We could in principle concatenate features of all entities {v i } n into a large vector so as to capture both feature and topological correlations, but neither the computational burden nor the data size would be tractable. To overcome these shortcomings, we propose a novel construction of sparse basis functions for highdimensional social graph data by combining the above two paradigms in a reciprocal way Basis Identifying sparse feature basis is a learning problem given κ statistically representative feature samples of network entities x i1, x i2,..., x iκ R d, we seek a set of vectors b 1, b 2,..., b mb R d, under which the samples could be sparsely represented by L d non-zero coefficients α i1, α i2,..., α iκ and this property could be generalized to the entire data universe. Mathematically, it can be formalized as a joint optimization problem: 1 min B,α κ κ ( 1 2 x i j Bα ij 2 + λ α ij 1 ) (3) j=1 where basis b i is settled at the i-th column of matrix B and λ is a regularization parameter for sparsity. This problem can be solved efficiently with secondorder gradient descent(lee et al., 2007), online learning methods (Mairal et al., 2009), etc Diffusion transforms are a staple of modern compression and signal processing methods due to their ability of representing piece-wise smooth signals efficiently (signals which are smooth everywhere, except for a few discontinuities). In general, wavelet transform produce a multi-scaling function decomposition defined on regularly sampled interval. Ronald R. Coifman and Mauro Maggioni (Ronald R. Coifman, 2006) introduces Diffusion wavelets specifically. They start from a semi-group of diffusion operators {T t }, associated to a diffusion process, to induce a multi-resolution analysis, interpreting the powers of T as dilation operators acting on functions, and constructing precise down sampling operators to efficiently represent the multi-scale structure. This yields a construction of multi-scale scaling functions and wavelets in a very general setting. Our goal for the Diffusion transformation is to compute a collection {B i } n of orthonormal wavelets basis vectors. Then, A function y on the graph can be written as y = n β i B i where β i is the i-th wavelet coefficient (Mark Coates & Rabbat, 2007).

4 The process for computing the basis need a sparse QR decomposition. Here given the input A (a sparse n n matrix and a precision indicator ɛ, the sparse QR decomposition returns a n n orthogonal matrix Q and an upper triangular matrix R where A = ɛ QR. To compute the orthonormal bases of scaling functions, Φ j, wavelets Ψ, the algorithm works as follows: Algorithm 1 Diffusion Construction (Ronald R. Coifman, 2006) 1: j 0 2: while j < N do 3: [Φ j+1 ] Φj, [T ] Φ1 Φ 0 4: T j+1 := [T 2j+1 ] Φj+1 Φ j+1 SpQR([T 2j ] Φj Φ j, ɛ) [Φ j+1 ] Φj [T 2j ] Φj Φ j [Φ j+1 ] Φ j 5: [Ψ j ] Φj SpQR(I <Φj> [Φ j+1 ] Φ [Φ j+1 ] Φ j, ɛ) 6: end while where [B 1 ] B2 represents the set of vectors B 1 represented on a basis B 2 ; and [L] B2 B 1 indicate the matrix representing the linear operator L with respect to the basis B 1 in the domain and B 2 in the range. The key point for this process is how to choose the initial diffusion operator T. This T must be a matrix such that T ij > 0 if and only if (i, j) E. Also, T ij s value should represents the correlation of the vertices i and j. Hence, we choose T to be the Laplacian of the graph, i.e. { 0, if(i, j) E L(i, j) = 1, if(i, j) E did j 4.3. Graph Basis (4) We propose to unify feature basis and diffusion wavelet in a hierarchical fashion. First, feature vectors associated with each node {x i } n Rd are decomposed sparsely into coefficients {α i } n i under basis B. x i = Bα i (5) Let X, A R d n, each column of which denotes vector x i or α i. Each row of A, denoted as {A i,: } n, is then a value function on the graph and we proceed by decomposing it over diffusion wavelets W. A i,: = W u i (6) Using matrix notation, the entire pipeline can be written as X = BA = BUW (7) where i-th row of the coefficient matrix U is u i. The focus of our sparse recovery algorithm is then turned to minimizing the l 1 norms u 1 1, u 2 1,..., u d 1 subject to Equation 7. minimize U d u i 1 subject to Y = M(BUW ) (8) The above optimization problem can be rewritten in the familiar form of compressed sensing, if we concatenate the columns of X and U into a long vector X, U and they are connected with the Kronecker product B W. minimize U d u i 1 (9) subject to Y = MB W U In other words, the feature basis and diffusion wavelet are combined through tensor product to produce a new basis for hierarchical sparse decomposition. 5. Efficient Implementation For networks with large number of nodes, the optimization problem (11) could be daunting due to the high dimensionality of the tensor basis. To clear away the obstacles for the applicability of our recover algorithm, we introduce in this section two approximations: Matching Pursuit and Patched-Based sparse recovery Matching Pursuit Although compressed sensing is often synonymous with l 1 -based optimization, many applications often require efficient storage and fast speed. This is especially true for the tensor version of our joint basis optimization problem (11), which might inevitably consumes up to n 2 d 2 space for d-dimensional vector graphs of n nodes. We show that these burdens are not fundamental obstacles to our sparse recovery paradigm by introducing a new greedy algorithm to tackle the optimization approximately. Our technique belongs to a large family of greedy pursuit methods used in compressed

5 sensing and generalizes the classic Matching Pursuit algorithm. Like other greedy pursuit algorithms, our Matching Pursuit (TMP) has two fundamental steps: element selection and coefficient update. In particular, the approximation is incremental: first selecting one column from the basis B at each iteration and update the coefficients associated with the column such that the residual of constraints is decreased. To derive Matching Pursuit, we need to put a weak constraints on measurements. Let M 1, M 2 be two linear operators, the following gives the slightly restricted version of our optimization problem (10). minimize U d u i 1 subject to Y = M 1 BUW M 2 (10) This restriction requires simultaneously measure the same feature dimensions and is reasonable in practice. The presented TMP method is elaborated in Algorithm 2. Algorithm 2 Matching Pursuit input Y, B, W 1: Set R [0] = Y, ˆX [0] = 0 2: for round t = 1 until stopping criterion is met do 3: Calculate projection R B = B R, R W = W R, where B and W is Moore-Penrose pseudoinverse. 4: Compute correlation matrix C B = w R B [ 1 w w 1 2,..., p w p 2 ] and C W = b 1 b p b q 2 ] where b i and w j R W [ b 1 2,..., are column vectors of B and W. 5: Select one entry e = (i, j) either C B or C W that has the highest absolute correlation value c(e). 6: if e is chosen from C B then 7: X ij = X ij + η t c(e) 8: else 9: X ji = X ji + η t c(e) 10: end if 11: Update R = Y AXB. 12: end for output X, R 5.2. Patch-Based Sparse Recovery The notion of patch-based sparse recovery originates from the compressed sensing of natural images with patch-based representations (Yang et al., 2008). It is then naturally applicable if we observe that social interactions are likely to be local. Technically, we divide the nodes of the graph into groups G 1, G 2,..., G q. Let W G be the wavelet basis restricted G ( with rows not involved in group G i removed) and M G be the corresponding measurements. The patch-based sparse recover could be formulated as: minimize U d u i 1 (11) subject to YG = M G B W GU for G {G i } q. The choice of grouping patch is of primary concern for applications. We consider two basic candidates here. mini-batch. Divide nodes {v i } n into K minibatches {v n/k j+1,..., v n/k (j+1) } K j. k-hop. Patch groups consist of k-hops of vertices {v i } n on the graph: G i is the set of vertices within distance k to v i. 6. Experiment The proposed compressed sensing algorithm and fast implementations are severely tested in three datasets, two of which are real-world social network data Synthetic Data Synthetic data is generated from classical network models that capture many characteristic aspects of practical social networks, for example, constant clustering coefficient, small-world effect, etc. (Newman, 2009). It gives simple demonstration how our compressed sensing approach works on social networks. Furthermore, since real-world data is noisy, the gap of the algorithm s performance on synthetic and real social network data could, in a way, measure its robustness in the presence of noise. In particular, we utilize preferential attachment and small world model (Newman, 2009) to synthesize graphs with topology akin to that of social networks. To generate a synthesize graph G = (V, E) of n vertices, our algorithm works as follows Here, Preferential Attachment rule selects the vertex v j with probability proportional to the in-degree of v j, i.e.

6 Algorithm 3 Graph Synthesis 1: V {v 1 } 2: 1 i 3: while i < N do 4: i i + 1 5: V = V {v i } 6: Choose v j {v 1, v 2,...v i 1 } according to the Preferential Attachment rule 7: E = E {(v i, v j )} 8: Choose a long range link < v s, v t > according to Kleinberg s model, where v s, v t V and (v s, v t ) V 9: E = E {(v s, v t )} 10: end while 11: Sample a random basis B. 12: for i = 1 n do 13: Randomly sample a k-sparse vector under basis B. 14: end for 15: Use the similarity of feature vectors to define weights for the Markov chain build from G = (V, E). 16: Simulate the Markov Chain with Gibbs Sampling. Pr[v j is chosen] = in-degree(v j) i 1 in-degree(v k ) k=1 (12) Kleinberg s model (Kleinberg, 2000) choose a longrange link (v s, v t ) with probability probability proportion to d(v s, v t ) α, where α is a constant. In this paper, Kleinberg showed that for α = 2, a grid network can be routed in O(log 2 N) steps in expectation for two vertices with distance N (Kleinberg, 2000). Here, in our synthetic graph model, we also choose α = 2. Experiment result shows that this synthetic graph model has small diameter, big clustering coefficient and similarly Power-Law degree distribution as normal social network graphs. Each node of the synthetic graph is then assigned with a randomly generated K-sparse feature under certain basis. A Markov network is generated corresponding to the synthetic graph to incorporate correlations into neighboring nodes SNAP Datasets The Stanford Large Network Dataset Collection (aka. SNAP library) (sna, 2009) provides open access to popular network data with anonymized features. The collection subsets used in our experiment involve so- (a) (b) Figure 1. Synthetic Social Graph of (a) 100 nodes. (b) nodes, which follow preferential attachment and power-law degree distribution. (a) Facebook 348 (b) Twitter Figure 2. Sample Social Circle from Facebook (Left) and Twitter(Right) Dataset in the SNAP Dataset Collection cial circles from Facebook and Twitter. The Facebook dataset contains 10 circles with in all 4,039 nodes and 88,234 edges. For each node, corresponding binary features are collected including education, school, year, location, work, etc. It is noticeable that these features are binary 0/1 vectors. The Twitter dataset follows similar settings but contains a much larger number of nodes and edges up to 81,306 and 1,768,149 respectively. We optionally choose circle 3980 (#nodes 59, #dimension 42) from Facebook dataset for the validation of the algorithms peformance. Around 300 samples are first sampled uniformly at random from these data to learn a dictionary (size L) of basis feature vec- Facebook Twitter 0 birthday 19 #CES 4 education 24 #Dell 10 education 28 #Facebook 34 first name 41 #NBA 44 languages 46 last name Table 1. Sample s for Facebook and Twitter Dataset in the SNAP Dataset Collection.

7 L = L = 0.4 Reconstruction Reconstruction Error Error L = L = L = L = L = 1 L = Figure 3. Relative recover error of social circle 3980 in Facebook dataset when m #nodes of data is observed and a dictionary of size L is prepared. Figure 4. Relative recover error of social circle in Twitter dataset when m #nodes of data is observed and a dictionary of size L is prepared. tors for sparse representation of the high-dimensional data. Upon specific network topology, the wavelet diffusion process is simulated at 2 scales with the graph Laplacian operator, which is computed from the social graph with weights set to the cosin similarity (Newman, 2009) for adjacent nodes. To emulate a compressed sensing setting, m #nodes randomly selected feature entries of nodes on the social graphs are observed and then recovered with OMP algorithm under the sparse tensor basis. The experiment is performed with MATLAB on an Intel 4-core i5 2.4 GHz machine and utilizes software packages SPAMS (spa, 2012) and (mau, 2009) Diffusion s. By varying the number of items L in the dictionary as well as the number of measurements m #nodes, the relative reconstruction error under l 2 norm is plotted in Figure 6.2. We observe that in spite of the stochastic reconstruction error due to uncertainty of measurement, a larger over-complete dictionary could result in better outcome. Also we find that the relative recovery error is usually large, and guess that this be because binary and categorical features could not find a natural sparse representation. Even with this drawback, the graph tensor basis is shown to be more stable and outperforms the baseline methods in some parameter settings Weibo Data Most open access datasets have restrictions on the use of social data due to privacy concerns. It brings inherent inconvenience to seeking an appropriate testbed with highly correlated and abundant social features for our algorithm. We therefore resort to the Internet and use collections of data grabbed from Weibo.com, which contains not only detailed user profile but also complete microblog texts. The Weibo dataset contains 200 millions users and 200 GB microblogs. A connected subgraph of 965 nodes and 3 millions microblogs is selected and a 2000-dimensional feature vector is established for each node simply by counting the word distribution of its microblog posts. This representation is in a sense raw and hence inevitably leads to stronger correlation among features as well as neighboring nodes. We evaluate our algorithm s performance following the same settings as the Facebook and Twitter dataset. However, we find for this dataset the traditional l 1 minimization or greedy algorithms requires explosive time and space to solve the tensor compressed sensing problem (11). In contrast, our proposed Matching Pursuit algorithm much faster and results in comparable accuracy. As shown in Figure 6.3, our algorithm significantly outperforms the baseline ones for small number of measurements. 7. Conclusion and Future Works In this paper, we focus on compressive sensing of correlated social network data. Based on the assumption that the network has good sparsity, we propose a novel algorithm for both gathering the data process and the recovery process based on hierarchical sparse modeling

8 Figure 5. The social subgraph taken from the Weibo data that has 965 users and 3 millions microblogs Figure 6. Relative recover error on Weibo dataset when m #nodes of data is observed and a dictionary of size L is prepared and tensor representation. Efficient implementations like Matching Pursuit and Patched-Based optimization are also presented to allow for a fast solving of the tensor compressed sensing problem. To show the robustness and effectiveness of our approach, we test the algorithms on several datasets. The results shows our model in a way captures the sparsity of social networks better and therefore more desirable in practice. Simple observation shows social networks share a nontrivial sparse structure. The graph tensor sparse representation is our very preliminary attempts to identify that structure. The future work, as we think, is three-fold. Technically, we may continue to explore more sophisticated numerical methods to deal with real big data from large social networks; Algorithmically, we can make more diverse ways of gathering the data, including path measurements, message propagation measurements; theoretically, we could prove some lower bounds on the number of samples needed for our sparse recovery algorithm as well as guarantees of the Pursuit Algorithm. References Matlab code for diffusion wavelets Snap: Stanford network analysis platform. snap.stanford.edu., Sparse modeling software. //spams-devel.gforge.inria.fr/., Anagnostopoulos, Aris, Kumar, Ravi, and Mahdian, Mohammad. Influence and correlation in social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp ACM, Residual #Non Zero Item Baraniuk, Richard, Davenport, Mark, DeVore, Ronald, and Wakin, Michael. A simple proof of the restricted isometry property for random matrices. Constructive Approximation, 28(3): , iteration Figure 7. Pursuit Matching. and #Non-Zero Items. Change of Residual Candès, Emmanuel J and Wakin, Michael B. An introduction to compressive sampling. Signal Processing Magazine, IEEE, 25(2):21 30, Candès, Emmanuel J, Romberg, Justin, and Tao, Terence. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. Information Theory, IEEE Transactions on, 52(2): , 2006.

9 Coates, Mark, Pointurier, Yvan, and Rabbat, Michael. Compressed network monitoring for ip and alloptical networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pp ACM, Donoho, David Leigh. Compressed sensing. Information Theory, IEEE Transactions on, 52(4): , Easley, David and Kleinberg, Jon. Networks, crowds, and markets, volume 8. Cambridge Univ Press, Friedman, Jerome H and Tukey, John W. A projection pursuit algorithm for exploratory data analysis. Computers, IEEE Transactions on, 100(9): , Kleinberg, Jon. The small-world phenomenon: an algorithmic perspective. In Proceedings of the thirtysecond annual ACM symposium on Theory of computing, pp ACM, Kwak, Haewoon, Lee, Changhyun, Park, Hosung, and Moon, Sue. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, pp ACM, Lee, Honglak, Battle, Alexis, Raina, Rajat, and Ng, Andrew Y. Efficient sparse coding algorithms. Advances in neural information processing systems, 19: 801, Mairal, Julien, Bach, Francis, Ponce, Jean, and Sapiro, Guillermo. Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning, pp ACM, Mallat, Stephane G and Zhang, Zhifeng. Matching pursuits with time-frequency dictionaries. Signal Processing, IEEE Transactions on, 41(12): , Mark Coates, Yvan Pointurier and Rabbat, Michael. Compressed network monitoring for ip and alloptical networks, Newman, Mark. Networks: an introduction. OUP Oxford, Newman, Mark EJ. Models of the small world. Journal of Statistical Physics, 101(3-4): , Ronald R. Coifman, Mauro Maggioni. Diffusion wavelets. In Applied and Computational Harmonic Analysis, volume 21, pp , Russell, Matthew A. Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites. O Reilly Media, Yang, Jianchao, Wright, John, Huang, Thomas, and Ma, Yi. Image super-resolution as sparse representation of raw image patches. In Computer Vision and Pattern Recognition, CVPR IEEE Conference on, pp IEEE, Liben-Nowell, David and Kleinberg, Jon. The linkprediction problem for social networks. Journal of the American society for information science and technology, 58(7): , Liwen, Xu. Efficient data gathering using compressed sparse functions Lu, Yue M and Do, Minh N. A theory for sampling signals from a union of subspaces. Signal Processing, IEEE Transactions on, 56(6): , Luo, Chong, Wu, Feng, Sun, Jun, and Chen, Chang Wen. Compressive data gathering for largescale wireless sensor networks. In Proceedings of the 15th annual international conference on Mobile computing and networking, pp ACM, Mahadevan, Sridhar and Maggioni, Mauro. Value function approximation with diffusion wavelets and laplacian eigenfunctions. Advances in neural information processing systems, 18:843, 2006.

When is missing data recoverable?

When is missing data recoverable? When is missing data recoverable? Yin Zhang CAAM Technical Report TR06-15 Department of Computational and Applied Mathematics Rice University, Houston, TX 77005 October, 2006 Abstract Suppose a non-random

More information

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property Venkat Chandar March 1, 2008 Abstract In this note, we prove that matrices whose entries are all 0 or 1 cannot achieve

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

An Improved Reconstruction methods of Compressive Sensing Data Recovery in Wireless Sensor Networks

An Improved Reconstruction methods of Compressive Sensing Data Recovery in Wireless Sensor Networks Vol.8, No.1 (14), pp.1-8 http://dx.doi.org/1.1457/ijsia.14.8.1.1 An Improved Reconstruction methods of Compressive Sensing Data Recovery in Wireless Sensor Networks Sai Ji 1,, Liping Huang, Jin Wang, Jian

More information

ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS. Mikhail Tsitsvero and Sergio Barbarossa

ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS. Mikhail Tsitsvero and Sergio Barbarossa ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS Mikhail Tsitsvero and Sergio Barbarossa Sapienza Univ. of Rome, DIET Dept., Via Eudossiana 18, 00184 Rome, Italy E-mail: tsitsvero@gmail.com, sergio.barbarossa@uniroma1.it

More information

Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity

Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Bag of Pursuits and Neural Gas for Improved Sparse Coding

Bag of Pursuits and Neural Gas for Improved Sparse Coding Bag of Pursuits and Neural Gas for Improved Sparse Coding Kai Labusch, Erhardt Barth, and Thomas Martinetz University of Lübec Institute for Neuro- and Bioinformatics Ratzeburger Allee 6 23562 Lübec, Germany

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

A Network Flow Approach in Cloud Computing

A Network Flow Approach in Cloud Computing 1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges

More information

Part II Redundant Dictionaries and Pursuit Algorithms

Part II Redundant Dictionaries and Pursuit Algorithms Aisenstadt Chair Course CRM September 2009 Part II Redundant Dictionaries and Pursuit Algorithms Stéphane Mallat Centre de Mathématiques Appliquées Ecole Polytechnique Sparsity in Redundant Dictionaries

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Link Prediction in Social Networks

Link Prediction in Social Networks CS378 Data Mining Final Project Report Dustin Ho : dsh544 Eric Shrewsberry : eas2389 Link Prediction in Social Networks 1. Introduction Social networks are becoming increasingly more prevalent in the daily

More information

Survey of the Mathematics of Big Data

Survey of the Mathematics of Big Data Survey of the Mathematics of Big Data Philippe B. Laval KSU September 12, 2014 Philippe B. Laval (KSU) Math & Big Data September 12, 2014 1 / 23 Introduction We survey some mathematical techniques used

More information

Stable Signal Recovery from Incomplete and Inaccurate Measurements

Stable Signal Recovery from Incomplete and Inaccurate Measurements Stable Signal Recovery from Incomplete and Inaccurate Measurements EMMANUEL J. CANDÈS California Institute of Technology JUSTIN K. ROMBERG California Institute of Technology AND TERENCE TAO University

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

Competitive Analysis of On line Randomized Call Control in Cellular Networks

Competitive Analysis of On line Randomized Call Control in Cellular Networks Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising

More information

Stable Signal Recovery from Incomplete and Inaccurate Measurements

Stable Signal Recovery from Incomplete and Inaccurate Measurements Stable Signal Recovery from Incomplete and Inaccurate Measurements Emmanuel Candes, Justin Romberg, and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics,

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Big learning: challenges and opportunities

Big learning: challenges and opportunities Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,

More information

Research Article A Method of Data Recovery Based on Compressive Sensing in Wireless Structural Health Monitoring

Research Article A Method of Data Recovery Based on Compressive Sensing in Wireless Structural Health Monitoring Mathematical Problems in Engineering Volume 214, Article ID 46478, 9 pages http://dx.doi.org/1.11/214/46478 Research Article A Method of Data Recovery Based on Compressive Sensing in Wireless Structural

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

Sparsity-promoting recovery from simultaneous data: a compressive sensing approach

Sparsity-promoting recovery from simultaneous data: a compressive sensing approach SEG 2011 San Antonio Sparsity-promoting recovery from simultaneous data: a compressive sensing approach Haneet Wason*, Tim T. Y. Lin, and Felix J. Herrmann September 19, 2011 SLIM University of British

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering

More information

MapReduce Approach to Collective Classification for Networks

MapReduce Approach to Collective Classification for Networks MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty

More information

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

More information

Group Sparse Coding. Fernando Pereira Google Mountain View, CA pereira@google.com. Dennis Strelow Google Mountain View, CA strelow@google.

Group Sparse Coding. Fernando Pereira Google Mountain View, CA pereira@google.com. Dennis Strelow Google Mountain View, CA strelow@google. Group Sparse Coding Samy Bengio Google Mountain View, CA bengio@google.com Fernando Pereira Google Mountain View, CA pereira@google.com Yoram Singer Google Mountain View, CA singer@google.com Dennis Strelow

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION Sérgio Pequito, Stephen Kruzick, Soummya Kar, José M. F. Moura, A. Pedro Aguiar Department of Electrical and Computer Engineering

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014 Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

More information

Class-specific Sparse Coding for Learning of Object Representations

Class-specific Sparse Coding for Learning of Object Representations Class-specific Sparse Coding for Learning of Object Representations Stephan Hasler, Heiko Wersing, and Edgar Körner Honda Research Institute Europe GmbH Carl-Legien-Str. 30, 63073 Offenbach am Main, Germany

More information

A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks

A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks H. T. Kung Dario Vlah {htk, dario}@eecs.harvard.edu Harvard School of Engineering and Applied Sciences

More information

Sparse recovery and compressed sensing in inverse problems

Sparse recovery and compressed sensing in inverse problems Gerd Teschke (7. Juni 2010) 1/68 Sparse recovery and compressed sensing in inverse problems Gerd Teschke (joint work with Evelyn Herrholz) Institute for Computational Mathematics in Science and Technology

More information

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition K. Osypov* (WesternGeco), D. Nichols (WesternGeco), M. Woodward (WesternGeco) & C.E. Yarman (WesternGeco) SUMMARY Tomographic

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Compression algorithm for Bayesian network modeling of binary systems

Compression algorithm for Bayesian network modeling of binary systems Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the

More information

Machine learning challenges for big data

Machine learning challenges for big data Machine learning challenges for big data Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure Joint work with R. Jenatton, J. Mairal, G. Obozinski, N. Le Roux, M. Schmidt - December 2012

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis Yusuf Yaslan and Zehra Cataltepe Istanbul Technical University, Computer Engineering Department, Maslak 34469 Istanbul, Turkey

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

Permutation Betting Markets: Singleton Betting with Extra Information

Permutation Betting Markets: Singleton Betting with Extra Information Permutation Betting Markets: Singleton Betting with Extra Information Mohammad Ghodsi Sharif University of Technology ghodsi@sharif.edu Hamid Mahini Sharif University of Technology mahini@ce.sharif.edu

More information

An Alternative Web Search Strategy? Abstract

An Alternative Web Search Strategy? Abstract An Alternative Web Search Strategy? V.-H. Winterer, Rechenzentrum Universität Freiburg (Dated: November 2007) Abstract We propose an alternative Web search strategy taking advantage of the knowledge on

More information

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud

More information

Background: State Estimation

Background: State Estimation State Estimation Cyber Security of the Smart Grid Dr. Deepa Kundur Background: State Estimation University of Toronto Dr. Deepa Kundur (University of Toronto) Cyber Security of the Smart Grid 1 / 81 Dr.

More information

Disjoint sparsity for signal separation and applications to hybrid imaging inverse problems

Disjoint sparsity for signal separation and applications to hybrid imaging inverse problems Disjoint sparsity for signal separation and applications to hybrid imaging inverse problems Giovanni S Alberti (joint with H Ammari) DMA, École Normale Supérieure, Paris June 16, 2015 Giovanni S Alberti

More information

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS Kyoungjin Park Alper Yilmaz Photogrammetric and Computer Vision Lab Ohio State University park.764@osu.edu yilmaz.15@osu.edu ABSTRACT Depending

More information

(67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7

(67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7 (67902) Topics in Theory and Complexity Nov 2, 2006 Lecturer: Irit Dinur Lecture 7 Scribe: Rani Lekach 1 Lecture overview This Lecture consists of two parts In the first part we will refresh the definition

More information

Sketch As a Tool for Numerical Linear Algebra

Sketch As a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn)

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,

More information

Notes on Factoring. MA 206 Kurt Bryan

Notes on Factoring. MA 206 Kurt Bryan The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

More information

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let) Wavelet analysis In the case of Fourier series, the orthonormal basis is generated by integral dilation of a single function e jx Every 2π-periodic square-integrable function is generated by a superposition

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Distance Degree Sequences for Network Analysis

Distance Degree Sequences for Network Analysis Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Three Effective Top-Down Clustering Algorithms for Location Database Systems Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr

More information

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003 Graph models for the Web and the Internet Elias Koutsoupias University of Athens and UCLA Crete, July 2003 Outline of the lecture Small world phenomenon The shape of the Web graph Searching and navigation

More information

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu

More information

8. Linear least-squares

8. Linear least-squares 8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1 Definition overdetermined linear equations if b range(a), cannot

More information

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

More information

A Bayesian Analysis of Compressive Sensing Data Recovery in Wireless Sensor Networks

A Bayesian Analysis of Compressive Sensing Data Recovery in Wireless Sensor Networks A Bayesian Analysis of Compressive Sensing Data Recovery in Wireless Sensor Networks Riccardo Masiero, Giorgio Quer, Michele Rossi and Michele Zorzi Department of Information Engineering, University of

More information

Weakly Secure Network Coding

Weakly Secure Network Coding Weakly Secure Network Coding Kapil Bhattad, Student Member, IEEE and Krishna R. Narayanan, Member, IEEE Department of Electrical Engineering, Texas A&M University, College Station, USA Abstract In this

More information

Differential Privacy Preserving Spectral Graph Analysis

Differential Privacy Preserving Spectral Graph Analysis Differential Privacy Preserving Spectral Graph Analysis Yue Wang, Xintao Wu, and Leting Wu University of North Carolina at Charlotte, {ywang91, xwu, lwu8}@uncc.edu Abstract. In this paper, we focus on

More information

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Social Media Mining. Graph Essentials

Social Media Mining. Graph Essentials Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures

More information

Single Depth Image Super Resolution and Denoising Using Coupled Dictionary Learning with Local Constraints and Shock Filtering

Single Depth Image Super Resolution and Denoising Using Coupled Dictionary Learning with Local Constraints and Shock Filtering Single Depth Image Super Resolution and Denoising Using Coupled Dictionary Learning with Local Constraints and Shock Filtering Jun Xie 1, Cheng-Chuan Chou 2, Rogerio Feris 3, Ming-Ting Sun 1 1 University

More information

The Image Deblurring Problem

The Image Deblurring Problem page 1 Chapter 1 The Image Deblurring Problem You cannot depend on your eyes when your imagination is out of focus. Mark Twain When we use a camera, we want the recorded image to be a faithful representation

More information

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment Edmond H. Wu,MichaelK.Ng, Andy M. Yip,andTonyF.Chan Department of Mathematics, The University of Hong Kong Pokfulam Road,

More information

How To Find Local Affinity Patterns In Big Data

How To Find Local Affinity Patterns In Big Data Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class

More information

II. RELATED WORK. Sentiment Mining

II. RELATED WORK. Sentiment Mining Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

More information

Social Media Mining. Network Measures

Social Media Mining. Network Measures Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

More information

Decentralized Utility-based Sensor Network Design

Decentralized Utility-based Sensor Network Design Decentralized Utility-based Sensor Network Design Narayanan Sadagopan and Bhaskar Krishnamachari University of Southern California, Los Angeles, CA 90089-0781, USA narayans@cs.usc.edu, bkrishna@usc.edu

More information

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

More information

Correlated Compressive Sensing for Networked Data

Correlated Compressive Sensing for Networked Data Correlated Compressive Sensing for Networked Data Tianlin Shi Da Tang Liwen Xu Thomas Moscibroda The Institute for Theoretical Computer Science (ITCS Institute for Interdisciplinary Information Sciences

More information

Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections

Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections Maximilian Hung, Bohyun B. Kim, Xiling Zhang August 17, 2013 Abstract While current systems already provide

More information