Compressed Sensing of Correlated Social Network Data

Compressed Sensing of Correlated Social Network Data Abstract In this paper, we present preliminary work on compressed sensing of social network data and identify its non-trivial sparse structure. basis learnt from data samples and the inherent sparse structure induced by graphs are combined into a tensor-based sparse representation of high-dimensional social network data. By solving the Compressed Sensing Problem, we could effectively recover correlated social data from a few samples. An efficient greedy algorithm, called Matching Pursuit (TMP), is also proposed to handle the computational intractability with big data. We severely test our algorithm and implementations over social network datasets like Twitter, Facebook and Weibo. The results show that our approach robustly outperforms the baseline algorithms and hence in a sense captures the sparsity of social network data better. 1. Introduction This paper considers the data gathering problem of large-scale social networks, where each user is modeled with a vector of features like education, hobby, opinion, etc. Such social networks as Twitter, Weibo are pervasive nowadays and involved extensively with network analysis, data mining (Russell, 2011), machine learning. In general, social network data is acquired in an accumulative fashion with user profiles and features collected and stored independently. As far as we know, however, social network data is far from independent (Anagnostopoulos et al., 2008). We, therefore, ask the fundamental question: can one essentially reduce the number of samples required for social network analysis with hopefully little accuracy tradeoff? The answer lies in the haystack of data correlations Network Science Term Project. work. Do not distribute. Unpublished preliminary in social networks. In particular, two representative types of correlations are frequently encountered, which we highlight as follows: Social Correlation. Your friend and you tend to like the same TV show. Social correlation, such as social influence (Anagnostopoulos et al., 2008) or information cascading (Easley & Kleinberg, 2010), characterizes the coordination of people s behavior and features over a connected graph component. Correlation. Being a Geek is being cool. The high-dimensional feature of each entity could have correlations among multiple dimensions. In contemporary signal processing and machine learning, the silver bullet to recover correlated data is compressed sensing (Candès & Wakin, 2008). This paper presents our preliminary attempt to apply compressed sensing techniques to social network data gathering. Our general idea is depicted in Figure. Instead of collecting the feature vectors of the entire data, we could just randomly sample a subset of the feature entries and later use sparse recovery algorithms to acquire the rest. Two major contributions are made in this paper. First, a novel sparse representation that simultaneously deals with both social and feature correlations is introduced by exploring the idea of hierarchical sparse modeling. Second, an efficient implementation of the sparse recovery algorithm under the new representation is devised that allows for fast solving of our optimization problem. The algorithms are extensively tested on several datasets and the results show the advantage of our approach against baseline algorithms. The rest of the paper is organized as follows. Section 2 reviews related work. Section 3 establishes the compressed sensing framework for social graphs with various types of data gathering strategies. Section 4 introduces the combination of feature basis learning and diffusion wavelets in a hierarchical style for sparse representation of social networks. Section 5 presents two efficient implementations of our proposed algorithm. Section 6 demonstrates the test results on three

datasets. Finally, Section 7 concludes this paper and discusses potential future works. 2. Related Works Over the past decades, network science research is becoming increasingly popular. The mainstream splits into two parts: network modeling (e.g. small world model (Newman, 2000) and network analysis (e.g. the link prediction problem (Liben-Nowell & Kleinberg, 2007). These research share a common feature: dealing with big data. Kwak et al. crawled the entire twitter network and gathered the data for analysis (Kwak et al., 2010). The field of compressed sensing grows out of the work by Candes, Romberg, Tao and Donoho(Donoho, 2006). Recent years have witnessed the advancement of compressed sensing in both theory and application. Baraniuk gives a simple proof the RIP property of random matrices (Baraniuk et al., 2008). Candes et al. considers signal recovery from highly incomplete information (Candès et al., 2006). Compressed sensing has find great applications in multiple fields like image processing, signal processing, audio analysis and sensor networks. The first application of compressed sensing technique owes back to Luo et al.(luo et al., 2009). They construct a sensor network with a sink collecting compressed measurements, which is equivalent to a random matrix projection. Xu et al.(liwen, 2013) considers more general compressed sparse functions for sparse representation of signals over graph. In terms of sparse representations, dictionary learning originates from efforts in reproducing V1-like visual neurons through sparse coding (Lee et al., 2007). Marial et al. proposed online dictionary learning methods, which leads to efficient computation of sparse coding (Mairal et al., 2009). Diffusion wavelet is first invented by Maggioni in 2004 (Mahadevan & Maggioni, 2006) and has found its application in compressed sensing over graphs (Coates et al., 2007). The typical numerical approach to solving compressed sensing problem is through l 1 minimization. Nevertheless, several greedy pursuit algorithms have been proposed. The pursuit algorithm could date back to 1974 (Friedman & Tukey, 1974). The most classic Matching Pursuit algorithm is proposed by Matllat and Zhang in 1993 (Mallat & Zhang, 1993). There is now a family of greedy pursuit algorithms and some theoretical guarantees have been established (Lu & Do, 2008). 3. Compressed Social Data Gathering Compressed Sensing, also known as Compressive Sensing, is a signal processing technique for efficiently acquiring and reconstructing a signal by finding solutions to underdetermined linear systems (Candès & Wakin, 2008). m = MΦ x (1) This takes advantage of the signal s sparseness or compressibility (representable by a few non-zero coefficients x) in some domain Φ, allowing the entire signal Φ x to be determined from relatively few measurements m. A compressed data gathering scheme should cleverly devise the measurement matrix so as to maximize information expressiveness. We propose two appealing measurement schemes for compressed social data gathering in this section. 3.1. Sampling According to traditional theory of compressed sensing, we need to choose at least Ck log n/k samples for the ideal recovery of k-sparse signals, where c is a small constant. In the setting of graph compressed sensing, we hence need to select Θ(k log n/k) feature entries at various vertices to achieve compressed sampling. For simplicity, we just sample the data uniformly at random. 3.2. Message Propagation On social networks, messages such as tweets and microblogs are propagated along the edges from vertex to vertex. When a message m passes a vertex v (represents a client or a person), the property of m could influenced by the property of the vertex v. Measuring the changes on a message may help us to learn more about the graph s features, and thus in a way viewed as measuring information during the compressed sensing process. Suppose that each vertex u has a high-dimensional feature vector v u. Also, for each message m spread in the network, there is a corresponding vector v m R l that implies the properties of this message. v m could be changed in propagation. Since traditional compressed sensing considers problem settings with linear transformations, we can assume that v m is changed linearly according the the weight of the edge as it passes the edge. More precisely, suppose for each edge e = (s, t), there is a vector α e R l that represents the weigh value of different properties of this edge, then we can

assume that v mi = v si + α i v mi 1 + α i, 1 i l. (2) With this assumption, we measure social networks through paths of messages propagation. Measuring the difference of the messages s property vector after it passes an edge, and from that we can construct a set of linear equations for solving the properties of the vertices. However, what limits the use of this paradigm is how to properly set the edge weights, which is a strong prior information is somehow restrictive. Therefore, we consider a simpler, applied way for gathering the network data by sampling. 4. Sparse Recovery In this section, we address a fundamental issue encountered in sparse recovery of gathered social network data. Social network data could be big in two dimensions: rows (i.e. number of entities on graph) and columns (i.e. number of features for each entity). Since correlation could happen in both dimensions, it is unclear how one could find a suitable way to sparsely represent the data. There are basically two straightforward paradigms to identify a sparse representation of data on social graphs. One trivial approach would simply ignore network structure and adopt the dictionary learning methods (Lee et al., 2007; Mairal et al., 2009) to find a basis, under which feature vectors of entities {v i } n would have sparse coefficients. This approach overlooks most significant role of social ties as shown in previous works (Anagnostopoulos et al., 2008), which could possibly lead to sparser representation. The other paradigm follows preliminary work by Coates et al. (Coates et al., 2007) as well as Xu et al. (Liwen, 2013), both of which in different ways consider sparse decomposition of signal values with respect to some specific functional basis on sensor networks. Although it suggests a direct way to model social correlation, it does not generalize naturally to high-dimensional social vector graphs with feature correlations. We could in principle concatenate features of all entities {v i } n into a large vector so as to capture both feature and topological correlations, but neither the computational burden nor the data size would be tractable. To overcome these shortcomings, we propose a novel construction of sparse basis functions for highdimensional social graph data by combining the above two paradigms in a reciprocal way. 4.1. Basis Identifying sparse feature basis is a learning problem given κ statistically representative feature samples of network entities x i1, x i2,..., x iκ R d, we seek a set of vectors b 1, b 2,..., b mb R d, under which the samples could be sparsely represented by L d non-zero coefficients α i1, α i2,..., α iκ and this property could be generalized to the entire data universe. Mathematically, it can be formalized as a joint optimization problem: 1 min B,α κ κ ( 1 2 x i j Bα ij 2 + λ α ij 1 ) (3) j=1 where basis b i is settled at the i-th column of matrix B and λ is a regularization parameter for sparsity. This problem can be solved efficiently with secondorder gradient descent(lee et al., 2007), online learning methods (Mairal et al., 2009), etc. 4.2. Diffusion transforms are a staple of modern compression and signal processing methods due to their ability of representing piece-wise smooth signals efficiently (signals which are smooth everywhere, except for a few discontinuities). In general, wavelet transform produce a multi-scaling function decomposition defined on regularly sampled interval. Ronald R. Coifman and Mauro Maggioni (Ronald R. Coifman, 2006) introduces Diffusion wavelets specifically. They start from a semi-group of diffusion operators {T t }, associated to a diffusion process, to induce a multi-resolution analysis, interpreting the powers of T as dilation operators acting on functions, and constructing precise down sampling operators to efficiently represent the multi-scale structure. This yields a construction of multi-scale scaling functions and wavelets in a very general setting. Our goal for the Diffusion transformation is to compute a collection {B i } n of orthonormal wavelets basis vectors. Then, A function y on the graph can be written as y = n β i B i where β i is the i-th wavelet coefficient (Mark Coates & Rabbat, 2007).

The process for computing the basis need a sparse QR decomposition. Here given the input A (a sparse n n matrix and a precision indicator ɛ, the sparse QR decomposition returns a n n orthogonal matrix Q and an upper triangular matrix R where A = ɛ QR. To compute the orthonormal bases of scaling functions, Φ j, wavelets Ψ, the algorithm works as follows: Algorithm 1 Diffusion Construction (Ronald R. Coifman, 2006) 1: j 0 2: while j < N do 3: [Φ j+1 ] Φj, [T ] Φ1 Φ 0 4: T j+1 := [T 2j+1 ] Φj+1 Φ j+1 SpQR([T 2j ] Φj Φ j, ɛ) [Φ j+1 ] Φj [T 2j ] Φj Φ j [Φ j+1 ] Φ j 5: [Ψ j ] Φj SpQR(I <Φj> [Φ j+1 ] Φ [Φ j+1 ] Φ j, ɛ) 6: end while where [B 1 ] B2 represents the set of vectors B 1 represented on a basis B 2 ; and [L] B2 B 1 indicate the matrix representing the linear operator L with respect to the basis B 1 in the domain and B 2 in the range. The key point for this process is how to choose the initial diffusion operator T. This T must be a matrix such that T ij > 0 if and only if (i, j) E. Also, T ij s value should represents the correlation of the vertices i and j. Hence, we choose T to be the Laplacian of the graph, i.e. { 0, if(i, j) E L(i, j) = 1, if(i, j) E did j 4.3. Graph Basis (4) We propose to unify feature basis and diffusion wavelet in a hierarchical fashion. First, feature vectors associated with each node {x i } n Rd are decomposed sparsely into coefficients {α i } n i under basis B. x i = Bα i (5) Let X, A R d n, each column of which denotes vector x i or α i. Each row of A, denoted as {A i,: } n, is then a value function on the graph and we proceed by decomposing it over diffusion wavelets W. A i,: = W u i (6) Using matrix notation, the entire pipeline can be written as X = BA = BUW (7) where i-th row of the coefficient matrix U is u i. The focus of our sparse recovery algorithm is then turned to minimizing the l 1 norms u 1 1, u 2 1,..., u d 1 subject to Equation 7. minimize U d u i 1 subject to Y = M(BUW ) (8) The above optimization problem can be rewritten in the familiar form of compressed sensing, if we concatenate the columns of X and U into a long vector X, U and they are connected with the Kronecker product B W. minimize U d u i 1 (9) subject to Y = MB W U In other words, the feature basis and diffusion wavelet are combined through tensor product to produce a new basis for hierarchical sparse decomposition. 5. Efficient Implementation For networks with large number of nodes, the optimization problem (11) could be daunting due to the high dimensionality of the tensor basis. To clear away the obstacles for the applicability of our recover algorithm, we introduce in this section two approximations: Matching Pursuit and Patched-Based sparse recovery. 5.1. Matching Pursuit Although compressed sensing is often synonymous with l 1 -based optimization, many applications often require efficient storage and fast speed. This is especially true for the tensor version of our joint basis optimization problem (11), which might inevitably consumes up to n 2 d 2 space for d-dimensional vector graphs of n nodes. We show that these burdens are not fundamental obstacles to our sparse recovery paradigm by introducing a new greedy algorithm to tackle the optimization approximately. Our technique belongs to a large family of greedy pursuit methods used in compressed

sensing and generalizes the classic Matching Pursuit algorithm. Like other greedy pursuit algorithms, our Matching Pursuit (TMP) has two fundamental steps: element selection and coefficient update. In particular, the approximation is incremental: first selecting one column from the basis B at each iteration and update the coefficients associated with the column such that the residual of constraints is decreased. To derive Matching Pursuit, we need to put a weak constraints on measurements. Let M 1, M 2 be two linear operators, the following gives the slightly restricted version of our optimization problem (10). minimize U d u i 1 subject to Y = M 1 BUW M 2 (10) This restriction requires simultaneously measure the same feature dimensions and is reasonable in practice. The presented TMP method is elaborated in Algorithm 2. Algorithm 2 Matching Pursuit input Y, B, W 1: Set R [0] = Y, ˆX [0] = 0 2: for round t = 1 until stopping criterion is met do 3: Calculate projection R B = B R, R W = W R, where B and W is Moore-Penrose pseudoinverse. 4: Compute correlation matrix C B = w R B [ 1 w w 1 2,..., p w p 2 ] and C W = b 1 b p b q 2 ] where b i and w j R W [ b 1 2,..., are column vectors of B and W. 5: Select one entry e = (i, j) either C B or C W that has the highest absolute correlation value c(e). 6: if e is chosen from C B then 7: X ij = X ij + η t c(e) 8: else 9: X ji = X ji + η t c(e) 10: end if 11: Update R = Y AXB. 12: end for output X, R 5.2. Patch-Based Sparse Recovery The notion of patch-based sparse recovery originates from the compressed sensing of natural images with patch-based representations (Yang et al., 2008). It is then naturally applicable if we observe that social interactions are likely to be local. Technically, we divide the nodes of the graph into groups G 1, G 2,..., G q. Let W G be the wavelet basis restricted G ( with rows not involved in group G i removed) and M G be the corresponding measurements. The patch-based sparse recover could be formulated as: minimize U d u i 1 (11) subject to YG = M G B W GU for G {G i } q. The choice of grouping patch is of primary concern for applications. We consider two basic candidates here. mini-batch. Divide nodes {v i } n into K minibatches {v n/k j+1,..., v n/k (j+1) } K j. k-hop. Patch groups consist of k-hops of vertices {v i } n on the graph: G i is the set of vertices within distance k to v i. 6. Experiment The proposed compressed sensing algorithm and fast implementations are severely tested in three datasets, two of which are real-world social network data. 6.1. Synthetic Data Synthetic data is generated from classical network models that capture many characteristic aspects of practical social networks, for example, constant clustering coefficient, small-world effect, etc. (Newman, 2009). It gives simple demonstration how our compressed sensing approach works on social networks. Furthermore, since real-world data is noisy, the gap of the algorithm s performance on synthetic and real social network data could, in a way, measure its robustness in the presence of noise. In particular, we utilize preferential attachment and small world model (Newman, 2009) to synthesize graphs with topology akin to that of social networks. To generate a synthesize graph G = (V, E) of n vertices, our algorithm works as follows Here, Preferential Attachment rule selects the vertex v j with probability proportional to the in-degree of v j, i.e.

Algorithm 3 Graph Synthesis 1: V {v 1 } 2: 1 i 3: while i < N do 4: i i + 1 5: V = V {v i } 6: Choose v j {v 1, v 2,...v i 1 } according to the Preferential Attachment rule 7: E = E {(v i, v j )} 8: Choose a long range link < v s, v t > according to Kleinberg s model, where v s, v t V and (v s, v t ) V 9: E = E {(v s, v t )} 10: end while 11: Sample a random basis B. 12: for i = 1 n do 13: Randomly sample a k-sparse vector under basis B. 14: end for 15: Use the similarity of feature vectors to define weights for the Markov chain build from G = (V, E). 16: Simulate the Markov Chain with Gibbs Sampling. Pr[v j is chosen] = in-degree(v j) i 1 in-degree(v k ) k=1 (12) Kleinberg s model (Kleinberg, 2000) choose a longrange link (v s, v t ) with probability probability proportion to d(v s, v t ) α, where α is a constant. In this paper, Kleinberg showed that for α = 2, a grid network can be routed in O(log 2 N) steps in expectation for two vertices with distance N (Kleinberg, 2000). Here, in our synthetic graph model, we also choose α = 2. Experiment result shows that this synthetic graph model has small diameter, big clustering coefficient and similarly Power-Law degree distribution as normal social network graphs. Each node of the synthetic graph is then assigned with a randomly generated K-sparse feature under certain basis. A Markov network is generated corresponding to the synthetic graph to incorporate correlations into neighboring nodes. 6.2. SNAP Datasets The Stanford Large Network Dataset Collection (aka. SNAP library) (sna, 2009) provides open access to popular network data with anonymized features. The collection subsets used in our experiment involve so- (a) (b) Figure 1. Synthetic Social Graph of (a) 100 nodes. (b) 10000 nodes, which follow preferential attachment and power-law degree distribution. (a) Facebook 348 (b) Twitter 613313 Figure 2. Sample Social Circle from Facebook (Left) and Twitter(Right) Dataset in the SNAP Dataset Collection cial circles from Facebook and Twitter. The Facebook dataset contains 10 circles with in all 4,039 nodes and 88,234 edges. For each node, corresponding binary features are collected including education, school, year, location, work, etc. It is noticeable that these features are binary 0/1 vectors. The Twitter dataset follows similar settings but contains a much larger number of nodes and edges up to 81,306 and 1,768,149 respectively. We optionally choose circle 3980 (#nodes 59, #dimension 42) from Facebook dataset for the validation of the algorithms peformance. Around 300 samples are first sampled uniformly at random from these data to learn a dictionary (size L) of basis feature vec- Facebook Twitter 0 birthday 19 #CES 4 education 24 #Dell 10 education 28 #Facebook 34 first name 41 #NBA 44 languages 241 @DIY 46 last name 355 @Microsoft...... Table 1. Sample s for Facebook and Twitter Dataset in the SNAP Dataset Collection.

5 5 5 5 L = 0.3 5 5 5 L = 0.4 Reconstruction Reconstruction Error Error 2.5 5 2 1.5 5 1 L = 0.3 1.1 1 L = 0.4 5 0.4 5 0.2 0.4 1 0.2 0.4 1 5 0 0.2 0.4 1 0.2 0.4 1 5 5 5 5 L = L = 5 5 5 5 L = 1 L = 0.45 0.2 0.4 1 0.4 0.2 0.4 1 0.45 0.2 0.4 1 0.4 0.2 0.4 1 Figure 3. Relative recover error of social circle 3980 in Facebook dataset when m #nodes of data is observed and a dictionary of size L is prepared. Figure 4. Relative recover error of social circle 116808228 in Twitter dataset when m #nodes of data is observed and a dictionary of size L is prepared. tors for sparse representation of the high-dimensional data. Upon specific network topology, the wavelet diffusion process is simulated at 2 scales with the graph Laplacian operator, which is computed from the social graph with weights set to the cosin similarity (Newman, 2009) for adjacent nodes. To emulate a compressed sensing setting, m #nodes randomly selected feature entries of nodes on the social graphs are observed and then recovered with OMP algorithm under the sparse tensor basis. The experiment is performed with MATLAB on an Intel 4-core i5 2.4 GHz machine and utilizes software packages SPAMS (spa, 2012) and (mau, 2009) Diffusion s. By varying the number of items L in the dictionary as well as the number of measurements m #nodes, the relative reconstruction error under l 2 norm is plotted in Figure 6.2. We observe that in spite of the stochastic reconstruction error due to uncertainty of measurement, a larger over-complete dictionary could result in better outcome. Also we find that the relative recovery error is usually large, and guess that this be because binary and categorical features could not find a natural sparse representation. Even with this drawback, the graph tensor basis is shown to be more stable and outperforms the baseline methods in some parameter settings. 6.3. Weibo Data Most open access datasets have restrictions on the use of social data due to privacy concerns. It brings inherent inconvenience to seeking an appropriate testbed with highly correlated and abundant social features for our algorithm. We therefore resort to the Internet and use collections of data grabbed from Weibo.com, which contains not only detailed user profile but also complete microblog texts. The Weibo dataset contains 200 millions users and 200 GB microblogs. A connected subgraph of 965 nodes and 3 millions microblogs is selected and a 2000-dimensional feature vector is established for each node simply by counting the word distribution of its microblog posts. This representation is in a sense raw and hence inevitably leads to stronger correlation among features as well as neighboring nodes. We evaluate our algorithm s performance following the same settings as the Facebook and Twitter dataset. However, we find for this dataset the traditional l 1 minimization or greedy algorithms requires explosive time and space to solve the tensor compressed sensing problem (11). In contrast, our proposed Matching Pursuit algorithm much faster and results in comparable accuracy. As shown in Figure 6.3, our algorithm significantly outperforms the baseline ones for small number of measurements. 7. Conclusion and Future Works In this paper, we focus on compressive sensing of correlated social network data. Based on the assumption that the network has good sparsity, we propose a novel algorithm for both gathering the data process and the recovery process based on hierarchical sparse modeling

Figure 5. The social subgraph taken from the Weibo data that has 965 users and 3 millions microblogs. 5 5 5 5 0.45 0.2 0.3 0.4 1 Figure 6. Relative recover error on Weibo dataset when m #nodes of data is observed and a dictionary of size L is prepared. 2500 and tensor representation. Efficient implementations like Matching Pursuit and Patched-Based optimization are also presented to allow for a fast solving of the tensor compressed sensing problem. To show the robustness and effectiveness of our approach, we test the algorithms on several datasets. The results shows our model in a way captures the sparsity of social networks better and therefore more desirable in practice. Simple observation shows social networks share a nontrivial sparse structure. The graph tensor sparse representation is our very preliminary attempts to identify that structure. The future work, as we think, is three-fold. Technically, we may continue to explore more sophisticated numerical methods to deal with real big data from large social networks; Algorithmically, we can make more diverse ways of gathering the data, including path measurements, message propagation measurements; theoretically, we could prove some lower bounds on the number of samples needed for our sparse recovery algorithm as well as guarantees of the Pursuit Algorithm. References Matlab code for diffusion wavelets. http://http:// www.math.duke.edu/~mauro/code.html., 2009. Snap: Stanford network analysis platform. http:// snap.stanford.edu., 2009. Sparse modeling software. http://http: //spams-devel.gforge.inria.fr/., 2012. Anagnostopoulos, Aris, Kumar, Ravi, and Mahdian, Mohammad. Influence and correlation in social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 7 15. ACM, 2008. 2000 1500 Residual #Non Zero Item Baraniuk, Richard, Davenport, Mark, DeVore, Ronald, and Wakin, Michael. A simple proof of the restricted isometry property for random matrices. Constructive Approximation, 28(3):253 263, 2008. 1000 500 0 0 100 200 300 400 500 600 700 800 900 1000 iteration Figure 7. Pursuit Matching. and #Non-Zero Items. Change of Residual Candès, Emmanuel J and Wakin, Michael B. An introduction to compressive sampling. Signal Processing Magazine, IEEE, 25(2):21 30, 2008. Candès, Emmanuel J, Romberg, Justin, and Tao, Terence. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. Information Theory, IEEE Transactions on, 52(2):489 509, 2006.

Coates, Mark, Pointurier, Yvan, and Rabbat, Michael. Compressed network monitoring for ip and alloptical networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pp. 241 252. ACM, 2007. Donoho, David Leigh. Compressed sensing. Information Theory, IEEE Transactions on, 52(4):1289 1306, 2006. Easley, David and Kleinberg, Jon. Networks, crowds, and markets, volume 8. Cambridge Univ Press, 2010. Friedman, Jerome H and Tukey, John W. A projection pursuit algorithm for exploratory data analysis. Computers, IEEE Transactions on, 100(9):881 890, 1974. Kleinberg, Jon. The small-world phenomenon: an algorithmic perspective. In Proceedings of the thirtysecond annual ACM symposium on Theory of computing, pp. 163 170. ACM, 2000. Kwak, Haewoon, Lee, Changhyun, Park, Hosung, and Moon, Sue. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, pp. 591 600. ACM, 2010. Lee, Honglak, Battle, Alexis, Raina, Rajat, and Ng, Andrew Y. Efficient sparse coding algorithms. Advances in neural information processing systems, 19: 801, 2007. Mairal, Julien, Bach, Francis, Ponce, Jean, and Sapiro, Guillermo. Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 689 696. ACM, 2009. Mallat, Stephane G and Zhang, Zhifeng. Matching pursuits with time-frequency dictionaries. Signal Processing, IEEE Transactions on, 41(12):3397 3415, 1993. Mark Coates, Yvan Pointurier and Rabbat, Michael. Compressed network monitoring for ip and alloptical networks, 2007. Newman, Mark. Networks: an introduction. OUP Oxford, 2009. Newman, Mark EJ. Models of the small world. Journal of Statistical Physics, 101(3-4):819 841, 2000. Ronald R. Coifman, Mauro Maggioni. Diffusion wavelets. In Applied and Computational Harmonic Analysis, volume 21, pp. 53 94, 2006. Russell, Matthew A. Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites. O Reilly Media, 2011. Yang, Jianchao, Wright, John, Huang, Thomas, and Ma, Yi. Image super-resolution as sparse representation of raw image patches. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1 8. IEEE, 2008. Liben-Nowell, David and Kleinberg, Jon. The linkprediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019 1031, 2007. Liwen, Xu. Efficient data gathering using compressed sparse functions. 2013. Lu, Yue M and Do, Minh N. A theory for sampling signals from a union of subspaces. Signal Processing, IEEE Transactions on, 56(6):2334 2345, 2008. Luo, Chong, Wu, Feng, Sun, Jun, and Chen, Chang Wen. Compressive data gathering for largescale wireless sensor networks. In Proceedings of the 15th annual international conference on Mobile computing and networking, pp. 145 156. ACM, 2009. Mahadevan, Sridhar and Maggioni, Mauro. Value function approximation with diffusion wavelets and laplacian eigenfunctions. Advances in neural information processing systems, 18:843, 2006.