BIG data are a collection of dataset consisting of
|
|
|
- Theodora Palmer
- 10 years ago
- Views:
Transcription
1 1 A ensor-based Approach for Big Data Representation and Dimensionality Reduction Liwei Kuang, Fei Hao, Laurence. Yang, Man Lin, Changqing Luo, and Geyong Min Abstract Variety and veracity are two distinct characteristics of large-scale and heterogeneous data. It has been a great challenge to efficiently represent and process big data with a unified scheme. In this paper, a unified tensor model is proposed to represent the unstructured, semi-structured and structured data. With tensor extension operator, various types of data are represented as sub-tensors and then are merged to a unified tensor. In order to extract the core tensor which is small but contains valuable information, an Incremental High Order Singular Value Decomposition (IHOSVD) method is presented. By recursively applying the incremental matrix decomposition algorithm, IHOSVD is able to update the orthogonal bases and compute the new core tensor. Analyses in terms of time complexity, memory usage and approximation accuracy of the proposed method are provided in this paper. A case study illustrates that approximate data reconstructed from the core set containing 18% elements can guarantee 93% accuracy in general. heoretical analyses and experimental results demonstrate that the proposed unified tensor model and IHOSVD method are efficient for big data representation and dimensionality reduction. Index erms ensor, HOSVD, Dimensionality Reduction, Data Representation 1 INRODUCION BIG data are a collection of dataset consisting of massive unstructured, semi-structured, and structured data. he four main characteristics of big data are volume (amount of data), variety (range of data types and sources), veracity (data quality), and velocity (speed of incoming data). Although many studies have been done on big data processing, very few have addressed the following two key issues: (1) how to represent the various types of data with a simple model; (2) how to extract the core data sets which are smaller but still contain valuable information, especially for streaming data. he purpose of this paper is to explore the above raised issues which are closely related to the variety and veracity characteristics of big data. Logic and Ontology [1], two knowledge representation methodologies, have been investigated widely. Composed of syntax, semantics and proof theory, Logic is used for making statements about the world. Although Logic is concise, unambiguous and expressive, it works with the statements that are true or false and is hard to be used for reasoning with L. Kuang, F. Hao and C. Luo are with the School of Computer Science and echnology, Huazhong University of Science and echnology, Wuhan , China. L.. Yang is with the School of Computer Science and echnology, Huazhong University of Science and echnology, Wuhan , China, and the Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada. M. Lin is with the Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada. Geyong Min is with the College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, EX4 4QF, United Kingdom. unstructured data. Ontology is the set of concepts and relationships that can help people communicate and share knowledge. It is definitive and exhaustive, but it also causes incompatibility among different application domains, and thus is not suitable for representing and integrating heterogeneous big data. he study of data dimensionality reduction has been reported in the literature. Previous approaches include Principal Component Analysis (PCA) [2], Incremental Singular Value Decomposition (SVD) [3], and Dynamic ensor Analysis (DA) [4]. hese methods are available for low dimension reduction but suffer from some limitations because they are timeconsuming when being performed on high-dimension data and fail to extract the core data sets from streaming big data. his paper presents a unified tensor model for big data representation and an incremental dimensionality reduction method for high-quality core set extraction. Data with different formats are employed to illustrate the representation approach, and equivalent theorems are proven to support the proposed reduction method. he major contributions are summarized as follows. Unified Data Representation Model: We propose a unified tensor model to integrate and represent the unstructured, semi-structured, and structured data. he tensor model has extensible orders to which new orders can be dynamically appended through the proposed tensor extension operator. Core ensor Equivalence heorem: o tackle the recalculation and order inconsistency problems in big data processing with tensor model, we prove a core tensor equivalence theorem which
2 2 can serve as the theoretical foundation for designing incremental decomposition algorithms. Recursive Incremental HOSVD Method: We present a recursive Incremental High Order Singular Value Decomposition method for streaming data dimensionality reduction. Detailed analyses in terms of time complexity, memory usage and approximation accuracy are also investigated. he remainder of this paper is organized as follows. Section 2 recalls the preliminaries of tensor decomposition. Section 3 presents a framework for big data representation and processing. A unified tensor model for big data representation is proposed in Section 4. Section 5 presents a novel incremental dimensionality reduction method. A case study of intelligent transportation is investigated in Section 6. After reviewing the related works in Section 7, we conclude the paper in Section 8. 2 PRELIMINARIES his section reviews the preliminaries of singular value decomposition [5] and tensor decomposition [6]. he core tensor and truncated bases described in the preliminaries can be employed to make big data smaller. Definition 1: Singular Value Decomposition (SVD). Let M R m n denote a matrix, the factorization M = UΣV (1) is called the SVD of M. Matrices U and V refer to the left singular vector space and the right singular vector space of matrix M respectively. Both U and V are unitary orthogonal matrices. Matrix Σ = diag(σ 1, σ 2,..., σ k,..., σ l ), l = min{m, n} is a diagonal matrix that contains the singular values of M. In particular, M k = U k Σ k V k (2) is called the rank-k truncated SVD of M, where U k = [u 1,.., u k ], V k = [v 1,.., v k ], Σ k = diag(σ 1,..., σ k ), k < l. he truncated SVD of M is much smaller to store and faster to compute. Among all rank-k matrices, M k is the unique minimizer of M M k F. Definition 2: ensor Unfolding. Given a P -order tensor R I1 I2... I P, the tensor unfolding [7] (p) R I p (I p+1 I p+2...i P I 1 I 2...I p 1 ) contains the element t i1i 2...i pi p+1...i P at the position with row number i p and column number that is equal to (i p+1 1)I p+2 I P I 1 I p 1 +(i p+2 1)I p+3 I P I 1 I p (i 2 1)I 3 I 4 I p i p 1. Example 1. Consider a three-order tensor R 2 4 3, Fig. 1 shows the three unfolded matrices (1), (2) and (3). Definition 3: p-mode product of a tensor by a matrix. Suppose a tensor (2) = (3) = (1) = Fig. 1. hree-order tensor unfolding; tensor is unfolded to three matrices. R I1 I2... Ip 1 Ip Ip+1... I P and a matrix U R J p I p, the p-mode product ( p U) R I 1 I 2... I p 1 J p I p+1... I P is defined as ( p U) i1 i 2...i p 1 j p i p+1...i P = I p i p=1 (a i1 i 2...i p 1 i p i p+1...i P u jp i p ). he p-mode product is a key linear operation for dimensionality reduction, and the truncated left singular vector matrix U Jp I p (J p < I p ) is used to reduce the dimensionality of order I p from I p to J p. I 4 I 3 I 5 I 6 I 2 I 1 I 4 2 = I 3 I 2 I 5 I 6 Fig. 2. ensor dimensionality reduction with p-mode product; the dimensionality of the 2nd order is reduced from 8 to 2 by a 2 8 matrix. Definition 4: Core ensor and Approximate ensor. For an initial tensor, the core tensor S [8] and the approximate tensor ˆ are defined as and (3) S = 1 U 1 2 U 2... P U P, (4) ˆ = S 1 U 1 2 U 2... P U P. (5) he core tensor S is viewed as a compressed version of initial tensor. By keeping only the left k unitary orthogonal vectors of the unfolded matrix, the principal characteristics are reserved. Big data applications can simply keep the core tensor S and truncated bases U 1, U 2,, U P. When needed, data can be reconstructed by generating the approximation tensor with Eq. (5). he right singular vector matrices V 1, V 2,, V P and the singular values are unified to the core tensor which contains the coordinates of the left singular vector matrices in the approximate tensor. I 1
3 3 In general, the reconstructed data are more efficient than the original data as noise, inconsistency and redundancy are removed. ^ Fig. 3. Illustration of the core tensor and the approximate tensor. he core tensor and the truncated unitary orthogonal bases (U 1, U 2, U 3 ) are called core data sets that can be used to make big data smaller, while the reconstructed approximate tensor is a substitute for the initial tensor. 3 DAA REPRESENAION AND PROCESS- ING FRAMEWORK In this section, a tensor-based data representation and processing framework is proposed. Fig. 4 depicts a three-tier framework in which different modules are enabled in each layer. We elaborate the functions and responsibilities of each module through a bottom-up view approach. ransportation Finance Mining Algorithm Inference Method Data Visualization Streaming Data Healthcare Video... Audio XML... HML GPS... EHR Data Service Data Analysis Data Dimensionality Reduction Data ensorization Data Collection data are not uniform, these data need to be represented as a unified tensor model. he subtensors with various orders will be generated to model the data according to their initial format. hen, all the sub-tensors will be integrated as a unified heterogeneous tensor. 3) Data Dimensionality Reduction Module. his module is for efficiently processing the high dimension tensorized data, and extracting the core data sets that are more smaller for storage and computation. he reduction can be enhanced by virtue of implementation of the proposed IHOSVD algorithm which can incrementally update the orthogonal bases of each unfolded matrices. 4) Data Analysis Module. Numerous algorithms such as clustering algorithms, multi-aspect predication algorithms, etc., are included in this module. he module can help obtain potential values behind large scale heterogeneous data. Data visualization module in this layer helps users easily understand the data values. 5) Data Service Module. Data service module provides services according to the requirements of different applications. For instance, with the s- mart monitor appliances, proactive health care services can be provided to users based on the thorough understanding of their physical status. his paper mainly focuses on data tensorization module and data dimensionality reduction module. 4 A UNIFIED DAA REPRESENAION MOD- EL his section proposes a tensor-based data representation model and tensorization approach for transforming heterogeneous data to a unified model. Firstly, an extensible order tensor model and tensor extension operator are presented. Secondly, we illustrate how to tensorize the unstructured, semi-structured and structured data as sub-tensors. hirdly, the integration of sub-tensors as a unified tensor is studied. ensor order and tensor dimension, two confusing concepts, are then discussed in the end. Unstructrued Data Semi-structured data Structured data Fig. 4. Data representation and processing framework. 1) Data Collection Module. his module is in charge of collecting various types of data from different areas, for example, video clip, XML document and GPS data. he streaming data will incrementally arrive and temporarily agglomerate together without changing their original format. 2) Data ensorization Module. Since the collected unstructured, semi-structured and structured 4.1 Extensible Order ensor In general, time and space are two basic characteristics of data collected from different areas, while users are major recipients of data services. herefore, a general tensor-based data model is defined as R It Is Iu I1... IP. (6) Eq. (6) shows a (P + 3)-order tensor which contains two parts, namely, the fixed part R It Is Iu and the extensible part R I1... IP. he tensor orders I t, I s and I u denote time, space and user respectively.
4 4 In the tensor model, data characteristics are represented as tensor orders. For example, the color space characteristic of unstructured video data can be modeled as I c. For heterogeneous data, various characteristics are represented as tensor orders and attached to the fix part using the proposed tensor extension operator. Definition 5: ensor Extension Operator. Let A R I t I s I u I 1, and B R I t I s I u I 2, the tensor extension operator is given by the following function f : A B C, C R It Is Iu I1 I2. (7) Operator satisfies the associative law. In other words, (A B) C = A (B C). By virtue of Eq. (7), heterogeneous data can first be tensorized as low order sub-tensors and then extended to a high order unified tensor. he operator merges the identical orders while keeping the diverse orders. Elements of the identical order are accumulated together. For instance, sub-tensor sub1 and sub-tensor sub2 have time order denoted as I t 1, I t 2, where I t 1 {i 1, i 2 }, I t 2 {i 1, i 3 }. After extension, time order of the new tensor = sub1 sub2 becomes I t {i 1, i 2, i 3 }. 4.2 ensorization Method Examples of unstructured data include video data and audio data, while semi-structured data are composed of XML documents, ontology data, etc. Representatives of structured data are numbers and character strings stored in relational database. In this paper, video clip, XML document and GPS data are employed to illustrate the tensorization process. Height Frames Width One frame Blue Green Red Fig. 5. Represent video clip as four-order tensor. I c I h I f I w <?xml version='1.0' encoding='uf-8'?> <University> <Student Category= doctoral'> <ID> </ID> <Name>Liang Chen</Name> <Research> <Area>Internet of hings</area> <Focus>Architecture;Sensor Ontology</Focus> </Research> </Student> </University> Element <ID> ext (a) Root Element <University> Element <Student> Element <Name> ext Liang Chen (b) Attribute <Category> Element <Area> ext Element <Research> Element <Focus> ext I ec Fig. 6. Represent XML document data as a threeorder tensor; (a) gives an initial XML document, (b) is the parsed tree, (c) shows the relationships between elements, and the three-order tensor is illustrated in (d). contain tags and contents both consisting of characters from unicode repertoire. An XML document has a hierarchical structure and can be parsed as a tree. Fig. 6(b) is the parsed tree of Fig. 6(a). XML Document can be tensorized as a three-order tensor, where I er and I ec indicate the row and column orders of the markup matrix, and I en represents the content vector order. For example, the XML document in Fig. 6(a) is tensorized as R , where 28 is the length of element Focus. Relationships among element, attribute and text are represented as numbers. In Fig. 6(c), number 1 is used to indicate the parent-child relationship. (c) (d) I en I er Video data can be represented as four-order tensor or three-order tensor. o represent a video clip of MPEG-4 format, 25 frames per second, resolution and RGB color space, a four-order tensor R I f I w I h I c is adopted with I f, I w, I h, I c indicating frame, width, height and color space. For instance, a 750-frame MPEG-4 video clip with resolution of and RGB color can be tensorized as R In some applications, RGB color is usually transformed to gray level using equation Gray = 0.299R G B, and the representation is replaced by a three-order tensor R Fig. 5 shows the process of transforming a video clip to a four-order tensor. Extensible Markup Language (XML) is semistructured. Fig. 6 shows a simple XML document with seven elements and one attribute. he elements Record StudentID Longitude Latitude ime 1 D :36:15 2 D :36:25 3 D :36:35 Record StudentID StudentName 1 D Liang Chen I y I x I name = I t I id I id I name Fig. 7. he upper table is modeled as a four-order subtensor, the lower table is modeled as a two-order subtensor, and the two sub-tensors are unified as a fiveorder tensor. Relational database is widely used to manage structured data. In database table, simple fields with num- I t I y I x I id
5 5 ber or character string type can be modeled as a matrix. For complex field, e.g. BLOB, new orders are needed for representation. In Fig. 7, the structured GPS data and student data are unified as a five-order tensor. GPS I s Video 4.3 Unified ensor Representation Model Big data are composed of unstructured data d u, semistructured data d semi and structured data d s. Due to the requirement of processing all types of heterogenous data, a unified data tensorization operation is performed using the following equation f : (d u d semi d s ) u semi } {{ } s. (8) With Eq. (7) and Eq. (8), d u, d semi and d s are transformed to subtensors u, semi and s which will later be integrated as a unified tensor. For example, on the basis of transformed video clip, XML document and structured tables as described in Figs. 5 7, the final tensor is consequently obtained as follows, R I t I s I u I w I h I er I ec I en I id I na. (9) In Eq. (9), order I f is identical to order I t, order I x, I y are combined to order I s, and order I c is unnecessary because gray level is adopted. Since too many orders may increase the decomposition complexity, less orders are preferable at the data representation stage. An element of the ten-order tensor in Eq. (9) is described as an eleven-tuple e = ( I, SP, U, W, H, ER, EC, EN, ID, NA, V ), (10) where I, SP and U refer to the fixed order time, space and user, W and H denote the orders from video data, ER, EC and EN are XML document characteristics, ID and NA are for GPS data, and V is the value of element e. Such type of tuples generated from heterogeneous tensor are usually sparse, and only the nonzero elements are essential for storage and computation. he generalized tuple formate according to Eq. (6) is defined as e = ( I, SP, U, i 1,..., i P, V ). (11) Fig. 8 illustrates the extensible order tensor model from another point of view. he fixed part containing I, SP and U is seen as an overall layer, while the extensible part is deemed as an inner layer. he tensor is simplified as a two layer model where the inner model is embedded to the three-order (I t I s I u ) overall model. Using the tensorization method, the heterogeneous data are modeled as sub-tensors that are inserted to the two-layer model to generate the unified tensor. I u I t XML Document Fig. 8. Visualization of the two-layer model for data representation. 4.4 ensor Order and ensor Dimension As tensor order and tensor dimension are two key concepts for data representation, we give a brief comparison between them. ensor R I1 I2... I P has P orders, and order i (1 i P ) has I i dimensions. A P -order tensor can be unfolded to P matrices. For the mode-i unfolded matrix (i), the number of rows is equal to I i, while the number of columns is equal to I j. In many big data applications, it is 1 j P, j i impractical to store all dimensions of big data which contain redundancy, uncertainty, inconsistency and incompleteness, thus it is essential to extract valuable core data. During the extraction of core data set, the number of tensor orders remains the same while the dimensionality is significantly reduced. 5 INCREMENAL ENSOR DIMENSIONALIY REDUCION A novel method is proposed for dimensionality reduction on streaming data in this section. Firstly, two problems of tensor decomposition are defined. hen two equivalence theorems are proven and an Incremental High-Order Singular Value Decomposition (IHOSVD) method that can efficiently compute the core data sets on streaming data is presented. Finally, complexity and accuracy of the proposed method are discussed. 5.1 Problems Definition wo important problems related to incremental tensor dimensionality reduction are: (1) the recalculation problem; (2) the order inconsistency problem. hey are formally defined below. Problem 1: ensor Decomposition Recalculation. Let S 1 denote the core tensor obtained from the previous tensor 1. denotes a new tensor. Combining 1 with, we obtain 2 = 1. According to Eq. (4), the new core tensor S 2 of new tensor 2 is computed with S 2 = 2 1 U 1 2 U 2... P U P. (12)
6 6 Decomposition recalculation occurs in Eq. (12) because the previous decomposition results during computing core tensor S 1 are not reused. Problem 1 can be solved using Algorithm 1 and Algorithm 2 that are designed with the proposed recursive incremental singular value decomposition method. Problem 2: ensor Order Inconsistency. Assume 1, S 2 and 2 are defined as previous tensor, new core tensor and new combined tensor, to compute S 2 with Eq. (4), the row number of the truncated orthogonal matrix U must be consistent with dimensionality of the tensor order I n. However, one order dimensionality of the combined tensor 2 is not equal to the row number of truncated orthogonal matrix U. For instance, let 1 R be a three-order tensor, 1(1) R 2 4, 1(2) R 2 4 and 1(3) R 2 4 are three unfolded matrices of 1. Given a new tensor χ R 2 2 2, combining it with previous tensor 1 along the third order I 3, we obtain 2 R he third order dimensionality of 2 is 4, while the row number of the truncated orthogonal basis computed from matrix 1(3) is 2. his leads to order inconsistency. In this paper, heorem 1, heorem 2 and Algorithm 3 are presented to address Problem Basis and Core ensor Equivalence heorems he left singular vector matrix U plays a key role on dimensionality reduction and data reconstruction. Similarly, the truncated k-rank orthogonal unitary bases U 1, U 2,..., U P of the unfolded matrices construct the most basic coordinate axes of a P-order tensor. For heterogeneous big data dimensionality reduction, the major difficulty lies in computing the bases on variable dimension. Our approach extends dimension to fixed length and finds out equivalent basis. In this paper, two theorems are presented and proven to support our approach. heorem 1: Basis Equivalence of SVD. Let M 1 be a m 1 by n matrix, and M 2 be a m 2 by n matrix whose left m 1 columns contain matrix M 1 and right m 2 m 1 columns are zeros. Namely, M 2 = [M 1 0], M 1 R m1 n, M 2 R m2 n, m 1 < m 2. If the singular value decompositions of matrix M 1 and matrix M 2 are expressed as M 1 = U 1 Σ 1 V 1, M 2 = U 2 Σ 2 V 2, (13) hen, the unitary orthogonal basis U 1 is equivalent to U 2. Proof. From Eq. (13), we obtain [ ] M 2 M2 M = [M 1 0] 1 = M 0 1 M1. (14) Consider M 2 M 2 = U 2 Σ 2 V 2 V 2 Σ 2 U 2 = U 2 (Σ 2 Σ 2 )U 2, (15) and M 1 M 1 we obtain = U 1 Σ 1 V 1 V 1 Σ 1 U 1 = U 1 (Σ 1 Σ 1 )U 1, (16) U 1 (Σ 1 Σ 1 )U 1 = U 2 (Σ 2 Σ 2 )U 2. (17) Note that both sides of Eq. (17) are spectral decompositions of two equal symmetric matrix. Additionally, the diagonal matrices Σ 1 Σ 1 and Σ 2 Σ 2 consist of the eigenvalues of the equal matrix. According to the uniqueness characteristic of eigenvalues, Σ 1 Σ 1 and Σ 2 Σ 2 are equal. It can be concluded that U 1 is equivalent to U 2. he equivalence implies that U 1 can be calculated by multiplying U 2 with a series of Elementary Matrix [9]. Based on heorem 1, the following two corollaries can be derived. Corollary 1: Let M 1 = [v 1, v 2,..., v n ], M 2 = [v 1, v 2,..., 0,..., 0,..., v n ], where v i is column vector, then the two matrices have equivalent left singular vector bases. [ ] M1 Corollary 2: Suppose M 2 =, then matrix 0 M 1 and matrix M 2 have equivalent left singular vector bases. With Corollary 2, the orthogonal basis U 1 can be obtained by trimming the bottom zeros of the orthogonal basis U 2. heorem 1, Corollaries 1 and 2 are employed to prove heorem 2 defined as follows. Before the proof, we introduce a special matrix which will be used in heorem 2. Definition 6: Extension Matrix. An extension matrix is defined as M = [ I 0 ], M R Jp Ip, J p > I p. Multiply the P -order tensor R I 1 I 2... I p... I P with extension matrix M along order p, the dimensionality of this order is extended from I p to J p. heorem 2: Core ensor Equivalence of HOSVD. Let and G be P-order tensors, where R I1 I2... I P and G R I1 I2... (lip)... I P, l is a non-negative integer. Define M as an extension matrix, M R Ip (lip). ensor and G satisfy ]. = G p M = G n [ Ip 0 lp Proof. Unfold tensor and tensor G to P matrices (1), (2),..., (P ), and G (1), G (2),..., G (P ). According to heorem 1, Corollaries 1 and 2, the corresponding unfolded matrices of tensor and G have equivalent left singular vector bases. Besides, the p-mode product of tensor by matrices A, B posses the following properties i A j B = j B i A, (18) and i A i B = i (BA). (19)
7 7 Employing Eq. (4), core tensors S, S G are calculated with the following equations and S = 1 U 1 2 U P U P, (20) S G = G 1 U 1 2 U P U P. (21) With Eqs. (18) (21), we obtain S = 1 U 1 2 U P U P = (G p M) 1 U 1 2 U P U P = G 1 U 1 2 U P U P pm = S G p M. (22) heorem 2 reveals that extending a tensor by padding zero elements will not transform the core tensor. After unified representation of big data, order number of the incremental tensor and the initial tensor are equal, but the dimensionality are different. heorem 2 can be used to solve this problem by resizing dimensionality. 5.3 Incremental High Order Singular Value Decomposition We propose an IHOSVD method for incremental dimensionality reduction on streaming data. IHOSVD method consists of three algorithms that are used for recursive matrix singular value decomposition and incremental tensor decomposition. he three algorithms are separately described in detail. Algorithm 1 is a recursive algorithm with recursive function given in Eq. (23). During the running process, function f will call itself (Step 4) over and over again to decompose matrices M i and C i. Each successive call reduces the size of matrix and moves closer to a solution until matrix M 1 is reached finally, the recursion stops and the function can exit. f(m i, C i ) = { svd(m1 ), i = 1 mix(f(m i 1, C i 1 ), C i ), i > 1 (23) Algorithm 1 Recursive matrix singular value decomposition, (U, Σ, V ) = R MSvd(M i, C i ). Input: Initial matrix M i. Incremental matrix C i. Output: Decomposition results U, S, V of matrix [M i C i ]. 1: if (i == 1) then 2: [U, Σ, V ] = svd(m 1 ). 3: else 4: [U j, Σ j, V j ] = R MSvd(M i 1, C i 1 ). 5: [U, Σ, V ] = mix(m i 1, C i 1, U j, Σ j, V j ). 6: end if 7: return U, S, V. Algorithm 1 calls function mix (Step 5) to merge column vectors of the incremental matrix with the decomposed components of initial matrix. Additional vectors are projected to the orthogonal bases and the coordinates are combined to the singular values. Detailed procedures of function mix are described in Algorithm 2. For most tensor unfolding, the number of rows is less than the number of columns. For such type of matrices, Algorithm 1 can efficiently compute the singular values and singular vectors by splitting the columns for recursive decomposition. Coordinates L K Coordinates C U Projection Projection J U J (a) V U V L K I (b) Fig. 9. (a) Incrementally incoming column vectors are projected on unitary orthogonal bases; (b) he middle quasi-diagonal matrix is diagonalized and the previous singular vector matrices are updated. Algorithm 2 Merge incremental matrix with decomposition results, (U, Σ, V ) = mix(m i 1, C i 1, U j, Σ j, V j ). Input: Initial matrix M i 1 and incremental matrix C i 1. Decomposition results U j, Σ j, V j of matrix M. Output: New decomposition results U, Σ, V. 1: Project C i 1 on the orthogonal space spanned by U j, L = Uj C i 1. 2: Compute H which is orthogonal to U j, H = C i 1 U j L. 3: Obtain the unitary orthogonal basis J from matrix H. 4: Compute the coordinates of matrix H, K = J H. 5: Execute SVD on the new matrix [U J], [U, Σ, V ] = svd([u J]). 6: Obtain new decomposition results, ([U J], U ) U, Σ Σ, V V. 7: return U, S, V. Algorithm 2 applies SVD updating [3] technique for incrementally matrix factorization. he additional columns in matric C i 1 are projected on the unitary orthogonal bases of previous matrix M i 1 (Step 1). Some column vectors are linear combination of orthogonal unitary bases U j, others are components orthogonal to the space spanned by U j. As illustrated in Fig. 9, these two types of vectors are separated J U
8 8 to obtain the bases U j, J and coordinates L, K. he operations are implemented as Steps 2 4. he column space of singular vector matrix U are spanned by the direct sum of the above two unitary orthogonal bases as follows CS(U) = span(u j J). (24) Combining the coordinates with the previous singular values, we obtain a quasi-diagonal sparse matrix which is easy for decomposition. he new equation consisting of the above orthogonal bases and coordinates is defined as [M i 1, C i 1 ] = [U j, J] [ Σj L 0 K ] [ V 0 0 I ]. (25) Let Ū and V denote the unitary orthogonal bases of the quasi-diagonal matrix in Eq. (25), the updated singular vector matrices are U = [U j J] Ū, V = [ V 0 0 I ] V. (26) Eq. (4) suggests only the left singular vector matrix U is essential for tensor decomposition. herefore, computation of matrix V can be omitted in Step 6 of Algorithm 2. Employing the above two algorithms, we propose Algorithm 3 for incrementally computing the core tensor. In this algorithm, extension matrix is used to ensure order consistency (Step 1). Unitary orthogonal bases U (1),..., U (P ) are updated from Step 2 to Step 4, as well as the new core tensor S is obtained in Step 6. For demonstration purpose, Fig. 10 shows a simple example with a three-order tensor. 3.Extension 1.Extension 4.Unfolding 2.HO-SVD U 1 (1) U 2 5.Update U 1,U 2,U 3 U 3 (3) (2) S U 1,U 2 6. New U 3,S Fig. 10. Example of incremental tensor decomposition, truncated orthogonal bases U 1, U 2, U 3 of new tensor are updated incrementally. 5.4 Complexity and Approximation Accuracy ime Complexity Execution time of the proposed IHOSVD method consists of matrix unfolding, incremental singular value decomposition of each unfolded matrices, and product of a tensor by the truncated bases. Let ime unf, ime isvd and ime prod denote the time used by the Algorithm 3 Incremental tensor singular value decomposition, (S, [U, Σ, V ] new ) = I Svd(χ,, [U, Σ, V ] initial ). Input: New tensor χ R I 1 I 2... I P. Previous tensor R I 1 I 2... I P. Previous unfolded matrices SVD results [U, Σ, V ] initial. Output: New truncated SVD results [U, Σ, V ] new. New core tensor S. 1: Extend tensor χ and tensor to identical dimensionality. 2: Unfold new tensor χ to matrices χ (1),..., χ (P ). 3: Call algorithm R M Svd to update above unfolded matrices. 4: runcate the new orthogonal bases. 5: Combine new tensor χ with initial tensor. 6: Obtain new core tensor S with n-mode product. 7: return S, and [U, Σ, V ] new. above processes respectively, the total time consumption ime satisfies ime = ime unf + ime isvd + ime prod. (27) ensor unfolding is a simple transformation with O(1) time complexity. ime isvd is equal to ime 1 + ime ime P = P i=1 ime i, where ime i refers to the time consumed by unfolded matrix (i). According to Eq. (23), time ime isvd can be obtained with { C1, i = 1 ime(i) = (28) ime(i 1) + C 2, i > 1, where C 1 and C 2 are constants. he recursive calling process first adds columns and then updates them with the previous decomposition results. he time complexity of decomposing one unfolded matrix is O(k 2 n), where k refers to the number of the truncated left singular vectors. For a truncated orthogonal basis U with k column vectors, time complexity of the product of a tensor by a matrix is O(k 2 n). o decompose a p-order tensor with p unfolded matrices, the time complexity of the proposed IHOSVD method is O(1) + O(pk 2 n) + O(pk 2 n), namely O(pk 2 n) Memory Usage Let Mem u denote the memory used to store all truncated orthogonal bases, Mem r msvd and Mem mix refer to the memory usages for recursive process in Algorithm 1, then the total memory used by the proposed IHOSVD method is defined as Mem = Mem u + Mem r msvd + Mem mix. (29) Complexity of Mem u is equal to O(kn). o incrementally compute the core tensor, IHOSVD method needs to keep all the truncated orthogonal bases, and the
9 9 memory usage is P i=1 k ii i. According to Eq. (23), the needed memory during the recursive process is equal to M i + C i + M i 1 + C i M 1 + C 1. (30) Complexity of the above memory usage is O(kn). herefore, the complexity of total memory usage is O(kn)+O(kn), i.e. O(kn). For a p-order tensor with p unfolded matrices, the complexity is O(pkn) Approximation Accuracy Reconstruction error between initial tensor and approximate tensor can be exactly measured with Frobenius Norm [10] as ˆ = ( F I 1 i 1 =1,..., I P i p =1 (a i1,...,i p â i1,...,i p ) 2 ) 1 2. (31) For the unfolded matrix (i) of initial tensor, the approximate matrix is ˆ (i) = U i Σ i Vi. he reconstruction error is caused by approximation of all unfolded matrices. o clearly analyze tensor dimensionality reduction degree and tensor approximation degree, we present two ratios. Definition 7: he Dimensionality Reduction Ratio of tensor is defined as nnz(s) + N nnz(u i ) i=1 ρ = nnz( ), (32) where S denotes the core tensor, and U i is the mode-i truncated orthogonal basis. he core data sets of tensor are composed of S (core tensor) and U 1, U 2,..., U P. Because only nonzero elements of the core data sets are stored, ratio ρ can accurately reflect the dimensionality reduction degree. Definition 8: he Reconstruction Error Ratio of tensor is defined as ˆ e = F. (33) F Ratio e reflects the degree of reconstruction error with tensor Frobenius Norm. In this paper, the pair (ρ, e) is employed to describe the dimensionality reduction degree and reconstruction error degree. Obviously, the ratio ρ is inversely proportional to ratio e. Computation accuracy is important for tensor data approximation, and in most applications, HOSVD type algorithms can find a better approximation. o obtain higher accuracy, High-Order Orthogonal Iteration (HOOI) [11] method can be utilized to find the best rank approximation. he High-Order Singular Value Decomposition (HOSVD) and the Higher Order Orthogonal Iteration (HOOI) of ensor can be viewed as extensions to the Singular Value Decomposition (SVD). 6 CASE SUDY In this section, we illustrate the proposed unified data representation model and incremental dimensionality reduction method with an Intelligent ransportation case. he test data used in experiments consist of unstructured video data collected with fixed cameras and mobile phones, semi-structured XML documents about traffic information, and structured trajectory data. After dimensionality reduction, the core tensor and the truncated bases are small to store, but accurate and fast for reconstruction of big data. 6.1 Demonstration of ensor Unfolding We construct a five-order tensor R by extracting three frames from unstructured video clip and three users from semi-structured XML document. Fig. 11(a) shows the five unfolded matrices of tensor. he five orders represent height, width, color space, time and user respectively. (2) (1) (3) (4) (5) (a) Five unfolded matrices of five-order tensor. I er I u I s I t Incremental ensor Data Previous ensor Data Video User Order Inconsistency (b) Incremental data on unfolded matrices of eight-order tensor. Fig. 11. Heterogenous tensor unfolding and incremental tensor unfolding. o demonstrate incremental tensor unfolding, an eight-order tensor R It Is Iu I h I w I c I ec I er is constructed. Incremental data are appended along the time order I t. Unfolded matrices of the combined new tensor (initial tensor and incremental tensor) are shown in Fig. 11(b). Order inconsistency of the new tensor occurs in order I t, because the incremental data are appended as rows on the bottom of the unfolded matrix.
10 his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation 10 45% 41% Dimensionality Reduction Ratio (ρ) Reconstruction Error Ratio (e) 35% Ratio Fig. 11(a), Fig. 11(b) and Fig. 8 in Section 4 illustrate the tensor model from different viewpoints, and demonstrate how the heterogeneous data are stacked together. Fig. 8 demonstrates the procedure of embedding unstructured video data and semi-structured XML document to a three-order tensor, as well as Fig. 11(a) and Fig. 11(b) show the inner elements of the unified tensor model. 25% 18% 24% 15% 5% 5% 6.2 Dimensionality Reduction and Approximation Error Approximate ensor 2 Experiment No. 3 25% Core ensor runcated U 1 runcated U2 runcated U,U,U 3 15% 4 5 5% 1 Initial ensor 4% (a) Proportion here exists a tradeoff between dimensionality reduction and approximation error. Fig. 12 shows two video frames reconstructed from the above five-order tensor under three different approximation error ratio, namely 4%, 7%, and 24%. Fig. 13(a) plots the two ratios together, and illustrates that the reconstruction error ratio increases gradually as the dimensionality reduction ratio decreases. he core data sets are composed of core tensor S and truncated orthogonal bases U1,..., U5. Fig. 13(b) shows their proportions to the dimensionality reduction ratio. Generally, the proportion of the core tensor is bigger than the truncated bases. 1 7% 2 Experiment No. 3 (b) Fig. 13. (a) radeoff between dimensionality reduction and reconstruction error; (b) Proportion of the core tensor to truncated orthogonal bases. e = 0% e = 0.4% e = 7% e = 24% Fig. 12. Video frames reconstructed with different approximation error ratios. Diverse data types can result in different dimensionality reduction ratios and approximation error ratios. With repeated experiments on video clips, XML documents and GPS data, the results show that the core set containing 18% elements can guarantee 93% accuracy in general. In practice, the balance between dimensionality reduction and computation accuracy is determined by the application requirement. 6.3 ime and Memory Comparison Compared with the general High Order Singular Value Decomposition method, the proposed incremental High Order Singular Value Decomposition method is efficient and memory saving. o evaluate the two decomposition methods, we perform them in computers of Intel Core (M) i5 CPU at 3.2 GHZ with total 4 cores and 8 GB RAM. We divide the unified tensor to four blocks and normalize the tensor size as well as the decomposition time for better comparison. During the process of dimensionality reduction, the general HOSVD method integrates the additional tensor blocks with previous tensor blocks to generate a new tensor which is then repeatedly decomposed. Different from this type of repeated HOSVD method, the incremental HOSVD method updates the truncated orthogonal bases and dynamically computes the core tensor. Fig. 14 demonstrates that the decomposition time of the repeated HOSVD method is greater than the incremental HOSVD method. Additionally, decomposition time of the incremental HOSVD method increases more gently than the repeated HOSVD method from the normalized tensor size As the normalized tensor size grows beyond 0.75, the repeated HOSVD method runs out of memory while the incremental HOSVD method continues to run. From theoretical point of view, with more orthogonal bases are appended to the left singular vector matrix, the middle quasi-diagonal contains less orthogonal columns, and the time consumption during the diagonalization process decreases. In brief, the incremental HOSVD method is more efficient because it projects additional tensor unfolding to previous truncated orthogonal bases rather than directly execute the orthogonalization procedure.
11 11 Nomalized Decomposition ime Incremental HOSVD Repeated HOSVD Out of Memory Normalized ensor Size Fig. 14. Comparison between the repeated HOSVD method and the incremental HOSVD method. 7 RELAED WORK his section reviews related works on data representation and high order singular value decomposition. Data Representation: Big data are composed of unstructured, semi-structured and structured data. In particular, the multimedia as an unstructured data, is mostly encoded as MPEG4 and H.264. MPEG-4 [12] is a method for defining compression of audio and visual digital data. H.264 [13] is a widely used standard for video compression. he semi-structured Extensible Markup Language (XML) [14] is a flexible text format that defines a set of rules for Encoding documents. XML is both for human-readable and machine-readable. he characteristics making up an XML document are divided into markup and content. Kim and Candan [15] proposed a tensor-based relational data model that can process multi-dimensional structured data. Ontology, such as resource description framework (RDF) [16] and web ontology language (OWL) [17], is playing an ever important role in the exchange of a wide variety of data. Higher Order Singular Value Decomposition: A tensor [6, 7] is the generalisation of a matrix and usually called multidimensional array. ensor is a more effective data representation model from which valuable information can be extracted using high order singular value decomposition (HOSVD) [8] method. Because HOSVD imposes orthogonal constraints on the truncated column bases, it may be considered as a special case of the commonly used UCKER [18] decomposition. Although low rank truncation of the HOSVD is not the best approximation of the initial data, it is considered to be sufficiently good for many applications. Analysis and mining of data with HOSVD has been adopted in many applications such as tag recommendations [19, 20], trajectory indexing and retrieval [21], hand-written digit classification [22]. Studies of data representation and dimensionality reduction have been reported in literatures. However, unified model for heterogenous data representation has been neglected, as well as decomposition problems during incremental data processing have not been considered. he contributions of this paper are using a unified tensor model to represent the large scale heterogeneous data and developing an efficient approach for extracting the high-quality core tensor which is small but contains valuable information. 8 CONCLUSION his paper aims at representing and processing the large scale heterogeneous data generated from multiple sources. Firstly, we present a unified tensor-based data representation model that can integrate unstructured, semi-structured and structured data. Secondly, according to the proposed model, an incremental high order singular value decomposition (IHOSVD) method is proposed for dimensionality reduction on big data. We prove two theorems that can solve the problem of decomposition recalculation and order inconsistency. Finally, an intelligent transportation case is investigated for evaluating the method. heoretical analyses and experimental results of the case study provide the evidences that the proposed data representation model and incremental dimensionality reduction method are promising, and they pave a way for efficiently mining and analyzing in big data applications. 9 ACKNOWLEDGMEN his work was supported by the National Nature Science Foundation of China under Grant and by the Fundamental Research Funds for the Central Universities, HUS: CXY13Q017 and 2013QN122. REFERENCES [1] I. F. Cruz and H. Xiao, Ontology Driven Data Integration in Heterogeneous Networks, in Complex Systems in Knowledge-Based Environments: heory, Models and Applications. Springer, 2009, pp [2] H. Abdi and L. J. Williams, Principal Component Analysis, Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp , [3] M. Brand, Incremental Singular Value Decomposition of Uncertain Data with Missing Values, in Computer Vision ECCV Springer, 2002, pp [4] J. Sun, D. ao, and C. Faloutsos, Beyond Streams and Graphs: Dynamic ensor Analysis, in Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2006, pp
12 his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation 12 [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] E. Henry, J. Hofrichter et al., Singular Value Decomposition: Application to Analysis of Experimental Data, Essential Numerical Computer Methods, vol. 210, pp , C. M. Martin, ensor Decompositions Workshop Discussion Notes, American Institute of Mathematics, G. Kolda and B. W. Bader, ensor Decompositions and Applications, SIAM Review, vol. 51, no. 3, pp , L. De Lathauwer, B. De Moor, and J. Vandewalle, A Multilinear Singular Value Decomposition, SIAM Journal on Matrix Analysis and Applications, vol. 21, no. 4, pp , H. Anton, Elementary Linear Algebra. Wiley. com, C. Meyer, Matrix Analysis and Applied Linear Algebra Book and Solutions Manual. SIAM, 2000, vol. 2. L. De Lathauwer, B. De Moor, and J. Vandewalle, On the Best Rank-1 and Rank-(R 1, R 2,..., Rn) Approximation of Higher-Order ensors, SIAM Journal on Matrix Analysis and Applications, vol. 21, no. 4, pp , I. E. Richardson, H. 264 and MPEG-4 Video Compression: Video Coding for Next-Generation Multimedia. Wiley. com, D. Marpe,. Wiegand, and G. J. Sullivan, he H. 264/MPEG4 Advanced Video Coding Standard and Its Applications, IEEE Communications Magazine, vol. 44, no. 8, pp , E. Van der Vlist, XML Schema: he W3C s ObjectOriented Descriptions for XML. O Reilly Media, Inc., M. Kim and K. S. Candan, Approximate ensor Decomposition within a ensor-relational Algebraic Framework, in Proc. of the 20th ACM International Conference on Information and Knowledge Management. ACM, 2011, pp I. Horrocks, P. F. Patel-Schneider, and F. Van Harmelen, From SHIQ and RDF to OWL: he Making of a Web Ontology Language, Web Semantics: Science, Services and Agents on the World Wide Web, vol. 1, no. 1, pp. 7 26, D. L. McGuinness, F. Van Harmelen et al., OWL Web Ontology Language Overview, W3C Recommendation, vol. 10, p. 10, L. R. ucker, Some Mathematical Notes on hree-mode Factor Analysis, Psychometrika, vol. 31, no. 3, pp , P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos, ag Recommendations Based on ensor Dimensionality Reduction, in Proc. of the 2008 ACM Conference on Recommender Systems. ACM, 2008, pp R. Wetzker, C. Zimmermann, C. Bauckhage, and S. Albayrak, I ag, You ag: ranslating ags for Advanced User Models, in Proc. of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 2010, pp [21] Q. Li, X. Shi, and D. Schonfeld, A General Framework for Robust HOSVD-Based Indexing and Retrieval with High-Order ensor Data, in Proc. of the 36th IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2011, pp [22] B. Savas and L. Elde n, Handwritten Digit Classification Using Higher Order Singular Value Decomposition, Pattern Recognition, vol. 40, no. 3, pp , Liwei Kuang is currently studying for the PhD degree in School of Computer Science and echnology at Huazhong University of Science and echnology, Wuhan, China. He received the master s degree in School of Computer Science from Hubei University of echnology, Wuhan, China, in From 2004 to 2012, he was a Research Engineer with FiberHome echnologies Group, Wuhan, China. His research interests include Big Data, Pervasive Computing and Cloud Computing. Fei Hao is an assistant professor in Huazhong University of Science and echnology. He received the B.S. and M.S. degrees in School of Mathematics and Computer Engineering from Xihua University, Chengdu, China, in 2005 and 2008, respectively. He was a research assistant at Korea Advanced Institute of Science and echnology and Hangul Engineering Research Center, Korea. He has published over 30 research papers in international and national Journals as well as conferences. His research interests include social computing, big data analysis and processing and mobile cloud computing. Laurence. Yang received the B.E. degree in Computer Science and echnology from singhua University, China and the PhD degree in Computer Science from University of Victoria, Canada. He is a professor in the School of Computer Science and echnology at Huazhong University of Science and echnology, China, and in the Department of Computer Science, St. Francis Xavier University, Canada. His research interests include parallel and distributed computing, embedded and ubiquitous/pervasive computing, and Big Data. His research has been supported by the National Sciences and Engineering Research Council, and the Canada Foundation for Innovation.
13 13 Man Lin received the B.E. degree in Computer Science and echnology from singhua University, China, She received the Lic. and Ph.D degrees from the Department of Computer Science and Information at Linkopings University, Sweden, in 1997 and 2000, respectively. She is currently an associate professor in Computer Science at St. Francis Xavier University, Canada. Her research interests include system design and analysis, power aware scheduling, optimization algorithms. Her research is supported by NSERC (National Sciences and Engineering Research Council, Canada) and CFI (Canada Foundation for Innovation). Changqing Luo received his B.E. and M.E. degree from Chongqing University of Posts and elecommunications in 2004 and 2007, respectively, and the Ph.D. from Beijing University of Posts and elecommunications in 2011, all in Electrical Engineering. After the graduation, he joined the school of Computer Science and echnology, Huazhong University of Science and echnology in 2011, where he currently works as an Assistant Professor. His current research focuses on algorithms and optimization for wireless networks, cooperative communication, green communication, resouce management in heterogeneous wireless networks, and mobile cloud computing. Geyong Min is a Professor of High Performance Computing and Networking in the Department of Mathematics and Computer Science within the College of Engineering, Mathematics and Physical Sciences at the University of Exeter, United Kingdom. He received the PhD degree in Computing Science from the University of Glasgow, United Kingdom, in 2003, and the B.Sc. degree in Computer Science from Huazhong University of Science and echnology, China, in His research interests include Next Generation Internet, Wireless Communications, Multimedia Systems, Information Security, High Performance Computing, Ubiquitous Computing, Modelling and Performance Engineering.
The Data-as-a-Service Framework for Cyber-Physical-Social Big Data
The Data-as-a-Service Framework for Cyber-Physical-Social Big Data Laurence T. Yang ( 杨 天 若 ), China St Francis Xavier University, Canada 1 Outline 1. Hyper (Cyber-Physical-Social)World 2. Big Data and
The Data-as-a-Service Framework for Cyber-Physical-Social Big Data
The Data-as-a-Service Framework for Cyber-Physical-Social Big Data Laurence Tianruo Yang ( 杨 天 若 ) Huazhong University of Science and Technology, China St Francis Xavier University, Canada 1 Outline 1.
A Direct Numerical Method for Observability Analysis
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every
CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
SYMMETRIC EIGENFACES MILI I. SHAH
SYMMETRIC EIGENFACES MILI I. SHAH Abstract. Over the years, mathematicians and computer scientists have produced an extensive body of work in the area of facial analysis. Several facial analysis algorithms
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Hybrid Lossless Compression Method For Binary Images
M.F. TALU AND İ. TÜRKOĞLU/ IU-JEEE Vol. 11(2), (2011), 1399-1405 Hybrid Lossless Compression Method For Binary Images M. Fatih TALU, İbrahim TÜRKOĞLU Inonu University, Dept. of Computer Engineering, Engineering
Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.
Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Previous Lecture 13 Indexes for Multimedia Data 13.1
α = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
Subspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft
Semantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
Text Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 3, 2014 Text Analytics (Text Mining) LSI (uses SVD), Visualization Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
Tensor Factorization for Multi-Relational Learning
Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany [email protected] 2 Siemens AG, Corporate
By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
1 Example of Time Series Analysis by SSA 1
1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales
Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
Network (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 [email protected],
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
An Efficient and Scalable Management of Ontology
An Efficient and Scalable Management of Ontology Myung-Jae Park 1, Jihyun Lee 1, Chun-Hee Lee 1, Jiexi Lin 1, Olivier Serres 2, and Chin-Wan Chung 1 1 Korea Advanced Institute of Science and Technology,
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
Computational Optical Imaging - Optique Numerique. -- Deconvolution --
Computational Optical Imaging - Optique Numerique -- Deconvolution -- Winter 2014 Ivo Ihrke Deconvolution Ivo Ihrke Outline Deconvolution Theory example 1D deconvolution Fourier method Algebraic method
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING
TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING Sunghae Jun 1 1 Professor, Department of Statistics, Cheongju University, Chungbuk, Korea Abstract The internet of things (IoT) is an
Notes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data
Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
MUSIC-like Processing of Pulsed Continuous Wave Signals in Active Sonar Experiments
23rd European Signal Processing Conference EUSIPCO) MUSIC-like Processing of Pulsed Continuous Wave Signals in Active Sonar Experiments Hock Siong LIM hales Research and echnology, Singapore hales Solutions
Solving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
Search Engine Based Intelligent Help Desk System: iassist
Search Engine Based Intelligent Help Desk System: iassist Sahil K. Shah, Prof. Sheetal A. Takale Information Technology Department VPCOE, Baramati, Maharashtra, India [email protected], [email protected]
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
Data Storage 3.1. Foundations of Computer Science Cengage Learning
3 Data Storage 3.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List five different data types used in a computer. Describe how
Server Load Prediction
Server Load Prediction Suthee Chaidaroon ([email protected]) Joon Yeong Kim ([email protected]) Jonghan Seo ([email protected]) Abstract Estimating server load average is one of the methods that
Lecture 5: Singular Value Decomposition SVD (1)
EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system
Algorithmic Techniques for Big Data Analysis. Barna Saha AT&T Lab-Research
Algorithmic Techniques for Big Data Analysis Barna Saha AT&T Lab-Research Challenges of Big Data VOLUME Large amount of data VELOCITY Needs to be analyzed quickly VARIETY Different types of structured
Load Distribution on a Linux Cluster using Load Balancing
Load Distribution on a Linux Cluster using Load Balancing Aravind Elango M. Mohammed Safiq Undergraduate Students of Engg. Dept. of Computer Science and Engg. PSG College of Technology India Abstract:
FCE: A Fast Content Expression for Server-based Computing
FCE: A Fast Content Expression for Server-based Computing Qiao Li Mentor Graphics Corporation 11 Ridder Park Drive San Jose, CA 95131, U.S.A. Email: qiao [email protected] Fei Li Department of Computer Science
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
The Image Deblurring Problem
page 1 Chapter 1 The Image Deblurring Problem You cannot depend on your eyes when your imagination is out of focus. Mark Twain When we use a camera, we want the recorded image to be a faithful representation
On the Standardization of Semantic Web Services-based Network Monitoring Operations
On the Standardization of Semantic Web Services-based Network Monitoring Operations ChenglingZhao^, ZihengLiu^, YanfengWang^ The Department of Information Techonlogy, HuaZhong Normal University; Wuhan,
Low-resolution Character Recognition by Video-based Super-resolution
2009 10th International Conference on Document Analysis and Recognition Low-resolution Character Recognition by Video-based Super-resolution Ataru Ohkura 1, Daisuke Deguchi 1, Tomokazu Takahashi 2, Ichiro
Keywords: Image Generation and Manipulation, Video Processing, Video Factorization, Face Morphing
TENSORIAL FACTORIZATION METHODS FOR MANIPULATION OF FACE VIDEOS S. Manikandan, Ranjeeth Kumar, C.V. Jawahar Center for Visual Information Technology International Institute of Information Technology, Hyderabad
Process Mining by Measuring Process Block Similarity
Process Mining by Measuring Process Block Similarity Joonsoo Bae, James Caverlee 2, Ling Liu 2, Bill Rouse 2, Hua Yan 2 Dept of Industrial & Sys Eng, Chonbuk National Univ, South Korea jsbae@chonbukackr
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER
HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER Gholamreza Anbarjafari icv Group, IMS Lab, Institute of Technology, University of Tartu, Tartu 50411, Estonia [email protected]
Linear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka [email protected] http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
A simple and fast algorithm for computing exponentials of power series
A simple and fast algorithm for computing exponentials of power series Alin Bostan Algorithms Project, INRIA Paris-Rocquencourt 7815 Le Chesnay Cedex France and Éric Schost ORCCA and Computer Science Department,
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
Chapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
Binary Image Scanning Algorithm for Cane Segmentation
Binary Image Scanning Algorithm for Cane Segmentation Ricardo D. C. Marin Department of Computer Science University Of Canterbury Canterbury, Christchurch [email protected] Tom
Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries
First Semester Development 1A On completion of this subject students will be able to apply basic programming and problem solving skills in a 3 rd generation object-oriented programming language (such as
P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition
P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition K. Osypov* (WesternGeco), D. Nichols (WesternGeco), M. Woodward (WesternGeco) & C.E. Yarman (WesternGeco) SUMMARY Tomographic
Solution of Linear Systems
Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start
Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning
Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning By: Shan Suthaharan Suthaharan, S. (2014). Big data classification: Problems and challenges in network
ADVANCED APPLICATIONS OF ELECTRICAL ENGINEERING
Development of a Software Tool for Performance Evaluation of MIMO OFDM Alamouti using a didactical Approach as a Educational and Research support in Wireless Communications JOSE CORDOVA, REBECA ESTRADA
Internet Video Streaming and Cloud-based Multimedia Applications. Outline
Internet Video Streaming and Cloud-based Multimedia Applications Yifeng He, [email protected] Ling Guan, [email protected] 1 Outline Internet video streaming Overview Video coding Approaches for video
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015
Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Lecture: MWF: 1:00-1:50pm, GEOLOGY 4645 Instructor: Mihai
Similar matrices and Jordan form
Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
A Robust and Lossless Information Embedding in Image Based on DCT and Scrambling Algorithms
A Robust and Lossless Information Embedding in Image Based on DCT and Scrambling Algorithms Dr. Mohammad V. Malakooti Faculty and Head of Department of Computer Engineering, Islamic Azad University, UAE
Speed Performance Improvement of Vehicle Blob Tracking System
Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA [email protected], [email protected] Abstract. A speed
Operation Count; Numerical Linear Algebra
10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point
Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder
Performance Analysis and Comparison of 15.1 and H.264 Encoder and Decoder K.V.Suchethan Swaroop and K.R.Rao, IEEE Fellow Department of Electrical Engineering, University of Texas at Arlington Arlington,
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
Masters in Human Computer Interaction
Masters in Human Computer Interaction Programme Requirements Taught Element, and PG Diploma in Human Computer Interaction: 120 credits: IS5101 CS5001 CS5040 CS5041 CS5042 or CS5044 up to 30 credits from
Big Data Driven Knowledge Discovery for Autonomic Future Internet
Big Data Driven Knowledge Discovery for Autonomic Future Internet Professor Geyong Min Chair in High Performance Computing and Networking Department of Mathematics and Computer Science College of Engineering,
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Data Warehouse Snowflake Design and Performance Considerations in Business Analytics
Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker
RN-coding of Numbers: New Insights and Some Applications
RN-coding of Numbers: New Insights and Some Applications Peter Kornerup Dept. of Mathematics and Computer Science SDU, Odense, Denmark & Jean-Michel Muller LIP/Arénaire (CRNS-ENS Lyon-INRIA-UCBL) Lyon,
SURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
NETCONF-based Integrated Management for Internet of Things using RESTful Web Services
NETCONF-based Integrated Management for Internet of Things using RESTful Web Services Hui Xu, Chunzhi Wang, Wei Liu and Hongwei Chen School of Computer Science, Hubei University of Technology, Wuhan, China
RN-Codings: New Insights and Some Applications
RN-Codings: New Insights and Some Applications Abstract During any composite computation there is a constant need for rounding intermediate results before they can participate in further processing. Recently
by the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that
General Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Three Effective Top-Down Clustering Algorithms for Location Database Systems
Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr
Applied Linear Algebra I Review page 1
Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties
Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.
Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry
Latent Semantic Indexing with Selective Query Expansion Abstract Introduction
Latent Semantic Indexing with Selective Query Expansion Andy Garron April Kontostathis Department of Mathematics and Computer Science Ursinus College Collegeville PA 19426 Abstract This article describes
Statistical Modeling of Huffman Tables Coding
Statistical Modeling of Huffman Tables Coding S. Battiato 1, C. Bosco 1, A. Bruna 2, G. Di Blasi 1, G.Gallo 1 1 D.M.I. University of Catania - Viale A. Doria 6, 95125, Catania, Italy {battiato, bosco,
Greedy Column Subset Selection for Large-scale Data Sets
Knowledge and Information Systems manuscript No. will be inserted by the editor) Greedy Column Subset Selection for Large-scale Data Sets Ahmed K. Farahat Ahmed Elgohary Ali Ghodsi Mohamed S. Kamel Received:
