BIG data are a collection of dataset consisting of

Size: px
Start display at page:

Download "BIG data are a collection of dataset consisting of"

Transcription

1 1 A ensor-based Approach for Big Data Representation and Dimensionality Reduction Liwei Kuang, Fei Hao, Laurence. Yang, Man Lin, Changqing Luo, and Geyong Min Abstract Variety and veracity are two distinct characteristics of large-scale and heterogeneous data. It has been a great challenge to efficiently represent and process big data with a unified scheme. In this paper, a unified tensor model is proposed to represent the unstructured, semi-structured and structured data. With tensor extension operator, various types of data are represented as sub-tensors and then are merged to a unified tensor. In order to extract the core tensor which is small but contains valuable information, an Incremental High Order Singular Value Decomposition (IHOSVD) method is presented. By recursively applying the incremental matrix decomposition algorithm, IHOSVD is able to update the orthogonal bases and compute the new core tensor. Analyses in terms of time complexity, memory usage and approximation accuracy of the proposed method are provided in this paper. A case study illustrates that approximate data reconstructed from the core set containing 18% elements can guarantee 93% accuracy in general. heoretical analyses and experimental results demonstrate that the proposed unified tensor model and IHOSVD method are efficient for big data representation and dimensionality reduction. Index erms ensor, HOSVD, Dimensionality Reduction, Data Representation 1 INRODUCION BIG data are a collection of dataset consisting of massive unstructured, semi-structured, and structured data. he four main characteristics of big data are volume (amount of data), variety (range of data types and sources), veracity (data quality), and velocity (speed of incoming data). Although many studies have been done on big data processing, very few have addressed the following two key issues: (1) how to represent the various types of data with a simple model; (2) how to extract the core data sets which are smaller but still contain valuable information, especially for streaming data. he purpose of this paper is to explore the above raised issues which are closely related to the variety and veracity characteristics of big data. Logic and Ontology [1], two knowledge representation methodologies, have been investigated widely. Composed of syntax, semantics and proof theory, Logic is used for making statements about the world. Although Logic is concise, unambiguous and expressive, it works with the statements that are true or false and is hard to be used for reasoning with L. Kuang, F. Hao and C. Luo are with the School of Computer Science and echnology, Huazhong University of Science and echnology, Wuhan , China. L.. Yang is with the School of Computer Science and echnology, Huazhong University of Science and echnology, Wuhan , China, and the Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada. M. Lin is with the Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada. Geyong Min is with the College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, EX4 4QF, United Kingdom. unstructured data. Ontology is the set of concepts and relationships that can help people communicate and share knowledge. It is definitive and exhaustive, but it also causes incompatibility among different application domains, and thus is not suitable for representing and integrating heterogeneous big data. he study of data dimensionality reduction has been reported in the literature. Previous approaches include Principal Component Analysis (PCA) [2], Incremental Singular Value Decomposition (SVD) [3], and Dynamic ensor Analysis (DA) [4]. hese methods are available for low dimension reduction but suffer from some limitations because they are timeconsuming when being performed on high-dimension data and fail to extract the core data sets from streaming big data. his paper presents a unified tensor model for big data representation and an incremental dimensionality reduction method for high-quality core set extraction. Data with different formats are employed to illustrate the representation approach, and equivalent theorems are proven to support the proposed reduction method. he major contributions are summarized as follows. Unified Data Representation Model: We propose a unified tensor model to integrate and represent the unstructured, semi-structured, and structured data. he tensor model has extensible orders to which new orders can be dynamically appended through the proposed tensor extension operator. Core ensor Equivalence heorem: o tackle the recalculation and order inconsistency problems in big data processing with tensor model, we prove a core tensor equivalence theorem which

2 2 can serve as the theoretical foundation for designing incremental decomposition algorithms. Recursive Incremental HOSVD Method: We present a recursive Incremental High Order Singular Value Decomposition method for streaming data dimensionality reduction. Detailed analyses in terms of time complexity, memory usage and approximation accuracy are also investigated. he remainder of this paper is organized as follows. Section 2 recalls the preliminaries of tensor decomposition. Section 3 presents a framework for big data representation and processing. A unified tensor model for big data representation is proposed in Section 4. Section 5 presents a novel incremental dimensionality reduction method. A case study of intelligent transportation is investigated in Section 6. After reviewing the related works in Section 7, we conclude the paper in Section 8. 2 PRELIMINARIES his section reviews the preliminaries of singular value decomposition [5] and tensor decomposition [6]. he core tensor and truncated bases described in the preliminaries can be employed to make big data smaller. Definition 1: Singular Value Decomposition (SVD). Let M R m n denote a matrix, the factorization M = UΣV (1) is called the SVD of M. Matrices U and V refer to the left singular vector space and the right singular vector space of matrix M respectively. Both U and V are unitary orthogonal matrices. Matrix Σ = diag(σ 1, σ 2,..., σ k,..., σ l ), l = min{m, n} is a diagonal matrix that contains the singular values of M. In particular, M k = U k Σ k V k (2) is called the rank-k truncated SVD of M, where U k = [u 1,.., u k ], V k = [v 1,.., v k ], Σ k = diag(σ 1,..., σ k ), k < l. he truncated SVD of M is much smaller to store and faster to compute. Among all rank-k matrices, M k is the unique minimizer of M M k F. Definition 2: ensor Unfolding. Given a P -order tensor R I1 I2... I P, the tensor unfolding [7] (p) R I p (I p+1 I p+2...i P I 1 I 2...I p 1 ) contains the element t i1i 2...i pi p+1...i P at the position with row number i p and column number that is equal to (i p+1 1)I p+2 I P I 1 I p 1 +(i p+2 1)I p+3 I P I 1 I p (i 2 1)I 3 I 4 I p i p 1. Example 1. Consider a three-order tensor R 2 4 3, Fig. 1 shows the three unfolded matrices (1), (2) and (3). Definition 3: p-mode product of a tensor by a matrix. Suppose a tensor (2) = (3) = (1) = Fig. 1. hree-order tensor unfolding; tensor is unfolded to three matrices. R I1 I2... Ip 1 Ip Ip+1... I P and a matrix U R J p I p, the p-mode product ( p U) R I 1 I 2... I p 1 J p I p+1... I P is defined as ( p U) i1 i 2...i p 1 j p i p+1...i P = I p i p=1 (a i1 i 2...i p 1 i p i p+1...i P u jp i p ). he p-mode product is a key linear operation for dimensionality reduction, and the truncated left singular vector matrix U Jp I p (J p < I p ) is used to reduce the dimensionality of order I p from I p to J p. I 4 I 3 I 5 I 6 I 2 I 1 I 4 2 = I 3 I 2 I 5 I 6 Fig. 2. ensor dimensionality reduction with p-mode product; the dimensionality of the 2nd order is reduced from 8 to 2 by a 2 8 matrix. Definition 4: Core ensor and Approximate ensor. For an initial tensor, the core tensor S [8] and the approximate tensor ˆ are defined as and (3) S = 1 U 1 2 U 2... P U P, (4) ˆ = S 1 U 1 2 U 2... P U P. (5) he core tensor S is viewed as a compressed version of initial tensor. By keeping only the left k unitary orthogonal vectors of the unfolded matrix, the principal characteristics are reserved. Big data applications can simply keep the core tensor S and truncated bases U 1, U 2,, U P. When needed, data can be reconstructed by generating the approximation tensor with Eq. (5). he right singular vector matrices V 1, V 2,, V P and the singular values are unified to the core tensor which contains the coordinates of the left singular vector matrices in the approximate tensor. I 1

3 3 In general, the reconstructed data are more efficient than the original data as noise, inconsistency and redundancy are removed. ^ Fig. 3. Illustration of the core tensor and the approximate tensor. he core tensor and the truncated unitary orthogonal bases (U 1, U 2, U 3 ) are called core data sets that can be used to make big data smaller, while the reconstructed approximate tensor is a substitute for the initial tensor. 3 DAA REPRESENAION AND PROCESS- ING FRAMEWORK In this section, a tensor-based data representation and processing framework is proposed. Fig. 4 depicts a three-tier framework in which different modules are enabled in each layer. We elaborate the functions and responsibilities of each module through a bottom-up view approach. ransportation Finance Mining Algorithm Inference Method Data Visualization Streaming Data Healthcare Video... Audio XML... HML GPS... EHR Data Service Data Analysis Data Dimensionality Reduction Data ensorization Data Collection data are not uniform, these data need to be represented as a unified tensor model. he subtensors with various orders will be generated to model the data according to their initial format. hen, all the sub-tensors will be integrated as a unified heterogeneous tensor. 3) Data Dimensionality Reduction Module. his module is for efficiently processing the high dimension tensorized data, and extracting the core data sets that are more smaller for storage and computation. he reduction can be enhanced by virtue of implementation of the proposed IHOSVD algorithm which can incrementally update the orthogonal bases of each unfolded matrices. 4) Data Analysis Module. Numerous algorithms such as clustering algorithms, multi-aspect predication algorithms, etc., are included in this module. he module can help obtain potential values behind large scale heterogeneous data. Data visualization module in this layer helps users easily understand the data values. 5) Data Service Module. Data service module provides services according to the requirements of different applications. For instance, with the s- mart monitor appliances, proactive health care services can be provided to users based on the thorough understanding of their physical status. his paper mainly focuses on data tensorization module and data dimensionality reduction module. 4 A UNIFIED DAA REPRESENAION MOD- EL his section proposes a tensor-based data representation model and tensorization approach for transforming heterogeneous data to a unified model. Firstly, an extensible order tensor model and tensor extension operator are presented. Secondly, we illustrate how to tensorize the unstructured, semi-structured and structured data as sub-tensors. hirdly, the integration of sub-tensors as a unified tensor is studied. ensor order and tensor dimension, two confusing concepts, are then discussed in the end. Unstructrued Data Semi-structured data Structured data Fig. 4. Data representation and processing framework. 1) Data Collection Module. his module is in charge of collecting various types of data from different areas, for example, video clip, XML document and GPS data. he streaming data will incrementally arrive and temporarily agglomerate together without changing their original format. 2) Data ensorization Module. Since the collected unstructured, semi-structured and structured 4.1 Extensible Order ensor In general, time and space are two basic characteristics of data collected from different areas, while users are major recipients of data services. herefore, a general tensor-based data model is defined as R It Is Iu I1... IP. (6) Eq. (6) shows a (P + 3)-order tensor which contains two parts, namely, the fixed part R It Is Iu and the extensible part R I1... IP. he tensor orders I t, I s and I u denote time, space and user respectively.

4 4 In the tensor model, data characteristics are represented as tensor orders. For example, the color space characteristic of unstructured video data can be modeled as I c. For heterogeneous data, various characteristics are represented as tensor orders and attached to the fix part using the proposed tensor extension operator. Definition 5: ensor Extension Operator. Let A R I t I s I u I 1, and B R I t I s I u I 2, the tensor extension operator is given by the following function f : A B C, C R It Is Iu I1 I2. (7) Operator satisfies the associative law. In other words, (A B) C = A (B C). By virtue of Eq. (7), heterogeneous data can first be tensorized as low order sub-tensors and then extended to a high order unified tensor. he operator merges the identical orders while keeping the diverse orders. Elements of the identical order are accumulated together. For instance, sub-tensor sub1 and sub-tensor sub2 have time order denoted as I t 1, I t 2, where I t 1 {i 1, i 2 }, I t 2 {i 1, i 3 }. After extension, time order of the new tensor = sub1 sub2 becomes I t {i 1, i 2, i 3 }. 4.2 ensorization Method Examples of unstructured data include video data and audio data, while semi-structured data are composed of XML documents, ontology data, etc. Representatives of structured data are numbers and character strings stored in relational database. In this paper, video clip, XML document and GPS data are employed to illustrate the tensorization process. Height Frames Width One frame Blue Green Red Fig. 5. Represent video clip as four-order tensor. I c I h I f I w <?xml version='1.0' encoding='uf-8'?> <University> <Student Category= doctoral'> <ID> </ID> <Name>Liang Chen</Name> <Research> <Area>Internet of hings</area> <Focus>Architecture;Sensor Ontology</Focus> </Research> </Student> </University> Element <ID> ext (a) Root Element <University> Element <Student> Element <Name> ext Liang Chen (b) Attribute <Category> Element <Area> ext Element <Research> Element <Focus> ext I ec Fig. 6. Represent XML document data as a threeorder tensor; (a) gives an initial XML document, (b) is the parsed tree, (c) shows the relationships between elements, and the three-order tensor is illustrated in (d). contain tags and contents both consisting of characters from unicode repertoire. An XML document has a hierarchical structure and can be parsed as a tree. Fig. 6(b) is the parsed tree of Fig. 6(a). XML Document can be tensorized as a three-order tensor, where I er and I ec indicate the row and column orders of the markup matrix, and I en represents the content vector order. For example, the XML document in Fig. 6(a) is tensorized as R , where 28 is the length of element Focus. Relationships among element, attribute and text are represented as numbers. In Fig. 6(c), number 1 is used to indicate the parent-child relationship. (c) (d) I en I er Video data can be represented as four-order tensor or three-order tensor. o represent a video clip of MPEG-4 format, 25 frames per second, resolution and RGB color space, a four-order tensor R I f I w I h I c is adopted with I f, I w, I h, I c indicating frame, width, height and color space. For instance, a 750-frame MPEG-4 video clip with resolution of and RGB color can be tensorized as R In some applications, RGB color is usually transformed to gray level using equation Gray = 0.299R G B, and the representation is replaced by a three-order tensor R Fig. 5 shows the process of transforming a video clip to a four-order tensor. Extensible Markup Language (XML) is semistructured. Fig. 6 shows a simple XML document with seven elements and one attribute. he elements Record StudentID Longitude Latitude ime 1 D :36:15 2 D :36:25 3 D :36:35 Record StudentID StudentName 1 D Liang Chen I y I x I name = I t I id I id I name Fig. 7. he upper table is modeled as a four-order subtensor, the lower table is modeled as a two-order subtensor, and the two sub-tensors are unified as a fiveorder tensor. Relational database is widely used to manage structured data. In database table, simple fields with num- I t I y I x I id

5 5 ber or character string type can be modeled as a matrix. For complex field, e.g. BLOB, new orders are needed for representation. In Fig. 7, the structured GPS data and student data are unified as a five-order tensor. GPS I s Video 4.3 Unified ensor Representation Model Big data are composed of unstructured data d u, semistructured data d semi and structured data d s. Due to the requirement of processing all types of heterogenous data, a unified data tensorization operation is performed using the following equation f : (d u d semi d s ) u semi } {{ } s. (8) With Eq. (7) and Eq. (8), d u, d semi and d s are transformed to subtensors u, semi and s which will later be integrated as a unified tensor. For example, on the basis of transformed video clip, XML document and structured tables as described in Figs. 5 7, the final tensor is consequently obtained as follows, R I t I s I u I w I h I er I ec I en I id I na. (9) In Eq. (9), order I f is identical to order I t, order I x, I y are combined to order I s, and order I c is unnecessary because gray level is adopted. Since too many orders may increase the decomposition complexity, less orders are preferable at the data representation stage. An element of the ten-order tensor in Eq. (9) is described as an eleven-tuple e = ( I, SP, U, W, H, ER, EC, EN, ID, NA, V ), (10) where I, SP and U refer to the fixed order time, space and user, W and H denote the orders from video data, ER, EC and EN are XML document characteristics, ID and NA are for GPS data, and V is the value of element e. Such type of tuples generated from heterogeneous tensor are usually sparse, and only the nonzero elements are essential for storage and computation. he generalized tuple formate according to Eq. (6) is defined as e = ( I, SP, U, i 1,..., i P, V ). (11) Fig. 8 illustrates the extensible order tensor model from another point of view. he fixed part containing I, SP and U is seen as an overall layer, while the extensible part is deemed as an inner layer. he tensor is simplified as a two layer model where the inner model is embedded to the three-order (I t I s I u ) overall model. Using the tensorization method, the heterogeneous data are modeled as sub-tensors that are inserted to the two-layer model to generate the unified tensor. I u I t XML Document Fig. 8. Visualization of the two-layer model for data representation. 4.4 ensor Order and ensor Dimension As tensor order and tensor dimension are two key concepts for data representation, we give a brief comparison between them. ensor R I1 I2... I P has P orders, and order i (1 i P ) has I i dimensions. A P -order tensor can be unfolded to P matrices. For the mode-i unfolded matrix (i), the number of rows is equal to I i, while the number of columns is equal to I j. In many big data applications, it is 1 j P, j i impractical to store all dimensions of big data which contain redundancy, uncertainty, inconsistency and incompleteness, thus it is essential to extract valuable core data. During the extraction of core data set, the number of tensor orders remains the same while the dimensionality is significantly reduced. 5 INCREMENAL ENSOR DIMENSIONALIY REDUCION A novel method is proposed for dimensionality reduction on streaming data in this section. Firstly, two problems of tensor decomposition are defined. hen two equivalence theorems are proven and an Incremental High-Order Singular Value Decomposition (IHOSVD) method that can efficiently compute the core data sets on streaming data is presented. Finally, complexity and accuracy of the proposed method are discussed. 5.1 Problems Definition wo important problems related to incremental tensor dimensionality reduction are: (1) the recalculation problem; (2) the order inconsistency problem. hey are formally defined below. Problem 1: ensor Decomposition Recalculation. Let S 1 denote the core tensor obtained from the previous tensor 1. denotes a new tensor. Combining 1 with, we obtain 2 = 1. According to Eq. (4), the new core tensor S 2 of new tensor 2 is computed with S 2 = 2 1 U 1 2 U 2... P U P. (12)

6 6 Decomposition recalculation occurs in Eq. (12) because the previous decomposition results during computing core tensor S 1 are not reused. Problem 1 can be solved using Algorithm 1 and Algorithm 2 that are designed with the proposed recursive incremental singular value decomposition method. Problem 2: ensor Order Inconsistency. Assume 1, S 2 and 2 are defined as previous tensor, new core tensor and new combined tensor, to compute S 2 with Eq. (4), the row number of the truncated orthogonal matrix U must be consistent with dimensionality of the tensor order I n. However, one order dimensionality of the combined tensor 2 is not equal to the row number of truncated orthogonal matrix U. For instance, let 1 R be a three-order tensor, 1(1) R 2 4, 1(2) R 2 4 and 1(3) R 2 4 are three unfolded matrices of 1. Given a new tensor χ R 2 2 2, combining it with previous tensor 1 along the third order I 3, we obtain 2 R he third order dimensionality of 2 is 4, while the row number of the truncated orthogonal basis computed from matrix 1(3) is 2. his leads to order inconsistency. In this paper, heorem 1, heorem 2 and Algorithm 3 are presented to address Problem Basis and Core ensor Equivalence heorems he left singular vector matrix U plays a key role on dimensionality reduction and data reconstruction. Similarly, the truncated k-rank orthogonal unitary bases U 1, U 2,..., U P of the unfolded matrices construct the most basic coordinate axes of a P-order tensor. For heterogeneous big data dimensionality reduction, the major difficulty lies in computing the bases on variable dimension. Our approach extends dimension to fixed length and finds out equivalent basis. In this paper, two theorems are presented and proven to support our approach. heorem 1: Basis Equivalence of SVD. Let M 1 be a m 1 by n matrix, and M 2 be a m 2 by n matrix whose left m 1 columns contain matrix M 1 and right m 2 m 1 columns are zeros. Namely, M 2 = [M 1 0], M 1 R m1 n, M 2 R m2 n, m 1 < m 2. If the singular value decompositions of matrix M 1 and matrix M 2 are expressed as M 1 = U 1 Σ 1 V 1, M 2 = U 2 Σ 2 V 2, (13) hen, the unitary orthogonal basis U 1 is equivalent to U 2. Proof. From Eq. (13), we obtain [ ] M 2 M2 M = [M 1 0] 1 = M 0 1 M1. (14) Consider M 2 M 2 = U 2 Σ 2 V 2 V 2 Σ 2 U 2 = U 2 (Σ 2 Σ 2 )U 2, (15) and M 1 M 1 we obtain = U 1 Σ 1 V 1 V 1 Σ 1 U 1 = U 1 (Σ 1 Σ 1 )U 1, (16) U 1 (Σ 1 Σ 1 )U 1 = U 2 (Σ 2 Σ 2 )U 2. (17) Note that both sides of Eq. (17) are spectral decompositions of two equal symmetric matrix. Additionally, the diagonal matrices Σ 1 Σ 1 and Σ 2 Σ 2 consist of the eigenvalues of the equal matrix. According to the uniqueness characteristic of eigenvalues, Σ 1 Σ 1 and Σ 2 Σ 2 are equal. It can be concluded that U 1 is equivalent to U 2. he equivalence implies that U 1 can be calculated by multiplying U 2 with a series of Elementary Matrix [9]. Based on heorem 1, the following two corollaries can be derived. Corollary 1: Let M 1 = [v 1, v 2,..., v n ], M 2 = [v 1, v 2,..., 0,..., 0,..., v n ], where v i is column vector, then the two matrices have equivalent left singular vector bases. [ ] M1 Corollary 2: Suppose M 2 =, then matrix 0 M 1 and matrix M 2 have equivalent left singular vector bases. With Corollary 2, the orthogonal basis U 1 can be obtained by trimming the bottom zeros of the orthogonal basis U 2. heorem 1, Corollaries 1 and 2 are employed to prove heorem 2 defined as follows. Before the proof, we introduce a special matrix which will be used in heorem 2. Definition 6: Extension Matrix. An extension matrix is defined as M = [ I 0 ], M R Jp Ip, J p > I p. Multiply the P -order tensor R I 1 I 2... I p... I P with extension matrix M along order p, the dimensionality of this order is extended from I p to J p. heorem 2: Core ensor Equivalence of HOSVD. Let and G be P-order tensors, where R I1 I2... I P and G R I1 I2... (lip)... I P, l is a non-negative integer. Define M as an extension matrix, M R Ip (lip). ensor and G satisfy ]. = G p M = G n [ Ip 0 lp Proof. Unfold tensor and tensor G to P matrices (1), (2),..., (P ), and G (1), G (2),..., G (P ). According to heorem 1, Corollaries 1 and 2, the corresponding unfolded matrices of tensor and G have equivalent left singular vector bases. Besides, the p-mode product of tensor by matrices A, B posses the following properties i A j B = j B i A, (18) and i A i B = i (BA). (19)

7 7 Employing Eq. (4), core tensors S, S G are calculated with the following equations and S = 1 U 1 2 U P U P, (20) S G = G 1 U 1 2 U P U P. (21) With Eqs. (18) (21), we obtain S = 1 U 1 2 U P U P = (G p M) 1 U 1 2 U P U P = G 1 U 1 2 U P U P pm = S G p M. (22) heorem 2 reveals that extending a tensor by padding zero elements will not transform the core tensor. After unified representation of big data, order number of the incremental tensor and the initial tensor are equal, but the dimensionality are different. heorem 2 can be used to solve this problem by resizing dimensionality. 5.3 Incremental High Order Singular Value Decomposition We propose an IHOSVD method for incremental dimensionality reduction on streaming data. IHOSVD method consists of three algorithms that are used for recursive matrix singular value decomposition and incremental tensor decomposition. he three algorithms are separately described in detail. Algorithm 1 is a recursive algorithm with recursive function given in Eq. (23). During the running process, function f will call itself (Step 4) over and over again to decompose matrices M i and C i. Each successive call reduces the size of matrix and moves closer to a solution until matrix M 1 is reached finally, the recursion stops and the function can exit. f(m i, C i ) = { svd(m1 ), i = 1 mix(f(m i 1, C i 1 ), C i ), i > 1 (23) Algorithm 1 Recursive matrix singular value decomposition, (U, Σ, V ) = R MSvd(M i, C i ). Input: Initial matrix M i. Incremental matrix C i. Output: Decomposition results U, S, V of matrix [M i C i ]. 1: if (i == 1) then 2: [U, Σ, V ] = svd(m 1 ). 3: else 4: [U j, Σ j, V j ] = R MSvd(M i 1, C i 1 ). 5: [U, Σ, V ] = mix(m i 1, C i 1, U j, Σ j, V j ). 6: end if 7: return U, S, V. Algorithm 1 calls function mix (Step 5) to merge column vectors of the incremental matrix with the decomposed components of initial matrix. Additional vectors are projected to the orthogonal bases and the coordinates are combined to the singular values. Detailed procedures of function mix are described in Algorithm 2. For most tensor unfolding, the number of rows is less than the number of columns. For such type of matrices, Algorithm 1 can efficiently compute the singular values and singular vectors by splitting the columns for recursive decomposition. Coordinates L K Coordinates C U Projection Projection J U J (a) V U V L K I (b) Fig. 9. (a) Incrementally incoming column vectors are projected on unitary orthogonal bases; (b) he middle quasi-diagonal matrix is diagonalized and the previous singular vector matrices are updated. Algorithm 2 Merge incremental matrix with decomposition results, (U, Σ, V ) = mix(m i 1, C i 1, U j, Σ j, V j ). Input: Initial matrix M i 1 and incremental matrix C i 1. Decomposition results U j, Σ j, V j of matrix M. Output: New decomposition results U, Σ, V. 1: Project C i 1 on the orthogonal space spanned by U j, L = Uj C i 1. 2: Compute H which is orthogonal to U j, H = C i 1 U j L. 3: Obtain the unitary orthogonal basis J from matrix H. 4: Compute the coordinates of matrix H, K = J H. 5: Execute SVD on the new matrix [U J], [U, Σ, V ] = svd([u J]). 6: Obtain new decomposition results, ([U J], U ) U, Σ Σ, V V. 7: return U, S, V. Algorithm 2 applies SVD updating [3] technique for incrementally matrix factorization. he additional columns in matric C i 1 are projected on the unitary orthogonal bases of previous matrix M i 1 (Step 1). Some column vectors are linear combination of orthogonal unitary bases U j, others are components orthogonal to the space spanned by U j. As illustrated in Fig. 9, these two types of vectors are separated J U

8 8 to obtain the bases U j, J and coordinates L, K. he operations are implemented as Steps 2 4. he column space of singular vector matrix U are spanned by the direct sum of the above two unitary orthogonal bases as follows CS(U) = span(u j J). (24) Combining the coordinates with the previous singular values, we obtain a quasi-diagonal sparse matrix which is easy for decomposition. he new equation consisting of the above orthogonal bases and coordinates is defined as [M i 1, C i 1 ] = [U j, J] [ Σj L 0 K ] [ V 0 0 I ]. (25) Let Ū and V denote the unitary orthogonal bases of the quasi-diagonal matrix in Eq. (25), the updated singular vector matrices are U = [U j J] Ū, V = [ V 0 0 I ] V. (26) Eq. (4) suggests only the left singular vector matrix U is essential for tensor decomposition. herefore, computation of matrix V can be omitted in Step 6 of Algorithm 2. Employing the above two algorithms, we propose Algorithm 3 for incrementally computing the core tensor. In this algorithm, extension matrix is used to ensure order consistency (Step 1). Unitary orthogonal bases U (1),..., U (P ) are updated from Step 2 to Step 4, as well as the new core tensor S is obtained in Step 6. For demonstration purpose, Fig. 10 shows a simple example with a three-order tensor. 3.Extension 1.Extension 4.Unfolding 2.HO-SVD U 1 (1) U 2 5.Update U 1,U 2,U 3 U 3 (3) (2) S U 1,U 2 6. New U 3,S Fig. 10. Example of incremental tensor decomposition, truncated orthogonal bases U 1, U 2, U 3 of new tensor are updated incrementally. 5.4 Complexity and Approximation Accuracy ime Complexity Execution time of the proposed IHOSVD method consists of matrix unfolding, incremental singular value decomposition of each unfolded matrices, and product of a tensor by the truncated bases. Let ime unf, ime isvd and ime prod denote the time used by the Algorithm 3 Incremental tensor singular value decomposition, (S, [U, Σ, V ] new ) = I Svd(χ,, [U, Σ, V ] initial ). Input: New tensor χ R I 1 I 2... I P. Previous tensor R I 1 I 2... I P. Previous unfolded matrices SVD results [U, Σ, V ] initial. Output: New truncated SVD results [U, Σ, V ] new. New core tensor S. 1: Extend tensor χ and tensor to identical dimensionality. 2: Unfold new tensor χ to matrices χ (1),..., χ (P ). 3: Call algorithm R M Svd to update above unfolded matrices. 4: runcate the new orthogonal bases. 5: Combine new tensor χ with initial tensor. 6: Obtain new core tensor S with n-mode product. 7: return S, and [U, Σ, V ] new. above processes respectively, the total time consumption ime satisfies ime = ime unf + ime isvd + ime prod. (27) ensor unfolding is a simple transformation with O(1) time complexity. ime isvd is equal to ime 1 + ime ime P = P i=1 ime i, where ime i refers to the time consumed by unfolded matrix (i). According to Eq. (23), time ime isvd can be obtained with { C1, i = 1 ime(i) = (28) ime(i 1) + C 2, i > 1, where C 1 and C 2 are constants. he recursive calling process first adds columns and then updates them with the previous decomposition results. he time complexity of decomposing one unfolded matrix is O(k 2 n), where k refers to the number of the truncated left singular vectors. For a truncated orthogonal basis U with k column vectors, time complexity of the product of a tensor by a matrix is O(k 2 n). o decompose a p-order tensor with p unfolded matrices, the time complexity of the proposed IHOSVD method is O(1) + O(pk 2 n) + O(pk 2 n), namely O(pk 2 n) Memory Usage Let Mem u denote the memory used to store all truncated orthogonal bases, Mem r msvd and Mem mix refer to the memory usages for recursive process in Algorithm 1, then the total memory used by the proposed IHOSVD method is defined as Mem = Mem u + Mem r msvd + Mem mix. (29) Complexity of Mem u is equal to O(kn). o incrementally compute the core tensor, IHOSVD method needs to keep all the truncated orthogonal bases, and the

9 9 memory usage is P i=1 k ii i. According to Eq. (23), the needed memory during the recursive process is equal to M i + C i + M i 1 + C i M 1 + C 1. (30) Complexity of the above memory usage is O(kn). herefore, the complexity of total memory usage is O(kn)+O(kn), i.e. O(kn). For a p-order tensor with p unfolded matrices, the complexity is O(pkn) Approximation Accuracy Reconstruction error between initial tensor and approximate tensor can be exactly measured with Frobenius Norm [10] as ˆ = ( F I 1 i 1 =1,..., I P i p =1 (a i1,...,i p â i1,...,i p ) 2 ) 1 2. (31) For the unfolded matrix (i) of initial tensor, the approximate matrix is ˆ (i) = U i Σ i Vi. he reconstruction error is caused by approximation of all unfolded matrices. o clearly analyze tensor dimensionality reduction degree and tensor approximation degree, we present two ratios. Definition 7: he Dimensionality Reduction Ratio of tensor is defined as nnz(s) + N nnz(u i ) i=1 ρ = nnz( ), (32) where S denotes the core tensor, and U i is the mode-i truncated orthogonal basis. he core data sets of tensor are composed of S (core tensor) and U 1, U 2,..., U P. Because only nonzero elements of the core data sets are stored, ratio ρ can accurately reflect the dimensionality reduction degree. Definition 8: he Reconstruction Error Ratio of tensor is defined as ˆ e = F. (33) F Ratio e reflects the degree of reconstruction error with tensor Frobenius Norm. In this paper, the pair (ρ, e) is employed to describe the dimensionality reduction degree and reconstruction error degree. Obviously, the ratio ρ is inversely proportional to ratio e. Computation accuracy is important for tensor data approximation, and in most applications, HOSVD type algorithms can find a better approximation. o obtain higher accuracy, High-Order Orthogonal Iteration (HOOI) [11] method can be utilized to find the best rank approximation. he High-Order Singular Value Decomposition (HOSVD) and the Higher Order Orthogonal Iteration (HOOI) of ensor can be viewed as extensions to the Singular Value Decomposition (SVD). 6 CASE SUDY In this section, we illustrate the proposed unified data representation model and incremental dimensionality reduction method with an Intelligent ransportation case. he test data used in experiments consist of unstructured video data collected with fixed cameras and mobile phones, semi-structured XML documents about traffic information, and structured trajectory data. After dimensionality reduction, the core tensor and the truncated bases are small to store, but accurate and fast for reconstruction of big data. 6.1 Demonstration of ensor Unfolding We construct a five-order tensor R by extracting three frames from unstructured video clip and three users from semi-structured XML document. Fig. 11(a) shows the five unfolded matrices of tensor. he five orders represent height, width, color space, time and user respectively. (2) (1) (3) (4) (5) (a) Five unfolded matrices of five-order tensor. I er I u I s I t Incremental ensor Data Previous ensor Data Video User Order Inconsistency (b) Incremental data on unfolded matrices of eight-order tensor. Fig. 11. Heterogenous tensor unfolding and incremental tensor unfolding. o demonstrate incremental tensor unfolding, an eight-order tensor R It Is Iu I h I w I c I ec I er is constructed. Incremental data are appended along the time order I t. Unfolded matrices of the combined new tensor (initial tensor and incremental tensor) are shown in Fig. 11(b). Order inconsistency of the new tensor occurs in order I t, because the incremental data are appended as rows on the bottom of the unfolded matrix.

10 his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation 10 45% 41% Dimensionality Reduction Ratio (ρ) Reconstruction Error Ratio (e) 35% Ratio Fig. 11(a), Fig. 11(b) and Fig. 8 in Section 4 illustrate the tensor model from different viewpoints, and demonstrate how the heterogeneous data are stacked together. Fig. 8 demonstrates the procedure of embedding unstructured video data and semi-structured XML document to a three-order tensor, as well as Fig. 11(a) and Fig. 11(b) show the inner elements of the unified tensor model. 25% 18% 24% 15% 5% 5% 6.2 Dimensionality Reduction and Approximation Error Approximate ensor 2 Experiment No. 3 25% Core ensor runcated U 1 runcated U2 runcated U,U,U 3 15% 4 5 5% 1 Initial ensor 4% (a) Proportion here exists a tradeoff between dimensionality reduction and approximation error. Fig. 12 shows two video frames reconstructed from the above five-order tensor under three different approximation error ratio, namely 4%, 7%, and 24%. Fig. 13(a) plots the two ratios together, and illustrates that the reconstruction error ratio increases gradually as the dimensionality reduction ratio decreases. he core data sets are composed of core tensor S and truncated orthogonal bases U1,..., U5. Fig. 13(b) shows their proportions to the dimensionality reduction ratio. Generally, the proportion of the core tensor is bigger than the truncated bases. 1 7% 2 Experiment No. 3 (b) Fig. 13. (a) radeoff between dimensionality reduction and reconstruction error; (b) Proportion of the core tensor to truncated orthogonal bases. e = 0% e = 0.4% e = 7% e = 24% Fig. 12. Video frames reconstructed with different approximation error ratios. Diverse data types can result in different dimensionality reduction ratios and approximation error ratios. With repeated experiments on video clips, XML documents and GPS data, the results show that the core set containing 18% elements can guarantee 93% accuracy in general. In practice, the balance between dimensionality reduction and computation accuracy is determined by the application requirement. 6.3 ime and Memory Comparison Compared with the general High Order Singular Value Decomposition method, the proposed incremental High Order Singular Value Decomposition method is efficient and memory saving. o evaluate the two decomposition methods, we perform them in computers of Intel Core (M) i5 CPU at 3.2 GHZ with total 4 cores and 8 GB RAM. We divide the unified tensor to four blocks and normalize the tensor size as well as the decomposition time for better comparison. During the process of dimensionality reduction, the general HOSVD method integrates the additional tensor blocks with previous tensor blocks to generate a new tensor which is then repeatedly decomposed. Different from this type of repeated HOSVD method, the incremental HOSVD method updates the truncated orthogonal bases and dynamically computes the core tensor. Fig. 14 demonstrates that the decomposition time of the repeated HOSVD method is greater than the incremental HOSVD method. Additionally, decomposition time of the incremental HOSVD method increases more gently than the repeated HOSVD method from the normalized tensor size As the normalized tensor size grows beyond 0.75, the repeated HOSVD method runs out of memory while the incremental HOSVD method continues to run. From theoretical point of view, with more orthogonal bases are appended to the left singular vector matrix, the middle quasi-diagonal contains less orthogonal columns, and the time consumption during the diagonalization process decreases. In brief, the incremental HOSVD method is more efficient because it projects additional tensor unfolding to previous truncated orthogonal bases rather than directly execute the orthogonalization procedure.

11 11 Nomalized Decomposition ime Incremental HOSVD Repeated HOSVD Out of Memory Normalized ensor Size Fig. 14. Comparison between the repeated HOSVD method and the incremental HOSVD method. 7 RELAED WORK his section reviews related works on data representation and high order singular value decomposition. Data Representation: Big data are composed of unstructured, semi-structured and structured data. In particular, the multimedia as an unstructured data, is mostly encoded as MPEG4 and H.264. MPEG-4 [12] is a method for defining compression of audio and visual digital data. H.264 [13] is a widely used standard for video compression. he semi-structured Extensible Markup Language (XML) [14] is a flexible text format that defines a set of rules for Encoding documents. XML is both for human-readable and machine-readable. he characteristics making up an XML document are divided into markup and content. Kim and Candan [15] proposed a tensor-based relational data model that can process multi-dimensional structured data. Ontology, such as resource description framework (RDF) [16] and web ontology language (OWL) [17], is playing an ever important role in the exchange of a wide variety of data. Higher Order Singular Value Decomposition: A tensor [6, 7] is the generalisation of a matrix and usually called multidimensional array. ensor is a more effective data representation model from which valuable information can be extracted using high order singular value decomposition (HOSVD) [8] method. Because HOSVD imposes orthogonal constraints on the truncated column bases, it may be considered as a special case of the commonly used UCKER [18] decomposition. Although low rank truncation of the HOSVD is not the best approximation of the initial data, it is considered to be sufficiently good for many applications. Analysis and mining of data with HOSVD has been adopted in many applications such as tag recommendations [19, 20], trajectory indexing and retrieval [21], hand-written digit classification [22]. Studies of data representation and dimensionality reduction have been reported in literatures. However, unified model for heterogenous data representation has been neglected, as well as decomposition problems during incremental data processing have not been considered. he contributions of this paper are using a unified tensor model to represent the large scale heterogeneous data and developing an efficient approach for extracting the high-quality core tensor which is small but contains valuable information. 8 CONCLUSION his paper aims at representing and processing the large scale heterogeneous data generated from multiple sources. Firstly, we present a unified tensor-based data representation model that can integrate unstructured, semi-structured and structured data. Secondly, according to the proposed model, an incremental high order singular value decomposition (IHOSVD) method is proposed for dimensionality reduction on big data. We prove two theorems that can solve the problem of decomposition recalculation and order inconsistency. Finally, an intelligent transportation case is investigated for evaluating the method. heoretical analyses and experimental results of the case study provide the evidences that the proposed data representation model and incremental dimensionality reduction method are promising, and they pave a way for efficiently mining and analyzing in big data applications. 9 ACKNOWLEDGMEN his work was supported by the National Nature Science Foundation of China under Grant and by the Fundamental Research Funds for the Central Universities, HUS: CXY13Q017 and 2013QN122. REFERENCES [1] I. F. Cruz and H. Xiao, Ontology Driven Data Integration in Heterogeneous Networks, in Complex Systems in Knowledge-Based Environments: heory, Models and Applications. Springer, 2009, pp [2] H. Abdi and L. J. Williams, Principal Component Analysis, Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp , [3] M. Brand, Incremental Singular Value Decomposition of Uncertain Data with Missing Values, in Computer Vision ECCV Springer, 2002, pp [4] J. Sun, D. ao, and C. Faloutsos, Beyond Streams and Graphs: Dynamic ensor Analysis, in Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2006, pp

12 his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation 12 [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] E. Henry, J. Hofrichter et al., Singular Value Decomposition: Application to Analysis of Experimental Data, Essential Numerical Computer Methods, vol. 210, pp , C. M. Martin, ensor Decompositions Workshop Discussion Notes, American Institute of Mathematics, G. Kolda and B. W. Bader, ensor Decompositions and Applications, SIAM Review, vol. 51, no. 3, pp , L. De Lathauwer, B. De Moor, and J. Vandewalle, A Multilinear Singular Value Decomposition, SIAM Journal on Matrix Analysis and Applications, vol. 21, no. 4, pp , H. Anton, Elementary Linear Algebra. Wiley. com, C. Meyer, Matrix Analysis and Applied Linear Algebra Book and Solutions Manual. SIAM, 2000, vol. 2. L. De Lathauwer, B. De Moor, and J. Vandewalle, On the Best Rank-1 and Rank-(R 1, R 2,..., Rn) Approximation of Higher-Order ensors, SIAM Journal on Matrix Analysis and Applications, vol. 21, no. 4, pp , I. E. Richardson, H. 264 and MPEG-4 Video Compression: Video Coding for Next-Generation Multimedia. Wiley. com, D. Marpe,. Wiegand, and G. J. Sullivan, he H. 264/MPEG4 Advanced Video Coding Standard and Its Applications, IEEE Communications Magazine, vol. 44, no. 8, pp , E. Van der Vlist, XML Schema: he W3C s ObjectOriented Descriptions for XML. O Reilly Media, Inc., M. Kim and K. S. Candan, Approximate ensor Decomposition within a ensor-relational Algebraic Framework, in Proc. of the 20th ACM International Conference on Information and Knowledge Management. ACM, 2011, pp I. Horrocks, P. F. Patel-Schneider, and F. Van Harmelen, From SHIQ and RDF to OWL: he Making of a Web Ontology Language, Web Semantics: Science, Services and Agents on the World Wide Web, vol. 1, no. 1, pp. 7 26, D. L. McGuinness, F. Van Harmelen et al., OWL Web Ontology Language Overview, W3C Recommendation, vol. 10, p. 10, L. R. ucker, Some Mathematical Notes on hree-mode Factor Analysis, Psychometrika, vol. 31, no. 3, pp , P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos, ag Recommendations Based on ensor Dimensionality Reduction, in Proc. of the 2008 ACM Conference on Recommender Systems. ACM, 2008, pp R. Wetzker, C. Zimmermann, C. Bauckhage, and S. Albayrak, I ag, You ag: ranslating ags for Advanced User Models, in Proc. of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 2010, pp [21] Q. Li, X. Shi, and D. Schonfeld, A General Framework for Robust HOSVD-Based Indexing and Retrieval with High-Order ensor Data, in Proc. of the 36th IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2011, pp [22] B. Savas and L. Elde n, Handwritten Digit Classification Using Higher Order Singular Value Decomposition, Pattern Recognition, vol. 40, no. 3, pp , Liwei Kuang is currently studying for the PhD degree in School of Computer Science and echnology at Huazhong University of Science and echnology, Wuhan, China. He received the master s degree in School of Computer Science from Hubei University of echnology, Wuhan, China, in From 2004 to 2012, he was a Research Engineer with FiberHome echnologies Group, Wuhan, China. His research interests include Big Data, Pervasive Computing and Cloud Computing. Fei Hao is an assistant professor in Huazhong University of Science and echnology. He received the B.S. and M.S. degrees in School of Mathematics and Computer Engineering from Xihua University, Chengdu, China, in 2005 and 2008, respectively. He was a research assistant at Korea Advanced Institute of Science and echnology and Hangul Engineering Research Center, Korea. He has published over 30 research papers in international and national Journals as well as conferences. His research interests include social computing, big data analysis and processing and mobile cloud computing. Laurence. Yang received the B.E. degree in Computer Science and echnology from singhua University, China and the PhD degree in Computer Science from University of Victoria, Canada. He is a professor in the School of Computer Science and echnology at Huazhong University of Science and echnology, China, and in the Department of Computer Science, St. Francis Xavier University, Canada. His research interests include parallel and distributed computing, embedded and ubiquitous/pervasive computing, and Big Data. His research has been supported by the National Sciences and Engineering Research Council, and the Canada Foundation for Innovation.

13 13 Man Lin received the B.E. degree in Computer Science and echnology from singhua University, China, She received the Lic. and Ph.D degrees from the Department of Computer Science and Information at Linkopings University, Sweden, in 1997 and 2000, respectively. She is currently an associate professor in Computer Science at St. Francis Xavier University, Canada. Her research interests include system design and analysis, power aware scheduling, optimization algorithms. Her research is supported by NSERC (National Sciences and Engineering Research Council, Canada) and CFI (Canada Foundation for Innovation). Changqing Luo received his B.E. and M.E. degree from Chongqing University of Posts and elecommunications in 2004 and 2007, respectively, and the Ph.D. from Beijing University of Posts and elecommunications in 2011, all in Electrical Engineering. After the graduation, he joined the school of Computer Science and echnology, Huazhong University of Science and echnology in 2011, where he currently works as an Assistant Professor. His current research focuses on algorithms and optimization for wireless networks, cooperative communication, green communication, resouce management in heterogeneous wireless networks, and mobile cloud computing. Geyong Min is a Professor of High Performance Computing and Networking in the Department of Mathematics and Computer Science within the College of Engineering, Mathematics and Physical Sciences at the University of Exeter, United Kingdom. He received the PhD degree in Computing Science from the University of Glasgow, United Kingdom, in 2003, and the B.Sc. degree in Computer Science from Huazhong University of Science and echnology, China, in His research interests include Next Generation Internet, Wireless Communications, Multimedia Systems, Information Security, High Performance Computing, Ubiquitous Computing, Modelling and Performance Engineering.

The Data-as-a-Service Framework for Cyber-Physical-Social Big Data

The Data-as-a-Service Framework for Cyber-Physical-Social Big Data The Data-as-a-Service Framework for Cyber-Physical-Social Big Data Laurence T. Yang ( 杨 天 若 ), China St Francis Xavier University, Canada 1 Outline 1. Hyper (Cyber-Physical-Social)World 2. Big Data and

More information

The Data-as-a-Service Framework for Cyber-Physical-Social Big Data

The Data-as-a-Service Framework for Cyber-Physical-Social Big Data The Data-as-a-Service Framework for Cyber-Physical-Social Big Data Laurence Tianruo Yang ( 杨 天 若 ) Huazhong University of Science and Technology, China St Francis Xavier University, Canada 1 Outline 1.

More information

A Direct Numerical Method for Observability Analysis

A Direct Numerical Method for Observability Analysis IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]

More information

SYMMETRIC EIGENFACES MILI I. SHAH

SYMMETRIC EIGENFACES MILI I. SHAH SYMMETRIC EIGENFACES MILI I. SHAH Abstract. Over the years, mathematicians and computer scientists have produced an extensive body of work in the area of facial analysis. Several facial analysis algorithms

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Hybrid Lossless Compression Method For Binary Images

Hybrid Lossless Compression Method For Binary Images M.F. TALU AND İ. TÜRKOĞLU/ IU-JEEE Vol. 11(2), (2011), 1399-1405 Hybrid Lossless Compression Method For Binary Images M. Fatih TALU, İbrahim TÜRKOĞLU Inonu University, Dept. of Computer Engineering, Engineering

More information

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs. Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Previous Lecture 13 Indexes for Multimedia Data 13.1

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft

More information

Semantic Concept Based Retrieval of Software Bug Report with Feedback

Semantic Concept Based Retrieval of Software Bug Report with Feedback Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 3, 2014 Text Analytics (Text Mining) LSI (uses SVD), Visualization Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

Tensor Factorization for Multi-Relational Learning

Tensor Factorization for Multi-Relational Learning Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany [email protected] 2 Siemens AG, Corporate

More information

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

By choosing to view this document, you agree to all provisions of the copyright laws protecting it. This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

1 Example of Time Series Analysis by SSA 1

1 Example of Time Series Analysis by SSA 1 1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 [email protected],

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

An Efficient and Scalable Management of Ontology

An Efficient and Scalable Management of Ontology An Efficient and Scalable Management of Ontology Myung-Jae Park 1, Jihyun Lee 1, Chun-Hee Lee 1, Jiexi Lin 1, Olivier Serres 2, and Chin-Wan Chung 1 1 Korea Advanced Institute of Science and Technology,

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

Computational Optical Imaging - Optique Numerique. -- Deconvolution -- Computational Optical Imaging - Optique Numerique -- Deconvolution -- Winter 2014 Ivo Ihrke Deconvolution Ivo Ihrke Outline Deconvolution Theory example 1D deconvolution Fourier method Algebraic method

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING

TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING Sunghae Jun 1 1 Professor, Department of Statistics, Cheongju University, Chungbuk, Korea Abstract The internet of things (IoT) is an

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

MUSIC-like Processing of Pulsed Continuous Wave Signals in Active Sonar Experiments

MUSIC-like Processing of Pulsed Continuous Wave Signals in Active Sonar Experiments 23rd European Signal Processing Conference EUSIPCO) MUSIC-like Processing of Pulsed Continuous Wave Signals in Active Sonar Experiments Hock Siong LIM hales Research and echnology, Singapore hales Solutions

More information

Solving Systems of Linear Equations

Solving Systems of Linear Equations LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

More information

Search Engine Based Intelligent Help Desk System: iassist

Search Engine Based Intelligent Help Desk System: iassist Search Engine Based Intelligent Help Desk System: iassist Sahil K. Shah, Prof. Sheetal A. Takale Information Technology Department VPCOE, Baramati, Maharashtra, India [email protected], [email protected]

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015 W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction

More information

Data Storage 3.1. Foundations of Computer Science Cengage Learning

Data Storage 3.1. Foundations of Computer Science Cengage Learning 3 Data Storage 3.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List five different data types used in a computer. Describe how

More information

Server Load Prediction

Server Load Prediction Server Load Prediction Suthee Chaidaroon ([email protected]) Joon Yeong Kim ([email protected]) Jonghan Seo ([email protected]) Abstract Estimating server load average is one of the methods that

More information

Lecture 5: Singular Value Decomposition SVD (1)

Lecture 5: Singular Value Decomposition SVD (1) EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system

More information

Algorithmic Techniques for Big Data Analysis. Barna Saha AT&T Lab-Research

Algorithmic Techniques for Big Data Analysis. Barna Saha AT&T Lab-Research Algorithmic Techniques for Big Data Analysis Barna Saha AT&T Lab-Research Challenges of Big Data VOLUME Large amount of data VELOCITY Needs to be analyzed quickly VARIETY Different types of structured

More information

Load Distribution on a Linux Cluster using Load Balancing

Load Distribution on a Linux Cluster using Load Balancing Load Distribution on a Linux Cluster using Load Balancing Aravind Elango M. Mohammed Safiq Undergraduate Students of Engg. Dept. of Computer Science and Engg. PSG College of Technology India Abstract:

More information

FCE: A Fast Content Expression for Server-based Computing

FCE: A Fast Content Expression for Server-based Computing FCE: A Fast Content Expression for Server-based Computing Qiao Li Mentor Graphics Corporation 11 Ridder Park Drive San Jose, CA 95131, U.S.A. Email: qiao [email protected] Fei Li Department of Computer Science

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

The Image Deblurring Problem

The Image Deblurring Problem page 1 Chapter 1 The Image Deblurring Problem You cannot depend on your eyes when your imagination is out of focus. Mark Twain When we use a camera, we want the recorded image to be a faithful representation

More information

On the Standardization of Semantic Web Services-based Network Monitoring Operations

On the Standardization of Semantic Web Services-based Network Monitoring Operations On the Standardization of Semantic Web Services-based Network Monitoring Operations ChenglingZhao^, ZihengLiu^, YanfengWang^ The Department of Information Techonlogy, HuaZhong Normal University; Wuhan,

More information

Low-resolution Character Recognition by Video-based Super-resolution

Low-resolution Character Recognition by Video-based Super-resolution 2009 10th International Conference on Document Analysis and Recognition Low-resolution Character Recognition by Video-based Super-resolution Ataru Ohkura 1, Daisuke Deguchi 1, Tomokazu Takahashi 2, Ichiro

More information

Keywords: Image Generation and Manipulation, Video Processing, Video Factorization, Face Morphing

Keywords: Image Generation and Manipulation, Video Processing, Video Factorization, Face Morphing TENSORIAL FACTORIZATION METHODS FOR MANIPULATION OF FACE VIDEOS S. Manikandan, Ranjeeth Kumar, C.V. Jawahar Center for Visual Information Technology International Institute of Information Technology, Hyderabad

More information

Process Mining by Measuring Process Block Similarity

Process Mining by Measuring Process Block Similarity Process Mining by Measuring Process Block Similarity Joonsoo Bae, James Caverlee 2, Ling Liu 2, Bill Rouse 2, Hua Yan 2 Dept of Industrial & Sys Eng, Chonbuk National Univ, South Korea jsbae@chonbukackr

More information

Email Spam Detection Using Customized SimHash Function

Email Spam Detection Using Customized SimHash Function International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

More information

HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER

HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER Gholamreza Anbarjafari icv Group, IMS Lab, Institute of Technology, University of Tartu, Tartu 50411, Estonia [email protected]

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka [email protected] http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

More information

A simple and fast algorithm for computing exponentials of power series

A simple and fast algorithm for computing exponentials of power series A simple and fast algorithm for computing exponentials of power series Alin Bostan Algorithms Project, INRIA Paris-Rocquencourt 7815 Le Chesnay Cedex France and Éric Schost ORCCA and Computer Science Department,

More information

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

Chapter 6. Orthogonality

Chapter 6. Orthogonality 6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

More information

Binary Image Scanning Algorithm for Cane Segmentation

Binary Image Scanning Algorithm for Cane Segmentation Binary Image Scanning Algorithm for Cane Segmentation Ricardo D. C. Marin Department of Computer Science University Of Canterbury Canterbury, Christchurch [email protected] Tom

More information

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries First Semester Development 1A On completion of this subject students will be able to apply basic programming and problem solving skills in a 3 rd generation object-oriented programming language (such as

More information

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition K. Osypov* (WesternGeco), D. Nichols (WesternGeco), M. Woodward (WesternGeco) & C.E. Yarman (WesternGeco) SUMMARY Tomographic

More information

Solution of Linear Systems

Solution of Linear Systems Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

More information

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning By: Shan Suthaharan Suthaharan, S. (2014). Big data classification: Problems and challenges in network

More information

ADVANCED APPLICATIONS OF ELECTRICAL ENGINEERING

ADVANCED APPLICATIONS OF ELECTRICAL ENGINEERING Development of a Software Tool for Performance Evaluation of MIMO OFDM Alamouti using a didactical Approach as a Educational and Research support in Wireless Communications JOSE CORDOVA, REBECA ESTRADA

More information

Internet Video Streaming and Cloud-based Multimedia Applications. Outline

Internet Video Streaming and Cloud-based Multimedia Applications. Outline Internet Video Streaming and Cloud-based Multimedia Applications Yifeng He, [email protected] Ling Guan, [email protected] 1 Outline Internet video streaming Overview Video coding Approaches for video

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Lecture: MWF: 1:00-1:50pm, GEOLOGY 4645 Instructor: Mihai

More information

Similar matrices and Jordan form

Similar matrices and Jordan form Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

A Robust and Lossless Information Embedding in Image Based on DCT and Scrambling Algorithms

A Robust and Lossless Information Embedding in Image Based on DCT and Scrambling Algorithms A Robust and Lossless Information Embedding in Image Based on DCT and Scrambling Algorithms Dr. Mohammad V. Malakooti Faculty and Head of Department of Computer Engineering, Islamic Azad University, UAE

More information

Speed Performance Improvement of Vehicle Blob Tracking System

Speed Performance Improvement of Vehicle Blob Tracking System Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA [email protected], [email protected] Abstract. A speed

More information

Operation Count; Numerical Linear Algebra

Operation Count; Numerical Linear Algebra 10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point

More information

Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder

Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder Performance Analysis and Comparison of 15.1 and H.264 Encoder and Decoder K.V.Suchethan Swaroop and K.R.Rao, IEEE Fellow Department of Electrical Engineering, University of Texas at Arlington Arlington,

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

Masters in Human Computer Interaction

Masters in Human Computer Interaction Masters in Human Computer Interaction Programme Requirements Taught Element, and PG Diploma in Human Computer Interaction: 120 credits: IS5101 CS5001 CS5040 CS5041 CS5042 or CS5044 up to 30 credits from

More information

Big Data Driven Knowledge Discovery for Autonomic Future Internet

Big Data Driven Knowledge Discovery for Autonomic Future Internet Big Data Driven Knowledge Discovery for Autonomic Future Internet Professor Geyong Min Chair in High Performance Computing and Networking Department of Mathematics and Computer Science College of Engineering,

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker

More information

RN-coding of Numbers: New Insights and Some Applications

RN-coding of Numbers: New Insights and Some Applications RN-coding of Numbers: New Insights and Some Applications Peter Kornerup Dept. of Mathematics and Computer Science SDU, Odense, Denmark & Jean-Michel Muller LIP/Arénaire (CRNS-ENS Lyon-INRIA-UCBL) Lyon,

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

NETCONF-based Integrated Management for Internet of Things using RESTful Web Services

NETCONF-based Integrated Management for Internet of Things using RESTful Web Services NETCONF-based Integrated Management for Internet of Things using RESTful Web Services Hui Xu, Chunzhi Wang, Wei Liu and Hongwei Chen School of Computer Science, Hubei University of Technology, Wuhan, China

More information

RN-Codings: New Insights and Some Applications

RN-Codings: New Insights and Some Applications RN-Codings: New Insights and Some Applications Abstract During any composite computation there is a constant need for rounding intermediate results before they can participate in further processing. Recently

More information

by the matrix A results in a vector which is a reflection of the given

by the matrix A results in a vector which is a reflection of the given Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that

More information

General Framework for an Iterative Solution of Ax b. Jacobi s Method

General Framework for an Iterative Solution of Ax b. Jacobi s Method 2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Three Effective Top-Down Clustering Algorithms for Location Database Systems Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr

More information

Applied Linear Algebra I Review page 1

Applied Linear Algebra I Review page 1 Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties

More information

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively. Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

More information

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction Latent Semantic Indexing with Selective Query Expansion Andy Garron April Kontostathis Department of Mathematics and Computer Science Ursinus College Collegeville PA 19426 Abstract This article describes

More information

Statistical Modeling of Huffman Tables Coding

Statistical Modeling of Huffman Tables Coding Statistical Modeling of Huffman Tables Coding S. Battiato 1, C. Bosco 1, A. Bruna 2, G. Di Blasi 1, G.Gallo 1 1 D.M.I. University of Catania - Viale A. Doria 6, 95125, Catania, Italy {battiato, bosco,

More information

Greedy Column Subset Selection for Large-scale Data Sets

Greedy Column Subset Selection for Large-scale Data Sets Knowledge and Information Systems manuscript No. will be inserted by the editor) Greedy Column Subset Selection for Large-scale Data Sets Ahmed K. Farahat Ahmed Elgohary Ali Ghodsi Mohamed S. Kamel Received:

More information