The Data-as-a-Service Framework for Cyber-Physical-Social Big Data Laurence Tianruo Yang ( 杨 天 若 ) Huazhong University of Science and Technology, China St Francis Xavier University, Canada 1
Outline 1. Hyper (Cyber-Physical-Social)World 2. Big Data and Challenges 3. A Tensor-Based Approach 4. Illustrations and Examples 5. Conclusion Huazhong University of Science and Technology 2
Hyper (Cyber-Physical-Social) World Social World Physical World Cyber World Hyper World What we are and what we can do in the Hyper World? Huazhong University of Science and Technology 4
Hyperspace Cyber computation, communication, and control that are discrete, logical, and switched Physical natural and human-made systems governed by the laws of physics and operating in continuous time Social socially produced space, serves as a tool of thoughts, action, mind, sense, interaction, communications, etc. Hyperspace = Cyber Space + Physical Space + Social Space Huazhong University of Science and Technology 5
Social Space Social Comuting (Internet of People) u-human-computer Interface (BCI) Social networks Cyber networks Pervasive/Ubiquitous Intelligence Cyber Space (hybrid world) u-things (Internet of Things) u-systems/services Physical Space Physical networks 6
Characteristics of Hyperworld Its basic characteristic is direct mapping between virtual and real worlds via active devices including sensors, actuators, micro-machines, robots, etc. -- Kunii, Ma & Huang, Keynote Hyperworld Modeling in VIS 96 C Data/Info Explosion! Last 5 years web data > 5,000 years whole data C C C Connection Explosion! C2C D2D H2H M2M CPS T2T (IoT) Service/App Explosion! WbS SaaS PaaS IaaS EaaS Cloud Intelligence Explosion! Ubi-Sensing CA TTT AmI Smart W/P Huazhong University of Science and Technology 7
Intelligent Computing Waves 1 st : AI (Logic/KL-based, cyber-based) - Machine learning - NLP & Comp-Vision - Robot & game theory - Expert system - Knowledge/Reasoning - DAI (Distributed AI) 3 rd : Social-based - Autonomous software - Multi/Massive agents - Agent language - Agent negotiation & cooperation - Personal/social behavior - Web intelligence/semantics 2 nd : Physical-based - Probabilistic computing - Fuzzy logic - Neural network - GA/Evolutionary computing - Chaotic/Swarm computing - Biologic computing 4 th : Cyber-Physical-Social - Physical/everyday things intelligence - Atop of the above 3 intelligent comp - Scale, dynamic, heterogeneous, spontaneous - Predictable, controllable, adaptable, manageable, ethic, - Others-aware & self-aware mind/spirit? Huazhong University of Science and Technology 8
HyperWorld Individuals Companies/Societies Data Social Space Social networks Internet /WWW SEA-net Data Cyber space Cyber networks Data Physical Space Physical networks Huazhong University of Science and Technology 10
Workstation Workstation Agents Workstation Workstation Workstation Workstation Social World Workstation Human Companies/Societies Socio-culture & organizational components Individuals activities Services Wisdom Knowledge Information Data Data Cycle Cyber World Things The Internet of Things Physical World Huazhong University of Science and Technology 11
Outline 1. Smart Word and Hyper World 2. Big Data and Challenges 3. A Tensor-Based Approach 4. Illustrations and Examples 5. Conclusion Huazhong University of Science and Technology 12
Big Data What s the difference between large data and big data? Large Data: GB Massive Data: TB Big Data: PB Just Size? Huazhong University of Science and Technology 13
Features of Big Data 4Vs Features Volume Terabytes, Petabytes, Volume Variety Structured Semi-Structured Un-Structured Velocity Variety Big Data Veracity Streaming Data Processing time requirement Veracity Velocity Quality of Data Huazhong University of Science and Technology 14
Cyber-Physical-Social Data Huazhong University of Science and Technology 15
Challenges Caused by Big Data Volume Large Scale Data, Limited Compute Resources Variety Diversified Format, Unified Representation Velocity Requirement for quick processing Veracity Inconsistent, Incomplete, Redundant Data Huazhong University of Science and Technology 16
How to tackle the problems caused by the 4Vs of Big Data? Huazhong University of Science and Technology 17
Outline 1. Smart Word and Hyper World 2. Big Data and Challenges 3. A Tensor-Based Framework 4. Illustrations and Examples 5. Conclusion Huazhong University of Science and Technology 18
A Tensor-Based Approach 3.1 Tensor and Tensor Decomposition 3.2 4Vs Problem with Tensor 3.3 A Framework for Big Data Analysis and Mining Huazhong University of Science and Technology 19
Tensor Definition Mathematics: Tensor Computer Science: Multiple Dimensional Arrays One Order Tensor: Vector Three Order Tensor Two Order Tensor: Matrix L.Ddecomposition. SIAM Journal of Matrix Analysis and Applications, 21(4), pages 1253 1278, (2000) Huazhong University of Science and Technology 20
Tensor Computations and Algebra Kronecker product. Tensor-tensor product Inverse Identity tensor Transpose Tensor-matrix mode product Stefan Ragnarsson, Ph.D thesis, 2012, Cornell 1 2 3 4 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 Equivalently: Unfolding on mode 1 (take mode 1 fibers and linearize).
Illustration Model-N Unfolding Huazhong University of Science and Technology 22
Illustration d 2 d 1 d 3 d 2 d 2 d 1 d 3 unfold transpose d 2 d 1 d 3 transpose d 2 d 1 d 3 fold d 1 d 2 d 3 Rotate 90º clockwise. d 1 d 3 d 1 Rotate and repeat. d 3 d 2 d 3 d 2 d 1 d 2 d 1 d 3
Formal Description a, i I, j I, k I, l I, m I, n I ijklmn t x y z c u Huazhong University of Science and Technology 24
Tensor Decomposition Tensor decomposition for dimension reduction Big data are modeled as tensors Big Data Huazhong University of Science and Technology 25
Huazhong University of Science and Technology 26
Decomposition Process Data SVD Matrix data SVD Approximation Data A (1) A (2) a a 11 1n m1 a a mn A U V Coresets A (k) Huazhong University of Science and Technology 27
Huazhong University of Science and Technology 28
Singular Value Decomposition (SVD) X = U V T X U V T 1 2 v 1 x (1) x (2) x (M) = u 1 u 2 u k.. v 2 k v k singular values right singular vectors input data left singular vectors 2007 Jimeng Sun
Huazhong University of Science and Technology 30
Huazhong University of Science and Technology 31
Huazhong University of Science and Technology 32
Huazhong University of Science and Technology 33
Huazhong University of Science and Technology 34
Huazhong University of Science and Technology 35
Huazhong University of Science and Technology 36
Huazhong University of Science and Technology 37
Huazhong University of Science and Technology 38
Huazhong University of Science and Technology 39
Huazhong University of Science and Technology 40
Huazhong University of Science and Technology 41
Huazhong University of Science and Technology 42
Huazhong University of Science and Technology 43
Huazhong University of Science and Technology 44
Incremental SVD A U V 0 Update the singular vectors and singular values by using the incremental decomposition results of matrix A1 Incremental Matrix 1 ' ' ' V 0 A0 A U J U V 0 I Updated U Updated V Updated Embedded and Pervasive Computing Lab 45
HO-SVD Tensor Decomposition Approximation Assemble Model-N Unfolding Huazhong University of Science and Technology 46
Approximate Tensor Calculation Matrix Unfolding SVD for Every Unfolding Matrix c Approximate Tensor Core Tensor Huazhong University of Science and Technology 47
Incremental HOSVD Original SVD Incremental SVD Huazhong University of Science and Technology 48
A Tensor-Based Approach 3.1 Tensor and Tensor Decomposition 3.2 4Vs Problem with Tensor 3.3 A Framework for Big Data Analysis and Mining Huazhong University of Science and Technology 49
Variety Volume <University> <Student> <Category> One frame <ID> <Name> <Research> Variety Big Data Veracity I y I x I name I name I y I x I id I id I t I id Velocity Huazhong University of Science and Technology 50
Frame y a Video z l k j x XML Documents RDF OWL m n i t Users Social Relationships c 6-Order Tensor u
Variety 201 76 44 234 123 67 9 134 84 Embedded and Pervasive Computing Lab 52
I s GPS Video I u L. Kuang, F. Hao, L.T. Yang, M. Lin, C. Luo and G. Min, A Tensor-Based Approach for Big Data Representation and Dimensionality Reduction, IEEE Transactions on Emerging Topics in Computing (TETC), 2014, DOI: 10.1109/TETC.2014.2330516. I t XML Document Unified Representation for Big Data L. Kuang, F. Hao, L.T. Yang, M. Lin, C. Luo and G. Min, A Tensor-Based Approach for Big Data Representation and Dimensionality Reduction, published at IEEE Transactions on Emerging Topics in Computing (TETC), 2014.
Velocity Recalculation Volume time Variety Big Data Veracity t n t 3 t 2 t 1 Velocity CPU Memory... CPU Memory Huazhong University of Science and Technology 54
研 究 内 容 3: 大 数 据 简 约 计 算 方 法 研 究 正 交 基 增 量 张 量 增 量 核 心 张 量 增 量 Updated Basis Incremental Data 55
Incremental HO-SVD Huazhong University of Science and Technology 56
Veracity Volume Variety Big Data Veracity Original Tensor Core Tensor Velocity Extract High Quality Core Tensor from PB(TB) Level Medial Tensor Error Constraints Huazhong University of Science and Technology 57
Volume Volume High Order Tensor Variety Big Data Value Matrix Unfolding A A( u ) () t... Velocity Incremental Lanczos at matrix level...... Ci 1 C i k Ci 1 C i k Three Layers Incremental Parallel Strategies Two layers incremental SVD parallel strategies on servers Huazhong University of Science and Technology 58
Transparent Computing vs. Cloud Computing Computation in sever: Synthetic SVD Results Compute Core Tensor All computation are performed in sever SVD Incremental SVD Compute Core Tensor Transparent Server Transparent Network Cloud Computing Transparent Client Computation in client: SVD Incremental SVD Embedded and Pervasive Computing Lab 59
Distributed Computing with Multi-Objective Optimization f 31 f 32 f...... f 21 f 22 f... TN......... f 11 f 12 f...... Computation Time Memory Usage Energy Consumption Communication Traffic OPT : minz Tim Mem Egy Tra 1 2 3 4 L. Kuang, Y. Zhang, L.T. Yang, J. Chen, F. Hao, and C. Luo, A Holistic Approach to Distributed Dimensionality Reduction of Big Data, Accepted at IEEE Transactions on Cloud Computing (TCC), 2014.. Huazhong University of Science and Technology 60
A Tensor-Based Approach 3.1 Tensor and Tensor Decomposition 3.2 4Vs Problem with Tensor 3.3 A Framework for Big Data Analysis and Mining Huazhong University of Science and Technology 61
Big Data Framework Huazhong University of Science and Technology 62
The Framework 1. Secure Dimensionality Reduction 2. Deep Computation Models 3. Big Data Ranking and Retrieval Huazhong University of Science and Technology 63
1. Secure Dimensionality Reduction Huazhong University of Science and Technology
1. Secure Dimensionality Reduction Huazhong University of Science and Technology 65
1. Secure Dimensionality Reduction L. Kuang, L.T. Yang, and J. Feng, A Novel Secure Big Data Dimensionality Reduction Scheme, Accepted at IEEE Transactions on Cloud Computing (TCC), 2015. Huazhong University of Science and Technology 66
2. Big Data + Tensor +Deep Learning = Deep Computation Basic Deep Computation Model: Basic Deep Computation Model: Input/Output: Tensor-Based Data A General Deep Learning Model for Big Data Hidden: Tensor-Based Features Unsupervised Feature Learning Parameter: Tensor-Based Parameters Bridge Between Representation and Mining Clustering Classification Prediction Deep Computation Model Tensor-Based Representation Huazhong University of Science and Technology
2. Big Data + Tensor+ Deep Learning = Deep Computation Basic Deep Computation Model: Basic Deep Computation Model Input/Output: Tensor-Based Data Tensor-Based Computing Hidden: Tensor-Based Features Tensor Distance Data Distribution Parameter: Tensor-Based Parameters Cost Function Tensor Distance Huazhong University of Science and Technology I 1 I 2 I N X R x X i I j N l i 12 i... i N j j N ; (1, 1 ) j 1 l i ( i 1) I 1 j 2 j 0 1 0 I 1 I 2 I N T TD l, m 1 l m l l m m d g ( x y ) ( x y ) ( x y ) G( x y ) g lm 2 1 p p 2 2 exp{ l m 2 } 2 2 p p ( i i ) ( i i ) ( i i ) 2 2 2 l m 2 1 1 2 2 N N Q. Zhang, L.T. Yang, and Z. Chen, A Deep Computational Model for Unsupervised Feature Learning for Big Data, IEEE Transactions on Service Computers (TSC), major revision, 2015.
2. Incremental Deep Computation Model Incremental Model Requirements: Key Challenges: Adaption: Learn features on new samples Parameters updating incrementally: nonlinear optimization problem Preservation: Preserve previous results Structure updating incrementally: adapting the structure of tensor autoencoder with preserving the previous knowledge Fast: Update parameters and structure in real-time Huazhong University of Science and Technology
2. Incremental Deep Computation Model Parameters Updating Incrementally: Structure Updating Incrementally: Update: parameters to Update: Structure and Parameters Preserving: Structure Idea: Incorporate neutrons into hidden layers Objective function: Special case: auto-encoder Approximation: 1 ( x, ) T T ( x, ) 1 T J ( x; ) (( ) x) (( ) x) 2 2 Solution: General case: Q. Zhang, L.T. Yang, and Z. Chen, An Incremental Deep Computation Model for Big Data Feature Learning, IEEE Transactions on Service Computers (TSC), Major Revision, 2015. Huazhong University of Science and Technology
2. Secure Deep Computation Model on Cloud Goals: Overall Scheme: Improve efficiency: Using cloud computing Protect privacy: Adopting BGV encryption Challenges: Choose high efficient full encryption scheme Activation function approximation Client: Only encryption and decryption operations Cloud: All the computing tasks Q. Zhang, L.T. Yang, and Z. Chen, Privacy Preserving Deep Computation Model on Cloud for Big Data Feature Learning, Accepted at IEEE Transactions on Computers (TC), 2015. Huazhong University of Science and Technology
3. Big Data Ranking and Retrieval Video 5-order Tensor XML Documents 4-order Tensor Users Tensorization 3-order Tensor Data Space: Represent heterogeneous data as sub tensors. Huazhong University of Science and Technology 72
3. Big Data Ranking and Retrieval Sub-Tensor 子 张 量 Sub-Tensor Sub-Tensor... (Relation 关 联 张 量 Tensor) 子 张 量 Sub-Tensor Sub-Tensor Metric Space: Map all the sub tensors to a Hamming Space, and obtain the similarity between all sub tensors. The similarities are stored in the relation tensor. Huazhong University of Science and Technology 73
Huazhong University of Science and Technology 74
Huazhong University of Science and Technology 75
3. Big Data Ranking and Retrieval Sub-Tensor r = T n r Vector r denotes the rank of the data object Huazhong University of Science and Technology 76
3. Big Data Ranking and Retrieval Similarity Tensor Distance... I1 I2... I N d g ( x y )( x y ) TD lm l l m m lm, T ( x y) G( x y) In the metric space, the similarity is computed with tensor distance. g lm 1 P P l m 2 exp 2 2 2 1 2 1 2 L. Kuang, L.T. Yang, and H. Liu, 3T5R: A Tensor-based Data-as-a-Service Framework for Cyber-Physical-Social Big Data, Accepted at IEEE Transactions on Cloud Computing (TCC), 2015. Huazhong University of Science and Technology 77
Outline 1. Smart Word and Hyper World 2. Big Data and Challenges 3. A Tensor-Based Approach 4. Illustration and Example 5. Conclusion Huazhong University of Science and Technology 81
Illustration 1: Huazhong University of Science and Technology 82
研 究 内 容 4: 校 园 大 数 据 环 境 下 人 的 影 子 数 据 应 用 实 例 子 张 量 子 张 量 子 张 量 Original Tensor Shadow Data 学 生 Relation Tensor 子 张 量 Sub Tensor Sub Tensor Shadow Data 学 生 学 生 子 张 量 子 张 量 子 张 量 子 张 量 子 张 量 Cloud V 2 Client 正 交 基 增 量 Tensor Deep Computation for Mining and Analysis Basis : 1 1... 2 T V 1 S SubTensor 2 V 3 n n 核 心 张... Original 量 增 量 Core Tensor Tensor 张 量 增 量 OPT minz F F F 83
Illustration 2: Huazhong University of Science and Technology 84
Heterogeneous Data from Intelligent Device Huazhong University of Science and Technology 85
Four Step to Construct Six-Order Tensor Smart Home Social Space: Father, Mother, Children Physical Space: Bedroom, Dining room Cyber Space: Computer, Electronic devices Huazhong University of Science and Technology 86
Initial Tensor Huazhong University of Science and Technology 87
Tensor Decomposition Huazhong University of Science and Technology 88
Approximate Tensor Construction Huazhong University of Science and Technology 89
Likelihood Generated by Approximation Tensor Huazhong University of Science and Technology 90
Outline 1. Smart Word and Hyper World 2. Big Data and Challenges 3. A Tensor-Based Approach 4. Illustration and Example 5. Conclusion Huazhong University of Science and Technology 91
Problem A tensor-based framework for big data Representations, Relations, Reductions, Retrieval, Reasoning and Recommendations! Huazhong University of Science and Technology 92