Tensor Factorization for Multi-Relational Learning

Similar documents
Bayesian Factorization Machines

Factorizing YAGO. Scalable Machine Learning for Linked Data. Volker Tresp

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

An Introduction to Data Mining

Translating Embeddings for Modeling Multi-relational Data

Cell Phone based Activity Detection using Markov Logic Network

Reasoning With Neural Tensor Networks for Knowledge Base Completion

Iterative Splits of Quadratic Bounds for Scalable Binary Tensor Factorization

Bayesian Statistics: Indian Buffet Process

MapReduce Approach to Collective Classification for Networks

Predict the Popularity of YouTube Videos Using Early View Data

Factorization Machines

The Basics of Graphical Models

Bayesian networks - Time-series models - Apache Spark & Scala

Factorization Machines

Statistical machine learning, high dimension and big data

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

STA 4273H: Statistical Machine Learning

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

MS1b Statistical Data Mining

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

Rank one SVD: un algorithm pour la visualisation d une matrice non négative

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Self Organizing Maps for Visualization of Categories

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

The Data Mining Process

Marketing Mix Modelling and Big Data P. M Cain

Statistical Models in Data Mining

Recommendation Tool Using Collaborative Filtering

Link Prediction on Evolving Data using Tensor Factorization

Learning outcomes. Knowledge and understanding. Competence and skills

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

How To Find Local Affinity Patterns In Big Data

The University of Jordan

Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA

Learning to Process Natural Language in Big Data Environment

HT2015: SC4 Statistical Data Mining and Machine Learning

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

Minimizing Probing Cost and Achieving Identifiability in Probe Based Network Link Monitoring

Least Squares Estimation

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Data quality in Accounting Information Systems

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Evaluation of Machine Learning Techniques for Green Energy Prediction

Collaborative Filtering. Radek Pelánek

A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS

A1 Introduction to Data exploration and Machine Learning

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Predict Influencers in the Social Network

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

Data Mining. Dr. Saed Sayad. University of Toronto

Data Mining: Algorithms and Applications Matrix Math Review

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Invited Applications Paper

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis

The primary goal of this thesis was to understand how the spatial dependence of

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE

DATA ANALYSIS II. Matrix Algorithms

Sanjeev Kumar. contribute

A Brief Introduction to Property Testing

Model-Based Cluster Analysis for Web Users Sessions

A Scala DSL for Rete-based Runtime Verification

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Transform-based Domain Adaptation for Big Data

Using Data Mining for Mobile Communication Clustering and Characterization

Categorical Data Visualization and Clustering Using Subjective Factors

Statistics for BIG data

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images

Application of Markov chain analysis to trend prediction of stock indices Milan Svoboda 1, Ladislav Lukáš 2

How To Make Visual Analytics With Big Data Visual

Prototype-based classification by fuzzification of cases

Up/Down Analysis of Stock Index by Using Bayesian Network

Experiments in Web Page Classification for Semantic Web

Clustering Technique in Data Mining for Text Documents

A Learning Based Method for Super-Resolution of Low Resolution Images

Less naive Bayes spam detection

Dissertation TOPIC MODELS FOR IMAGE RETRIEVAL ON LARGE-SCALE DATABASES. Eva Hörster

How To Find Influence Between Two Concepts In A Network

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Factor analysis. Angela Montanari

Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector

Fast Matching of Binary Features

Practical Graph Mining with R. 5. Link Analysis

Faculty of Science School of Mathematics and Statistics

From relational databases to linked data:r for the semantic web. Jose Quesada, Max Planck Institute, Berlin

Tracking Groups of Pedestrians in Video Sequences

A Statistical Text Mining Method for Patent Analysis

Transcription:

Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany nickel@dbs.ifi.lmu.de 2 Siemens AG, Corporate Technology, Otto-Hahn-Ring 6, Munich, Germany volker.tresp@siemens.com Abstract. Tensor factorization has emerged as a promising approach for solving relational learning tasks. Here we review recent results on a particular tensor factorization approach, i.e. Rescal, which has demonstrated state-of-the-art relational learning results, while scaling to knowledge bases with millions of entities and billions of known facts. 1 Introduction Exploiting the information contained in the relationships between entities has been essential for solving a number of important machine learning tasks. For instance, social network analysis, bioinformatics, and artificial intelligence all make extensive use of relational information, as do large knowledge bases such as Google s Knowledge Graph or the Semantic Web. It is well-known that, in these and similar domains, statistical relational learning (SRL) can improve learning results significantly over non-relational methods. However, despite the success of SRL in specific applications, wider adoption has been hindered by multiple factors: without extensive prior knowledge about a domain, existing SRL methods often have to resort to structure learning for their functioning; a process that is both time consuming and error prone. Moreover, inference is often based on methods such as MCMC and variational inference which introduce additional scalability issues. Recently, tensor factorization has been explored as an approach that overcomes some of these problems and that leads to highly scalable solutions. Tensor factorizations realize multi-linear latent factor models and contain commonly used matrix factorizations as the special case of bilinear models. We will discuss tensor factorization for relational learning by the means of Rescal [6,7,5], which is based on the factorization of a third-order tensor and which has shown excellent learning results; outperforming state-of-the-art SRL methods and related tensor-based approaches on benchmark data sets. Moreover, Rescal is highly scalable such that large knowledge bases can be factorized, which is currently out of scope for most SRL methods. In our review of this model, we will also exemplify the general benefits of tensor factorization for relational learning, as considered recently in approaches like [10,8,1,4,2]. In the following, we will mostly follow the notation outlined in [3]. We will also assume that all relationships are of dyadic form.

2 Maximilian Nickel and Volker Tresp j-th i-th k-th relation X i-th k-th relation j-th R 1 A 2 A n a i σ R σ A x ijk R k m a j σ n Fig. 1. Factorization of an adjacency tensor X using the Rescal model. Fig. 2. Graphical plate model of Rescal. 2 Relational Learning via Tensor Factorization Dyadic relational data has a natural representation as an adjacency tensor X R n n m whose entries x ijk correspond to all possible relationships between n entities over m different relations. The entries of X are set to { 1, if the relationship relation k ( i, j ) exists x ijk = 0, otherwise. Rescal [6] is a latent factor model for relational learning, which factorizes an adjacency tensor X into a core tensor R R r r m and a factor matrix A R n r such that X R 1 A 2 A. (1) Equation (1) can be equivalently specified as x ijk a T i R k a j, where the column vector a i R r denotes the i-th row of A and the matrix R k R r r denotes the k-th frontal slice of R. Consequently, a i corresponds to the latent representation of i, while R k models the interactions of the latent variables for relation k. The dimensionality r of the latent space A is a user-given parameter which specifies the complexity of the model. The symbol denotes the approximation under a given loss function. Figure 1 shows a visualization of the factorization. Probabilistically, eq. (1) can be interpreted as estimating the joint distribution over all possible relationships n n m P(X A, R) = P(x ijk a T i R k a j ). (2) i=1 j=1 k=1 Hence, a Rescal factorization of an adjacency tensor X computes a complete model of a relational domain where the state of a relationship x ijk depends on the matrix-vector product a T i R k a j. Here, a Gaussian likelihood model would imply a least squares loss function, while a Bernoulli likelihood model would imply a logistic loss function [6,5]. To model attributes of entities efficiently, coupled tensor factorization can be employed [7,11], where simultaneously to

Tensor Factorization for Multi-Relational Learning 3 eq. (1) an attribute matrix F R n l is factorized such that F AW and where the latent factor A is shared between the factorization of X and F. Rescal and other tensor factorizations feature a number of important properties that can be exploited for tasks like link prediction, resolution or link-based clustering: Efficient Inference The latent variable structure of Rescal decouples inference such that global dependencies are captured during learning, whereas prediction relies only on a typically small number of latent variables. It can be seen from eq. (2) that a variable x ijk is conditionally independent from all other variables given the expression a T i R k a j. The computational complexity of these matrix-vector multiplications depends only on the dimensionality of the latent space A, what enables, for instance, fast query answering on knowledge bases. It is important to note that this locality of computation does not imply that the likelihood of a relationship is only influenced by local information. On the contrary, the conditional independence assumptions depicted in fig. 2 show that information is propagated globally when computing the factorization. Due to the colliders in fig. 2, latent variables (a i, a j, R k ) are not d-separated from other variables such that they are possibly dependent on all remaining variables. Therefore, as the variable x ijk depends on its associated latent variables {a i, a j, R k }, it depends indirectly on the state of any other variable such that global dependencies between relationships can be captured. Similar arguments apply to tensor factorizations such as the Tucker decomposition and CP, which explains the strong relational learning results of Rescal and CP compared to state-of the-art methods such as MLN or IRM [6,7,2,5]. Unique Representation A distinctive feature of Rescal is the unique representation of entities via the latent space A. Standard tensor factorization models such as CP and Tucker compute a bipartite model of relational data, meaning that entities have different latent representations as subjects or objects in a relationship. For instance, a Tucker-2 model would factorize the frontal slices of an adjacency tensor X as X k AR k B T such that entities are represented as subjects via the latent factor A and as objects via the latent factor B. However, relations are usually not bipartite and in these cases a bipartite modeling would effectively break the flow of information from subjects to objects, as it does not account for the fact that the latent variables a i and b i refer to the identical. In contrast, Rescal uses a unique latent representation a i for each in the data set, what enables efficient information propagation via the dependency structure shown in fig. 2 and what has been demonstrated to be critical for capturing correlations over relational chains. For instance, consider the task to predict the party membership of presidents of the United States of America. When the party membership of a president s vice president is known, this can be done with high accuracy, as both persons have usually been members of the same party, meaning that the formula vicepresident(x, y) party(y, z) party(x, z) holds with high probability. For this and similar examples, it has been shown that bipartite models such as CP and Tucker fail to capture the necessary correlations, as, for instance, the object representation b y does not reflect that

4 Maximilian Nickel and Volker Tresp person y is in a relationship to party z as a subject. Rescal, on the other hand, is able to propagate the required information, e.g. the party membership of y, via the unique latent representations of the involved entities [6,7]. Latent Representation In relational data, the similarity of entities is determined by the similarity of their relationships, following the intuition that if two objects are in the same relation to the same object, this is evidence that they may be the same object [9]. This notion of similarity is reflected in Rescal via the latent space A. For the i-th, all possible occurrences as a subject are grouped in the slice X i,:,: of an adjacency tensor, while all possible occurrences as an object are grouped in the slice X :,i,: (see figs. 3 and 4). According to the Rescal model, these slices are computed by vec (X i,:,: ) a i R (1) (I A) T and vec (X :,i,: ) a i R (2) (I A) T. Since the terms R (1) (I A) T and R (2) (I A) T are constant for different values of i, it is sufficient to consider only the similarity of a p and a q to determine the relational similarity of p and q. As this measure of similarity is based on the latent representations of entities, it is not only based on counting identical relationships of identical entities, but it also considers the similarity of the entities that are involved in a relationship. The previous intuition could therefore be restated as if two objects are in similar relations to similar objects, this is evidence that they may be the same object. Latent representations of entities have been exploited very successfully for resolution and also enabled large-scale hierarchical clustering on relational data [6,7]. Moreover, since the matrix A is a vector space representation of entities, non-relational machine learning algorithms such as k-means or kernel methods can be conveniently applied to any of these tasks. X :,i,:x:,j,: X i,:,: X j,:,: Fig. 3. Incoming Links Fig. 4. Outgoing Links High Scalability The scalability of algorithms has become of utmost importance as relational data is generated in an unprecedented amount and the size of knowledge bases grows rapidly. Rescal-ALS is a highly scalable algorithm to compute the Rescal model under a least-squares loss. It has been shown that it can efficiently exploit the sparsity of relational data as well as the structure of the factorization such that it features linear runtime complexity with regard to the number of entities n, the number of relations m, and the number of known relationships nnz(x), while being cubic in the model complexity r. This property allowed, for instance, to predict various high-level classes of entities in the Yago 2 ontology, which consists of over three million entities, over 80 relations or attributes, and over 33 million existing relationships, by computing low-rank factorizations of its adjacency tensor on a single desktop computer [7].

3 Conclusion and Outlook Tensor Factorization for Multi-Relational Learning 5 Rescal has shown state-of-the-art relational learning results, while scaling up to the size of complete knowledge bases. Due to its latent variable structure, Rescal does not require deep domain knowledge and therefore can be easily applied to most domains. Its latent representation of entities enables the application of non-relational algorithms to relational data for a wide range of tasks such as cluster analysis or resolution. Rescal is applicable if latent factors are suitable for capturing the essential information in a domain. In ongoing research, we explore situations where plain latent factor models are not a very efficient approach to relational learning and examine how to overcome the underlying causes of these situations. References 1. Bordes, A., Weston, J., Collobert, R., Bengio, Y.: Learning structured embeddings of knowledge bases. In: Proc. of the 25th Conference on Artificial Intelligence. SF, USA (2011) 2. Jenatton, R., Le Roux, N., Bordes, A., Obozinski, G.: A latent factor model for highly multi-relational data. In: Advances in Neural Information Processing Systems. vol. 25 (2012) 3. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Review 51(3), 455 500 (2009) 4. Kolda, T.G., Bader, B.W., Kenny, J.P.: Higher-order web link analysis using multilinear algebra. In: Proc. of the Fifth International Conference on Data Mining. pp. 242 249 (2005) 5. Nickel, M., Tresp, V.: Logistic tensor-factorization for multi-relational data. In: ICML Workshop - Structured Learning: Inferring Graphs from Structured and Unstructured Inputs. Atlanta, GA, USA (2013) 6. Nickel, M., Tresp, V., Kriegel, H.P.: A three-way model for collective learning on multi-relational data. In: Proc. of the 28th International Conference on Machine Learning. pp. 809 816. Bellevue, WA, USA (2011) 7. Nickel, M., Tresp, V., Kriegel, H.P.: Factorizing YAGO: scalable machine learning for linked data. In: Proc. of the 21st International World Wide Web Conference. Lyon, France (2012) 8. Rendle, S., Freudenthaler, C., Schmidt-Thieme, L.: Factorizing personalized markov chains for next-basket recommendation. In: Proc. of the 19th International Conference on World Wide Web. pp. 811 820 (2010) 9. Singla, P., Domingos, P.: Entity resolution with markov logic. In: Proc. of the Sixth International Conference on Data Mining. p. 572582. Washington, DC, USA (2006) 10. Sutskever, I., Salakhutdinov, R., Tenenbaum, J.B.: Modelling relational data using bayesian clustered tensor factorization. Advances in Neural Information Processing Systems 22 (2009) 11. Ylmaz, Y.K., Cemgil, A.T., Simsekli, U.: Generalised coupled tensor factorisation. In: Advances in Neural Information Processing Systems (2011)