Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart phones now incorporates many diverse and powerful sensors, like GPS sensors, light sensors, temperature sensors, direction sensors (i.e., magnetic compasses), and acceleration sensors (i.e., accelerometers). In this project we are trying to build a system that uses phone-based accelerometers, gyroscope and magnetometers to perform activity recognition, a task which involves identifying the physical activity a user is performing. In order to address the activity recognition task as a sequential supervised learning problem, we are going to represnt the collected sensor data a time series and we will use Markov Logic Network to model this time series data. 2 Motivation Dealing with sequential data has become an important application area of machine learning. Such data are frequently found in speech recognition, activity recognition, information extraction, etc. One of the main problems in this area of machine learning is assigning labels to sequences of objects. This class of problems has been called sequential supervised learning. Probabilistic graphical models such as Hidden Markov Models (HMM) or their generalization Dynamic Bayesian Networks (DBN [Murphy, 2002]) have been quite successful in modeling sequential phenomena. However, the main weaknesses for these models are: HMM (as well as DBN) is a generative model. The goal of generative graphical models is to model the joint probability distributions p(x, Y ) where X is the observed features, and Y is the label. If there are many observed features, all combinations of the observed features must be enumerated in order to compute the joint distribution. This is generally intractable. It is hard to change the model of a DBN or HMM, and also dependencies among input data is hard to specify. For this reason we propose to use Markov Logic Networks (MLN) for modelling. MLNs can be seen as a template for generating undirectional graphical model (or Markov Network). They are easy to specify as the structure of the graphical model is specified using First Order logic. MLN also gives us fliexbility to use as a discriminitive model or as a generative model. 1

3 Background on Markov Logic Network Markov Logic Networks (MLN, [Richardson and Domingos, 2006]) are one type of the unrolled graphical models developed in Statistical Relational Learning(SRL, [Getoor and Taskar, 2007] ) to combine logical and probabilistic reasoning. An MLN L is a set of pairs(f i, w i ), where F i is a formula in the first order logic and w i is a real number (called the weight of the formula F i ). Every instantiation of F i is given the same weight. Together with a finite set of constants C = c 1, c 2,..., c C, it defines a Markov network M L,C as follows: M L,C contains one binary node for each possible grounding of each predicate appearing in L. The value of the node is 1 if the ground predicate is true, and 0 otherwise. M L,C contains one feature for each possible grounding of each formula F i in L. The value of this feature is 1 if the ground formula is true, and 0 otherwise. The weight of the feature is the w i associated with F i in L. Thus first-order logic formulae in our knowledge base serve as templates to construct the Markov Network. This network models the joint distribution of the set of all ground atoms, X, each of which is a binary variable. It provides a means for performing probabilistic inference. The probability distribution over possible worlds x specified by the ground Markov network M L,C is given by P (X = x) = 1 Z exp( i w i n i (x)) = 1 φ i (x {i} ) ni(x) Z where n i (x) is the number of true groundings of F i in x, x {i} is the state (truth values) of the predicates appearing in F i, φ i (x {i} ) = e wi and Z is the normalizing factor, Z = x X exp( i w in i (x)). However classic Markov logic delas with discrete features, where as for the current project we need to use continuous features, so we are using Hybrid Markov Logic Network, which is an extension of MLNs to numeric domain. The Hybrid Markov Logic Network is defined as follows A hybrid Markov logic network(hmln) L is a set of pairs(f i, w i ), where F i is a formula or a numeric term, and w i is a real number. Together with a finite set of constants C = c 1, c 2,..., c C, it defines a Markov network M L,C as follows: M L,C contains one node for each possible grounding with constants in C of each predicate or numeric property appearing in L. The value of a predicate node is 1 if the ground predicate is true, and 0 otherwise. The value of a numeric node is the value of the corresponding ground term. M L,C contains one feature for each possible grounding with constants in C of each formula or numeric term F i in L. The value of a numeric feature is the value of the corresponding ground term. The weight of the feature is the w i associated with F i in L. HMLNs also allow a few extensions of first-order syntax, among these the most important one is soft equality. This is written as (α = β) for numeric terms. This notation is a shorthand for (α β) 2, where α and β are arbitrary numeric terms. This makes it possible to state numeric constraints as equations, with an implied Gaussian penalty for diverging from them. If the weight of a formula is w, the standard deviation of the Gaussian is σ = 1/ 2w. A numeric domain can now be modeled simply by writing down the equations that describe it. i 2

4 Challenges A couple of challenges were faced when implementing the collected cell phone sensor data as a sequential graphical model. They are as follows The gathered data was not sequential. The data has been collected from classification point of view. Hence there is no transition from one activity to another in the data. This effectively rules out the possibility of modelling the data as a sequential graphical model (such as HMM) Creating a graphical model with continuous variables is a open research area. In most cases a distribution of the continuous features are assumed, which like any assumptions may or may not be true. Alchemy is in work-in-progress, (at least for contiuous features). There are a few problems realted with alchemy (in terms of system specification). For example, only the soft equality is implemented in Alchemy, where as soft inequality is not implmented (though it is mentioned in the documentation). Inference in graphical model (even approximate inference) is intractable. As a result the inference and learning of the graphical model takes a long time. 5 Implementation 5.1 Feature Extraction For classifying activity from the time series data, first the time component was removed. For that reason a sliding window technique was used. The window size is taken as 2 second and between two consequtive windows there is an overlap of 1 second. After that a few basic features are extracted from it. These features are mostly statistical measures of central tendency. The extracted features are as follows Mean The average of the values in the window. (In statistical terms E[X]) Variance The variance of the values in the window. (In statistical terms E[X] 2 (E[X]) 2 ) k th Moment Third, fourth fifth and sixth moments are used as features. (The momnet of a random variable is defined as E[X k ]) k th Central Moment Third, fourth fifth and sixth central moments are used as features. (The central momnet of a random variable is defined as E[(X µ X ) k ], where µ X = E[X] ) Amplitude The differnce between highest and lowest value in the window. These features are extracted from accelerometer values on three axis. The gyroscope values are not used for feature extraction as they had very weak correlation with activity. Among these features the most important feature is variance (as it has got highest weight, and as it alone can classify 60% of the instances). 3

5.2 Validation For classification problem, the method of validation is of utmost important. Among differnt validation methods available I have chosen Leave One Person Out (LOPO). The data is collected for ten different person, so at the time of training the classifier, I am using data for nine person, and I am testing the data on tenth person. Instead of doing this I could have chosen to perform a 10 fold cross validation, but LOPO method is chosen as it gives better generalization. 5.3 Technique Using Alchemy the system has been modeled as a Gaussian Naïve Bayes, with Activity as the class variable and accelerometer variance, and amplitude as the observed variable. As all the features are continuous features, Hybrid Markov Logic Network is used for modelling the system. The model is hand trained with the mean and standard deviation computed from the data. As we have assumed Normal distribution of the features and as all the features are assumed to be conditionally independent, we computed the weights of each formula as w = 1/2σ 2. Also as we have different number of examples for different activity (and most of the examples were walking), the prior of the Activity distribution is assumed to be same (else there is a strong bias to classify every instance as a walking instance). Also for comparison purpose the system is also modelled using Logistic Regression and linear SVM. 6 Results The MLN was trained and tested on only left and right pocket data set. The accuracy we received for the MLN is 65.82%. The confusion matrix is as follows Predicted Class ClimbUp Running ClimbDown Jogging Walking Still ClimbUp 57 1 9 3 4 0 Running 9 60 11 2 8 0 ClimbDown 5 0 15 15 9 0 Jogging 6 1 54 9 26 0 Walking 4 0 25 12 51 2 Still 1 0 0 0 0 75 Table 1: Confusion Matrix, using MLN 4

For Logistic Regression, the classifier was trained on all three position as well as tested on these three position using LOPO validation. The accuracy we received using Logistic Regression is 74.35%. The confusion matrix is as follows a b c d e f classified as 675 43 0 0 2 11 a = Still 32 5840 24 158 126 217 b = Walking 1 105 764 225 23 44 c = Running 10 548 133 1384 76 59 d = Jogging 17 452 15 167 414 103 e = ClimbDown 4 551 71 73 68 656 f = ClimbUp Table 2: Confusion Matrix, using Logistic Regression For linear SVM, the classifier was trained and tested on only left and right pocket data set, and validation method used was 10 fold cross validation. The accuracy we received using SVM is 77.84%. The confusion matrix is as follows a b c d e f classified as 709 5 0 0 1 0 a = Still 10 4322 21 104 69 101 b = Walking 0 132 497 102 13 31 c = Running 6 655 69 655 47 41 d = Jogging 11 300 2 76 486 32 e = ClimbDown 1 252 6 13 26 799 f = ClimbUp Table 3: Confusion Matrix, using linear SVM References [Richardson and Domingos, 2006] Richardson, M., Domingos, P Networks. Machine Learning, 2006. Markov Logic [Getoor and Taskar, 2007] Getoor, L., Taskar, B, Intro. to Statistical Relational Learning.. MIT Press, 2007. [Murphy, 2002] Murphy, K.P., Dynamic bayesian networks: representation, inference and learning. PhD thesis, University of California, 2002. 5